Ask HN: How do you find and format public data for your apps?

1 points by AdobiWanKenobi 12 hours ago

I'm working on a project that sources a lot of publically accessible but unprocessed, unformatted and unconsolidated data. Currently I'm just running Brave API + Crawlee (Playwright) + AWS Bedrock in Python for realtime processing. This is fine but not exactly scalable.

I have also tried to do some web scraping myself building my own data processing pipeline from scratch and honestly this is a bit of a pain.

So I am asking HN how do you find and process information online. Especially if its not something you can plan ahead for.

I know there's tavily search as well as the new perplexity api but they don't quite fulfil what I need.

sargstuff 10 hours ago

Finding (Note: where data from & terms of use!) : general search engine search on "publically available data for " <insert topics/areas of interest aka computer science>

> ... building my own data processing pipeline from scratch and honestly this is a bit of a pain.

Yup, only thing really changed over years is more ways to access data / volume of data / faster ways to process data.