Qwik Hackernews

15 points | by polaritymaking 2 hours ago ago

7 comments

I've done a lot of work in the scraping space for a side project I've worked on for years. At this point I've built my own tools, but over the years I've tested just about every paid scraping service that currently exists.

I think it would be really valuable to add some sort of tool that allows me to enter a URL and see a subset of the data returned, latency measurement, and flag if the Enhanced Bot Detection was required to get that data.

I can't count the number of times I've signed up for a service to see if they could get data from a URL that's giving me problems only to find that they couldn't scrape it either. It would be really helpful to know before I sign up what I need to pay for and that your tool can actually get the data I want. While it may result in fewer sign ups for your site, I think it will result in the customer who do sign up being higher value customers and potentially reduce support burden.

[-]

jeromechoo 38 minutes ago

You’re probably also looking for that tool to be available unauthed. And yeah agreed. We do this at Diffbot and the test drive is the 2nd highest visited page.

djoldman an hour ago

> Runo extracts by meaning, not DOM position. Site redesigns and HTML changes won't ever break your pipeline.

Bold claim.

drewrbaker an hour ago

Do a lot of scraping at work. Curious how you handle simulating UI interactions? Or if I can supply cookies in my requests?

eddy-sekorti an hour ago

good, how much can be done in 500 requests/ month, i want to try it for something

rvz an hour ago

How does this work against sites that use Google's next-gen reCAPTCHA that uses hardware attestations?

nlitened an hour ago

Now we’re the forum for captcha-evading scrapers, boys