12 comments

  • sippeangelo 3 hours ago

    With all respect to Mozilla, "respects robots.txt" makes this effectively DoA. AI agents are a form of user agent like any other when initiated by a human, no matter the personal opinion of the content publisher (unlike the egregious automated /scraping/ done for model training).

    • MrTravisB 2 hours ago

      This is a valid perspective. Since this is an emerging space, we are still figuring out how to show up in a healthy way for the open web.

      We recognize that the balance between content owners and the users or developers accessing that content is delicate. Because of that, our initial stance is to default to respecting websites as much as possible.

      That said, to be clear on our implementation: we currently only respond to explicit blocks directed at the Tabstack user agent. You can read more about how this works here: https://docs.tabstack.ai/trust/controlling-access

    • observationist 2 hours ago

      Exactly. robots.txt with regards to AI is not a standard and should be treated like the performative, politicized, ideologically incoherent virtue signalling that it is.

      There are technical improvements to web standards that can and should be made that doesn't favor adtech and exploitative commercial interests over the functionality, freedom, and technically sound operation of the internet

    • mossTechnician 2 hours ago

      I agree with you in spirit, but I find it hard to explain that distinction. What's the difference between mass web scraping and an automated tool using this agent? The biggest differences I assume would be scope and intent... But because this API is open for general development, it's difficult to judge the intent and scope of how it could be used.

      • jakelazaroff an hour ago

        What's difficult to explain? If you're having an agent crawl a handful of pages to answer a targeted query, that's clearly not mass scraping. If you're pulling down entire websites and storing their contents, that's clearly not normal use. Sure, there's a gray area, but I bet almost everyone who doesn't work for an AI company would be able to agree whether any given activity was "mass scraping" or "normal use".

        • 1shooner 41 minutes ago

          What is worse: 10,000 agents running daily targeted queries on your site, or 1 query pulling 10,000 records to cache and post-process your content without unnecessarily burdening your service?

          • jakelazaroff 6 minutes ago

            I apprehend that you want me to say the first one is worse, but it's impossible with so few details. Like: worse for whom? in what way? to what extent?

            If (for instance) my content changes often and I always want people to see an up-to-date version, the second option is clearly worse for me!

    • ugh123 3 hours ago

      100%

  • shevy-java an hour ago

    Mozilla giving up on Firefox every day ...

  • Diti 3 hours ago

    Pricing page is hidden behind a registration form. Why?

    I also wanted to see how/if it handled semantic data (schema.org and Wikidata ontologies), but the hidden pricing threw me off.

    • MrTravisB 2 hours ago

      Thanks for the feedback. We are definitely not trying to hide it. We actually do have pricing listed in the API section regarding the different operations, but we could definitely work on making this clearer and easier to parse.

      We are simply in an early stage and still finalizing our long-term subscription tiers. Currently, we use a simple credit model which is $1 per 10,000 credits. However, every account receives 50,000 credits for free every month ($5 value). We will have a dedicated public pricing page up as soon as our monthly plans are finalized.

      Regarding semantic data, our JSON extraction endpoint is designed to extract any data on the page. That said, we would love to know your specific use cases for those ontologies to see if we can further improve our support for them.

  • srameshc 2 hours ago

    This looks good , but if Pay-as-you-go pricing can have some more information about what your actual are charges are per unit or whatever metrics, that would be helpful. I signed up but still can not find the actual pricing.