26 comments

  • dboreham a minute ago

    In case anyone else is confused as to what "browser automations" is : this is about making a program that drives a target web site (owned by someone else typically), in the manner of selenium or the like --- inserting key press events and mouse move/click events, to make that target web site do something. Once you know that the rest of the description makes sense.

  • glorpsicle an hour ago

    Congrats on the launch! I've been keeping up with you folks since you last posted (a few months ago, I believe). How does Anthropic's recent announcement of Claude's "computer use" abilities grab you? What key differentiators does Skyvern have, at this point in time ("computer use" with Claude being relatively new)?

    • suchintan 21 minutes ago

      Great question -- I was waiting for someone to ask this!

      Their product and launch is super cool. It's incredible how much it's able to do by just relying on tool use + micro agents + screen shots + coordinates to interact with websites.

      There are a couple of thoughts here:

      (1) Will their competitors wait around and not build something similar? Will xAI / Gemini / OpenAI / Mistral / MetaAI teams wait around? Probably not. This is likely a huge part of the future, and one company will not "take it all"

      (2) How is value actually derived from these systems? Is a demo + cool usable product enough? Likely not. Most people actually want their workflow automated. For personal use-cases, this might be enough.. but enterprises likely want something more complex

      (3) Will this be optimized for Claude only? What if you want to run this with your own open source LLMs? Or you want to point this at the best model on the market all the time? Will you get that flexibility through a solution provided by a big player? Likely not -- Anthropic has incentive to get you to use Claude under the hood

      The last point is the one that gives me hope. Our open source users are able to pick their favourite model to run on. You're not locked into Cluade. You can run it on Gemini / GPT-4O or open source ones such as Llama 3.2.

  • mmaunder 2 hours ago

    Congrats!!! And super cool that you've open sourced it under the AGPL. Sorry if this is answered in the docs but I did a brief search on the source and noticed you're not using LangChain but do plan to integrate it so it can be offered to that community. I'm curious if you wouldn't mind talking about what you did use to create the chain of thought/actions logic in Skyvern and if you had to start work today if you'd consider going the LangChain/Graph route? Thanks.

    • suchintan an hour ago

      We actually started off using the AutoGPT framework. There are a ton of remnants of that (tasks, steps) but we found the framework extremely limiting as we wanted to expand and do more complex things

      For example, we're currently using a multi agent architecture where we have micro agents run to analyze SVGs, fill out dynamic autocompletes. This would have been really hard.

      Frameworks like langchain are good for early prototyping, but it's too restricting when you want to push the limits

  • drewsonian 41 minutes ago

    This is great, and I can think of several business uses and some personal.

    Like this: Could I use this to pull screenshots or PDFs of my grocery receipts from a major grocery chain?

    • suchintan 19 minutes ago

      Yes! We're helping a few companies with this right now. This use-case actually surprised me.

      I never realized how important it is to track invoices in Europe (where VAT needs to be closely tracked), and a large % of vendors require you to log into their portal to fetch them

  • msp26 an hour ago

    Awesome, I've been working on a similar thing at a smaller scale and I think this area is very promising.

    I've limited my problem scope to single page interactions / scraping which has been very reliable and useful for my company. But agentic automation does sound fun.

    • suchintan an hour ago

      Yeah! We've seen this especially useful if you want to work in highly dynamic situations

      Ex: filling out contact forms on hundreds of websites? It's really tough for normal code to be able to handle that cardinality. No problem for an AI agent

      • msp26 an hour ago

        Just out of curiosity, what sort of challenges did you run into when scaling this up?

        I don't see a need for my current solution to go past a handful of browser instances but I'd imagine it might get crazy.

        • suchintan an hour ago

          I made a LinkedIn post about it yesterday, but the funniest has been our customer DoSing our service by accident (sending 10K tasks per hour for 24h straight)

          Toughest was Skyvern accidentally talking to a support agent when the website said "your request failed, please contact support"

          https://www.linkedin.com/posts/suchintansingh_we-received-20...

  • infocollector 28 minutes ago

    Quick question: What does DataDog's ddtrace do in the opensource version?

    • suchintan 20 minutes ago

      Nothing -- we use DataDog for our cloud telemetry and haven't built a great way to separate dependencies between cloud and open source

  • modo_ 2 hours ago

    Congrats on the launch! This is really cool - one of the applications of LLM I find most compelling. I've seen so many back office processes that have hundreds of steps, are incredibly error prone, and traditionally couldn't be automated due to API limitations. Solutions like Skyvern are going to supercharge businesses that have had historically low margins due to the number of humans required. (Not as a replacement for a human, but as a force multiplier)

    • suchintan an hour ago

      The most fascinating part is how tough that work really is. Everyone we've talked to loathes the manual stuff, but until a better solution comes out, you have to allocate X% of your time to it

  • DennisSFO an hour ago

    Congrats on the launch. I'm curious if you had any experience running skyvern on airline websites (for example to extract award availability for miles tickets from point A to B)? It seems like airlines always change things around and have robust anti scraping measures.

    • suchintan an hour ago

      Great question. We haven't helped anyone with that exact use case yet, but we're in the middle of integrating with a company to help them automate purchasing flights with Alaska and Southwest (on the behalf of real people)

      It's going to be our way of beta testing CC transaction and testing them for reliabilty

  • BrandiATMuhkuh 2 hours ago

    Congratulations on the launch. This is really cool. I was recently tinkering with the same idea. But based on a browser extension.

    There are many back office tasks where people copy data from page 1 into a form of page 2.

    • suchintan 2 hours ago

      Yeah we've been surprised by how many interesting things companies do in the background to keep them running

      The craziest one we heard about was this government portal in India that was hard to automate because halfway through the portal you had to refresh the page a bunch of times to get a button to show up

      • selimthegrim 2 hours ago

        The railway ticket site?

        • suchintan an hour ago

          It was a state level permit website I think. Very interesting!

  • delusional an hour ago

    The plaintext version of your signup email replaces the ampersand in the url with an & XML entity. You probably don't want that.

    • suchintan an hour ago

      Interesting. We will fix it

  • Cheesman123 2 hours ago

    Congrats on the launch - love the tool

  • ganeshkrishnan an hour ago

    awesome work. I had the github starred from the day I saw on Show HN but never got around to using it.

    I want to use this to automate approving/declining group members for our facebook group which is approaching half million members and fb admin tools are pretty lacking