17 comments

  • the_mitsuhiko an hour ago

    At this point I'm fully down the path of the agent just maintaining his own tools. I have a browser skill that continues to evolve as I use it. Beats every alternative I have tried so far.

    • kinduff an hour ago

      whats the name of the skill?

  • randito 3 hours ago

    If you look at Elixir keynote for Phoenix.new -- a cool agentic coding tool -- you'll see some hints about a browser control using a API tool call. It's called "web" in the video.

    Video: https://youtu.be/ojL_VHc4gLk?t=2132

    More discussion: https://simonwillison.net/2025/Jun/23/phoenix-new/

  • binalpatel 4 hours ago

    Cool to see lots of people independently come to "CLIs are all you need". I'm still not sure if it's a short-term bandaid because agents are so good at terminal use or if it's part of a longer term trend but it's definitely felt much more seamless to me then MCPs.

    (my one of many contribution https://github.com/caesarnine/binsmith)

    • cosinusalpha 2 hours ago

      I am also not sure if MCP will eventually be fixed to allow more control over context, or if the CLI approach really is the future for Agentic AI.

      Nevertheless, I prefer the CLI for other reasons: it is built for humans and is much easier to debug.

    • 0x696C6961 2 hours ago

      MCP let's you hide secrets from the LLM

      • pylotlight 2 hours ago

        you can do same thing with cli via env vars no?

    • desireco42 2 hours ago

      Hey this looks cool. So each agent or session is one thread. Nice. I like it.

  • renegat0x0 4 hours ago

    A little bit different, but also allows to scrape efficiently. Json http communication rather than cli.

    https://github.com/rumca-js/crawler-buddy

    More like a framework for other mechanisms

  • philipbjorge 5 hours ago

    This looks remarkably similar to https://github.com/vercel-labs/agent-browser

    How is it different?

    • cosinusalpha 2 hours ago

      To be honest, I hadn't seen that one yet!

      The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.

      Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.

    • hugs 4 hours ago

      vibium clicker, too. https://github.com/VibiumDev/vibium/blob/main/CONTRIBUTING.m...

      "browser automation for ai agents" is a popular idea these days.

  • grigio 3 hours ago

    is there a benchmark? there are a lot of scraping agents nowdays..

    • cosinusalpha 2 hours ago

      I don't have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.

      An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I'll definitely look into how to set that up.

  • desireco42 2 hours ago

    How are you holding session if every command is issues through cli? I assume this is essential for automation.

    • cosinusalpha 2 hours ago

      A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.

      • desireco42 an hour ago

        I see, nice. Is there a way to run multiple sessions?