At this point I'm fully down the path of the agent just maintaining his own tools. I have a browser skill that continues to evolve as I use it. Beats every alternative I have tried so far.
If you look at Elixir keynote for Phoenix.new -- a cool agentic coding tool -- you'll see some hints about a browser control using a API tool call. It's called "web" in the video.
Cool to see lots of people independently come to "CLIs are all you need". I'm still not sure if it's a short-term bandaid because agents are so good at terminal use or if it's part of a longer term trend but it's definitely felt much more seamless to me then MCPs.
The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.
Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.
I don't have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.
An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I'll definitely look into how to set that up.
A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.
At this point I'm fully down the path of the agent just maintaining his own tools. I have a browser skill that continues to evolve as I use it. Beats every alternative I have tried so far.
whats the name of the skill?
If you look at Elixir keynote for Phoenix.new -- a cool agentic coding tool -- you'll see some hints about a browser control using a API tool call. It's called "web" in the video.
Video: https://youtu.be/ojL_VHc4gLk?t=2132
More discussion: https://simonwillison.net/2025/Jun/23/phoenix-new/
Cool to see lots of people independently come to "CLIs are all you need". I'm still not sure if it's a short-term bandaid because agents are so good at terminal use or if it's part of a longer term trend but it's definitely felt much more seamless to me then MCPs.
(my one of many contribution https://github.com/caesarnine/binsmith)
I am also not sure if MCP will eventually be fixed to allow more control over context, or if the CLI approach really is the future for Agentic AI.
Nevertheless, I prefer the CLI for other reasons: it is built for humans and is much easier to debug.
MCP let's you hide secrets from the LLM
you can do same thing with cli via env vars no?
Hey this looks cool. So each agent or session is one thread. Nice. I like it.
A little bit different, but also allows to scrape efficiently. Json http communication rather than cli.
https://github.com/rumca-js/crawler-buddy
More like a framework for other mechanisms
This looks remarkably similar to https://github.com/vercel-labs/agent-browser
How is it different?
To be honest, I hadn't seen that one yet!
The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.
Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.
vibium clicker, too. https://github.com/VibiumDev/vibium/blob/main/CONTRIBUTING.m...
"browser automation for ai agents" is a popular idea these days.
is there a benchmark? there are a lot of scraping agents nowdays..
I don't have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.
An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I'll definitely look into how to set that up.
How are you holding session if every command is issues through cli? I assume this is essential for automation.
A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.
I see, nice. Is there a way to run multiple sessions?