Anthropic would be better off letting the community do this. Their harness sucks. Great scientists but not the best app developers. I suspect they just dont want to relinquish control of anything because they think the world cant be trusted with AI, we can only be trusted to pay them.
Custom agents using the low level completion APIs tend to outperform these generic tools, especially when you are working with complex problems.
It's hard to beat domain specific code. I can avoid massive prompts and token bloat if my execution environment, tools and error feedback provide effectively the same constraints.
If I had to pick only one tool for a generic agent to use, it would definitely be ExecuteSqlQuery (or a superset like ExecuteShell). If you gave me an agent framework and this is all it could do, I'd probably be ok for quite a while. SQL can absorb the domain specific concerns quite well. Consider that tool definitions also consume tokens.
Could you go into more details about why their "harness sucks?" This feels like a shared conclusion, but I've used several and theirs is better than many.
I generally agree that the harness isn't good, but it works and gets the job done and that seems to be the singular goal of the top 4 or 5 companies building them.
We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre work, but the takeaway seemed to be 'yeah but it works and they've got crazy revenue'.
That's where we're at. The harness is kind of buggy. The LLM still wanders and cycles in it sometimes. It's a monolithic LLM herding machine. The underlying model is awesome and the harness works well enough to make it super effective.
We can do so much better but we could also do worse. It's a turbulent time. I'm not super pleased with it all the time, but it's hard to criticize in many ways. They're doing a good job under the circumstances.
I see it kind of like they're at war. If they slow down to perfect anything, they will begin to lose battles, and they will lose ground. It's a highly contentious space. The harness isn't as good as it could be under better circumstances, but it's arguably a necessary trade off Anthropic needs to make.
I've been using OpenCode until yesterday (with some plugin to let me use their model until they implemented what it seems very sophisticated detection to reject you).
It just has a sane workflow it's easy to use, doesn't bother you with 1000 questions if you allow this or that to run and generally it feels like the model is dumber and makes more mistakes since yesterday since I have to use claude code.
Not sure popularity necessarily suggests it's good, but possibly just what people have most heard of or is easiest to setup with. This is going to be even more true now that Claude subscriptions are going to be essentially vendor locked.
It makes sense that anthropic is cranking out these products trying to find and maintain a foothold in the market.
But part of me just wishes they would go back to developing and refining an excellent and user-friendly harness.
I can't imagine what the long term support is for the dozens of products they release every three months.
Meanwhile, they're shipping a more and more buggy and Byzantine Claude code with a million switches and tons of ways to use it wrong.
The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
I bet subs are not their main source of revenue (by far), big cos are throwing big dollars to them, offering things like this entrenches these corps more into their products (makes it harder to just switch to OpenAI if your entire infra is built on top of their products)
> The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
Of course it is and they're not hiding it. Paying 200$ a month for the equivalent of maybe 2000$ is no secret. Theyre at the frontier of the models and they need to stay there to stay relevant. Otherwise they will fall like the majority of these "AI" companies will when the bubble bursts.
Anthropic would be better off letting the community do this. Their harness sucks. Great scientists but not the best app developers. I suspect they just dont want to relinquish control of anything because they think the world cant be trusted with AI, we can only be trusted to pay them.
Custom agents using the low level completion APIs tend to outperform these generic tools, especially when you are working with complex problems.
It's hard to beat domain specific code. I can avoid massive prompts and token bloat if my execution environment, tools and error feedback provide effectively the same constraints.
If I had to pick only one tool for a generic agent to use, it would definitely be ExecuteSqlQuery (or a superset like ExecuteShell). If you gave me an agent framework and this is all it could do, I'd probably be ok for quite a while. SQL can absorb the domain specific concerns quite well. Consider that tool definitions also consume tokens.
I've been greatly enjoying jetbrain's air IDE for some tasks. it uses claude behind the scenes.
Could you go into more details about why their "harness sucks?" This feels like a shared conclusion, but I've used several and theirs is better than many.
I generally agree that the harness isn't good, but it works and gets the job done and that seems to be the singular goal of the top 4 or 5 companies building them.
We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre work, but the takeaway seemed to be 'yeah but it works and they've got crazy revenue'.
That's where we're at. The harness is kind of buggy. The LLM still wanders and cycles in it sometimes. It's a monolithic LLM herding machine. The underlying model is awesome and the harness works well enough to make it super effective.
We can do so much better but we could also do worse. It's a turbulent time. I'm not super pleased with it all the time, but it's hard to criticize in many ways. They're doing a good job under the circumstances.
I see it kind of like they're at war. If they slow down to perfect anything, they will begin to lose battles, and they will lose ground. It's a highly contentious space. The harness isn't as good as it could be under better circumstances, but it's arguably a necessary trade off Anthropic needs to make.
> We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre work
Based on this, are there any open source harnesses that have objectively good-to-excellent work in their code?
pi.dev
very minimal, extensible.
I've been using OpenCode until yesterday (with some plugin to let me use their model until they implemented what it seems very sophisticated detection to reject you).
It just has a sane workflow it's easy to use, doesn't bother you with 1000 questions if you allow this or that to run and generally it feels like the model is dumber and makes more mistakes since yesterday since I have to use claude code.
> We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre
Do you have an example to contrast by what measure is good besides your word?
?
Anthropic made the most popular harness for developers.
Anthropic made the most popular desktop tool for AI automation.
Not sure popularity necessarily suggests it's good, but possibly just what people have most heard of or is easiest to setup with. This is going to be even more true now that Claude subscriptions are going to be essentially vendor locked.
Yes. Gemeni's web interface was atrocious even when the model was the best frontier.
And codex still uses phrases and syntax in prose ostensibly for the user as though they forgot people are actively reading this stuff.
Product is unquestionably where Anthropic excels. It is what carried it through periods where its thinking model lagged.
It makes sense that anthropic is cranking out these products trying to find and maintain a foothold in the market.
But part of me just wishes they would go back to developing and refining an excellent and user-friendly harness.
I can't imagine what the long term support is for the dozens of products they release every three months.
Meanwhile, they're shipping a more and more buggy and Byzantine Claude code with a million switches and tons of ways to use it wrong.
The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
Interested to see if this works out for them.
I bet subs are not their main source of revenue (by far), big cos are throwing big dollars to them, offering things like this entrenches these corps more into their products (makes it harder to just switch to OpenAI if your entire infra is built on top of their products)
This movement is brilliant for them
> The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
Of course it is and they're not hiding it. Paying 200$ a month for the equivalent of maybe 2000$ is no secret. Theyre at the frontier of the models and they need to stay there to stay relevant. Otherwise they will fall like the majority of these "AI" companies will when the bubble bursts.
How does releasing "Claude managed agents" keep them at the frontier of the models?