I'm still not 100% sure I understand what a symlink in a git repository actually does, especially across different operating systems. Maybe it's fine?
Anthropic say "put @AGENTS.md in your CLAUDE.md" file and my own experiments confirmed that this dumps the content into the system prompt in the same way as if you had copied it to CLAUDE.md manually, so I'm happy with that solution - at least until Anthropic give in and support AGENTS.md directly.
> Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools [...] In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.
Thanks! I def don't think I would have guessed this use case when MCP first came out, but more and more it seems Claude just yearns for scripting on data rather than a bunch of "tools". My/MCPs job has become just getting it that data.
Have you tried using light CLIs rather than MCP? I’ve found that CLIs are just easier for Claude, especially if you write them with Claude and during planning instruct it to think about adding guidance to users who get confused.
Our auth, log diving, infra state, etc, is all usable via cli, and it feels pretty good when pointing Claude at it.
This is how MCP works if you use it for as essential an internal tool API gateway (stateless http) instead of a client facing service that end users are connecting directly to. It's basically just OpenAPI but slightly more tuned for LLM inference.
> If you’re not already using a CLI-based agent like Claude Code or Codex CLI, you probably should be.
Are the CLI-based agents better (much better?) than the Cursor app? Why?
I like how easy it is to get Cursor to focus a particular piece of code. I select the text and Cmd-L, saying "fix this part, it's broken like this ____."
I haven't really tried a CLI agent; sending snippets of code by CLI sounds really annoying. "Fix login.ts lines 148-160, it's broken like this ___"
> The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents.
Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.
By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.
I use claude code every day, and havent had a chance to dig super deep into skills, but even though ive read a lot of people describe them and say they're the best thing so far, I still dont get them. Theyre things the agent chooses to call right? They have different permissions? is it a tool call with different permissions and more context? I have yet to see a single post give an actual real-world concrete example of how theyre supposed to be used or a compare and contrast with other approaches.
I think if it literally as a collection of .md files and scripts to help perform some set of actions. I'm excited for it not really as a "new thing" (as mentioned in the post) but as effectively an endorsement for this pattern of agent-data interaction.
So if youre building your own agent, this would be a directory of markdown documents with headers that you tell the agent to scan so that its aware of them, and then if it thinks they could be useful it can choose to read all the instructions into its context? Is it any more than that?
I guess I dont understand how this isnt just RAG with an index you make the agent aware of?
As fascinating as these tools can be - are we (the industry) once again finding something other than our “customer” to focus our brains on (see Paul Graham’s “Top idea in your mind” essay)?
Just my curiosity: Why are you producing so much code? Is it because it is now possible to do so with AI, or because you have a genuine need (solid business usecase) that requires a lot of code?
I just started developing self-hosted services largely with AI.
It wasn't possible before for me to do any of this at this kind of scale. Before, getting stuck on a bug could mean hours, days, or maybe even weeks of debugging. I never made the kind of progress I wanted before.
Many of the things I want, do already exist, but are often older, not as efficient or flexible as they could be, or just plain _look_ dated.
But now I can pump out react/shadcn frontends easily, generate apis, and get going relatively quickly. It's still not pure magic. I'm still hitting issues and such, but they are not these demotivating, project-ending, roadblocks anymore.
I can now move at a speed that matches the ideas I have.
I am giving up something to achieve that, by allowing AI to take control so much, but it's a trade that seems worth it.
Often code in SaaS companies like ours is indeed how we solve customer problems. It's not so much the amount of code but the rate (code per time) we can effectively use to solve problems/build solutions. AI, when tuned correctly, lets us do this faster than ever possible before.
Blog posts like this would really benefit from specific examples. While I can get some mileage out of these tools for greenfield projects, I'm actually shocked that this has proven useful with projects of any substantial size or complexity. I'm very curious to understand the context where such tools are paying off.
Makes sense. I work for a growth stage startup and most of these apply to our internal mono repo so hard to share specifics. We use this for both new and legacy code each with their own unique AI coding challenges.
If theres enough interest, I might replicate some examples in an open source project.
If you have no modifications or customization of Claude code then it comes down to a preference for proactivity (codex) or a bit more restraint.
If you are using literally any of Claude Code’s features the experience isn’t close, and regardless of model preference (Claude is my least favorite model by far) you should probably use Claude code. It’s just a much more extensible project that feels like it was made for teams to extend.
CC has better agent tools and is faster. The ability to switch from plan mode to execution mode and back is huge. Toggling thinking also. And of course they are innovating all of these agentic features like MCP, sub-agents, skills, etc...
Codex writes higher quality code, but is slower and less feature rich. I imagine this will change within months. The jury is still out. Exciting times!
> Generally my goal is to “shoot and forget”—to delegate, set the context, and let it work. Judging the tool by the final PR and not how it gets there.
This feels like a false economy to me for real sized changes, but maybe I’m just a weak code reviewer. For code I really don’t care about, I’m happy to do this, but if I ever need to understand that code I have an uphill battle. OTOH reading intermediate diffs and treating the process like actual pair programming has worked well for me, left me with changes I’m happy with, and codebases I understand well enough to debug.
I treat everything I find in code review as something to integrate into the prompts. Eventually, on a given project, you end up getting correct PRs without manual intervention. That's what they mean. You still have to review your code of course!
I feel like these posts are interesting, but become irrelevant quickly. Does anyone actually follow these as guides, or just consume them as feedback for how we wish we could interface with LLMs and the workarounds we currently use?
Right now these are reading like a guide to prolog in the 1980s.
Given that this space is so rapidly evolving, these kinds of posts are helpful just to make sure you aren't missing anything big. I've caught myself doing something the hard way after reading one of these. In this case, the framing is basically man pages for CLIs was a helpful description of sills that gives me some ideas about how to improve interaction with an in-house CLI my co. uses.
Yeah I like to think not everyone can spend their day exploring/tinkering with all these features so it's handy to just snapshot what exists and what works/doesn't.
This one is already out of date. The bit on the top about allocating space in CLAUDE.md for each tool is largely a waste of tokens these days. Use the skills feature.
The skill lets you compress the amount loaded to just the briefest description, with the “where do I go to get more info” being implicit. You should use a SKILL.md for evry added tool. At which point, putting instructions in CLAIDE.md becomes redundant and confusing to the LLM.
I wouldn't say I follow them as guides, but I think the field is changing quickly enough that it's good, or at least interesting, to read what's working well for other people.
When touch typing and talking to someone, I accidentally typed something to claude with my fingers off the home row, e.g. ttoubg kuje tgus ubti tge ckayde cide ternubak. Claude understood it just fine. Didn't even remark on it.
> Finally, we keep this file synced with an AGENTS.md file to maintain compatibility with other AI IDEs that our engineers might be using.
I researched this the other day, the recommended (by Anthropic) way to do this is to have a CLAUDE.md with a single line in it:
Then keep your actual content in the other file: https://docs.claude.com/en/docs/claude-code/claude-code-on-t...Yeah that's probably a slightly cleaner way of doing it.
You think it would be a good idea to use a symlink instead?
I'm still not 100% sure I understand what a symlink in a git repository actually does, especially across different operating systems. Maybe it's fine?
Anthropic say "put @AGENTS.md in your CLAUDE.md" file and my own experiments confirmed that this dumps the content into the system prompt in the same way as if you had copied it to CLAUDE.md manually, so I'm happy with that solution - at least until Anthropic give in and support AGENTS.md directly.
I have AGENTS.md symlinked to CLAUDE.md and it works fine in my repos.
But I can’t speak to it working across OS.
Confirm on a new clone that if you modify a file that the other is updated.
I thought git by default treats symlinks simply as file copies when cloning new.
Ie git may not be aware of the symlink.
git very much supports symlinks. Although depending on the system config it might not create actual symlinks on Windows.
I really like this take on MCP: https://blog.sshh.io/i/177742847/mcp-model-context-protocol
> Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools [...] In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.
Thanks! I def don't think I would have guessed this use case when MCP first came out, but more and more it seems Claude just yearns for scripting on data rather than a bunch of "tools". My/MCPs job has become just getting it that data.
Have you tried using light CLIs rather than MCP? I’ve found that CLIs are just easier for Claude, especially if you write them with Claude and during planning instruct it to think about adding guidance to users who get confused.
Our auth, log diving, infra state, etc, is all usable via cli, and it feels pretty good when pointing Claude at it.
This is how MCP works if you use it for as essential an internal tool API gateway (stateless http) instead of a client facing service that end users are connecting directly to. It's basically just OpenAPI but slightly more tuned for LLM inference.
> If you’re not already using a CLI-based agent like Claude Code or Codex CLI, you probably should be.
Are the CLI-based agents better (much better?) than the Cursor app? Why?
I like how easy it is to get Cursor to focus a particular piece of code. I select the text and Cmd-L, saying "fix this part, it's broken like this ____."
I haven't really tried a CLI agent; sending snippets of code by CLI sounds really annoying. "Fix login.ts lines 148-160, it's broken like this ___"
Claude is able to detect the lines of code selected in vscode anyway
Yes and you can select multiple files to give it focus. It can run anything in your PATH too. Eg it's pretty good at using `gh` and so on
They all have optional ide integration, e.g Claude knows the active vscode tab and highlighted lines.
Is that better than Cursor? Same? Just different?
> The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents.
Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.
By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.
1. OpenSkills -https://github.com/BandarLabs/open-skills
I use claude code every day, and havent had a chance to dig super deep into skills, but even though ive read a lot of people describe them and say they're the best thing so far, I still dont get them. Theyre things the agent chooses to call right? They have different permissions? is it a tool call with different permissions and more context? I have yet to see a single post give an actual real-world concrete example of how theyre supposed to be used or a compare and contrast with other approaches.
Maybe these might be handy: - https://github.com/anthropics/skills - https://www.anthropic.com/engineering/equipping-agents-for-t...
I think if it literally as a collection of .md files and scripts to help perform some set of actions. I'm excited for it not really as a "new thing" (as mentioned in the post) but as effectively an endorsement for this pattern of agent-data interaction.
apparently I missed Simon Willison's article, this at least somewhat explains them: https://simonwillison.net/2025/Oct/16/claude-skills/
So if youre building your own agent, this would be a directory of markdown documents with headers that you tell the agent to scan so that its aware of them, and then if it thinks they could be useful it can choose to read all the instructions into its context? Is it any more than that?
I guess I dont understand how this isnt just RAG with an index you make the agent aware of?
As fascinating as these tools can be - are we (the industry) once again finding something other than our “customer” to focus our brains on (see Paul Graham’s “Top idea in your mind” essay)?
Just my curiosity: Why are you producing so much code? Is it because it is now possible to do so with AI, or because you have a genuine need (solid business usecase) that requires a lot of code?
I just started developing self-hosted services largely with AI.
It wasn't possible before for me to do any of this at this kind of scale. Before, getting stuck on a bug could mean hours, days, or maybe even weeks of debugging. I never made the kind of progress I wanted before.
Many of the things I want, do already exist, but are often older, not as efficient or flexible as they could be, or just plain _look_ dated.
But now I can pump out react/shadcn frontends easily, generate apis, and get going relatively quickly. It's still not pure magic. I'm still hitting issues and such, but they are not these demotivating, project-ending, roadblocks anymore.
I can now move at a speed that matches the ideas I have.
I am giving up something to achieve that, by allowing AI to take control so much, but it's a trade that seems worth it.
Often code in SaaS companies like ours is indeed how we solve customer problems. It's not so much the amount of code but the rate (code per time) we can effectively use to solve problems/build solutions. AI, when tuned correctly, lets us do this faster than ever possible before.
Blog posts like this would really benefit from specific examples. While I can get some mileage out of these tools for greenfield projects, I'm actually shocked that this has proven useful with projects of any substantial size or complexity. I'm very curious to understand the context where such tools are paying off.
Makes sense. I work for a growth stage startup and most of these apply to our internal mono repo so hard to share specifics. We use this for both new and legacy code each with their own unique AI coding challenges.
If theres enough interest, I might replicate some examples in an open source project.
Why are so many still using CC and not Codex
If you have no modifications or customization of Claude code then it comes down to a preference for proactivity (codex) or a bit more restraint.
If you are using literally any of Claude Code’s features the experience isn’t close, and regardless of model preference (Claude is my least favorite model by far) you should probably use Claude code. It’s just a much more extensible project that feels like it was made for teams to extend.
CC has better agent tools and is faster. The ability to switch from plan mode to execution mode and back is huge. Toggling thinking also. And of course they are innovating all of these agentic features like MCP, sub-agents, skills, etc...
Codex writes higher quality code, but is slower and less feature rich. I imagine this will change within months. The jury is still out. Exciting times!
Ecosystem features and cohesion.
> Generally my goal is to “shoot and forget”—to delegate, set the context, and let it work. Judging the tool by the final PR and not how it gets there.
This feels like a false economy to me for real sized changes, but maybe I’m just a weak code reviewer. For code I really don’t care about, I’m happy to do this, but if I ever need to understand that code I have an uphill battle. OTOH reading intermediate diffs and treating the process like actual pair programming has worked well for me, left me with changes I’m happy with, and codebases I understand well enough to debug.
I treat everything I find in code review as something to integrate into the prompts. Eventually, on a given project, you end up getting correct PRs without manual intervention. That's what they mean. You still have to review your code of course!
I've found planning to be key here for scaling to arbitrary complex changes.
It's much easier to review larger changes when you've aligned on a Claude generated plan up front.
Good article
I feel like these posts are interesting, but become irrelevant quickly. Does anyone actually follow these as guides, or just consume them as feedback for how we wish we could interface with LLMs and the workarounds we currently use?
Right now these are reading like a guide to prolog in the 1980s.
Given that this space is so rapidly evolving, these kinds of posts are helpful just to make sure you aren't missing anything big. I've caught myself doing something the hard way after reading one of these. In this case, the framing is basically man pages for CLIs was a helpful description of sills that gives me some ideas about how to improve interaction with an in-house CLI my co. uses.
Yeah I like to think not everyone can spend their day exploring/tinkering with all these features so it's handy to just snapshot what exists and what works/doesn't.
This one is already out of date. The bit on the top about allocating space in CLAUDE.md for each tool is largely a waste of tokens these days. Use the skills feature.
It's a balance and we use both.
Skills doesn't totally deprecate documenting things in CLAUDE.md but agree that a lot of these can be defined as skills instead.
Skill frontmatter also still sits in the global context so it's not really a token optimization either.
The skill lets you compress the amount loaded to just the briefest description, with the “where do I go to get more info” being implicit. You should use a SKILL.md for evry added tool. At which point, putting instructions in CLAIDE.md becomes redundant and confusing to the LLM.
I wouldn't say I follow them as guides, but I think the field is changing quickly enough that it's good, or at least interesting, to read what's working well for other people.
right on. i usually just tell it "hey go update this function to do [x]" in horribly misspelled english and then yell at it until it does it right
Sometimes I write with Claude in English and German mixed with really bad typos and it’s amazing how well it works.
When touch typing and talking to someone, I accidentally typed something to claude with my fingers off the home row, e.g. ttoubg kuje tgus ubti tge ckayde cide ternubak. Claude understood it just fine. Didn't even remark on it.
Ja find ich auch schön schreckliches Denglisch mit Claude zu reden.
Enshittification needs its Moore's law.