Writing a good Claude.md

(humanlayer.dev)

325 points | by objcts 8 hours ago ago

109 comments

  • nico 2 hours ago

    > Claude often ignores CLAUDE.md

    > The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file

    Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file

    A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently

    • stingraycharles 37 minutes ago

      That’s hilarious and a great way to test this.

      What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.

      Claude pulls in those files automatically whenever it tries to read a file in that directory.

      I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.

  • vunderba 5 hours ago

    From the article:

    > We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.

    I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:

      # Documentation References
    
      - When adding CSS, refer to: docs/ADDING_CSS.md
      - When adding assets, refer to: docs/ADDING_ASSETS.md
      - When working with user data, refer to: docs/STORAGE_MANAGER.md
    
    
    Full CLAUDE.md file for reference:

    https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...

    • dimitri-vs 9 minutes ago

      Correct me if I'm wrong but I think the new "skillss are exactly this, but better.

    • sothatsit an hour ago

      I have also done this, but my results are very hit or miss. Claude rarely actually reads the other documentation files I point it to.

      • dhorthy an hour ago

        I think the key here is “if X then Y syntax” - this seems to be quite effective at piercing through the “probably ignore this” system message by highlighting WHEN a given instruction is “highly relevant”

  • johnsmith1840 4 hours ago

    I don't get the point. Point it at your relevent files ask it to review discuss the update refine it's understanding and then tell it to go.

    I have found that more context comments and info damage quality on hard problems.

    I actually for a long time now have two views for my code.

    1. The raw code with no empty space or comments. 2. Code with comments

    I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.

    I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.

    Compute to informatio ratio is all that matters. Compute is capped.

    • Aurornis 4 hours ago

      > I have found that more context comments and info damage quality on hard problems.

      There can be diminishing returns, but every time I’ve used Claude Code for a real project I’ve found myself repeating certain things over and over again and interrupting tool usage until I put it in the Claude notes file.

      You shouldn’t try to put everything in there all the time, but putting key info in there has been very high ROI for me.

      Disclaimer: I’m a casual user, not a hardcore vibe coder. Claude seems much more capable when you follow the happy path of common projects, but gets constantly turned around when you try to use new frameworks and tools and such.

      • MarkMarine 3 hours ago

        Setting hooks has been super helpful for me, you can reject certain uses of tools (don’t touch my tests for this session) with just simple scripting code.

        • brianwawok an hour ago

          Git lint hook has been key. No matter how many times I told it, it lints randomly. Sometimes not at all. Sometime before rubbing tests (but not after fixing test failures).

      • lostdog 3 hours ago

        Agreed, I don't love the CLAUDE.md that gets autogenerated. It's too wordy for me to understand and for the model to follow consistently.

        I like to write my CLAUDE.md directly, with just a couple paragraphs describing the codebase at a high level, and then I add details as I see the model making mistakes.

    • Mtinie 4 hours ago

      > 1. The raw code with no empty space or comments. 2. Code with comments

      I like the sound of this but what technique do you use to maintain consistency across both views? Do you have a post-modification script which will strip comments and extraneous empty space after code has been modified?

      • wormpilled an hour ago

        Curious if that is the case, how you would put comments back too? Seems like a mess.

    • nightski 4 hours ago

      IMO within the documentation .md files the information density should be very high. Higher than trying to shove the entire codebase into context that is for sure.

      • johnsmith1840 4 hours ago

        You deffinetly don't just push the entire code base. Previous models required you to be meticulous about your input. A function here a class there.

        Even now if I am working on REALLY hard problems I will still manually copy and paste code sections out for discussion and algorithm designs. Depends on complexity.

        This is why I still believe open ai O1-Pro was the best model I've ever seen. The amount of compute you could throw at a problem was absurd.

    • senshan 4 hours ago

      > I never give the second to my LLM.

      How do you practically achieve this? Honest question. Thanks

      • johnsmith1840 3 hours ago

        Custom scripts.

        1. Turn off 2. Code 3. Turn on 4. Commit

        I also delete all llm comments they 100% poison your codebase.

        • senshan an hour ago

          >> 1. The raw code with no empty space or comments. 2. Code with comments

          > 1. Turn off 2. Code 3. Turn on 4. Commit

          What does it mean "turn off" / "turn on"?

          Do you have a script to strip comments?

          Okay, after the comments were stripped, does this become the common base for 3-way merge?

          After modification of the code stripped of the comments, do you apply 3-way merge to reconcile the changes and the comments?

          This seems a lot of work. What is the benefit? I mean demonstrable benefit.

          How does it compare to instructing through AGENTS.md to ignore all comments?

    • ra 4 hours ago

      This is exactly right. Attention is all you need. It's all about attention. Attention is finite.

      The more you data load into context the more you dilute attention.

      • throwuxiytayq 4 hours ago

        people who criticize LLMs for merely regurgitating statistically related token sequences have very clearly never read a single HN comment

  • _pdp_ 6 hours ago

    There is far much easier way to do this and one that is perfectly aligned with how these tools work.

    It is called documenting your code!

    Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.

    Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.

    What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

    You don't have to "prompt it just the right way".

    What you have to do is to use the same old good best practices.

    • dhorthy 6 hours ago

      For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

      sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.

      Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`

      > What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

      I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.

    • johnfn 5 hours ago

      So how exactly does one "write what this file is supposed to do in a clear concise way" in a way that is quickly comprehensible to AI? The gist of the article is that when your audience changes from "human" to "AI" the manner in which you write documentation changes. The article is fairly high quality, and presents excellent evidence that simply "documenting your code" won't get you as far as the guidelines it provides.

      Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.

    • bastawhiz 5 hours ago

      This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:

      1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)

      2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)

      Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".

      CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.

      Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".

      • gitgud 16 minutes ago

        > 1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)

        README files are not a new concept, and have been used in software for like 5 decades now, whereas CLAUDE.md files were invented 12 months ago...

      • _pdp_ 5 hours ago

        You put a warning where it is most likely to be seen by a human coder.

        Besides, no amount of prompting will prevent this situation.

        If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.

        I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.

        • mvkel 5 hours ago

          Documenting for AI exactly like you would document for a human is ignoring how these tools work

          • anonzzzies 5 hours ago

            But they are right, claude routinely ignores stuff from CLAUDE.md, even with warning bells etc. You need a linter preventing things. Like drizzle sql` templates: it just loves them.

          • CuriouslyC 4 hours ago

            You can make affordances for agent abilities without deviating from what humans find to be good documentation. Use hyperlinks, organize information, document in layers, use examples, be concise. It's not either/or unless you're being lazy.

        • bastawhiz 3 hours ago

          > no amount of prompting will prevent this situation.

          Again, missing the point. If you don't prompt for it and you document it in a place where the tool won't look first, the tool simply won't do it. "No amount of promoting" couldn't be more wrong, it works for me and all my coworkers.

          > If it is a concern then you put a linter or unit tests to prevent it altogether

          Sure, and then it'll always do things it's own way, run the tests, and have to correct itself. Needlessly burning tokens. But if you want to pay for it to waste its time and yours, go for it.

          > I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.

          It's not about avoiding mistakes! It's about having it follow the norms of your codebase.

          - My codebase at work is slowly transitioning from Mocha to Jest. I can't write a linter to ban new mocha tests, and it would be a pain to keep a list of legacy mocha test suites. The solution is to simply have a bullet point in the CLAUDE.md file that says "don't write new Mocha test suites, only write new test suites in Jest". A more robust solution isn't necessary and doesn't avoid mistakes, it avoids the extra step of telling the LLM to rewrite the tests.

          - We have a bunch of terraform modules for convenience when defining new S3 buckets. No amount of documenting the modules will have Claude magically know they exist. You tell it that there are convenience modules and to consider using them.

          - Our ORM has findOne that returns one record or null. We have a convenience function getOne that returns a record or throws a NotFoundError to return a 404 error. There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError. And the hassle of maybe catching some instances isn't necessary, because avoiding it is just one line in CLAUDE.md.

          It's really not that hard.

          • girvo an hour ago

            > There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError

            Yes there is? Though this is usually better served with a type checker, it’s still totally feasible with a linter too if that’s your bag

            > because avoiding it is just one line in CLAUDE.md.

            Except no, it isn’t, because these tools still ignore that line sometimes so I still have to check for it myself.

    • 0xblacklight 5 hours ago

      I think you’re missing that CLAUDE.md is deterministically injected into the model’s context window

      This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt

      I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely

  • gonzalohm 5 hours ago

    Probably a lot of people here disagree with this feeling. But my take is that if setting up all the AI infrastructure and onboarding to my code is going to take this amount of effort, then I might as well code the damn thing myself which is what I'm getting paid to (and enjoy doing anyway)

    • Havoc 2 hours ago

      A lot of the style stuff you can write once and reuse. I started splitting mine into overall and project specific files for this reason

      Universal has stuff I always want (use uv instead of pip etc) while the other describes what tech choice for this project

    • nichochar 2 hours ago

      The effort described in the article is maybe a couple hours of work.

      I understand the "enjoy doing anyway" part and it resonates, but not using AI is simply less productive.

      • TheRoque 2 hours ago

        > but not using AI is simply less productive

        Some studies shows the opposite for experienced devs. And it also shows that developers are delusional about said productivity gains: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

        If you have a counter-study (for experienced devs, not juniors), I'd be curious to see. My experience also has been that using AI as part of your main way to produce code, is not faster when you factor in everything.

    • vanviegen 5 hours ago

      Perhaps. But keep in mind that the setup work is typically mostly delegated to LLMs as well.

    • nvarsj 4 hours ago

      It really doesn't take that much effort. Like any tool, people can over-optimise on the setup rather than just use it.

    • fragmede 5 hours ago

      Whether it's setting up AI infrastructure or configuring Emacs/vim/VSCode, the important distinction to make is if the cost has to be paid continually, or if it's a one time/intermittent cost. If I had to configure my shell/git aliases every time I booted my computer, I wouldn't use them, but seeing as how they're saved in config files, they're pretty heavily customized by this point.

      Don't use AI if you don't want to, but "it takes too much effort to set up" is an excuse printf debuggers use to avoid setting up a debugger. Which is a whole other debate though.

    • kissgyorgy 4 hours ago

      I strongly disagree with the author not using /init. It takes a minute to run and Claude provides surprisingly good results.

      • 0xblacklight 2 hours ago

        If you find it works for you, then that’s great! This post is mostly from our learnings from getting it to solve hard problems in complex brownfield codebases where auto generation is almost never sufficient.

      • alwillis 3 hours ago

        /init has evolved since the early day; it's more concise than it used to be.

  • serial_dev 6 hours ago

    I’m sure I’m just working like a caveman, but I simply highlight the relevant code, add it to the chat, and talk to these tools as if they were my colleagues and I’m getting pretty good results.

    About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.

    Basically, I found that they not make that much of a difference, the model is either good enough or not…

    I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.

    I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.

    • vanviegen 5 hours ago

      > I simply highlight the relevant code, add it to the chat, and talk to these tools

      Different use case. I assume the discussion is about having the agent implement whole features or research and fix bugs without much guidance.

      • 0xblacklight 5 hours ago

        Yep it is opinionated for how to get coding agents to solve hard problems in complex brownfield codebases which is what we are focused on at humanlayer :)

    • rmnclmnt 5 hours ago

      Matches my experience also. Bothered only once to setup a proper CLAUDE.md file, and now never do it. Simply refering to the context properly for surgical recommendations and edit works relatively well.

      It feels a lot like bikeshedding to me, maybe I’m wrong

    • wredcoll 5 hours ago

      How about a list of existing database tables/columns so you don't need to repeat it each time?

      • girvo an hour ago

        I gave it a tool to execute to get that info if required, but it mostly doesn’t need to due to Kysely migration files and the database type definition being enough.

      • anonzzzies 5 hours ago

        Claude code figures that out at startup every time. Never had issues with it.

      • HDThoreaun 3 hours ago

        Do you not use a model file for your orm?

        • wredcoll an hour ago

          ORMs are generally a bad idea, so.. hopefully not?

          • girvo an hour ago

            Even without the explicit magic ORMs, with data mapper style query builders like Kysely and similar, I still find I need to marshall selected rows into objects to, yknow, do things with them in a lot of cases.

            Perhaps a function of GraphQL though.

            • wredcoll 20 minutes ago

              Sure, but that's not the same thing. For example, whether or not you have to redeclare your entire database schema in a custom ORM language in a different repo.

    • jwpapi 5 hours ago

      === myExperience

  • 2 hours ago
    [deleted]
  • bryanhogan 18 minutes ago

    I've been very satisfied with creating a short AGENTS.md file with the project basics, and then also including references to where to find more information / context, like a /context folder that has markdown files such as app-description.md.

  • andersco 6 hours ago

    I have found enabling the codebase itself to be the “Claude.md” to be most effective. In other words, set up effective automated checks for linting, type checking, unit tests etc and tell Claude to always run these before completing a task. If the agent keeps doing something you don’t like, then a linting update or an additional test often is more effective than trying to tinker with the Claude.md file. Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task and of course update the docs for each new task. YMMV but this has worked for me.

    • Aeolun 4 hours ago

      > Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task

      Yeah, if you do this every time it works fine. If you add what you tell it every time to CLAUDE.md, it also works fine, but you don’t have to tell it any more ;)

    • Havoc 2 hours ago

      > Claude.md

      It’s case sensitive btw. CLAUDE.md - Might explain your mixed results with it

    • szundi 6 hours ago

      [dead]

  • astrostl 2 hours ago

    I have Claude itself write CLAUDE.md. Once it is informed of its context (e.g., "README.md is for users, CLAUDE.md is for you") you can say things like, "update readme and claudemd" and it will do it. I find this especially useful for prompts like, "update claudemd to make absolutely certain that you check the API docs every single time before making assumptions about its behavior" — I don't need to know what magick spell will make that happen, just that it does happen.

    • dexwiz 2 hours ago

      Do you have any proof that AI written instructions are better than human ones? I don't see why an AI would have an innate understanding on how best to prompt itself.

      • astrostl 2 hours ago

        Having been through cycles of manual writing with '#' and having it do it itself, it seems to have been a push on efficacy while spending less effort and getting less frustrated. Hard to quantify except to say that I've had great results with it. I appreciate the spirit of OP's, "CLAUDE.md is the highest leverage point of the harness, so avoid auto-generating it" but you can always ask Claude to tighten it up itself too.

      • michaelbuckbee 2 hours ago

        Generally speaking it has a lot of information from things like OP's blog post on how best to structure the file and prompt itself and you can also (from within Claude Code) ask it to look at posts or Anthropic prompting best practices and adopt those to your own file.

  • 2 hours ago
    [deleted]
  • mmaunder 3 hours ago

    That paper the article references is old at this point. No GPT 5.1, no Gemini 3, which both were game changers. I'd love to see their instruction following graphs.

  • 2 hours ago
    [deleted]
  • grishka 3 hours ago

    Oh yeah I added a CLAUDE.md to my project the other day: https://github.com/grishka/Smithereen/blob/master/CLAUDE.md

    Is it a good one?

    • lijok 2 hours ago

      I copy/pasted it into my codebase to see if it’s any good and now Claude is refusing to do any work? I asked Copilot to investigate why Claude is not working but it too is not working. Do you know what happened?

  • DR_MING 3 hours ago

    I already forgot CLAUDE.md, I generate and update it by AI, I prefer to keep design, tasks, docs folder instead. It is always better to ask it to read a some spec docs and read the real code first before doing anything.

  • ctoth 5 hours ago

    I've gotten quite a bit of utility out of my current setup[0]:

    Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.

    Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.

    The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.

    The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!

    Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.

    [0]: https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...

  • prettyblocks 6 hours ago

    The advice here seems to assume a single .md file with instructions for the whole project, but the AGENTS.md methodology as supported by agents like github copilot is to break out more specific AGENTS.md files in the subdirectories in your code base. I wonder how and if the tips shared change assuming a flow with a bunch of focused AGENTS.md files throughout the code.

    • 0xblacklight 6 hours ago

      Hi, post author here :)

      I didn’t dive into that because in a lot of cases it’s not necessary and I wanted to keep the post short, but for large monorepos it’s a good idea

  • jasonjmcghee 6 hours ago

    Interesting selection of models for the "instruction count vs. accuracy" plot. Curious when that was done and why they chose those models. How well does ChatGPT 5/5.1 (and codex/mini/nano variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok models, Kimi 2 Thinking etc (this generation of models) do?

    • alansaber 6 hours ago

      Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes

      • jasonjmcghee 6 hours ago

        Sure - I was more commenting that they are all > 6 months old, which sounds silly, but things have been changing fast, and instruction following is definitely an area that has been developing a lot recently. I would be surprised if accuracy drops off that hard still.

        • 0xblacklight 5 hours ago

          I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post

  • 0xcb0 3 hours ago

    Here is my take, on writing a good claude.md. I had very good results with my 3 file approach. And it has also been inspired by the great blog posts that Human Layer is publishing from time to time https://github.com/marcuspuchalla/claude-project-management

  • eric-burel 6 hours ago

    "You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.

    • 0xblacklight 6 hours ago

      Hi, post author here

      We used cloudflare’s AI gateway which is pretty simple. Set one up, get the proxy URL and set it through the env var, very plug-and-play

    • fishmicrowaver 6 hours ago

      Have you considered just asking claude? I'd wager you'd get up and running in <10 minutes.

      • dhorthy 6 hours ago

        agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning

    • Havoc 2 hours ago

      Just install mitmproxy. Takes like 5 mins to figure out. 2 with Claude.

      On phone else I’d post commands

  • VimEscapeArtist 3 hours ago

    What's the actual completion rate for Advent of Code? I'd bet the majority of participants drop off before day 25, even among those aiming to complete it.

    Is this intentional? Is AoC designed as an elite challenge, or is the journey more important than finishing?

  • candiddevmike 6 hours ago

    None of this should be necessary if these tools did what they say on the tin, and most of this advice will probably age like milk.

    Write readmes for humans, not LLMs. That's where the ball is going.

    • 0xblacklight 5 hours ago

      Hi, post author here :)

      Yes README.md should still be written for humans and isn’t going away anytime soon.

      CLAUDE.md is a convention used by claude code, and AGENTS.md is used by other coding agents. Both are intended to be supplemental to the README and are deterministically injected into the agent’s context.

      It’s a configuration point for the harness, it’s not intended to replace the README.

      Some of the advice in here will undoubtedly age poorly as harnesses change and models improve, but some of the generic principles will stay the same - e.g. that you shouldn’t use an LLM to do a linter &formatter’s job, or that LLMs are stateless and need to be onboarded into the codebase, and having some deterministically-injected instructions to achieve that is useful instead of relying on the agent to non-deterministically derive all that info by reading config and package files

      The post isn’t really intended to be super forward-looking as much as “here’s how to use this coding agent harness configuration point as best as we know how to right now”

      • teiferer 4 hours ago

        > you shouldn’t use an LLM to do a linter &formatter’s job,

        Why is that good advice? If that thing is eventually supposed to do the most tricky coding tasks, and already a year ago could have won a medal at the informatics olympics, then why wouldn't it eventually be able to tell if I'm using 2 or 4 spaces and format my code accordingly? Either it's going to change the world, then this is a trivial task, or it's all vaporware, then what are we even discussing..

        > or that LLMs are stateless and need to be onboarded into the codebase

        What? Why would that be a reasonable assumption/prediction for even near term agent capabilities? Providing it with some kind of local memory to dump its learned-so-far state of the world shouldn't be too hard. Isn't it supposed to already be treated like a junior dev? All junior devs I'm working with remember what I told them 2 weeks ago. Surely a coding agent can eventually support that too.

        This whole CLAUDE.md thing seems a temporary kludge until such basic features are sorted out, and I'm seriously surprised how much time folks are spending to make that early broken state less painful to work with. All that precious knowledge y'all are building will be worthless a year or two from now.

        • Zerot 21 minutes ago

          > Why is that good advice? If that thing is eventually supposed to do the most tricky coding tasks, and already a year ago could have won a medal at the informatics olympics, then why wouldn't it eventually be able to tell if I'm using 2 or 4 spaces and format my code accordingly? Either it's going to change the world, then this is a trivial task, or it's all vaporware, then what are we even discussing..

          This is the exact reason for the advice: The LLM already is able to follow coding conventions by just looking at the surrounding code which was already included in the context. So by adding your coding conventions to the claude.md, you are just using more context for no gain.

          And another reason to not use an agent for linting/formatting(i.e. prompting to "format this code for me") is that dedicated linters/formatters are faster and only take maybe a single cent of electricity to run whereas using an LLM to do that job will cost multiple dollars if not more.

        • alwillis 2 hours ago

          > Then why wouldn't it eventually be able to tell if I'm using 2 or 4 spaces and format my code accordingly?

          It's not that an agent doesn't know if you're using 2 or 4 spaces in your code; it comes down to:

          - there are many ways to ensure your code is formatted correctly; that's what .editorconfig [1] is for.

          - in a halfway serious project, incorrectly formatted code shouldn't reach the LLM in the first place

          - tokens are relatively cheap but they're not free on a paid plan; why spend tokens on something linters and formatters can do deterministically and for free?

          If you wanted Claude Code to handle linting automatically, you're better off taking that out of CLAUDE.md and creating a Skill [2].

          > What? Why would that be a reasonable assumption/prediction for even near-term agent capabilities? Providing it with some kind of local memory to dump its learned-so-far state of the world shouldn't be too hard. Isn't it supposed to already be treated like a junior dev? All junior devs I'm working with remember what I told them 2 weeks ago. Surely a coding agent can eventually support that too.

          It wasn't mentioned in the article, but Claude Code, for example, does save each chat session by default. You can come back to a project and type `claude --resume` and you'll get a list of past Claude Code sessions that you can pick up from where you left off.

          [1]: https://editorconfig.org

          [2]: https://code.claude.com/docs/en/skills

        • lijok 2 hours ago

          > All junior devs I'm working with remember what I told them 2 weeks ago

          That’s why they’re junior

        • cruffle_duffle 3 hours ago

          The stateless nature of Claude code is what annoys me so much. Like it has to spend so much time doing repetitious bootstraps. And how much it “picks up and propagates” random shit it finds in some document it wrote. It will echo back something it wrote that “stood out” and I’ll forget where it got that and ask “find where you found that info so we can remove it.” And it will do so but somehow mysteriously pick it up again and it will be because of some git commit message or something. It’s like a tune stuck in its head or something only it’s sticky for LLMs not humans.

          And that describes the issues I had with “automatic memories” features things like ChatGPT had. Turns out it is an awful judge of things to remember. Like it would make memories like “cruffle is trying to make pepper soup with chicken stock”! Which it would then parrot back to me at some point 4 months later and I’d be like “WTF I figured it out”. The “# remember this” is much more powerful because know how sticky this stuff gets and id rather have it over index on my own forceful memories than random shit it decided.

          I dunno. All I’m saying is you are right. The future is in having these things do a better job of remembering. And I don’t know if LLMs are the right tool for that. Keyword search isn’t either though. And vector search might not be either—I think it suffers from the same kinds of “catchy tune attack” an LLM might.

          Somebody will figure it out somehow.

  • btbuildem 6 hours ago

    It seems overall a good set of guidelines. I appreciate some of the observations being backed up by data.

    What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.

  • 3 hours ago
    [deleted]
  • tietjens 4 hours ago

    I think this could work really well for infrastructure/ops style work where the LLM will not be able to grasp the full context of say the network from just a few files that you have open.

    But as others are saying this is just basic documentation that should be done anyway.

  • malshe 4 hours ago

    I have been using Claude.md to stuff way too many instructions so this article was an eye opener. Btw, any tips for Claude.md when one uses subagents?

  • rootusrootus 6 hours ago

    Ha, I just tell Claude to write it. My results have been generally fine, but I only use Claude on a simple codebase that is well documented already. Maybe I will hand-edit it to see if I can see any improvements.

  • johnfn 5 hours ago

    I was expecting the traditional AI-written slop about AI, but this is actually really good. In particular, the "As instruction count increases, instruction-following quality decreases uniformly" section and associated graph is truly fantastic! To my mind, the ability to follow long lists of rules is one of the most obvious ways that virtually all AI models fail today. That's why I think that graph is so useful -- I've never seen someone go and systematically measure it before!

    I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)

    • 0xblacklight 5 hours ago

      I looked when I wrote the post but the paper hasn’t been revisited with newer models :/

  • brcmthrowaway 2 hours ago

    Is CLAUDE.md required when claude has a --continue option?

    • Zerot 11 minutes ago

      I would recommend using it, yeah. You have limited context and it will be compacted/summarized occasionally. The compaction/summary will lose some information and it is easy for it to forget certain instructions you gave it. Afaik claude.md will be loaded into the context on every compaction which allows you to use it for instructions that should always be included in the context.

  • huqedato 5 hours ago

    Looking for a similar GEMINI.md

    • 0xblacklight 5 hours ago

      It might support AGENTS.md, you could check the site and see if it’s there

  • max-privatevoid 3 hours ago

    The only good Claude.md is a deleted Claude.md.

  • boredtofears 5 hours ago

    It would be nice to see an actual example of what a good claude.md that implements all of these recommendations looks like.

  • acedTrex 4 hours ago

    "Here's how to use the slop machine better" is such a ridiculous pretense for a blog or article. You simply write a sentence and it approximates it. That is hardly worth any literature being written as it is so self obvious.

    • 0xblacklight 3 minutes ago

      This is an excellent point - LLMs are autoregressive next-token predictors, and output token quality is a function of input token quality

      Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine

  • vladsh 6 hours ago

    What is a good Claude.md?

    • testdelacc1 6 hours ago

      Claude.md - A markdown file you add to your code repository to explain how things work to Claude.

      A good Claude.md - I don’t know, presumably the article explains.