Using LLMs at Oxide

(rfd.shared.oxide.computer)

96 points | by steveklabnik 2 hours ago ago

39 comments

  • thundergolfer an hour ago

    A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:

    > it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.

    I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.

    Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.

    The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

    • zackerydev 28 minutes ago

      I remember in the very first class I ever took on Web Design the teacher spent an entire semester teaching "first principles" of HTML, CSS and JavaScript by writing it in Notepad.

      It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.

    • pests 34 minutes ago

      > That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

      Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.

  • mcqueenjordan 27 minutes ago

    As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:

    > Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.

    Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.

    Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".

    I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.

    • lukasb 24 minutes ago

      One difference is that clichéd prose is bad and clichéd code is generally good.

      • joshka 21 minutes ago

        Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.

        • minimaxir 19 minutes ago

          I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.

        • dcre 16 minutes ago

          Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.

        • danenania 16 minutes ago

          A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.

          It can be addressed with prompting, but you have to fight this constantly.

    • dcre 24 minutes ago

      In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.

  • john01dav 41 minutes ago

    > Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.

    My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:

    1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project

    2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.

    3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.

    4) I then tell it to generate the code

    5) I skim & test the code to see if it's generally correct, and have it make corrections as needed

    6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)

    The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.

    This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.

  • jhhh 42 minutes ago

    I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.

  • john01dav an hour ago

    > it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)

    This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).

    • worble 41 minutes ago

      See: Kernighan's Law

      > Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

      https://www.laws-of-software.com/laws/kernighan/

      • DrewADesign 25 minutes ago

        I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.

  • an_ko an hour ago

    I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.

    • tolerance 39 minutes ago

      I made the mistake of first reading this as a document intended for all in spite of it being public.

      This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.

      I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.

    • john01dav an hour ago

      He speaks of trust and LLMs breaking that trust. Is this not what you mean, but by another name?

      > First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).

      > Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another

      > our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice

  • kace91 24 minutes ago

    The guide is generally very well thought, but I see an issue in this part:

    It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.

    I find two problems with this:

    - there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.

    - in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.

    There is a significant risk in placing a translation layer between content and reader.

  • rgoulter 39 minutes ago

    > LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.

    I think this points out a key point.. but I'm not sure the right way to articulate it.

    A human-written comment may be worth something, but an LLM-generated is cheap/worthless.

    The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".

    It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.

  • 000ooo000 40 minutes ago

    >Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it

    By this own article's standards, now there are 2 authors who don't understand what they've produced.

  • tonkinai 22 minutes ago

    Based on paragraph length, I would assume that "LLMs as writers" is the most extensive use case.

  • bryancoxwell an hour ago

    Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes

    • minimaxir an hour ago

      You can stop LLMs from using em-dashes by just telling it to "never use em-dashes". This same type of prompt engineering works to mitigate almost every sign of AI-generated writing, which is one reason why AI writing heuristics/detectors can never be fully reliable.

      • dcre 19 minutes ago

        This does not work on Bryan, however.

    • matt_daemon an hour ago

      I believe Bryan is a well known em dash addict

      • bryancoxwell 44 minutes ago

        And I mean no disrespect to him for it, it’s just kind of funny

    • bccdee 25 minutes ago

      To be fair, LLMs usually use em-dashes correctly, whereas I think this document misuses them more often than not. For example:

      > This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.

      That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.

  • thatxliner an hour ago

    The empathy section is quite interesting

  • monkaiju an hour ago

    Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.

    • gghffguhvc an hour ago

      For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.

      Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.

      • ahepp 37 minutes ago

        It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?

        I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".

        My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.

      • zihotki 42 minutes ago

        And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner

    • ares623 32 minutes ago

      I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)

    • mathgeek an hour ago

      Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.

    • rgoulter an hour ago

      What do you find confusing about the document encouraging use of LLMs?

      The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".

      The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.

      Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.

    • devmor an hour ago

      The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.

      I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.

  • bgwalter 25 minutes ago

    Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."

    He is a long way from Sun.

  • fallat 26 minutes ago

    The problem with this text is it's a written anecdote. Could all be fake.