If you're using llms to shit out large swathes of unreviewed code you're doing it wrong and your project is indeed doomed to become unmaintainable the minute it goes down a wrong path architecturally, or you get a bug with complex causes or whatever.
Where llms excel is in situations like:
* I have <special snowflake pile of existing data structures> that I want to apply <well known algorithm> to - bam, half a days work done in 2 minutes.
* I want to set up test data and the bones of unit tests for <complicated thing with lots of dependencies> - bam, half a days work done in 2 minutes (note I said to use the llms for a starting point - don't generate your actual test cases with it, at least not without very careful review - I've seen a lot of really dumb ai generated unit tests).
* I want a visual web editor for <special snowflake pile of existing data structures> that saves to an sqlite db and has a separate backend api, bam 3 days work done in 2 minutes.
* I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.
You don't need to solve hard problems to massively increase your productivity with llms, you just need to shave yaks. Even when it's not a time save, it still lets you focus mental effort on interesting problems rather than burning out on endless chores.
> * I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.
You would naively think that, as did I, but I've tested it against several big name models and they are all eventually "lazy", sometimes make unrelated changes, and worse as the context fills up.
On a small toy example they will do it flawlessly, but as you scale up to more and more code that requires repetitive changes the errors compound.
Agentic loops help the situation, but now you aren't getting it done in 2 minutes because you have to review to find out it wasn't done and then tell it to do it again N times until it's done.
Having the LLM write a program to make the changes is much more reliable.
> Having the LLM write a program to make the changes is much more reliable.
I ended up doing this when switching our 50k-LOC codebase to pnpm workspaces, and it was such a good experience. It still took me a day or two of moulding that script to get it to handle the dozens of edge cases, but it would have taken me far longer to split things up by hand.
I still feel like I am under-using the ability of LLMs to spit out custom scripts to handle one-off use-cases.
there is more to it than that. it's about modularization as well.
I run LLMs against a 500k LoC poker engine and they do well because the engine is modularized into many small parts with a focus on good naming schemes and DRY.
If it doesn't require a lot of context for an LLM to figure out how to direct effort then the codebase size is irrelevant -- what becomes relevant in those scenarios is module size and the amount of modules implicated with any change or problem-solving. The LLM codebase 'navigation' becomes near-free with good naming and structure. If you code in a style that allows an LLM to navigate the codebase via just an `ls` output it can handle things deftly.
The LLMification of things has definitely made me embrace the concept of program-as-plugin-loader more-so than ever before.
Yeah was thinking about this recently. A semantic patch is more reliable, but prompting an ai might be easier. So why not prompt the ai to wrote the semantic patch.
I like the spirit of these, but there are waaaay more. Like you only mentioned the ones for professional and skilled coders who have another option. What about all the sub-examples for people all the way from "technically unskilled" to "baby-step coders". There's a bunch of things they can now just do and get in front of ppl without us.
Going from "thing in my head that I need to pay someone $100/h to try" to "thing a user can literally use in 3 minutes that will make that hypothetical-but-nonexistent $100/h person cry"... like there is way more texture of roles in that territory than your punchy comment gives credit. No one cares is it's maintainable if they now know what's possible, and that matters 1000x more than future maintenance concerns. People spend years working up to this step that someone can now simply jank out* in 3 minutes.
* to jank out. verb. 1. to crank out via vibe-coding, in the sense of productive output.
The fact that people could make excel monstrosities has never really been a real threat to the job security of programmers. IMO it increased it. LLMs are the new excel
Agreed. Video game idea that’s been in my head for years, but not sure if it’s actually fun? Too lazy to sit down for a few days and make it. Went back and forth with an llm for 30 mins and I had more of a game than was even in my head.
I agree with your comment but I wanted to share something that gave me a good chuckle the other day. I had asked claude to write some unit tests that, after reviewing, were sound and actually uncovered a bug in the code-under-test that I had written. When I pointed this out, claude had decided that to make the unit test pass, it would not patch the bug but it would simply not exercise the failing unit test LOL! Good times.
But yeah, LLM are not good at defining requirements, architectures, or writing a spec to the requirements. They are good a contained, bite sized asks that don't have much implications outside the code it writes.
There's also a middle ground, where you have the AI generate PR reviews and then review them manually. So that 2 minutes of code you spat out (really more like 5-10 using CC) takes another hour or three to review, and maybe 5 to 10 more commits before it's merged in.
I've done this successfully on multiple projects in the 10-20k LOC, ~100 file area - fully LLM generated w/ tons of my personal feedback - and it works just fine. 4/5 features I implement it gets pretty close to nailing from the spec I provide and the work is largely refactoring. But the times it doesn't get it right, it is a slog that could eat the better part of a day. On the whole though it probably is a 3-5x speedup.
I'm a little less confident about doing this on projects that are much bigger... then breaking things up into modules begins to look more attractive.
It's definitely a middle ground, but PR reviews, are not perfect. So it's easy to miss a lot of things and to have a lot of extra baggage. From reviewing code it's not always easy to tell exactly what's necessary or duplicate. So I agree, this is a middle ground of using LLMs to be more productive. Removing one bad line of code is worth adding a hundred good lines of code.
> If you're using llms to shit out large swathes of unreviewed code you're doing it wrong
> bam, x days work done in 2 minutes
This is a bit of a misrepresentation, since those two minutes don’t account for the reviewing time needed (nor prorperly, which vastly exceeds that time. Otherwise you end up in the situation of “doing it wrong” described in your first paragraph.
The implication with that example was it's some editor thing for use during the dev process separate from the actual product, so it doesn't matter if it's disposable and unmaintainable as long as it does the thing you needed it for. If the tool becomes an integral part of your workflow later on you stop and do it properly the second time around.
It’s not a misrepresentation, they’re saying the time it would take to write the code has been reduced to two minutes, not the reviewing and everything else (which still takes just as long)
Reviewing code another person wrote also takes longer than code I wrote. Hell reviewing code I wrote six months ago might as well be someone else’s code.
My job right now depending on the week is to either lead large projects dealing with code I don’t write or smaller “full stack” POCs- design, cloud infrastructure (IAC), database, backend code and ETL jobs and rarely front end code. Even before LLMs if I had to look at a project I did it took me time to ramp up.
> Reviewing code another person wrote also takes longer than code I wrote.
Yes, and water is wet, but that's not exactly relevant. If you have an LLM generate slop at you that you have to review and adjust, you need to compare the time this whole process took you rather than just the "generating slop" step to the time needed to write the code by yourself.
It may still save you time, but it won't be anywhere close to 2 minutes anymore for anything but the most trivial stuff.
I have been developing a long time - 10 years as a hobbyist and 30 years professionally. For green field work especially since all of the code I write these days are around the AWS SDKs/CDKs, I find the code is just as structured as what I would write.
The only refactoring I ended up doing on my current project is extracting functions from a script and creating a library that I reused across other functionality.
Even then I just pasted the library into a new ChatGPT session and told it the requirements of my next piece of functionality and told it to use the library.
I don’t trust an LLM to write more than 200 lines of code at the time. But I hardly ever write more than 200-300 lines at a time.
I can tell you that my latest project has around 1000 lines of Node CDK code between multiple apps (https://aws.amazon.com/cdk/) and around 1000 lines of Python code and I didn’t write a single line of any of it by hand and from reviewing it, it didn’t make any choices that I wouldn’t make and I found some of the techniques it used for the CDK code were things I wouldn’t have thought about.
The SQL code it generated for one of the functions was based on my just giving it the inbound JSON and the create table statements and it didn’t idiomatic MySQL, with parameters (ie no sql injection risk) and no unsafe code.
This was a 3 week project that I would have at least needed one if not two junior/mid level devs to do without Gen AI. Since I also had to be in customer meetings, write documentation and help sells on another project coming up.
It's like artificial intelligence isn't intelligent at all but rather semi-useful for tedious, repetitive and non creative tasks. Who would have thought.
What's interesting is that I wouldn't really call any of the things you list software development. With the exception of the "testing starting point", they're mostly about translating from one programming language/API to another. Useful for sure, but not quite "the future of programming". Also, they all sound like the kind of thing that most "basic" models would do well, which means that the "thinking" models are a waste of money.
Finally, the productivity boost is significant from the perspective of the programmer, but I don't know how big it is from the perspective of the employer. Does this significantly shorten time-to-market?
Still fleshing out this idea, but it feels recently like LLMs are helping me "throw the first one away". Get the initial starting momentum on something with the LLM, continue to iterate until it mostly works, and then go in and ruthlessly strip out the weird LLMisms. Especially for tedium work where nothing is being designed, it's just transformations or boilerplate.
Scaffolding is another area where LLM's work great.
I want to create a new project using framework XYZ. I already know how to do it, but I don't remember how to set up it since I only do that once, or I don't know how to set up a class that inherits from the framework because I usually just copy the other from another class in the same project. I can simply tell the bot to write the starting code and take it from there.
The sad thing is for a LOT of use cases an LLM is completely unnecessary. Like why do I even need an LLM for something like this? Why can't I just download a database of code examples, plug it into a search engine that apppears in the sidebar, and then just type "new project XYZ" or "new class XYZ.foo" to find the necessary snippet? A lot of NPM frameworks have a set up script to get you started with a new project, but after that you are practically forced to use Google.
It's crazy that a problem that could be solved so easily with a local file search has been ignored for so long and the only solution has been something impossibly inefficient for the problem it's supposed to solve.
As long as the LLM is up to date, or you really know the framework / tech well or you will be in a fair amount of pain with little understanding of how to reconcile what it’s got wrong.
> work you literally would have never bothered to do before done in 2 minutes.
That has been a nice outcome I didn't expect. Some of the trivial "nice to haves" I can get done by Claude, stuff I just don't have time for.
To your other points I agree as well, I think what's important isn't so much stuffing the context with data, but providing the context with the key insights to the problem you have in your head.
In your first example, perhaps the insight is knowing that you have a special snowflake data structure that needs to be explained first. Or another example is properly explaining historical context of a complex bug. Just saying "hey here's all the code please fix this error" yields less good results, if the problem space is particularly complex.
The article throws out a lot of potential issues with AI generated code, but doesn't stop for a moment to consider if solutions currently exist or might exist to these problems.
- Before LLMs, provided that your team did some of their due diligence, you could always expect to have some help when tackling new code-bases.
Has the author never worked on legacy code before?
- (oh, and it has forgotten everything about the initial writing process by then).
Does the author not think this can ever be solved?
- Because of the situation of a Bus Factor of zero that it creates, vibe coding is fundamentally flawed. That is, only until there is an AI that can generate 100% accurate code 100% of the time, and it is fed 100% accurate prompts.
Why does it require 100% accuracy 100% of the time? Humans are not 100% accurate 100% of the time and we seem to trust them with our code.
Yea, I found the article to be overly reductive. I work on a shared ownership codebase, which at times, I am not going to be able to pull the original author at all times when I work on my branch.
At least having a partner explain some aspects is a huge unlock for me. Maybe the OP shadowboxing a world with no humans, but no humans is already a status quo issue that my team and I face sometimes.
He’ll, have they never had to maintain applications where the source code was lost? I haven’t had to do it too often, and I wouldn’t claim I’m very good at it, but on more than a handful of occasions I have had to decompile binaries to figure out what the fuck it was doing before I could write a wrapper around it to try and mitigate its unfixeable issues
The author was just saying that this will become the norm, not the exception. Ok, not this bad -- we can at least expect to have access to the AI-generated code for at least a little while longer. (I can imagine a future where AI programming gets just robust enough that some early adopter types will take their whiteboard sketches and prompts and prose test case descriptions, generate the code, compile it to a binary, then throw out the code. It's more agile! You can tweak the spec slightly and just turn the crank again!)
I agree with your first point, maybe AI will close some of those gaps with future advances, but I think a large part of the damage will have been done by then.
Regarding the memory of reasoning from LLMs, I think the issue is that even if you can solve it in the future, you already have code for which you've lost the artifacts associated with the original generation. Overall I find there's a lot of talks (especially in the mainstream media) about AI "always learning" when they don't actually learn new anything until a new model is released.
> Why does it require 100% accuracy 100% of the time? Humans are not 100% accurate 100% of the time and we seem to trust them with our code.
Correct, but humans writing code don't lead to a Bus Factor of 0, so it's easier to go back, understand what is wrong and address it.
If the other gaps mentioned above are addressed, then I agree that this also partially goes away.
> Regarding the memory of reasoning from LLMs, I think the issue is that even if you can solve it in the future, you already have code for which you've lost the artifacts associated with the original generation. Overall I find there's a lot of talks (especially in the mainstream media) about AI "always learning" when they don't actually learn new anything until a new model is released.
But this already exists! At work, our code is full of code where the original reasoning for the code is lost. Sometimes someone has forgotten, sometimes the person who wrote it is no longer at the company any more, and so on.
> Correct, but humans writing code don't lead to a Bus Factor of 0, so it's easier to go back, understand what is wrong and address it.
But there are plenty of instances where I work with code that has a bus factor of 0.
The conclusion of your article is that vibe coding is "fundamentally flawed". But every aspect you've described about vibe coding has an analog in normal software engineering, and I don't think you would claim that is "fundamentally flawed".
> But there are plenty of instances where I work with code that has a bus factor of 0.
Do you think this is a problem?
As per my other replies, if all of these instances are in completely unimportant projects, then I could see you answering "no" (but I'd be concerned if you're spending a lot of time on unimportant things). If they are important, isn't the fact that knowledge about them has been lost indicative of a flaw in how your team/company operates?
I can't speak for the author, but I would definitely claim that having a bus factor of zero for any remotely-mission-critical piece of software is "fundamentally flawed", no matter the cause. I'd say the same for a bus factor of one in most settings.
I think that's moving goalposts. The original post never talks about vibe-coding mission-critical software - and I wouldn't advocate for that, either. The post says that all vibe coding is fundamentally flawed.
That's fair, and I agree with you that the generalizations in the article's conclusion go too far.
I added the "remotely-mission-critical" qualifier to capture additional nuance. Tolerance for a low bus factor should be inversely correlated with a project's importance. That wasn't explicitly stated in the article, but it seems uncontroversial, and I suspect the author would agree with me.
I recently joined a team with a very messy codebase. The devs were long gone, and even the ones maintaining it didn’t really understand large parts of the code. The bus factor was effectively zero.
What surprised me was how useful AI was. It helped me not only understand the code but also infer the probable intent behind it, which made debugging much faster. I started generating documentation directly from the code itself.
For me, this was a big win. Code is the source of truth. Developer documentation and even shared knowledge are often full of bias, selective memory, or the “Chinese whispers” problem where the story shifts every time it’s retold and never documented. Code doesn’t lie, it just needs interpretation. Using AI to cut through the noise and let the code explain itself felt like a net positive.
As a manager, I am considering to enforce a rule on my team that -- no README in any repo should ever go stale ever again --> it should be near-trivial for every dev to ask Claude Code to read the existing README, read/interpret the code as it practically currently stands, read what's changed in the PR, then update the README as necessary. This does not mean Claude will be perfect or that engineers don't need to check that its summaries make sense (they do, and the human is always accountable for the changes at the end of the day); but this does mean that, the typical amount of laziness that we are all guilty of often, should not be eliminated as a reason as to why READMEs go stale.
Why have such a rule if at any moment of time the LLM could update the readme ad hoc? Btw, your ingested readmes will affect your LLM's code generation and I made the observation that more often than not it is better to exclude the readmes from the context window.
The Bus Factor was an issue long before LLM-generated code. Very few companies structure work to allow a pool of >1 individuals to understand/contribute to it. What I found is -- when companies are well structured with multiple smart individuals per area, the output expectation just ends up creeping up until again there is too much to really know. You can only get away from this with really good engineering management that specifically tries to move people around the codebase and trade-off speed in the process. I have tried to do this, but sometimes pressure from the stakeholders for speed is just too great to do it perfectly.
Shameful plug, i've been writing a book on this with my retrospective as a CTO building like this. I just updated it so you can choose your price (even $0) to make this a less shameful plug on HN: https://ctoretrospective.gumroad.com/l/own-your-system
I dont think anyone has the perfect answer, yet, but LLM-built systems arent that different from having the system built by 10 diff people on eLance/Upwork/Fiverr...so the principles are the same.
The Bus Factor was indeed an issue before LLMs, and in fact it's a jargon term that has been in use since forever.
What TFA is arguing is that never before we had a trend towards Bus Factor zero. Before, the worst was often 1 (occasionally zero, of course, but now TFA argues we're aiming for zero whether we're aware or not).
True, but when the bus factor is 1, it might as well be zero -- soon you end up with employees (or contractors) who legitimately want more compensation realizing their critical nature. I totally sympathize from the employee's perspective, esp if the 1-factor means they cannot take holiday. Really, it is the company's job to control the bus factor (LLM or human) -- it is good for both the employee and company in the long run.
Agreed, it's the company's job to control the Bus Factor, that's a given. I think TFA's author worries that instead of controlling it, we're now aiming for zero (the worst possible factor).
When I was an architect for startups between 2016-2020 doing mostly green field development using new to the company AWS technologies, I made damn sure that any knowledge was disseminated so I could both take a vacation without being interrupted and I could “put myself out of a job”.
I considered it a success when I realized a company doesn’t need me anymore and I can move on and talk about what I did at my next interview in STAR format.
Agree, and also, promotion is hard if you are too tied to a specific system. Diagonal cross-department promotion becomes especially hard if you are a single point of failure.
But a Bus Factor of 1 has always been considered high risk. Sometimes companies take the risk, but that's a different issue.
This is precisely why the term "Bus Factor" was invented: to point out when it's 1, because it's both high risk to the company and unfair to the dev that cannot go on vacation or extended time off.
A Bus Factor of 1 has always been construed as high risk; that's why the term exists after all. Companies sometimes mitigate it, sometimes not, but in general they are vaguely aware it's a risk.
A Bus Factor of 0, especially as an implicit goal, seems doubly worrisome! Now it's a goal rather than a warning sign.
Yes, its very much a goldfish problem, where work needed grows to fill what is possible, not what is advisable or good.
The only way I have seen people "solve" this is by putting a bunch of speed bumps in a process, and generally it just makes everyone lazy and deliver stuff at the last second anyway, not use the additional time to make something polished.
>> The only way I have seen people "solve" this is by putting a bunch of speed bumps in a process
I solve this by sufficient compartmentalization with good inter-component interfaces. Worst case, you excise part of your system and rebuild. Possibly you can take the schema and docs and rebuild with an LLM :-)
I talk about this in my upcoming book on the topic (link above.) Most good systems are rebuilt 3 or 4 times anyway.
There is a really important point here, and it's critical to be aware of it, but we're really just at the beginning of these tools and workflows and these issues can be solved, IMO, possibly better than with humans.
I've been trying to use LLMs to help code more to mixed success, honestly, but it's clear that they're very good at some things, and pretty bad at others. One of the things they good at obviously is producing lots of text, two important others are that they can be very persistent and thorough.
Producing a lot of code can be a liability, but an LLM won't get annoyed at you if you ask it for thorough comments and updates to docs, READMEs, and ADRs. It'll "happily" document what it just did and "why" - to the degree of accuracy that they're able, of course.
So it's conceivable to me at least, that with the right guidance and structure an LLM-generated codebase might be easier to come into cold years later, for both humans and future LLMs, because it could have excellent documentation.
There is a really important point here, and it's critical to be aware of it, but we're really at the end of these tools and workflows and these issues can't be solved.
The problem is that our brains really don't like expending calories on information we don't repeatedly use so the further you get from something, the less you understand it or remember it.
So even if you aren't even vibe coding and are trying to review every change, your skills are atrophying. We see this all the time as engineers enter management, they become super competent at the new skills the role requires but quickly become pretty useless at solving technical problems.
It's similar to why it's so hard to go from level2 to level5 in driving automation. We're not really designed to be partially involved in a process - we quickly loose attention, become lazy, and blindly trust the machine. Which is maybe fine if the machine is 100% reliable but we know that isn't the case.
I find that bus factor 0 code occurs due to lack of paradigmatic structure or adherence, both internal and external. If you have a paradigm, even a not so great paradigm, I will grok your code and start making changes on my first day.
So maybe that is a test, or something to strive for. If you can get a new developer to make a moderate change to your code base on their first day successfully, then your code is fine. If not you have work to do.
To expand a little on this thought, any one developer's experience is probably not significant but if you hire say 10 new devs and 6-8 are making changes on their first day or in the first few days, your code base is likely fine.
I recomend that a new dev should first make the most trivial change possible but see it through all the way to release, to expose them to process. Following that, a moderate change to expose them to paradigm. All on the first day or in the first few days. If only say 3 of 10 new hires can accomplish the above, the problem is in your code base (or hiring practices).
> I think the article underestimates how much intent can be grasped from code alone. Even without comments.
I agree, the human thought process always ends up getting embedded in a which of several possible ways any one thing might be done. But it's still a process, and a vastly inferior one to having a knowledgeable person on hand. Reverse-engineering has up until now been reserved to times when it is necessary. (I mean, we all still do it, especially on legacy codebases, but it's not good for productivity at all.)
> Humans (and I strongly suspect LLMs, since they're statistical synthesis of human production) are fairly predictable.
I disagree with the parenthetical. That's what stands out to me the most about LLM code: there are definitely intentions embedded in the code, but they're a hodgepodge of different intentions all tangled up together. You can't infer much of anything from a small snippet, because the intention behind that snippet was likely relevant to a completely different origin codebase that was piled up into a compost heap to grow an LLM out of. It's actually harder to make sense of an LLM codebase because the intentions distract and confuse the reader -- just as with text, we implicitly assume that a generated artifact means the same thing as if a human wrote it, and it doesn't.
> I think the article underestimates how much intent can be grasped from code alone.
That's very scale related.
I rarely have any trouble reading and understanding Arduino code. But that's got a hard upper limit (at least on the common/original Arduinos) of 32kB of (compiled) code.
It's many weeks or months worth of effort, or possibly impossible, for me to read and understand a platform with a hundred or so interdependent microservices written in several languages. _Perhaps_ there was a very skilled and experienced architect for all of that, who demanded comprehensive API styles and docs? But if all that was vibe coded and then dropped on me to be responsible? I'd just quit.
No disrespect for your ability to read Arduino code, but no amount of experience will tell you why the code was written a certain way. Did the programmer not know of any alternatives? Did they specifically choose this one method because it was superior to the alternatives? Did they just run out of time? Is it due to regulatory requirements? The list goes on. To expand further:
FTA:
> but ultimately reading code remains much more complex than writing it no matter what.
I disagree. If reading code is complex, it's because that code was not documented well. If you've written a complex algorithm, that presumably took you hours or days to develop, the proper documentation should allow somebody to understand it (or at least grasp the major points) in a few minutes.
If you're not documenting your code to that level, i.e. to allow future devs to take less time to read and understand than it took you to write--let alone add additional information went into why you made the decisions you did--then you're doing something wrong.
I agree with the premise and the conclusion, but over almost 20 years of writing, adapting and delivering software I've more than once been in exactly the same situation. Noone to ask, the only person even vaguely familiar with software development left half a year ago. Half of the processes have changed since the software was written, and the people who owned them have left, too.
So while I agree that LLMs will accelerate this process, in my opinion it's not a new flavor, just more of an existing problem. Glad to see this kind of thinking though.
The author though neglects what a bus factor of 0 means in real terms and how it gets there, aside from the description of the definition upfront where all knowledge lost.
A company acceptable with a bus factor of zero is a company that is not willing to pay the economic advantage to the expertise required to do the work.
The economic demand, of humanity competing with AI is zero because AI does things its good at with an order of magnitude difference in cost, and the deception and lies surrounding the marketing coupled with communications channel jamming lead to predictable outcomes. What happens when that demand and thus economic benefit go to zero? Any investment in the knowledge in the first place has no return. No one goes into it, no one learns, and that's quite dangerous in economies based in money-printing.
So while there may not be a problem right now, there will no doubt be problems in the next proverbial quarter. Career development pipelines are years in the making. They are sequential pipelines. Zero into a sequential pipeline is zero out with the time-lag between the two (which is years).
~2 years without economic incentive is when you lose your best and brightest. From there is a slow march to a 10 year deadline after which catastrophic losses occur.
I had a chance to have a interesting discussion with a local community college Dean. Apparently they have had to lower the number of Computer Science related program sections because of lack of demand. In intro courses, for 18 sections there were 13 people who had declared for the major, most students when asked citing AI concerns and lack of career development pipeline.
What happens when you have no expertise that you can pay at any price to fix the processes involved because you put them all out of business using what amounts to a semi-thinking non-sentient slave.
Without supply, there can be no demand. Where supply becomes infinite because of external parties, there can be no demand. There can be need, but demand is not need.
So this all started in 2022. Best and brightest are re-skilling. There's a glut of lower competency talent too. Bad things happen when you mess with the foundations of economics, and they happen on a time lag, where you can't react fast enough after-the-fact.
What will it take? At some point there will be a crisis where they will have to treat it as triage on the battlefield. The people in charge didn't want to hear about this when it could have made a difference.
As programmers, the bus factor is something to be noted and avoided, but in medicine, it goes the other direction. Private practice is one doctor, and a whole support staff for that single individual. Why are we so eager to be replaceable?
Doctors keep patient notes, and EHRs and patients can recall histories separately to that doctor. Doctors also go through relatively standardised training. That single individual is important, but it's not the same.
What's more the same would be if they were practicing healthcare on a species that they had invented, and no one knew anything about it, and the species was crucial to a company's survival.
It's potentially the opposite. If you instrument a codebase with documentation and configuration for AI agents to work well in it, then in a year, that agent will be able to do that same work just as well (or better with model progress) at adding new features.
This assumes your adding documentation, tests, instructions, and other scaffolding along the way, of course.
I wonder how soon (or if it's already happening) that AI coding tools will behave like early career developers who claim all the existing code written by others is crap and go on to convince management that a ground up rewrite is required.
(And now I'm wondering how soon the standard AI-first response to bug reports will be a complete rewrite by AI using the previous prompts plus the new bug report? Are people already working on CI/CD systems that replace the CI part with whole-project AI rewrites?)
As the cost of AI-generated code approaches zero (both in time and money), I see nothing wrong with letting the AI agent spin up a dev environment and take its best shot. If it can prove with rigorous testing that the new code works is at least as reliable as the old code, and is written better, then it's a win/win. If not, delete that agent and move on.
On the other hand, if the agent is just as capable of fixing bugs in legacy code as rewriting it, and humans are no longer in the loop, who cares if it's legacy code?
But I can see it "working". At least for the values of "working" that would be "good enough" for a large portion of the production code I've written or overseen in my 30+ year career.
Some code pretty much outlasts all expectations because it just works. I had a Perl script I wrote in around 1995-1998 that ran from cron and sent email to my personal account. I quit that job, but the server running it got migrated to virtual machines and didn't stop sending me email until about 2017 - at least three sales or corporate takeovers later (It was _probably_ running on CentOS4 when I last touched it in around 2005, I'd love to know if it was just turned into a VM and running as part of critical infrastructure on CentOS4 12 years later).
But most code only lasts as long as the idea or the money or the people behind the idea last - all the website and differently skinned CRUD apps I built or managed rarely lasted 5 years without being either shut down or rewritten from the ground up by new developers or leadership in whatever the Resume Driven Development language or framework was at the time - toss out the Perl and rewrite it in Python, toss out the Python and rewrite it in Ruby On Rails, then decide we need Enterprise Java to post about on LinkedIn, then rewrite that in Nodejs, now toss out the Node and use Go or Rust. I'm reasonably sure this year's or perhaps next years LLM coding tools can do a better job of those rewrites than the people who actually did them...
Will the cost of AI-generated code approach zero? I thought the hardware and electricity needed to power and train the models and infer was huge and only growing. Today the free and plus plans might be only $20/month, once moats are built I assume prices will skyrocket a order of magnitude or few higher.
I used a similar metaphor in the past referencing "They Machine Stops" [0] by E.M. Forster. Yes, in the near future, we will still be able to read code and figure out what it does. I work on legacy code all the time.
But in the long term, when experienced developers actually feel comfortable letting LLMs write large swats of code, or when the machine no longer needs to generate human readable code, then we will start forgetting how it works.
A good dev can dive into a completely unknown codebase with or without tools like a debugger and figure it out. AI makes this far easier.
Some great devs/reverse-engineering experts can do the same without even the compiled source code. Again AI tools can now do this faster than any human.
Security researchers have figured out the intricacies of a system with no more than a single string as input and an error code as output.
I find myself acting as a brutal code reviewer more than a collaborator when I lean too heavily on an agent. I literally just typed this into the agent's chat pane (GPT-5, in this case), after finding some less-than-optimal code for examining and importing REST API documentation.
> Testing string prefixes or file extensions is bound to fail at some point on some edge case. I'd like to see more robust discovery of formats than this. This reeks of script-kiddie code, not professional-quality code.
It's true more often than I'd like that the quality of code I see generated is script-kiddie level. If I prompt carefully beforehand or review harshly after, it generally improves, but I have to keep my guard up.
I’ve got a new project I’ve been handling with Claude code. Up until now I’ve always pair coded with AIs, so I would know (and usually tweak) every bit of code generated. Now with the agent, it’s easy to miss what’s being made.
Ive been trying to resolve this with things like “make a notebook that walks through this modules functions”, etc, to make it easier for me to review.
In the spirit of literate riding though, why but have these agents spend more time (tokens…money) walking you through what they made.
Likewise, if dev A vibe codes something and leaves it to dev B to maintain, we should think about what AI workflows can get B up to speed fast. “Give me a tour of the code”
I think it is negative: it actually drains knowledge. It is an anti knowledge field because experts won’t be hired if they can be vibed. This sucks all the brains out of the room. Hence less than zero.
> The only thing you can rely on is on your ability to decipher what a highly imperfect system generated, and maybe ask explanations to that same imperfect system about your code its code (oh, and it has forgotten everything about the initial writing process by then).
This just sounds like every other story I hear about working on ossified code bases as it is. At least AI can ingest large amounts of code quickly, even if as of today it can't be trusted to actually decipher it all.
This concept of deploying unreviewed vibe code strikes me as very similar to using a fallen log as a bridge to cross a ravine. Yes, it works, until the day it doesn't. And that day is likely to be much sooner than if it had been a concrete-reinforced steel design signed out by a PE.
aw. I was hallucinating the article content from the title. Bus factor(aka truck factor, lottery factor, honeymoon number, etc.) is number of team members you could lose, hopefully due to positive life events, before the project falls apart. The author argues this could be zero with vibecoded projects, meaning the project could spontaneously explode on full working members.
You want this factor to be +Inf, not 1/+Inf. Just in case it wasn't beyond abundantly clear to all...
LLMs are bad for bad programmers. LLMs will make a bad programmer worse and make a layperson think they're a prodigy.
Meanwhile, the truly skilled programmers are using LLMs to great success. You can get a huge amount of value and productivity from an LLM if and only if you have the skill to do it yourself in the first place.
LLMs are not a tool that magically makes anyone a good programmer. Expecting that to be the case is exactly why they don't work for you. You must already be a good programmer to use these tools effectively.
I have no idea what this will do to the rising generation of programmers and engineers. Frankly I'm terrified for them.
The flaw in this reasoning is AI can also help you understand code much more quickly than we could before. We are now in fractional bus factor territory.
A team unfamiliar with a code base demoed asking questions to an LLM about it. The answers genuinely excited some. But anyone who had spent a short time in the code base knew the answers were wrong. Oh well.
That is one anecdote, but it doesn't really have any information in it. To debug the process we'd need to know which LLM, the developer's backgrounds, what prompts they used etc.
I've used a variety of LLMs to ask questions about probably dozens of unfamiliar code bases many of which are very complicated technically.
At this point I think LLMs are indispensable to understanding unfamiliar code bases. Nearly always much better than documentation and search engines combined.
It's fascinating ...didn't think about the Bus Factor at all wrt vibe coding. Feels obvious in retrospect. But I feel there's the other side of software beyond the maintanable, professional-grade software requirements. There are a lot of use cases for basic software to solve that one problem in that one specific way and get it over with. A bit like customized software with little scope and little expectation of long-term support. Vibe-coding excels there.
In a way, I have been thinking about it [1] as the difference between writing a book and a writing a blog post - the production qualities expected in both are wildly different. And that's expected, almost as a feature!
I think as “writing” and distributing new software keeps getting easier - as easy as writing a new blog post - the way we consume software is going to change.
> Before LLMs, provided that your team did some of their due diligence, you could always expect to have some help when tackling new code-bases. Either a mentor, or at least some (even if maybe partially outdated) documentation. With LLMs, this is gone.
I love the conclusion. When no human holds the knowledge it is like the bus already struck everybody at the company.
I have worked in teams that share knowledge often and extensively. Anybody can go on a vacation with little disruption as other's can take the tasks. Everybody is happier and projects work better.
(If your first tough is that you can be replaced easily and you will be fired. Then you live in a dystopian class-warfare country where the owner-class will fire you because they enjoy making the working-class suffer. I am sorry for you, but have hope. That can be changed with good laws and employee protections.)
The folks that think they can now suddenly program without any experience, and not need understand how their product works, are suffering from Dunning-Kruger syndrome. Actually, it is a much broader segment and includes product managers, executives, VCs and the general public.
The project foundation is everything. LLMs are sensitive to over-engineering. The LLM doesn't have an opinion about good code vs bad code.
If you show it bad code and ask it to add features on top, it will produce more bad code... It might work (kind of) but more likely to be buggy and have security holes. When the context you give to the LLM includes unnecessary complexity, it will assume that you want unnecessary complexity and it will generate more of it for you.
I tried Claude Code with both a bad codebase and a good codebase; the difference is stark. The first thing I notice is that, with the good code base without unnecessary complexity, it generates a lot LESS code for any given feature/prompt. It's really easy to review its output and it's very reliable.
With a bad, overengineered codebase, Claude Code will produce complex code that's hard to review... Even for similar size features. Also it will often get it wrong and the code won't work. Many times it adds code which does literally nothing at all. It says "I changed this so that ..., this should resolve the issue ..." - But then I test and the issue is still there.
Some bad coders may be tempted to keep asking Claude to do more to fix the issue and Claude keeps adding more mess on top. It becomes a giant buggy hack and eventually you have to ask it to rewrite a whole bunch of stuff because it becomes way too complicated and even Claude can't understand its own code... That's how you get to bus factor of 0. Claude will happily keep churning out code even if it doesn't know what it's doing. It will never tell you that your code is unmaintainable and unextendable. Show it the worst codebase in the world and it will adapt itself to become the worst coder in the world.
Unfortunately the corporate machine has been converging on a bus factor of 0. I've been part of multiple teams now where I was the only one holding knowledge over critical subsystems and whenever I attempted to train people on it, it was futile. Mainly because they would get laid off doing 'cost-savings measures'.
There were times where I was close to getting fed up and just quitting during some of the high profile ops I had to deal with which would've left the entire system inoperable for an extended period of time. And frankly from talking to a lot of other engineers, it sounds like a lot of companies operate in this manner.
I fully expect a lot of these issues to come home to roost as AI compounds loss of institutional knowledge and leads to rapid system decay.
My guess? The AI companies will keep the free and $20/month plans to entice developers and their managers. They will have $200/month plans with bigger context windows to allow effective work on larger that toy codebases. But sooner or later companies with large scale projects are going to need a much larger context window, that _that_ will suddenly become a $200k/year/developer subscription. There's a lot of correlation between "institutional knowledge" and context window I think.
Interesting. I just posted a similar comment as a sister comment to yours above (at least at the time of reading the thread) to another persons comment about cost of AI code going to zero... Which was basically the same as you believe here.
just not even ten years ago the discussion here was all about "software engineering" trying to be more legitimized as a formal engineering practice, if there should be licensing, if there should be certifications, lots of threads about formal methods to prove algorithms work, and look where we are now. Arguing if humans should even care look at the code we are producing and shipping. crazy shit man
Pretty sure that was 25 years ago, not 10 years ago. And then, 20 years ago, we were coming to terms with the fact that all of that stuff was a spectacular failure. And then, 15 year ago, we had found much better ways to do things.
I don't miss that phase of the evolution of software development practice at all.
I'm not talking about UML or waterfall. Talking more about formal methods which were still pretty common here amongst the various lisp/haskell/clojure discussions etc. I'm pretty sure this is still a relevant technique for certain classes of software.
By that same logic, if a project is documented so thoroughly that an agent could handle all the work, then the bus factor effectively becomes infinite.
The difference is this isn't some legacy system that still exists a decade later. It's brand new with the tag still on. And it wasn't designed by a conscious being but by probability.
I've seen from beautiful to crazy legacy systems in various domains. But when I encounter something off, there appears to always be a story. Not so much with LLMs.
If you're using llms to shit out large swathes of unreviewed code you're doing it wrong and your project is indeed doomed to become unmaintainable the minute it goes down a wrong path architecturally, or you get a bug with complex causes or whatever.
Where llms excel is in situations like:
* I have <special snowflake pile of existing data structures> that I want to apply <well known algorithm> to - bam, half a days work done in 2 minutes.
* I want to set up test data and the bones of unit tests for <complicated thing with lots of dependencies> - bam, half a days work done in 2 minutes (note I said to use the llms for a starting point - don't generate your actual test cases with it, at least not without very careful review - I've seen a lot of really dumb ai generated unit tests).
* I want a visual web editor for <special snowflake pile of existing data structures> that saves to an sqlite db and has a separate backend api, bam 3 days work done in 2 minutes.
* I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.
You don't need to solve hard problems to massively increase your productivity with llms, you just need to shave yaks. Even when it's not a time save, it still lets you focus mental effort on interesting problems rather than burning out on endless chores.
> * I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.
You would naively think that, as did I, but I've tested it against several big name models and they are all eventually "lazy", sometimes make unrelated changes, and worse as the context fills up.
On a small toy example they will do it flawlessly, but as you scale up to more and more code that requires repetitive changes the errors compound.
Agentic loops help the situation, but now you aren't getting it done in 2 minutes because you have to review to find out it wasn't done and then tell it to do it again N times until it's done.
Having the LLM write a program to make the changes is much more reliable.
> Having the LLM write a program to make the changes is much more reliable.
I ended up doing this when switching our 50k-LOC codebase to pnpm workspaces, and it was such a good experience. It still took me a day or two of moulding that script to get it to handle the dozens of edge cases, but it would have taken me far longer to split things up by hand.
I still feel like I am under-using the ability of LLMs to spit out custom scripts to handle one-off use-cases.
That’s not even a very large code base. My experience is definitely that anything with more than 100K-loc really makes the LLMs struggle.
there is more to it than that. it's about modularization as well.
I run LLMs against a 500k LoC poker engine and they do well because the engine is modularized into many small parts with a focus on good naming schemes and DRY.
If it doesn't require a lot of context for an LLM to figure out how to direct effort then the codebase size is irrelevant -- what becomes relevant in those scenarios is module size and the amount of modules implicated with any change or problem-solving. The LLM codebase 'navigation' becomes near-free with good naming and structure. If you code in a style that allows an LLM to navigate the codebase via just an `ls` output it can handle things deftly.
The LLMification of things has definitely made me embrace the concept of program-as-plugin-loader more-so than ever before.
This has the side benefit of likely being easier to navigate for humans too. The less I need to keep in my head to figure something out the better.
Yeah was thinking about this recently. A semantic patch is more reliable, but prompting an ai might be easier. So why not prompt the ai to wrote the semantic patch.
"bam work you literally would have never bothered to do before done in 2 minutes."
And I would never want to use a piece of software written by you ever.
If you think that writing the code was the hard part, your code was probably always shite.
Yeah well you definitely already do and don't know it so please spare us the pearl clutching.
I like the spirit of these, but there are waaaay more. Like you only mentioned the ones for professional and skilled coders who have another option. What about all the sub-examples for people all the way from "technically unskilled" to "baby-step coders". There's a bunch of things they can now just do and get in front of ppl without us.
Going from "thing in my head that I need to pay someone $100/h to try" to "thing a user can literally use in 3 minutes that will make that hypothetical-but-nonexistent $100/h person cry"... like there is way more texture of roles in that territory than your punchy comment gives credit. No one cares is it's maintainable if they now know what's possible, and that matters 1000x more than future maintenance concerns. People spend years working up to this step that someone can now simply jank out* in 3 minutes.
* to jank out. verb. 1. to crank out via vibe-coding, in the sense of productive output.
The fact that people could make excel monstrosities has never really been a real threat to the job security of programmers. IMO it increased it. LLMs are the new excel
Agreed. Video game idea that’s been in my head for years, but not sure if it’s actually fun? Too lazy to sit down for a few days and make it. Went back and forth with an llm for 30 mins and I had more of a game than was even in my head.
*"to vibe out"
I agree with your comment but I wanted to share something that gave me a good chuckle the other day. I had asked claude to write some unit tests that, after reviewing, were sound and actually uncovered a bug in the code-under-test that I had written. When I pointed this out, claude had decided that to make the unit test pass, it would not patch the bug but it would simply not exercise the failing unit test LOL! Good times.
But yeah, LLM are not good at defining requirements, architectures, or writing a spec to the requirements. They are good a contained, bite sized asks that don't have much implications outside the code it writes.
It has a strong preference to only change pieces of code you asked it to touch.
There's also a middle ground, where you have the AI generate PR reviews and then review them manually. So that 2 minutes of code you spat out (really more like 5-10 using CC) takes another hour or three to review, and maybe 5 to 10 more commits before it's merged in.
I've done this successfully on multiple projects in the 10-20k LOC, ~100 file area - fully LLM generated w/ tons of my personal feedback - and it works just fine. 4/5 features I implement it gets pretty close to nailing from the spec I provide and the work is largely refactoring. But the times it doesn't get it right, it is a slog that could eat the better part of a day. On the whole though it probably is a 3-5x speedup.
I'm a little less confident about doing this on projects that are much bigger... then breaking things up into modules begins to look more attractive.
It's definitely a middle ground, but PR reviews, are not perfect. So it's easy to miss a lot of things and to have a lot of extra baggage. From reviewing code it's not always easy to tell exactly what's necessary or duplicate. So I agree, this is a middle ground of using LLMs to be more productive. Removing one bad line of code is worth adding a hundred good lines of code.
> If you're using llms to shit out large swathes of unreviewed code you're doing it wrong
> bam, x days work done in 2 minutes
This is a bit of a misrepresentation, since those two minutes don’t account for the reviewing time needed (nor prorperly, which vastly exceeds that time. Otherwise you end up in the situation of “doing it wrong” described in your first paragraph.
Most of these cases don't require "review". It either works or it doesn't.
If you have an LLM transform a big pile of structs, you plug them into your program and it will either compile or it won't.
All programmers write countless one-off throwaway scripts. I can't tell you how many times I've written scripts to generate boring boilerplate code.
How many hours do you spend reviewing such tools and their output? I'll bet anything it's just about zero.
What do you mean "reviewing" throwaway tools and scripts? If you wrote them yourself, presumably you understand what they do?
I've also spent countless hours debugging throwaway scripts I wrote myself and which don't work exactly like I intended when I try them on test data.
Working in aerospace, code generation tools are indeed reviewed pretty thoroughly.
The implication with that example was it's some editor thing for use during the dev process separate from the actual product, so it doesn't matter if it's disposable and unmaintainable as long as it does the thing you needed it for. If the tool becomes an integral part of your workflow later on you stop and do it properly the second time around.
It’s not a misrepresentation, they’re saying the time it would take to write the code has been reduced to two minutes, not the reviewing and everything else (which still takes just as long)
Reviewing the code you didn't write takes much longer than the one you did.
Reviewing code another person wrote also takes longer than code I wrote. Hell reviewing code I wrote six months ago might as well be someone else’s code.
My job right now depending on the week is to either lead large projects dealing with code I don’t write or smaller “full stack” POCs- design, cloud infrastructure (IAC), database, backend code and ETL jobs and rarely front end code. Even before LLMs if I had to look at a project I did it took me time to ramp up.
> Reviewing code another person wrote also takes longer than code I wrote.
Yes, and water is wet, but that's not exactly relevant. If you have an LLM generate slop at you that you have to review and adjust, you need to compare the time this whole process took you rather than just the "generating slop" step to the time needed to write the code by yourself.
It may still save you time, but it won't be anywhere close to 2 minutes anymore for anything but the most trivial stuff.
I have been developing a long time - 10 years as a hobbyist and 30 years professionally. For green field work especially since all of the code I write these days are around the AWS SDKs/CDKs, I find the code is just as structured as what I would write.
The only refactoring I ended up doing on my current project is extracting functions from a script and creating a library that I reused across other functionality.
Even then I just pasted the library into a new ChatGPT session and told it the requirements of my next piece of functionality and told it to use the library.
I don’t trust an LLM to write more than 200 lines of code at the time. But I hardly ever write more than 200-300 lines at a time.
I can tell you that my latest project has around 1000 lines of Node CDK code between multiple apps (https://aws.amazon.com/cdk/) and around 1000 lines of Python code and I didn’t write a single line of any of it by hand and from reviewing it, it didn’t make any choices that I wouldn’t make and I found some of the techniques it used for the CDK code were things I wouldn’t have thought about.
The SQL code it generated for one of the functions was based on my just giving it the inbound JSON and the create table statements and it didn’t idiomatic MySQL, with parameters (ie no sql injection risk) and no unsafe code.
This was a 3 week project that I would have at least needed one if not two junior/mid level devs to do without Gen AI. Since I also had to be in customer meetings, write documentation and help sells on another project coming up.
It's like artificial intelligence isn't intelligent at all but rather semi-useful for tedious, repetitive and non creative tasks. Who would have thought.
What's interesting is that I wouldn't really call any of the things you list software development. With the exception of the "testing starting point", they're mostly about translating from one programming language/API to another. Useful for sure, but not quite "the future of programming". Also, they all sound like the kind of thing that most "basic" models would do well, which means that the "thinking" models are a waste of money.
Finally, the productivity boost is significant from the perspective of the programmer, but I don't know how big it is from the perspective of the employer. Does this significantly shorten time-to-market?
personal favorite of mine - I want to switch data api but I dont have time to port 2 different services so here's their documentation. BAM. Done.
Still fleshing out this idea, but it feels recently like LLMs are helping me "throw the first one away". Get the initial starting momentum on something with the LLM, continue to iterate until it mostly works, and then go in and ruthlessly strip out the weird LLMisms. Especially for tedium work where nothing is being designed, it's just transformations or boilerplate.
Scaffolding is another area where LLM's work great.
I want to create a new project using framework XYZ. I already know how to do it, but I don't remember how to set up it since I only do that once, or I don't know how to set up a class that inherits from the framework because I usually just copy the other from another class in the same project. I can simply tell the bot to write the starting code and take it from there.
The sad thing is for a LOT of use cases an LLM is completely unnecessary. Like why do I even need an LLM for something like this? Why can't I just download a database of code examples, plug it into a search engine that apppears in the sidebar, and then just type "new project XYZ" or "new class XYZ.foo" to find the necessary snippet? A lot of NPM frameworks have a set up script to get you started with a new project, but after that you are practically forced to use Google.
It's crazy that a problem that could be solved so easily with a local file search has been ignored for so long and the only solution has been something impossibly inefficient for the problem it's supposed to solve.
As long as the LLM is up to date, or you really know the framework / tech well or you will be in a fair amount of pain with little understanding of how to reconcile what it’s got wrong.
> work you literally would have never bothered to do before done in 2 minutes.
That has been a nice outcome I didn't expect. Some of the trivial "nice to haves" I can get done by Claude, stuff I just don't have time for.
To your other points I agree as well, I think what's important isn't so much stuffing the context with data, but providing the context with the key insights to the problem you have in your head.
In your first example, perhaps the insight is knowing that you have a special snowflake data structure that needs to be explained first. Or another example is properly explaining historical context of a complex bug. Just saying "hey here's all the code please fix this error" yields less good results, if the problem space is particularly complex.
YES, this is the way to make AI tools pretty much a strictly positive productivity tool on large codebases.
The article throws out a lot of potential issues with AI generated code, but doesn't stop for a moment to consider if solutions currently exist or might exist to these problems.
- Before LLMs, provided that your team did some of their due diligence, you could always expect to have some help when tackling new code-bases.
Has the author never worked on legacy code before?
- (oh, and it has forgotten everything about the initial writing process by then).
Does the author not think this can ever be solved?
- Because of the situation of a Bus Factor of zero that it creates, vibe coding is fundamentally flawed. That is, only until there is an AI that can generate 100% accurate code 100% of the time, and it is fed 100% accurate prompts.
Why does it require 100% accuracy 100% of the time? Humans are not 100% accurate 100% of the time and we seem to trust them with our code.
Yea, I found the article to be overly reductive. I work on a shared ownership codebase, which at times, I am not going to be able to pull the original author at all times when I work on my branch.
At least having a partner explain some aspects is a huge unlock for me. Maybe the OP shadowboxing a world with no humans, but no humans is already a status quo issue that my team and I face sometimes.
He’ll, have they never had to maintain applications where the source code was lost? I haven’t had to do it too often, and I wouldn’t claim I’m very good at it, but on more than a handful of occasions I have had to decompile binaries to figure out what the fuck it was doing before I could write a wrapper around it to try and mitigate its unfixeable issues
The author was just saying that this will become the norm, not the exception. Ok, not this bad -- we can at least expect to have access to the AI-generated code for at least a little while longer. (I can imagine a future where AI programming gets just robust enough that some early adopter types will take their whiteboard sketches and prompts and prose test case descriptions, generate the code, compile it to a binary, then throw out the code. It's more agile! You can tweak the spec slightly and just turn the crank again!)
Hello, author here. Thanks for your comment.
I agree with your first point, maybe AI will close some of those gaps with future advances, but I think a large part of the damage will have been done by then.
Regarding the memory of reasoning from LLMs, I think the issue is that even if you can solve it in the future, you already have code for which you've lost the artifacts associated with the original generation. Overall I find there's a lot of talks (especially in the mainstream media) about AI "always learning" when they don't actually learn new anything until a new model is released.
> Why does it require 100% accuracy 100% of the time? Humans are not 100% accurate 100% of the time and we seem to trust them with our code.
Correct, but humans writing code don't lead to a Bus Factor of 0, so it's easier to go back, understand what is wrong and address it.
If the other gaps mentioned above are addressed, then I agree that this also partially goes away.
> Regarding the memory of reasoning from LLMs, I think the issue is that even if you can solve it in the future, you already have code for which you've lost the artifacts associated with the original generation. Overall I find there's a lot of talks (especially in the mainstream media) about AI "always learning" when they don't actually learn new anything until a new model is released.
But this already exists! At work, our code is full of code where the original reasoning for the code is lost. Sometimes someone has forgotten, sometimes the person who wrote it is no longer at the company any more, and so on.
> Correct, but humans writing code don't lead to a Bus Factor of 0, so it's easier to go back, understand what is wrong and address it.
But there are plenty of instances where I work with code that has a bus factor of 0.
The conclusion of your article is that vibe coding is "fundamentally flawed". But every aspect you've described about vibe coding has an analog in normal software engineering, and I don't think you would claim that is "fundamentally flawed".
To make sure I understand your position:
> But there are plenty of instances where I work with code that has a bus factor of 0.
Do you think this is a problem?
As per my other replies, if all of these instances are in completely unimportant projects, then I could see you answering "no" (but I'd be concerned if you're spending a lot of time on unimportant things). If they are important, isn't the fact that knowledge about them has been lost indicative of a flaw in how your team/company operates?
I can't speak for the author, but I would definitely claim that having a bus factor of zero for any remotely-mission-critical piece of software is "fundamentally flawed", no matter the cause. I'd say the same for a bus factor of one in most settings.
I think that's moving goalposts. The original post never talks about vibe-coding mission-critical software - and I wouldn't advocate for that, either. The post says that all vibe coding is fundamentally flawed.
That's fair, and I agree with you that the generalizations in the article's conclusion go too far.
I added the "remotely-mission-critical" qualifier to capture additional nuance. Tolerance for a low bus factor should be inversely correlated with a project's importance. That wasn't explicitly stated in the article, but it seems uncontroversial, and I suspect the author would agree with me.
Why would the damage have been done by then? What does that even mean?
How is that any different than any other legacy code where the reason for the decisions being made have long been forgotten?
[dead]
I _wish_ I had an AI tool when digging into some legacy codebases in my past.
"Just ask the author of that perl file".
Sure, it was last edited in -97 and the author is the regional manager now, should I just book a meeting or...?
> Why does it require 100% accuracy 100% of the time?
I've said this before, but I'd say it again: anti-AI people, instead of AI users, are usually those who expect AI to be magical panacea.
The vibe reminds me of some people who are against static typing because "it can't catch logical error anyway."
No, we treat it as a bureaucratic, distracting and infantile way of software engineering that does not work.
I recently joined a team with a very messy codebase. The devs were long gone, and even the ones maintaining it didn’t really understand large parts of the code. The bus factor was effectively zero.
What surprised me was how useful AI was. It helped me not only understand the code but also infer the probable intent behind it, which made debugging much faster. I started generating documentation directly from the code itself.
For me, this was a big win. Code is the source of truth. Developer documentation and even shared knowledge are often full of bias, selective memory, or the “Chinese whispers” problem where the story shifts every time it’s retold and never documented. Code doesn’t lie, it just needs interpretation. Using AI to cut through the noise and let the code explain itself felt like a net positive.
As a manager, I am considering to enforce a rule on my team that -- no README in any repo should ever go stale ever again --> it should be near-trivial for every dev to ask Claude Code to read the existing README, read/interpret the code as it practically currently stands, read what's changed in the PR, then update the README as necessary. This does not mean Claude will be perfect or that engineers don't need to check that its summaries make sense (they do, and the human is always accountable for the changes at the end of the day); but this does mean that, the typical amount of laziness that we are all guilty of often, should not be eliminated as a reason as to why READMEs go stale.
Why have such a rule if at any moment of time the LLM could update the readme ad hoc? Btw, your ingested readmes will affect your LLM's code generation and I made the observation that more often than not it is better to exclude the readmes from the context window.
No LLM will by default touch a README.md
They will when you run /init, but after that they won't look at it unless directed to do so.
The Bus Factor was an issue long before LLM-generated code. Very few companies structure work to allow a pool of >1 individuals to understand/contribute to it. What I found is -- when companies are well structured with multiple smart individuals per area, the output expectation just ends up creeping up until again there is too much to really know. You can only get away from this with really good engineering management that specifically tries to move people around the codebase and trade-off speed in the process. I have tried to do this, but sometimes pressure from the stakeholders for speed is just too great to do it perfectly.
Shameful plug, i've been writing a book on this with my retrospective as a CTO building like this. I just updated it so you can choose your price (even $0) to make this a less shameful plug on HN: https://ctoretrospective.gumroad.com/l/own-your-system
I dont think anyone has the perfect answer, yet, but LLM-built systems arent that different from having the system built by 10 diff people on eLance/Upwork/Fiverr...so the principles are the same.
The Bus Factor was indeed an issue before LLMs, and in fact it's a jargon term that has been in use since forever.
What TFA is arguing is that never before we had a trend towards Bus Factor zero. Before, the worst was often 1 (occasionally zero, of course, but now TFA argues we're aiming for zero whether we're aware or not).
True, but when the bus factor is 1, it might as well be zero -- soon you end up with employees (or contractors) who legitimately want more compensation realizing their critical nature. I totally sympathize from the employee's perspective, esp if the 1-factor means they cannot take holiday. Really, it is the company's job to control the bus factor (LLM or human) -- it is good for both the employee and company in the long run.
Agreed, it's the company's job to control the Bus Factor, that's a given. I think TFA's author worries that instead of controlling it, we're now aiming for zero (the worst possible factor).
Is there really a large difference between 0 and 1 when the average tenure of a software developer is 3 years or less at any given company?
> Is there really a large difference between 0 and 1 when the average tenure of a software developer is 3 years or less at any given company?
Spot on. 1 might as well be zero. Totally unfair to the worker also, who now cannot take time off.
When I was an architect for startups between 2016-2020 doing mostly green field development using new to the company AWS technologies, I made damn sure that any knowledge was disseminated so I could both take a vacation without being interrupted and I could “put myself out of a job”.
I considered it a success when I realized a company doesn’t need me anymore and I can move on and talk about what I did at my next interview in STAR format.
Agree, and also, promotion is hard if you are too tied to a specific system. Diagonal cross-department promotion becomes especially hard if you are a single point of failure.
But a Bus Factor of 1 has always been considered high risk. Sometimes companies take the risk, but that's a different issue.
This is precisely why the term "Bus Factor" was invented: to point out when it's 1, because it's both high risk to the company and unfair to the dev that cannot go on vacation or extended time off.
A Bus Factor of 1 has always been construed as high risk; that's why the term exists after all. Companies sometimes mitigate it, sometimes not, but in general they are vaguely aware it's a risk.
A Bus Factor of 0, especially as an implicit goal, seems doubly worrisome! Now it's a goal rather than a warning sign.
Yes, its very much a goldfish problem, where work needed grows to fill what is possible, not what is advisable or good. The only way I have seen people "solve" this is by putting a bunch of speed bumps in a process, and generally it just makes everyone lazy and deliver stuff at the last second anyway, not use the additional time to make something polished.
>> The only way I have seen people "solve" this is by putting a bunch of speed bumps in a process
I solve this by sufficient compartmentalization with good inter-component interfaces. Worst case, you excise part of your system and rebuild. Possibly you can take the schema and docs and rebuild with an LLM :-)
I talk about this in my upcoming book on the topic (link above.) Most good systems are rebuilt 3 or 4 times anyway.
There is a really important point here, and it's critical to be aware of it, but we're really just at the beginning of these tools and workflows and these issues can be solved, IMO, possibly better than with humans.
I've been trying to use LLMs to help code more to mixed success, honestly, but it's clear that they're very good at some things, and pretty bad at others. One of the things they good at obviously is producing lots of text, two important others are that they can be very persistent and thorough.
Producing a lot of code can be a liability, but an LLM won't get annoyed at you if you ask it for thorough comments and updates to docs, READMEs, and ADRs. It'll "happily" document what it just did and "why" - to the degree of accuracy that they're able, of course.
So it's conceivable to me at least, that with the right guidance and structure an LLM-generated codebase might be easier to come into cold years later, for both humans and future LLMs, because it could have excellent documentation.
There is a really important point here, and it's critical to be aware of it, but we're really at the end of these tools and workflows and these issues can't be solved.
The problem is that our brains really don't like expending calories on information we don't repeatedly use so the further you get from something, the less you understand it or remember it.
So even if you aren't even vibe coding and are trying to review every change, your skills are atrophying. We see this all the time as engineers enter management, they become super competent at the new skills the role requires but quickly become pretty useless at solving technical problems.
It's similar to why it's so hard to go from level2 to level5 in driving automation. We're not really designed to be partially involved in a process - we quickly loose attention, become lazy, and blindly trust the machine. Which is maybe fine if the machine is 100% reliable but we know that isn't the case.
I find that bus factor 0 code occurs due to lack of paradigmatic structure or adherence, both internal and external. If you have a paradigm, even a not so great paradigm, I will grok your code and start making changes on my first day.
So maybe that is a test, or something to strive for. If you can get a new developer to make a moderate change to your code base on their first day successfully, then your code is fine. If not you have work to do.
To expand a little on this thought, any one developer's experience is probably not significant but if you hire say 10 new devs and 6-8 are making changes on their first day or in the first few days, your code base is likely fine.
I recomend that a new dev should first make the most trivial change possible but see it through all the way to release, to expose them to process. Following that, a moderate change to expose them to paradigm. All on the first day or in the first few days. If only say 3 of 10 new hires can accomplish the above, the problem is in your code base (or hiring practices).
I think the article underestimates how much intent can be grasped from code alone. Even without comments.
Humans (and I strongly suspect LLMs, since they're statistical synthesis of human production) are fairly predictable.
We tend to tackle the same problems the same way. So how something is solved, tells you a lot about why, who and when it was solved.
Still, it's a valid point that much of the knowledge is now obscured, but that could be said too of a high employee churn organization.
> I think the article underestimates how much intent can be grasped from code alone. Even without comments.
I agree, the human thought process always ends up getting embedded in a which of several possible ways any one thing might be done. But it's still a process, and a vastly inferior one to having a knowledgeable person on hand. Reverse-engineering has up until now been reserved to times when it is necessary. (I mean, we all still do it, especially on legacy codebases, but it's not good for productivity at all.)
> Humans (and I strongly suspect LLMs, since they're statistical synthesis of human production) are fairly predictable.
I disagree with the parenthetical. That's what stands out to me the most about LLM code: there are definitely intentions embedded in the code, but they're a hodgepodge of different intentions all tangled up together. You can't infer much of anything from a small snippet, because the intention behind that snippet was likely relevant to a completely different origin codebase that was piled up into a compost heap to grow an LLM out of. It's actually harder to make sense of an LLM codebase because the intentions distract and confuse the reader -- just as with text, we implicitly assume that a generated artifact means the same thing as if a human wrote it, and it doesn't.
> I think the article underestimates how much intent can be grasped from code alone.
That's very scale related.
I rarely have any trouble reading and understanding Arduino code. But that's got a hard upper limit (at least on the common/original Arduinos) of 32kB of (compiled) code.
It's many weeks or months worth of effort, or possibly impossible, for me to read and understand a platform with a hundred or so interdependent microservices written in several languages. _Perhaps_ there was a very skilled and experienced architect for all of that, who demanded comprehensive API styles and docs? But if all that was vibe coded and then dropped on me to be responsible? I'd just quit.
No disrespect for your ability to read Arduino code, but no amount of experience will tell you why the code was written a certain way. Did the programmer not know of any alternatives? Did they specifically choose this one method because it was superior to the alternatives? Did they just run out of time? Is it due to regulatory requirements? The list goes on. To expand further:
FTA:
> but ultimately reading code remains much more complex than writing it no matter what.
I disagree. If reading code is complex, it's because that code was not documented well. If you've written a complex algorithm, that presumably took you hours or days to develop, the proper documentation should allow somebody to understand it (or at least grasp the major points) in a few minutes.
If you're not documenting your code to that level, i.e. to allow future devs to take less time to read and understand than it took you to write--let alone add additional information went into why you made the decisions you did--then you're doing something wrong.
I agree with the premise and the conclusion, but over almost 20 years of writing, adapting and delivering software I've more than once been in exactly the same situation. Noone to ask, the only person even vaguely familiar with software development left half a year ago. Half of the processes have changed since the software was written, and the people who owned them have left, too. So while I agree that LLMs will accelerate this process, in my opinion it's not a new flavor, just more of an existing problem. Glad to see this kind of thinking though.
The author though neglects what a bus factor of 0 means in real terms and how it gets there, aside from the description of the definition upfront where all knowledge lost.
A company acceptable with a bus factor of zero is a company that is not willing to pay the economic advantage to the expertise required to do the work.
The economic demand, of humanity competing with AI is zero because AI does things its good at with an order of magnitude difference in cost, and the deception and lies surrounding the marketing coupled with communications channel jamming lead to predictable outcomes. What happens when that demand and thus economic benefit go to zero? Any investment in the knowledge in the first place has no return. No one goes into it, no one learns, and that's quite dangerous in economies based in money-printing.
So while there may not be a problem right now, there will no doubt be problems in the next proverbial quarter. Career development pipelines are years in the making. They are sequential pipelines. Zero into a sequential pipeline is zero out with the time-lag between the two (which is years).
~2 years without economic incentive is when you lose your best and brightest. From there is a slow march to a 10 year deadline after which catastrophic losses occur.
I had a chance to have a interesting discussion with a local community college Dean. Apparently they have had to lower the number of Computer Science related program sections because of lack of demand. In intro courses, for 18 sections there were 13 people who had declared for the major, most students when asked citing AI concerns and lack of career development pipeline.
What happens when you have no expertise that you can pay at any price to fix the processes involved because you put them all out of business using what amounts to a semi-thinking non-sentient slave.
Without supply, there can be no demand. Where supply becomes infinite because of external parties, there can be no demand. There can be need, but demand is not need.
So this all started in 2022. Best and brightest are re-skilling. There's a glut of lower competency talent too. Bad things happen when you mess with the foundations of economics, and they happen on a time lag, where you can't react fast enough after-the-fact.
What will it take? At some point there will be a crisis where they will have to treat it as triage on the battlefield. The people in charge didn't want to hear about this when it could have made a difference.
As programmers, the bus factor is something to be noted and avoided, but in medicine, it goes the other direction. Private practice is one doctor, and a whole support staff for that single individual. Why are we so eager to be replaceable?
Doctors keep patient notes, and EHRs and patients can recall histories separately to that doctor. Doctors also go through relatively standardised training. That single individual is important, but it's not the same.
What's more the same would be if they were practicing healthcare on a species that they had invented, and no one knew anything about it, and the species was crucial to a company's survival.
It's potentially the opposite. If you instrument a codebase with documentation and configuration for AI agents to work well in it, then in a year, that agent will be able to do that same work just as well (or better with model progress) at adding new features.
This assumes your adding documentation, tests, instructions, and other scaffolding along the way, of course.
I wonder how soon (or if it's already happening) that AI coding tools will behave like early career developers who claim all the existing code written by others is crap and go on to convince management that a ground up rewrite is required.
(And now I'm wondering how soon the standard AI-first response to bug reports will be a complete rewrite by AI using the previous prompts plus the new bug report? Are people already working on CI/CD systems that replace the CI part with whole-project AI rewrites?)
As the cost of AI-generated code approaches zero (both in time and money), I see nothing wrong with letting the AI agent spin up a dev environment and take its best shot. If it can prove with rigorous testing that the new code works is at least as reliable as the old code, and is written better, then it's a win/win. If not, delete that agent and move on.
On the other hand, if the agent is just as capable of fixing bugs in legacy code as rewriting it, and humans are no longer in the loop, who cares if it's legacy code?
I kinda hate the idea of all that.
But I can see it "working". At least for the values of "working" that would be "good enough" for a large portion of the production code I've written or overseen in my 30+ year career.
Some code pretty much outlasts all expectations because it just works. I had a Perl script I wrote in around 1995-1998 that ran from cron and sent email to my personal account. I quit that job, but the server running it got migrated to virtual machines and didn't stop sending me email until about 2017 - at least three sales or corporate takeovers later (It was _probably_ running on CentOS4 when I last touched it in around 2005, I'd love to know if it was just turned into a VM and running as part of critical infrastructure on CentOS4 12 years later).
But most code only lasts as long as the idea or the money or the people behind the idea last - all the website and differently skinned CRUD apps I built or managed rarely lasted 5 years without being either shut down or rewritten from the ground up by new developers or leadership in whatever the Resume Driven Development language or framework was at the time - toss out the Perl and rewrite it in Python, toss out the Python and rewrite it in Ruby On Rails, then decide we need Enterprise Java to post about on LinkedIn, then rewrite that in Nodejs, now toss out the Node and use Go or Rust. I'm reasonably sure this year's or perhaps next years LLM coding tools can do a better job of those rewrites than the people who actually did them...
Will the cost of AI-generated code approach zero? I thought the hardware and electricity needed to power and train the models and infer was huge and only growing. Today the free and plus plans might be only $20/month, once moats are built I assume prices will skyrocket a order of magnitude or few higher.
Author here, you're right, but by definition when you do all of this the Bus Factor has already increased:
> This assumes your adding documentation, tests, instructions, and other scaffolding along the way, of course.
It's not just about knowledge in someone's brain, just about knowledge persistence.
I used a similar metaphor in the past referencing "They Machine Stops" [0] by E.M. Forster. Yes, in the near future, we will still be able to read code and figure out what it does. I work on legacy code all the time.
But in the long term, when experienced developers actually feel comfortable letting LLMs write large swats of code, or when the machine no longer needs to generate human readable code, then we will start forgetting how it works.
[0]: https://news.ycombinator.com/item?id=43909111
Perhaps bus factor zero doesn't matter.
A good dev can dive into a completely unknown codebase with or without tools like a debugger and figure it out. AI makes this far easier.
Some great devs/reverse-engineering experts can do the same without even the compiled source code. Again AI tools can now do this faster than any human.
Security researchers have figured out the intricacies of a system with no more than a single string as input and an error code as output.
At least we also have LLMs to generate our status updates during outages of our SaaS products while groping around in the dark.
This is why it's important to be experienced in a range of different SaaS and AI tools.
So that when one of your employers has a SaaS related outage, you can just switch to one of your other employers and keep working.
All hail the 100x AI assisted developers doing 10x jobs at 5 different companies at the same time!
I find myself acting as a brutal code reviewer more than a collaborator when I lean too heavily on an agent. I literally just typed this into the agent's chat pane (GPT-5, in this case), after finding some less-than-optimal code for examining and importing REST API documentation.
> Testing string prefixes or file extensions is bound to fail at some point on some edge case. I'd like to see more robust discovery of formats than this. This reeks of script-kiddie code, not professional-quality code.
It's true more often than I'd like that the quality of code I see generated is script-kiddie level. If I prompt carefully beforehand or review harshly after, it generally improves, but I have to keep my guard up.
I’ve got a new project I’ve been handling with Claude code. Up until now I’ve always pair coded with AIs, so I would know (and usually tweak) every bit of code generated. Now with the agent, it’s easy to miss what’s being made.
Ive been trying to resolve this with things like “make a notebook that walks through this modules functions”, etc, to make it easier for me to review.
In the spirit of literate riding though, why but have these agents spend more time (tokens…money) walking you through what they made.
Likewise, if dev A vibe codes something and leaves it to dev B to maintain, we should think about what AI workflows can get B up to speed fast. “Give me a tour of the code”
I think it is negative: it actually drains knowledge. It is an anti knowledge field because experts won’t be hired if they can be vibed. This sucks all the brains out of the room. Hence less than zero.
Related premise / HN discussion from a bit ago - “AI code is legacy code from day one”:
https://news.ycombinator.com/item?id=43888225
> The only thing you can rely on is on your ability to decipher what a highly imperfect system generated, and maybe ask explanations to that same imperfect system about your code its code (oh, and it has forgotten everything about the initial writing process by then).
This just sounds like every other story I hear about working on ossified code bases as it is. At least AI can ingest large amounts of code quickly, even if as of today it can't be trusted to actually decipher it all.
This concept of deploying unreviewed vibe code strikes me as very similar to using a fallen log as a bridge to cross a ravine. Yes, it works, until the day it doesn't. And that day is likely to be much sooner than if it had been a concrete-reinforced steel design signed out by a PE.
aw. I was hallucinating the article content from the title. Bus factor(aka truck factor, lottery factor, honeymoon number, etc.) is number of team members you could lose, hopefully due to positive life events, before the project falls apart. The author argues this could be zero with vibecoded projects, meaning the project could spontaneously explode on full working members.
You want this factor to be +Inf, not 1/+Inf. Just in case it wasn't beyond abundantly clear to all...
LLMs aren't bad for programming in general.
LLMs are bad for bad programmers. LLMs will make a bad programmer worse and make a layperson think they're a prodigy.
Meanwhile, the truly skilled programmers are using LLMs to great success. You can get a huge amount of value and productivity from an LLM if and only if you have the skill to do it yourself in the first place.
LLMs are not a tool that magically makes anyone a good programmer. Expecting that to be the case is exactly why they don't work for you. You must already be a good programmer to use these tools effectively.
I have no idea what this will do to the rising generation of programmers and engineers. Frankly I'm terrified for them.
Yes, the engineer who coded is already dead. No bus necessary.
The flaw in this reasoning is AI can also help you understand code much more quickly than we could before. We are now in fractional bus factor territory.
LLMs make it vastly easier to work on unfamiliar code bases
A team unfamiliar with a code base demoed asking questions to an LLM about it. The answers genuinely excited some. But anyone who had spent a short time in the code base knew the answers were wrong. Oh well.
That is one anecdote, but it doesn't really have any information in it. To debug the process we'd need to know which LLM, the developer's backgrounds, what prompts they used etc.
I've used a variety of LLMs to ask questions about probably dozens of unfamiliar code bases many of which are very complicated technically.
At this point I think LLMs are indispensable to understanding unfamiliar code bases. Nearly always much better than documentation and search engines combined.
It's fascinating ...didn't think about the Bus Factor at all wrt vibe coding. Feels obvious in retrospect. But I feel there's the other side of software beyond the maintanable, professional-grade software requirements. There are a lot of use cases for basic software to solve that one problem in that one specific way and get it over with. A bit like customized software with little scope and little expectation of long-term support. Vibe-coding excels there.
In a way, I have been thinking about it [1] as the difference between writing a book and a writing a blog post - the production qualities expected in both are wildly different. And that's expected, almost as a feature!
I think as “writing” and distributing new software keeps getting easier - as easy as writing a new blog post - the way we consume software is going to change.
[1]: https://world.hey.com/akumar/software-s-blog-era-2812c56c
> Before LLMs, provided that your team did some of their due diligence, you could always expect to have some help when tackling new code-bases. Either a mentor, or at least some (even if maybe partially outdated) documentation. With LLMs, this is gone.
I love the conclusion. When no human holds the knowledge it is like the bus already struck everybody at the company.
I have worked in teams that share knowledge often and extensively. Anybody can go on a vacation with little disruption as other's can take the tasks. Everybody is happier and projects work better.
(If your first tough is that you can be replaced easily and you will be fired. Then you live in a dystopian class-warfare country where the owner-class will fire you because they enjoy making the working-class suffer. I am sorry for you, but have hope. That can be changed with good laws and employee protections.)
We’re already at bus factor of close to zero for most bank code written in cobol lol
The folks that think they can now suddenly program without any experience, and not need understand how their product works, are suffering from Dunning-Kruger syndrome. Actually, it is a much broader segment and includes product managers, executives, VCs and the general public.
The project foundation is everything. LLMs are sensitive to over-engineering. The LLM doesn't have an opinion about good code vs bad code.
If you show it bad code and ask it to add features on top, it will produce more bad code... It might work (kind of) but more likely to be buggy and have security holes. When the context you give to the LLM includes unnecessary complexity, it will assume that you want unnecessary complexity and it will generate more of it for you.
I tried Claude Code with both a bad codebase and a good codebase; the difference is stark. The first thing I notice is that, with the good code base without unnecessary complexity, it generates a lot LESS code for any given feature/prompt. It's really easy to review its output and it's very reliable.
With a bad, overengineered codebase, Claude Code will produce complex code that's hard to review... Even for similar size features. Also it will often get it wrong and the code won't work. Many times it adds code which does literally nothing at all. It says "I changed this so that ..., this should resolve the issue ..." - But then I test and the issue is still there.
Some bad coders may be tempted to keep asking Claude to do more to fix the issue and Claude keeps adding more mess on top. It becomes a giant buggy hack and eventually you have to ask it to rewrite a whole bunch of stuff because it becomes way too complicated and even Claude can't understand its own code... That's how you get to bus factor of 0. Claude will happily keep churning out code even if it doesn't know what it's doing. It will never tell you that your code is unmaintainable and unextendable. Show it the worst codebase in the world and it will adapt itself to become the worst coder in the world.
Context management is hard.
Unfortunately the corporate machine has been converging on a bus factor of 0. I've been part of multiple teams now where I was the only one holding knowledge over critical subsystems and whenever I attempted to train people on it, it was futile. Mainly because they would get laid off doing 'cost-savings measures'.
There were times where I was close to getting fed up and just quitting during some of the high profile ops I had to deal with which would've left the entire system inoperable for an extended period of time. And frankly from talking to a lot of other engineers, it sounds like a lot of companies operate in this manner.
I fully expect a lot of these issues to come home to roost as AI compounds loss of institutional knowledge and leads to rapid system decay.
My guess? The AI companies will keep the free and $20/month plans to entice developers and their managers. They will have $200/month plans with bigger context windows to allow effective work on larger that toy codebases. But sooner or later companies with large scale projects are going to need a much larger context window, that _that_ will suddenly become a $200k/year/developer subscription. There's a lot of correlation between "institutional knowledge" and context window I think.
Interesting. I just posted a similar comment as a sister comment to yours above (at least at the time of reading the thread) to another persons comment about cost of AI code going to zero... Which was basically the same as you believe here.
https://news.ycombinator.com/item?id=44970251
just not even ten years ago the discussion here was all about "software engineering" trying to be more legitimized as a formal engineering practice, if there should be licensing, if there should be certifications, lots of threads about formal methods to prove algorithms work, and look where we are now. Arguing if humans should even care look at the code we are producing and shipping. crazy shit man
Pretty sure that was 25 years ago, not 10 years ago. And then, 20 years ago, we were coming to terms with the fact that all of that stuff was a spectacular failure. And then, 15 year ago, we had found much better ways to do things.
I don't miss that phase of the evolution of software development practice at all.
I'm not talking about UML or waterfall. Talking more about formal methods which were still pretty common here amongst the various lisp/haskell/clojure discussions etc. I'm pretty sure this is still a relevant technique for certain classes of software.
By that same logic, if a project is documented so thoroughly that an agent could handle all the work, then the bus factor effectively becomes infinite.
[dead]
[dead]
This guy thinks bus factors of zero started with ChatGPT. Hahahahahaha. Adorable.
How many of you have asked about a process and been told that nobody knows how it works because the person who developed it left the company?
There was a blog post at the top of HN about this, like, yesterday.
I hate the current AI hype hothouse and everything it seems to be doing to the industry... but I couldn't help but laugh.
The post is great. Bus factor of zero is a great coinage.
The difference is this isn't some legacy system that still exists a decade later. It's brand new with the tag still on. And it wasn't designed by a conscious being but by probability.
I've seen from beautiful to crazy legacy systems in various domains. But when I encounter something off, there appears to always be a story. Not so much with LLMs.
It got hit by a bus before it was born.
Everything you interact with on a daily basis is either natural, or designed by a human. Until now.