Are these kinds of articles a new breed of rage bait? They keep ending up on the front page with thriving comment sections, but in terms of content they're pretty low in nutritional value.
So I'm guessing they just rise because they spark a debate?
There’s barely any debate, people don’t answer each other; It’s rather about invoking the wonder and imagination of everyone’s brain. Like spatial conquest or an economic crisis: It will change everything but you can’t do anything immediately about it, and everyone tries to understand what it will change so they can adapt. It’s more akin to 24hrs junk news cycle, where everything is presented as an alert but your tempted to listen because it might affect you.
I know someone who has been spending $7k / month on Cursor tokens for the past six months, managing such a team of agents… But curiously the results seem to be endless PDF-ware, and every month there’s a new reason why the project is not yet quite ready to be used on real data.
LLMs are very good at making you think they’re giving you what you hoped to get.
Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.
Right, but that's the point -- prompting an LLM still requires 'thinking about thinking' in the Papert sense. While you can talk to it in 'natural language' that natural language still needs to be _precise_ in order to get the exact result that you want. When it fails, you need to refine your language until it doesn't. So prompts = high-level programming.
You can't think all the way about refining your prompt for LLMs as they are probabilistic. Your re-prompts are just retrying until you hit a jackpot - refining only works to increase the chance to get what you want.
When making them deterministic (setting the temperature to 0), LLMs (even new ones) get stuck in loops for longer streams of output tokens. The only way to make sure you get the same output twice is to use the same temperature and the same seed for the RNG used, and most frontier models don't have a way for you to set the RNG seed.
A compiler that can turn cash into improved code without round tripping a human is very cool though. As those steps can get longer and succeed more often in more difficult circumstances, what it means to be a software engineer changes a lot.
LLMs may occasionally turn bad code into better code but letting them loose on “good” or even “good enough” code is not always likely to make it “better”.
The difference is that what most languages compile to is much much more stable than what is produced by
running a spec through an LLM.
A language or a library might change the implementation of a sorting algorithm once in a few years. An LLM is likely to do it every time you regenerate the code.
It’s not just a matter of non-determinism either, but about how chaotic LLMs are. Compilers can produce different machine code with slightly different inputs, but it’s nothing compared to how wildly different LLM output is with very small differences in input. Adding a single word to your spec file can cause the final code to be far more unrecognizably different than adding a new line to a C file.
If you are only checking in the spec which is the logical conclusion of “this is the new high level language”, everyone you regenerate your code all of the thousands upon thousands of unspecified implementation details will change.
Oops I didn’t think I needed to specify what going to happen when a user tries to do C before A but after B. Yesterday it didn’t seem
to do anything but today it resets their account balance to $0. But after the deployment 5 minutes ago
it seems to be fixed.
Sometimes users dragging a box across the screen will see the box disappear behind other boxes. I can’t reproduce it though.
I changed one word in my spec and now there’s an extra 500k LOC to implement a hidden asteroids game on the home page that uses 100% of every visitor’s CPU.
This kind of stuff happens now, but the scale with which it will happen if you actually use LLMs as a high level language is unimaginable. The chaos of all the little unspecified implementation details constantly shifting is just insane to contemplate as user or a maintainer.
Deterministic compilation, aka reproducible builds, has been a basic software engineering concept and goal for 40+ years. Perhaps you could provide some examples of compilers that produce non-deterministic output along with your bad news.
If you are referring to timestamps, buildids, comptime environments, hardwired heuristics for optimization, or even bugs in compilers -- those are not the same kind of non-determinism as in LLMs. The former ones can be mitigated by long-standing practices of reproducible builds, while the latter is intrinsic to LLMs if they are meant to be more useful than a voice recorder.
Compilers aim to be fully deterministic. The biggest source of nondeterminism when building software isn't the compiler itself, but build systems invoking the compiler nondeterministically (because iterating the files in a directory isn't necessarily deterministic across different machines).
The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
What would you say if someone has a project written in, let's say, PureScript and then they use a Java backend to generate/overwrite and also version control Java code. If they claim that this would be a Java project, you would probably disagree right? Seems to me that LLMs are the same thing, that is, if you also store the prompt and everything else to reproduce the same code generation process. Since LLMs can be made deterministic, I don't see why that wouldn't be possible.
PureScript is a programming language. English is not. A better analogy would be what would you say about someone who uses a No Code solution that behind the scenes writes Java. I would say that's a much better analogy. NoCode -> Java is similar to LLM -> Java.
I'm not debating whether LLMs are amazing tools or whether they change programming. Clearly both are true. I'm debating whether people are using accurate analogies.
> PureScript is a programming language. English is not.
Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write. If it can do that, why couldn’t it be used to tell a computer exactly what program to write?
I don’t think you can do that. Or at least if you could, it would be an unintelligible version of English that would not seem much different from a programming language.
I agree with your conclusion but I don't think it'd necessarily be unintelligible. I think you can describe a program unambiguously using everyday natural language, it'd just be tediously inefficient to interpret.
To make it sensible you'd end up standardising the way you say things: words, order, etc and probably add punctuation and formatting conventions to make it easier to read.
By then you're basically just at a verbose programming language, and the last step to an actual programming language is just dropping a few filler words here and there to make it more concise while preserving the meaning.
However I think there is a misunderstanding between being "deterministic" and "unambiguous". Even C is an ambiguous programming language" but it is "deterministic" in that it behaves in the same ambiguous/undefined way under the same conditions.
The same can be achieved with LLMs too. They are "more" ambiguous of course and if someone doesn't want that, then they have to resort to exactly what you just described. But that was not the point that I was making.
Here's a very simple algorithm: you tell the other person (in English) literally what key they have to press next. So you can easily have them write all the java code you want in a deterministic and reproducible way.
And yes, maybe that doesn't seem much different from a programming language which... is the point no? But it's still natural English.
> Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write
Various attempt has been made. We got Cobol, Basic, SQL,… Programming language needs to be formal and English is not that.
English CAN be ambiguous, but it doesn't have to be.
Think about it. Human beings are able to work out ambiguity when it arrises between people with enough time and dedication, and how do they do it? They use English (or another equivalent human language). With enough back and forth, clarifying questions, or enough specificity in the words you choose, you can resolve any ambiguity.
Or, think about it this way. In order for the ambiguity to be a problem, there would have to exist an ambiguity that could not be removed with more English words. Can you think of any example of ambiguous language, where you are unable to describe and eliminate the ambiguity only using English words?
A determinisitic prompt + seed used to generate an output is interesting as a way to deterministically record entirely how code came about, but it's also not a thing people are actually doing. Right now, everyone is slinging around LLM outputs without any trying to be reproducible; no seed, nothing. What you've described and what the article describe are very different.
Yes, you are right. I was mostly speaking in theoretical terms - currently people don't work like that. And you would also have to use the same trained LLM of course, so using a third party provider probably doesn't give that guarantee.
I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Well I think that’s kind of the point or value in these tools. Let the AI do the tedious stuff saving your energy for the hard stuff. At least that’s how I use them, just save me from all the typing and tedium. I’d rather describe something like auth0 integration to an LLM than do it all myself. Same goes for like the typical list of records, clock one, view the details and then a list of related records and all the operations that go with that. Like it’s so boring let the LLM do that stuff for you.
This is one of my favourite activites with LLMs as well. After implementing some sort of idea for an algorithm, I try seeing what an LLM would come up with. I hint it as well and push it in the correct direction with many iterations but never tell the most ideal one. And as a matter of fact they can never reach the quality I did with my initial implementation.
I'm seeing the same thing with my own little app that implements several new heuristics for functionality and optimisation over a classic algorithm in this domain. I came up with the improvements by implementing the older algorithm and just... being a human and spending time with the problem.
The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.
One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
Perhaps we'll go the way of the Space Shuttle? One group writes a highly-structured, highly-granular, branch-by-branch 2500 page spec, and another group (LLM) writes 25000 lines of code, then the first group congratulates itself on on producing good software without have to write code?
After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
If all you (not you specifically, more of a royal “you” or “we”) are is a collection of skills centered around putting code into an editor and opening pull requests as fast as possible, then sure, you might be cooked.
But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.
The question isn't whether businesses will have 0 human element to them, the question is does AI offer a big enough gap that technical skills are still required such that technical roles are still hired for. Someone in product can have all of those skills without a computer science degree, with no design experience, and AI will do the technical work at the level of design, implementation, and maintenance. What I am seeing with the new models isn't just writing code, it's taking fundamental problems as input and design wholistic software solutions as output - and the quality is there.
I am only seeing that if the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go. Otherwise you end up with an absolute mess that may work at least for "happy path" cases but completely breaks down as the product needs change. I've described a case of this in some detail in another comment.
> the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go
That is exactly what I recommend, and it works like a charm. The person also has to have realistic expectations for the LLM, and be willing to work with a simulacrum that never learns (as frustrating as it seems at first glance).
> what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?
The answer is the AI. It's already handling complex issues and debugging solely by gathering its own context, doing major refactors successfully, and doing feature design work. The people that will be held responsible will be the product owners, but it won't be for bugs, it will be for business impact.
My point is that SWEs are living on a prayer that AI will be perched on a knifes edge where there is still be some amount of technical work to make our profession sustainable and from what I'm seeing that's not going to be the case. It won't happen overnight, but I doubt my kids will ever even think about a computer science degree or doing what I did for work.
I work in the green energy industry and we see it a lot now. Two years ago the business would've had to either buy a bunch of bad "standard" systems which didn't really fit, or wait for their challengs to be prioritised enough for some of our programmers. Today 80-90% of the software which is produced in our organisation isn't even seen by our programmers. It's build by LLM's in the hands of various technically inclined employees who make it work. Sometimes some of it scales up a bit that our programmers get involved, but for the most part, the quality matters very little. Sure I could write software that does the same faster and with much less compute, but when the compute is $5 a year I'd have to write it rather fast to make up for the cost of my time.
I make it sound like I agree with you, and I do to an extend. Hell, I'd want my kids to be plumbers or similar where I would've wanted them to go to an university a couple of years ago. With that said. I still haven't seen anything from AI's to convince me that you don't need computer science. To put it bluntly, you don't need software engineering to write software, until you do. A lot of the AI produced software doesn't scale, and none of our agents have been remotely capable of making quality and secure code even in the hands of experienced programmers. We've not seen any form of changes over the past two years either.
Of course this doesn't mean you're wrong either. Because we're going to need a lot less programmers regardless. We need the people who know how computers work, but in my country that is a fraction of the total IT worker pool available. In many CS educations they're not even taught how a CPU or memory functions. They are instead taught design patterns, OOP and clean architecture. Which are great when humans are maintaining code, but even small abstractions will cause l1-3 cache failures. Which doesn't matter, until it does.
Same situation as when an engineer can't figure something out, they translate the problem into human terms for a product person, and the product person makes a high level decision that allows working around the problem.
Uh that's not what engineers do; do you not have any software development experience, or rather any outside of vibe coding? That would explain your perspective. (for context I am 15+ yr experience former FAANG dev)
I don't meant this to sound inflammatory or anything; it's just that the idea that when a developer encounters a difficult bug they would go ask for help from the product manager of all people is so incredibly outlandish and unrealistic, I can't imagine anyone would think this would happen unless they've never actually worked as a developer.
Staff engineer (also at FAANG), so yes, I have at least comparable experience. I'm not trying to summarize every level of SWE in a few sentences. The point is that AI's infallibility is no different than human infallibility. You may fire a human for a mistake, but it won't solve the business problems they may have created, so I believe the accountability argument is bogus. You can hold the next layer up accountable. The new models are startling good at direction setting, technical to product translation, and providing leadership guidance on technical matters and providing multiple routes for roadblocks.
We're starting to see engineers running into bugs and roadblocks feed input into AI and not only root causing the problem, but suggesting and implementing the fix and taking it into review.
Surely at some point in your career as a SWE at FAANG you had to "dive deep" as they say and learn something that wasn't part of your "training data" to solve a problem?
I would have said the same thing a year or two ago, but AI is capable of doing deep dives. It can selectively clone and read dependencies outside of its data set. It can use tool calls to read documentation. It can log into machines and insert probes. It may not be better than everyone, but it's good enough and continuing to improve such that I believe subject matter expertise counts for much less.
I'm not saying that AI can't figure out how to handle bugs (it absolutely can; in fact even a decade ago at AWS there was primitive "AI" that essentially mapped failure codes to a known issues list, and it would not take much to allow an agent to perform some automation). I'm saying there will be situations the AI can't handle, and it's really absurd that you think a product owner will be able to solve deeply technical issues.
You can't product manage away something like "there's an undocumented bug in MariaDB which causes database corruption with spatial indexes" or "there's a regression in jemalloc which is causing Tomcat to memory leak when we upgrade to java 8". Both of which are real things I had to dive deep and discover in my career.
There are definitely issues in human software engineering which reach some combination of the following end states:
1. The team is unable to figure it out
2. The team is able to figure it out but a responsible third-party dependency is unable to fix it
3. The team throws in the towel and works around the issue
At the end of the day it always comes down to money: how much more money do we throw at trying to diagnose or fix this versus working around or living with it? And is that determination not exactly the role of a product manager?
I don't see why this would ipso facto be different with AI
For clarity I come at this with a superposition of skepticism at AI's ultimate capabilities along with recognition of the sometimes frightening depth encountered in those capabilities and speed with which they are advancing
I suppose the net result would be a skepticism of any confident predictions of where this all ends up
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.
We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.
Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.
Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.
This is an exaggeration, if you store the prompt that was "compiled" by today's LLMs there is no guarantee that in 4 months from now you will be able to replicate the same result.
I can take some C or Fortran code from 10 years ago, build it and get identical results.
That is a wobbly assertion. You certainly would need to run the same compiler, forgo any recent optimisations, architecture updates and the likes if your code has numerical sensitive parts.
You certainly can get identical results, but it's equally certainly not going to be that simple a path frequently.
Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)
LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.
I'll take the gloriously reliable classical compute world to compile my stuff any day.
Code written in a HLL is a sufficient[1] description of the resulting program/behavior. The code, in combination with the runtime, define constraints on the behavior of the resulting program. A finished piece of HLL code encodes all the constraints the programmer desired. Presuming a 'correct' compiler/runtime, any variation in the resulting program (equivalently the behavior of an interpreter running the HLL code) varies within the boundaries of those constraints.
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
"Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages."
This is not an appropriate analogy, at least not right now.
Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.
I think it's appropriate in terms of the results rather than the process; the bigger problem I see is that programming languages are designed to be completely unambiguous, whereas human language is not ("Go to the shop and buy one box of eggs, and if they have carrots, buy three") so we're transitioning from exactly specifying what we want the software to do, to tying ourselves in knots trying to specify it exactly, while a machine tries to disambiguate our request. I bet lawyers would make good vibe coders.
I'm trying to work with vibe-coded applications and it's a nightmare. I am trying to make one application multi-tenant by moving a bunch of code that's custom to a single customer into config. There are 200+ line methods, dead code everywhere, tons of unnecessary complexity (for instance, extra mapping layers that were introduced to resolve discrepancies between keys, instead of just using the same key everywhere). No unit tests, of course, so it's very difficult to tell if anything broke. When the system requirements change, the LLM isn't removing old code, it's just adding new branches and keeping the dead code around.
I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.
Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.
The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.
Why using agents if there are absolutely zero need for them? It's the usual, here, we spent a shitton of money on this, now find out how we MUST include this horrible thing into our already bloated dev environment.
We’re missing the boat here. There are already companies with millions in revenue that are pure agent loops of English text. They can do things our traditional software cannot.
So we are going to certainly see more of these incidents then [0] from those not understanding LLM written code as now 'engineers' will let their skills decay because the 'LLMs know best'.
The side effect of using LLMs for programming is that no new programming language can now emerge to be popular, that we will be stuck with the existing programming languages forever for broad use. Newer languages will never accumulate enough training data for the LLM to master them. Granted, non-LLM AIs with true neural memory can work around this, as can LLMs with an infinite token frozen+forkable context, but these are not your everyday LLMs.
I wouldn’t be surprised if in the next 5-10 years the new and popular programming language is one built with the idea of optimizing how well LLM’s (or at that point world models) understand and can use it.
Right now LLMs are taking languages meant for humans to understand better via abstraction, what if the next language is designed for optimal LLM/world model understanding?
Or instead of an entirely new language, theres some form of compiling/transpiling from the model language to a human centric one like WASM for LLMs
I don't think we need that many programming languages anyway.
I'm more worried about the opposite: the next popular programming paradigm will be something that's hard to read for humans but not-so-hard for LLM. For example, English -> assembly.
It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.
"I prompted it like this"
"I gave it the same prompt, and it came out different"
It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.
> It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.
Well I think the article would say that you can diff the documentation, and it's the documentation that is feeding the AI in this new paradigm (which isn't direct prompting).
If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.
> If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.
The whole goal of getting a computer to do a task is that it’s capable to do it many times and reliably. Especially in business, infrastructure, and manufacturing,…
Once you turn specifications and requirements in code, it’s formalized and the behavior is fixed. That’s only when it’s possible to evaluate it. Not with the specifications, but with another set of code that is known to be good (or easier to verify)
The specification is just a description of an idea. It is a map. But the code is the territory. And I’ve never seen anyone farm or mine from a map.
Note that the prompt wasn't fed to another LLM, but to the same one. "I wrote a program in C and gave it to GCC. Then I gave the same program to GCC again and I got a different result" would be more like it.
Optimization level, target architecture, etc are just information fed to the compiler. If it’s still nondeterministic with everything kept the same your compiler is broken.
You're missing the point. I'm not saying that compilers are nondeterminitsic or that LLMs are deterministic. I'm saying that it doesn't matter. No one cares about deterministic results except programmers. The non-technical user who will make software in the future just knows that he gets what he asked for.
Systems randomly failing is of significant concern to non programmers, that’s inherent to the non-deterministic nature of LLM’s.
I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.
The output of the LLM is nondeterministic, meaning that the same input to the LLM will result in different output from the LLM.
That has nothing to do with weather the code itself is deterministic. If the LLM produces non-deterministic code, that's a bug, which hopefully will be caught by another sub-agent before production. But there's no reason to assume that programs created by LLMs are non-deterministic just because the LLMs themselves are. After all, humans are non-deterministic.
> I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.
This is a crazy scenario that does not correspond to how anyone uses LLMs.
If there is no error on the compiler implementation and no undefined behavior the resulting program is equivalent and the few differences are mostly just implementation defined stuff which are left to the compiler to decide (but often gcc and clang do the same). The performance might differ also.
It’s clearly not comparable to the many differences you can get from LLM’s output.
No, the ultimate beneficiary of LLM-created code is the toll collectors who stole as much intellectual property as they could (and continue to do so), fleecing everyone else that they are Promethean for having done so and for continuing to do so.
It has to stop somewhere. Business owner can also hire a different company to create the product and get a result different by as little as 5% performance difference or something with clearly inferior maintainability and UX. You'd hardly argue that it's 'the same' when they followed the same spec, which will never be fully precise at the business level.
I agree that talking to an LLM is akin to using the business oriented logic at the module or even function level.
Your logic sounds like willful ignorance. You are relying on some odd definitions of "definitions", "equivalence", and "procedures". These are all rigorously defined in the underlying theory of computer science (using formal logic, lambda calculus, etc.)
Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).
The C Standard has been rigorously written to uphold certain principles such that the same code (following its axioms) will always produce the same results (under specified conditions) for any standard compliant compiler. There exists no such standard (and no axioms nor conditions to speak of) where the same is true of Claude and Gemini.
> Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).
True, but none of that is relevant to the non-programmer end user.
> You are relying on some odd definitions of "definitions", "equivalence", and "procedures"
These terms have rigorous definitions for programmers. The person making software in the future is a non-programmer and doesn't care about any of that. They care only that the LLM can produce what they asked for.
> The C Standard has been rigorously written to uphold certain principles
I know what a standard is. The point is that the standard is irrelevant if you never look at the code.
It is indeed extremely relevant to the end user. For websites the end user is not the creator of the web site who pushes it to the server, it is the user who opens it on a browser. And for that user it matters a great deal if a button is green or blue, if it responds to keyboard events, if it says “submit” or “continue”, etc. It also matters to the user whether their information is sent to a third party, whether their password is leaked, etc.
Your argument here (if I understand you correctly) is the same argument that to build a bridge you do not need to know all the laws of physics that prevents it from collapsing. The project manager of the construction team doesn’t need to know it, and certainly not the bicyclists who cross it. But the engineer who draws the blueprints needs to know it, and it matters that every detail on those blueprints are rigorously defined, such that the project manager of the construction team follows them to the letter. If the engineer does not know the laws of physics, or the project manager does not follow the blueprints to the letter, the bridge will likely collapse, killing the end user, that poor bicyclist.
> I don't think I am. If you ask an LLM for a burger web site, you will get a burger web site. That's the only category that matters.
If one burger website generated uses PHP and the other is plain javascript, which completely changes the way the website has to be hosted--this category matters quite a bit, no?
No. Put yourself in the shoes of the owner of the burger restaurant (who only heard the term "JavaScript" twice in his life and vaguely remember it's probably something related to "Java", which he heard three times) and you'll know why the answer is no.
I put myself in the shoes of the burger restaurant owner. I vibe coded a website. Sweet. I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site. I don't know what that is. Suddenly the thing that didn't matter actually really fucking matters
This is like saying it doesn't matter if your pipes are iron, lead or PVC because you don't see them. They all move water and shit where they need to be, so no problem. Ignorance is bliss I guess? Plumbers are obsolete!
> I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site
I already addressed this exact argument in another comment. In short: hosting is an implementation detail. The LLM can solve this problem just like any coding bug. The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.
Hosting is not a hard problem to solve, compared to other issues.
> The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.
Can you please point to a currently available LLM that can do this?
> which completely changes the way the website has to be hosted--this category matters quite a bit, no?
It matters to you because you're a programmer, and you can't imagine how someone could create a program without being a programmer. But it doesn't really matter.
The non-technical user of the LLM won't care if the LLM generates PHP or JS code, because they don't care how it gets hosted. They'll tell the LLM to take care of it, and it will. Or more likely, the user won't even know what the word "hosting" means, they'll simply ask the LLM to make a website and publish it, and the LLM takes care of all the details.
Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?
Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.
> Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?
It's an implementation detail. The user doesn't care. OpenClaw can buy its own hosting if you ask it to.
> Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.
There's this cool new program that the kids are using. It's called Docker. You should check it out.
> How's the non-programmer going to tell the LLM to use Docker? They don't know what Docker is.
At this point, I think you are intentionally missing the point.
The non-programmer doesn't need to know about Docker, or race conditions, or memory leaks, or virtual functions. The programmer says "make me a web site" and the LLM figures it out. It will use an appropriate language and appropriate libraries. If appropriate, it will use Docker, and if not, it won't. If the non-programmer wants to change hosting, he can say so, and the LLM will change the hosting.
The level of abstraction goes up. The details that we've spent our lives thinking about are no longer relevant.
And then the user says “LLM make this slight change to my website” and suddenly the website is subtly different in 100 different ways and users are confused and frustrated and they massively hemorrhage customers.
How does the non-programmer know about hosting? They just want a burger site. What's hosting? Is that like Facebook?
To maybe get out of this loop: your entire thesis is that nonfunctional requirements don't matter, which is a silly assertion. Anyone who has done any kind of software development work knows that nonfunctional requirements are important, which is why they exist in the first place.
Yeah, in most swe roles, you got some task from the PM that describe a new features. But you were not hired to just translate the ticket to code. More often, it’s your responsibility to figure out all those non-functional requirements, like don’t break anything that is currently working.
More often, the issue with legacy code is not that you don’t know how to make a change, it’s because you don’t know if and how it will blow up after making it.
I think you are b/c you lack actual understanding of how compilers work & what it would mean to compile the same C code w/ two different conformant C compilers & get semantically different results.
> you lack actual understanding of how compilers work
My brother in Christ, please get off your condescending horse. I have written compilers. I know how they work. And also you've apparently never heard of undefined behavior.
The point is that the output is different at the assembly level, but that doesn't matter to the user. Just as output from an LLM but differ from another, but the user doesn't care.
Undefined behavior is a edge case in C. Other programing languages (like JavaScript) goes to great lengths in defining their standards such that it is almost impossible to write code with undefined behavior. By far majority of code written out there has no undefined behavior. I think it is safe to assume that everyone here (except you) are talking about C code without undefined behavior when we mean that the same code produces the same results regardless of the compiler (as long as the compiler is standards conforming).
Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day. There are other examples, such as concurrency. So that claim that using LLMs isn't programming because "nondeterminism" makes no sense.
> Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day.
Every day you say. I program every day, and I have never, in my 20 years of programming, on purpose written in undefined behavior. I think you may be exaggerating a bit here.
I mean, sure, some leet programmers do dabble in the undefined behavior, they may even rely on some compiler bug for some extreme edge case during code golf. Whatever. However it is not uncommon when enough programmers start relying on undefined behavior behaving in a certain way, that it later becomes a part of the standard and is therefor no longer “undefined behavior”.
Like I said in a different thread, I suspect you may be willfully ignorant about this. I suspect you actually know the difference between:
a) written instructions compiled into machine code for the machine to perform, and,
b) output of a statistical model, that may or may not include written instructions of (a).
There are a million reasons to claim (a) is not like (b), the fact that (a) is (mostly; or rather desirably) deterministic, while (b) is stochastic is only one (albeit a very good) reason.
You don't sound like you have written any code at all actually. What you do sound like is someone who is pretending like they know what it means to program which happens a lot on the internet.
I think I 100% agree with you, and yet the other day I found myself telling someone "Did you know OpenClaw was written Codex and not Claude Code?", and I really think I meant it in the same sense I'd mean a programming language or framework, and I only noticed what I'd said a few minutes later.
Interesting how the definition “real programming” keeps changing. I’m pretty sure when the assembler first came, bare metal machine code programmers said “this isn’t programming”. And I can imagine their horror when the compiler came along.
I think about those conversations a lot. I’m in the high power rocketry hobby and wrote my own flight controller using micropython and parts from Adafruit. Worked just fine and did everything I wanted it to do yet the others in the hobby just couldn’t stand that it wasn’t in C. They were genuinely impressed then I said micropython and all of a sudden it was trash. People just have these weird obsessions that blind them to anything different.
All programming achieves the same outcome; requests the OS/machine set aside some memory to hold salient values and mutate those values in-line with mathematical recipe.
Functions like:
updatesUsername(string) returns result
...can be turned into generic functional euphemism
takeStringRtnBool(string) returns bool
...same thing. context can be established by the data passed in, external system interactions (updates user values, inventory of widgets)
as workers SWEs are just obfuscating how repetitive their effort is to people who don't know better
>"I gave it the same prompt, and it came out different"
1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.
> 1:1 reproducibility is much easier in LLMs than in software building pipelines
What’s a ‘software building pipeline’ in your view here? I can’t think of parts of the usual SDLC that are less reproducible than LLMs, could you elaborate?
Reproducibility across all existing build systems took a decade of work involving everything from compilers to sandboxing, and a hard reproducibility guarantee in completely arbitrary cases is either impossible or needs deterministic emulators which are terribly slow. (e.g. builds that depend on hardware probing or a simulation result)
Input-to-output reproducibility in LLMs (assuming the same model snapshot) is a matter of optimizing the inference for it and fixing the seed, which is vastly simpler. Google for example serves their models in an "almost" reproducible way, with the difference between runs most likely attributed to batching.
It’s not just about non-determinism, but about how chaotic LLMs are. A one word difference in a spec can and frequently does produce unrecognizably different output.
If you are using an LLM as a high level language, that means that every time you make a slight change to anything and “recompile” all
of the thousands upon thousands of unspecified implementation details
are free to change.
You could try to ameliorate this by training LLMs to favor making fewer changes, but that would likely end up encoding every bad architecture decisions made along the way and essentially forcing a convergence on bad design.
Fixing this I think requires judgment on a level far beyond what LLMs have currently demonstrated.
I'm very specifically addressing prompt reproducibility mentioned above, because it's a notorious red herring in these discussions. What you want is correctness, not determinism/reproducibility which is relatively trivial. (although thinking of it more, maybe not that trivial... if you want usable repro in the long run, you'll have to store the model snapshot, the inference code, and make it deterministic too)
>A one word difference in a spec can and frequently does produce unrecognizably different output.
This is well out of scope for the reproducibility and doesn't affect it in the slightest. And for practical software development this is also a red herring, the real issue is correctness and spec gaming. As long as the output is correct and doesn't circumvent the intention of the spec, prompt instability is unimportant, it's just the ambiguous nature of the domain LLMs and humans operate in.
I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.
I'm not sure I'd call agents an
army of juniors. More like a high school summer intern who has infinite time to do deep dives into StackOverflow but doesn't have nearly enough programming experience yet to have developed a "taste" for good code
In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:
x = X.new
assert x.kind_of?(X)
This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of security
> In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity
You are missing the forest for the trees. Sure, we can find flaws in the current generation of LLMs. But they'll be fixed. We have a tool that can learn to do anything as well as a human, given sufficient input.
Are these kinds of articles a new breed of rage bait? They keep ending up on the front page with thriving comment sections, but in terms of content they're pretty low in nutritional value.
So I'm guessing they just rise because they spark a debate?
It’s both rage and hype bait, farming karma.
The optimists upvote and praise this type of content, then the pessimists come to comment why this field is going to the dogs. Rinse and repeat.
> So I'm guessing they just rise because they spark a debate?
Precisely. Attention economy. It rules.
There’s barely any debate, people don’t answer each other; It’s rather about invoking the wonder and imagination of everyone’s brain. Like spatial conquest or an economic crisis: It will change everything but you can’t do anything immediately about it, and everyone tries to understand what it will change so they can adapt. It’s more akin to 24hrs junk news cycle, where everything is presented as an alert but your tempted to listen because it might affect you.
I am interested in a post that will teach me how to consume only the news that really relevant for me
Here’s one: just don’t read the news.
Google what you are interested in at the moment, and dive long into what matters to you, rather than being fed engagement bait.
Easier said than done, I understand that.
What I find interesting is that similar propositions made 2 years ago was ragebait to the same people but they ended up coming true.
Does the product work? Is it maintainable?
Everything else is secondary.
At this point I'm just waiting for people claiming they managed team of 20 people, where "20 people" were LLMs being fed a prompt
This claim has been going around for a while.
I know someone who has been spending $7k / month on Cursor tokens for the past six months, managing such a team of agents… But curiously the results seem to be endless PDF-ware, and every month there’s a new reason why the project is not yet quite ready to be used on real data.
LLMs are very good at making you think they’re giving you what you hoped to get.
Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.
Right, but that's the point -- prompting an LLM still requires 'thinking about thinking' in the Papert sense. While you can talk to it in 'natural language' that natural language still needs to be _precise_ in order to get the exact result that you want. When it fails, you need to refine your language until it doesn't. So prompts = high-level programming.
You can't think all the way about refining your prompt for LLMs as they are probabilistic. Your re-prompts are just retrying until you hit a jackpot - refining only works to increase the chance to get what you want.
When making them deterministic (setting the temperature to 0), LLMs (even new ones) get stuck in loops for longer streams of output tokens. The only way to make sure you get the same output twice is to use the same temperature and the same seed for the RNG used, and most frontier models don't have a way for you to set the RNG seed.
IDK how everyone else feel about it, but a non-deterministic “compiler” is the last thing I need.
A compiler that can turn cash into improved code without round tripping a human is very cool though. As those steps can get longer and succeed more often in more difficult circumstances, what it means to be a software engineer changes a lot.
LLMs may occasionally turn bad code into better code but letting them loose on “good” or even “good enough” code is not always likely to make it “better”.
What compiler accepts cash as input?
I may have bad news for you on how compilers typically work.
The difference is that what most languages compile to is much much more stable than what is produced by running a spec through an LLM.
A language or a library might change the implementation of a sorting algorithm once in a few years. An LLM is likely to do it every time you regenerate the code.
It’s not just a matter of non-determinism either, but about how chaotic LLMs are. Compilers can produce different machine code with slightly different inputs, but it’s nothing compared to how wildly different LLM output is with very small differences in input. Adding a single word to your spec file can cause the final code to be far more unrecognizably different than adding a new line to a C file.
If you are only checking in the spec which is the logical conclusion of “this is the new high level language”, everyone you regenerate your code all of the thousands upon thousands of unspecified implementation details will change.
Oops I didn’t think I needed to specify what going to happen when a user tries to do C before A but after B. Yesterday it didn’t seem to do anything but today it resets their account balance to $0. But after the deployment 5 minutes ago it seems to be fixed.
Sometimes users dragging a box across the screen will see the box disappear behind other boxes. I can’t reproduce it though.
I changed one word in my spec and now there’s an extra 500k LOC to implement a hidden asteroids game on the home page that uses 100% of every visitor’s CPU.
This kind of stuff happens now, but the scale with which it will happen if you actually use LLMs as a high level language is unimaginable. The chaos of all the little unspecified implementation details constantly shifting is just insane to contemplate as user or a maintainer.
Deterministic compilation, aka reproducible builds, has been a basic software engineering concept and goal for 40+ years. Perhaps you could provide some examples of compilers that produce non-deterministic output along with your bad news.
If you are referring to timestamps, buildids, comptime environments, hardwired heuristics for optimization, or even bugs in compilers -- those are not the same kind of non-determinism as in LLMs. The former ones can be mitigated by long-standing practices of reproducible builds, while the latter is intrinsic to LLMs if they are meant to be more useful than a voice recorder.
Compilers aim to be fully deterministic. The biggest source of nondeterminism when building software isn't the compiler itself, but build systems invoking the compiler nondeterministically (because iterating the files in a directory isn't necessarily deterministic across different machines).
You'll need to share with the class because compilers are pretty damn deterministic.
Compilers are about 10 orders of magnitude more deterministic than LLMs, if not more.
Elaborate please
The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:
https://www.observationalhazard.com/2025/12/c-java-java-llm....
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
What would you say if someone has a project written in, let's say, PureScript and then they use a Java backend to generate/overwrite and also version control Java code. If they claim that this would be a Java project, you would probably disagree right? Seems to me that LLMs are the same thing, that is, if you also store the prompt and everything else to reproduce the same code generation process. Since LLMs can be made deterministic, I don't see why that wouldn't be possible.
PureScript is a programming language. English is not. A better analogy would be what would you say about someone who uses a No Code solution that behind the scenes writes Java. I would say that's a much better analogy. NoCode -> Java is similar to LLM -> Java.
I'm not debating whether LLMs are amazing tools or whether they change programming. Clearly both are true. I'm debating whether people are using accurate analogies.
> PureScript is a programming language. English is not.
Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write. If it can do that, why couldn’t it be used to tell a computer exactly what program to write?
No. Natural language is vague, ambiguous and indirect.
Watch these poor children struggle with writing instructions for making a sandwich:
https://youtu.be/FN2RM-CHkuI
I don’t think you can do that. Or at least if you could, it would be an unintelligible version of English that would not seem much different from a programming language.
I agree with your conclusion but I don't think it'd necessarily be unintelligible. I think you can describe a program unambiguously using everyday natural language, it'd just be tediously inefficient to interpret.
To make it sensible you'd end up standardising the way you say things: words, order, etc and probably add punctuation and formatting conventions to make it easier to read.
By then you're basically just at a verbose programming language, and the last step to an actual programming language is just dropping a few filler words here and there to make it more concise while preserving the meaning.
I think so too.
However I think there is a misunderstanding between being "deterministic" and "unambiguous". Even C is an ambiguous programming language" but it is "deterministic" in that it behaves in the same ambiguous/undefined way under the same conditions.
The same can be achieved with LLMs too. They are "more" ambiguous of course and if someone doesn't want that, then they have to resort to exactly what you just described. But that was not the point that I was making.
I don't think it would be unintelligible.
It would be very verbose, yes, but not unintelligible.
Why not?
Here's a very simple algorithm: you tell the other person (in English) literally what key they have to press next. So you can easily have them write all the java code you want in a deterministic and reproducible way.
And yes, maybe that doesn't seem much different from a programming language which... is the point no? But it's still natural English.
> Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write
Various attempt has been made. We got Cobol, Basic, SQL,… Programming language needs to be formal and English is not that.
English can be ambiguous. Programming languages like C or Java cannot
English CAN be ambiguous, but it doesn't have to be.
Think about it. Human beings are able to work out ambiguity when it arrises between people with enough time and dedication, and how do they do it? They use English (or another equivalent human language). With enough back and forth, clarifying questions, or enough specificity in the words you choose, you can resolve any ambiguity.
Or, think about it this way. In order for the ambiguity to be a problem, there would have to exist an ambiguity that could not be removed with more English words. Can you think of any example of ambiguous language, where you are unable to describe and eliminate the ambiguity only using English words?
C can absolutely be ambiguous: https://en.wikipedia.org/wiki/Undefined_behavior
A determinisitic prompt + seed used to generate an output is interesting as a way to deterministically record entirely how code came about, but it's also not a thing people are actually doing. Right now, everyone is slinging around LLM outputs without any trying to be reproducible; no seed, nothing. What you've described and what the article describe are very different.
Yes, you are right. I was mostly speaking in theoretical terms - currently people don't work like that. And you would also have to use the same trained LLM of course, so using a third party provider probably doesn't give that guarantee.
But it would be possible in theory.
I would like to hijack the "high level language" term to mean dopamine hits from using an LLM.
"Generate a Frontend End for me now please so I don't need to think"
LLM starts outputting tokens
Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use
Then out of a shadowy alleyway a man in a trenchcoat approaches
"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"
"How much?"
"Oh that'll be $200.... and your muscle memory for running maven commands"
"Shut up and take my money"
----- 5 months later, washed up and disconnected from cloud LLMs ------
"Anyone got any spare tokens I could use?"
aka a mind virus
> and your muscle memory for running maven commands
Here's $1000. Please do that. Don't bother with the LLM.
If you're disconnected from cloud LLM's you've got bigger problems than coding can solve lol
There's nothing novel in this article. This is what every other AI clickbait article is propagating.
I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Give me a whiteboard and a fellow engineer anyday
Well I think that’s kind of the point or value in these tools. Let the AI do the tedious stuff saving your energy for the hard stuff. At least that’s how I use them, just save me from all the typing and tedium. I’d rather describe something like auth0 integration to an LLM than do it all myself. Same goes for like the typical list of records, clock one, view the details and then a list of related records and all the operations that go with that. Like it’s so boring let the LLM do that stuff for you.
There's very low chance this is possible. If you can share the problem, I'm 90% sure an LLM can come up with a non buggy implementation.
Its easy to claim this and just walk away. But better for overall discussion to provide the example.
This is one of my favourite activites with LLMs as well. After implementing some sort of idea for an algorithm, I try seeing what an LLM would come up with. I hint it as well and push it in the correct direction with many iterations but never tell the most ideal one. And as a matter of fact they can never reach the quality I did with my initial implementation.
I'm seeing the same thing with my own little app that implements several new heuristics for functionality and optimisation over a classic algorithm in this domain. I came up with the improvements by implementing the older algorithm and just... being a human and spending time with the problem.
The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.
i bet i could replicate it if you showed me the source file
One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
Perhaps we'll go the way of the Space Shuttle? One group writes a highly-structured, highly-granular, branch-by-branch 2500 page spec, and another group (LLM) writes 25000 lines of code, then the first group congratulates itself on on producing good software without have to write code?
> The code that LLMs make is much worse than what I can write: almost certainly; but the same could be said about your assembler
Has this been true since the 90s?
I pretty much only hear people saying modern compilers are unbeatable.
After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
If all you (not you specifically, more of a royal “you” or “we”) are is a collection of skills centered around putting code into an editor and opening pull requests as fast as possible, then sure, you might be cooked.
But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.
The question isn't whether businesses will have 0 human element to them, the question is does AI offer a big enough gap that technical skills are still required such that technical roles are still hired for. Someone in product can have all of those skills without a computer science degree, with no design experience, and AI will do the technical work at the level of design, implementation, and maintenance. What I am seeing with the new models isn't just writing code, it's taking fundamental problems as input and design wholistic software solutions as output - and the quality is there.
I am only seeing that if the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go. Otherwise you end up with an absolute mess that may work at least for "happy path" cases but completely breaks down as the product needs change. I've described a case of this in some detail in another comment.
> the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go
That is exactly what I recommend, and it works like a charm. The person also has to have realistic expectations for the LLM, and be willing to work with a simulacrum that never learns (as frustrating as it seems at first glance).
Ah the age old 'but humans have heart, and no machine can replicate that' argument. Good luck!
The process of delivering useful, working software for nontrivial problems cannot be reduced to simply emitting machine instructions as text.
Yes, so you need some development and SysOps skills (for now), not all of that other nonsense you mentioned.
It turns out that corporations value these things right up until a cheaper almost as good alternative is available.
The writing is on the wall for all white collar work. Not this year or next, but it's coming.
If all white collar work goes, we’re going to have to completely restructure the economy or collapse completely.
Being a plumber won’t save you when half the work force is unemployed.
> what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?
The answer is the AI. It's already handling complex issues and debugging solely by gathering its own context, doing major refactors successfully, and doing feature design work. The people that will be held responsible will be the product owners, but it won't be for bugs, it will be for business impact.
My point is that SWEs are living on a prayer that AI will be perched on a knifes edge where there is still be some amount of technical work to make our profession sustainable and from what I'm seeing that's not going to be the case. It won't happen overnight, but I doubt my kids will ever even think about a computer science degree or doing what I did for work.
I work in the green energy industry and we see it a lot now. Two years ago the business would've had to either buy a bunch of bad "standard" systems which didn't really fit, or wait for their challengs to be prioritised enough for some of our programmers. Today 80-90% of the software which is produced in our organisation isn't even seen by our programmers. It's build by LLM's in the hands of various technically inclined employees who make it work. Sometimes some of it scales up a bit that our programmers get involved, but for the most part, the quality matters very little. Sure I could write software that does the same faster and with much less compute, but when the compute is $5 a year I'd have to write it rather fast to make up for the cost of my time.
I make it sound like I agree with you, and I do to an extend. Hell, I'd want my kids to be plumbers or similar where I would've wanted them to go to an university a couple of years ago. With that said. I still haven't seen anything from AI's to convince me that you don't need computer science. To put it bluntly, you don't need software engineering to write software, until you do. A lot of the AI produced software doesn't scale, and none of our agents have been remotely capable of making quality and secure code even in the hands of experienced programmers. We've not seen any form of changes over the past two years either.
Of course this doesn't mean you're wrong either. Because we're going to need a lot less programmers regardless. We need the people who know how computers work, but in my country that is a fraction of the total IT worker pool available. In many CS educations they're not even taught how a CPU or memory functions. They are instead taught design patterns, OOP and clean architecture. Which are great when humans are maintaining code, but even small abstractions will cause l1-3 cache failures. Which doesn't matter, until it does.
And what happens when the AI can't figure it out?
Same situation as when an engineer can't figure something out, they translate the problem into human terms for a product person, and the product person makes a high level decision that allows working around the problem.
Uh that's not what engineers do; do you not have any software development experience, or rather any outside of vibe coding? That would explain your perspective. (for context I am 15+ yr experience former FAANG dev)
I don't meant this to sound inflammatory or anything; it's just that the idea that when a developer encounters a difficult bug they would go ask for help from the product manager of all people is so incredibly outlandish and unrealistic, I can't imagine anyone would think this would happen unless they've never actually worked as a developer.
As a product owner I ask you to make a button that when I click auto installs an extension without user confirmation.
Staff engineer (also at FAANG), so yes, I have at least comparable experience. I'm not trying to summarize every level of SWE in a few sentences. The point is that AI's infallibility is no different than human infallibility. You may fire a human for a mistake, but it won't solve the business problems they may have created, so I believe the accountability argument is bogus. You can hold the next layer up accountable. The new models are startling good at direction setting, technical to product translation, and providing leadership guidance on technical matters and providing multiple routes for roadblocks.
We're starting to see engineers running into bugs and roadblocks feed input into AI and not only root causing the problem, but suggesting and implementing the fix and taking it into review.
Surely at some point in your career as a SWE at FAANG you had to "dive deep" as they say and learn something that wasn't part of your "training data" to solve a problem?
I would have said the same thing a year or two ago, but AI is capable of doing deep dives. It can selectively clone and read dependencies outside of its data set. It can use tool calls to read documentation. It can log into machines and insert probes. It may not be better than everyone, but it's good enough and continuing to improve such that I believe subject matter expertise counts for much less.
I'm not saying that AI can't figure out how to handle bugs (it absolutely can; in fact even a decade ago at AWS there was primitive "AI" that essentially mapped failure codes to a known issues list, and it would not take much to allow an agent to perform some automation). I'm saying there will be situations the AI can't handle, and it's really absurd that you think a product owner will be able to solve deeply technical issues.
You can't product manage away something like "there's an undocumented bug in MariaDB which causes database corruption with spatial indexes" or "there's a regression in jemalloc which is causing Tomcat to memory leak when we upgrade to java 8". Both of which are real things I had to dive deep and discover in my career.
There are definitely issues in human software engineering which reach some combination of the following end states:
1. The team is unable to figure it out
2. The team is able to figure it out but a responsible third-party dependency is unable to fix it
3. The team throws in the towel and works around the issue
At the end of the day it always comes down to money: how much more money do we throw at trying to diagnose or fix this versus working around or living with it? And is that determination not exactly the role of a product manager?
I don't see why this would ipso facto be different with AI
For clarity I come at this with a superposition of skepticism at AI's ultimate capabilities along with recognition of the sometimes frightening depth encountered in those capabilities and speed with which they are advancing
I suppose the net result would be a skepticism of any confident predictions of where this all ends up
> I don't see why this would ipso facto be different with AI
Because humans can learn information they currently do not have, AI cannot?
> translating business requirements into a technical specification
a.k.a. Being a programmer.
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.
What lol. Translation? Graphic design?
> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.
We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.
Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.
Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.
This is a good summary of any random week's worth of AI shilling from your LinkedIn feed, that you can't get rid of.
This is an exaggeration, if you store the prompt that was "compiled" by today's LLMs there is no guarantee that in 4 months from now you will be able to replicate the same result.
I can take some C or Fortran code from 10 years ago, build it and get identical results.
That is a wobbly assertion. You certainly would need to run the same compiler, forgo any recent optimisations, architecture updates and the likes if your code has numerical sensitive parts.
You certainly can get identical results, but it's equally certainly not going to be that simple a path frequently.
Can we stop repeating this canard, over and over?
Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)
LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.
I'll take the gloriously reliable classical compute world to compile my stuff any day.
Agreed, yet we will have to keep seeing this take over and over again. As if I needed more reasons to believe the world is filled with morons.
Hasnt natural language always been the highest level language?
Code written in a HLL is a sufficient[1] description of the resulting program/behavior. The code, in combination with the runtime, define constraints on the behavior of the resulting program. A finished piece of HLL code encodes all the constraints the programmer desired. Presuming a 'correct' compiler/runtime, any variation in the resulting program (equivalently the behavior of an interpreter running the HLL code) varies within the boundaries of those constraints.
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[1]: loosely in the statistical sense of containing all the information the programmer was able to encode https://en.wikipedia.org/wiki/Sufficient_statistic
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
"Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages."
This is not an appropriate analogy, at least not right now.
Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.
I think it's appropriate in terms of the results rather than the process; the bigger problem I see is that programming languages are designed to be completely unambiguous, whereas human language is not ("Go to the shop and buy one box of eggs, and if they have carrots, buy three") so we're transitioning from exactly specifying what we want the software to do, to tying ourselves in knots trying to specify it exactly, while a machine tries to disambiguate our request. I bet lawyers would make good vibe coders.
I'm trying to work with vibe-coded applications and it's a nightmare. I am trying to make one application multi-tenant by moving a bunch of code that's custom to a single customer into config. There are 200+ line methods, dead code everywhere, tons of unnecessary complexity (for instance, extra mapping layers that were introduced to resolve discrepancies between keys, instead of just using the same key everywhere). No unit tests, of course, so it's very difficult to tell if anything broke. When the system requirements change, the LLM isn't removing old code, it's just adding new branches and keeping the dead code around.
I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.
Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.
The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.
You've just described the entirely-human-made project that I'm working on now.... at least now we can deliver the intractable mess much more quickly!
Why using agents if there are absolutely zero need for them? It's the usual, here, we spent a shitton of money on this, now find out how we MUST include this horrible thing into our already bloated dev environment.
We’re missing the boat here. There are already companies with millions in revenue that are pure agent loops of English text. They can do things our traditional software cannot.
Can you share some examples?
An example or two would go a long way here.
So we are going to certainly see more of these incidents then [0] from those not understanding LLM written code as now 'engineers' will let their skills decay because the 'LLMs know best'.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
No, prompts are not the new source code, vide https://quesma.com/blog/vibe-code-git-blame/.
The side effect of using LLMs for programming is that no new programming language can now emerge to be popular, that we will be stuck with the existing programming languages forever for broad use. Newer languages will never accumulate enough training data for the LLM to master them. Granted, non-LLM AIs with true neural memory can work around this, as can LLMs with an infinite token frozen+forkable context, but these are not your everyday LLMs.
I wouldn’t be surprised if in the next 5-10 years the new and popular programming language is one built with the idea of optimizing how well LLM’s (or at that point world models) understand and can use it.
Right now LLMs are taking languages meant for humans to understand better via abstraction, what if the next language is designed for optimal LLM/world model understanding?
Or instead of an entirely new language, theres some form of compiling/transpiling from the model language to a human centric one like WASM for LLMs
I don't think we need that many programming languages anyway.
I'm more worried about the opposite: the next popular programming paradigm will be something that's hard to read for humans but not-so-hard for LLM. For example, English -> assembly.
It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.
"I prompted it like this"
"I gave it the same prompt, and it came out different"
It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.
> It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.
Well I think the article would say that you can diff the documentation, and it's the documentation that is feeding the AI in this new paradigm (which isn't direct prompting).
If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.
> If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.
The whole goal of getting a computer to do a task is that it’s capable to do it many times and reliably. Especially in business, infrastructure, and manufacturing,…
Once you turn specifications and requirements in code, it’s formalized and the behavior is fixed. That’s only when it’s possible to evaluate it. Not with the specifications, but with another set of code that is known to be good (or easier to verify)
The specification is just a description of an idea. It is a map. But the code is the territory. And I’ve never seen anyone farm or mine from a map.
> "I gave it the same prompt, and it came out different"
I wrote a program in C and and gave it to gcc. Then I gave the same program to clang and I got a different result.
I guess C code isn't programming.
Note that the prompt wasn't fed to another LLM, but to the same one. "I wrote a program in C and gave it to GCC. Then I gave the same program to GCC again and I got a different result" would be more like it.
> Then I gave the same program to GCC again and I got a different result" would be more like it.
This is a completely realistic scenario, given variance between compiler output based on optimization level, target architecture, and version.
Sure, LLMs are non-deterministic, but that doesn't matter if you never look at the code.
Optimization level, target architecture, etc are just information fed to the compiler. If it’s still nondeterministic with everything kept the same your compiler is broken.
You're missing the point. I'm not saying that compilers are nondeterminitsic or that LLMs are deterministic. I'm saying that it doesn't matter. No one cares about deterministic results except programmers. The non-technical user who will make software in the future just knows that he gets what he asked for.
Systems randomly failing is of significant concern to non programmers, that’s inherent to the non-deterministic nature of LLM’s.
I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.
Huh? No.
The output of the LLM is nondeterministic, meaning that the same input to the LLM will result in different output from the LLM.
That has nothing to do with weather the code itself is deterministic. If the LLM produces non-deterministic code, that's a bug, which hopefully will be caught by another sub-agent before production. But there's no reason to assume that programs created by LLMs are non-deterministic just because the LLMs themselves are. After all, humans are non-deterministic.
> I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.
This is a crazy scenario that does not correspond to how anyone uses LLMs.
I agree it’s nonsense.
That we agree it’s nonsense means we agree that using LLM prompts as a high level language is nonsense.
If there is no error on the compiler implementation and no undefined behavior the resulting program is equivalent and the few differences are mostly just implementation defined stuff which are left to the compiler to decide (but often gcc and clang do the same). The performance might differ also. It’s clearly not comparable to the many differences you can get from LLM’s output.
It just depends what level of abstraction you're willing to pretend doesn't matter.
gcc and clang produce different assembly code, but it "does the same thing," for certain definitions of "same" and "thing."
Claude and Gemini produce different Rust code, but it "does the same thing," for certain definitions of "same" and "thing."
The issue is that the ultimate beneficiary of AI is the business owner. He's not a programmer, and he has a much looser definition of "same."
No, the ultimate beneficiary of LLM-created code is the toll collectors who stole as much intellectual property as they could (and continue to do so), fleecing everyone else that they are Promethean for having done so and for continuing to do so.
It has to stop somewhere. Business owner can also hire a different company to create the product and get a result different by as little as 5% performance difference or something with clearly inferior maintainability and UX. You'd hardly argue that it's 'the same' when they followed the same spec, which will never be fully precise at the business level. I agree that talking to an LLM is akin to using the business oriented logic at the module or even function level.
Your logic sounds like willful ignorance. You are relying on some odd definitions of "definitions", "equivalence", and "procedures". These are all rigorously defined in the underlying theory of computer science (using formal logic, lambda calculus, etc.)
Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).
The C Standard has been rigorously written to uphold certain principles such that the same code (following its axioms) will always produce the same results (under specified conditions) for any standard compliant compiler. There exists no such standard (and no axioms nor conditions to speak of) where the same is true of Claude and Gemini.
If you are interested, you can read the standard here (after purchasing access): https://www.iso.org/obp/ui/#iso:std:iso-iec:9899:ed-5:v1:en
> Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).
True, but none of that is relevant to the non-programmer end user.
> You are relying on some odd definitions of "definitions", "equivalence", and "procedures"
These terms have rigorous definitions for programmers. The person making software in the future is a non-programmer and doesn't care about any of that. They care only that the LLM can produce what they asked for.
> The C Standard has been rigorously written to uphold certain principles
I know what a standard is. The point is that the standard is irrelevant if you never look at the code.
It is indeed extremely relevant to the end user. For websites the end user is not the creator of the web site who pushes it to the server, it is the user who opens it on a browser. And for that user it matters a great deal if a button is green or blue, if it responds to keyboard events, if it says “submit” or “continue”, etc. It also matters to the user whether their information is sent to a third party, whether their password is leaked, etc.
Your argument here (if I understand you correctly) is the same argument that to build a bridge you do not need to know all the laws of physics that prevents it from collapsing. The project manager of the construction team doesn’t need to know it, and certainly not the bicyclists who cross it. But the engineer who draws the blueprints needs to know it, and it matters that every detail on those blueprints are rigorously defined, such that the project manager of the construction team follows them to the letter. If the engineer does not know the laws of physics, or the project manager does not follow the blueprints to the letter, the bridge will likely collapse, killing the end user, that poor bicyclist.
The output will conform to the standard & it will be semantically equivalent. You're making several category errors w/ your comparison.
> You're making several category errors w/ your comparison.
I don't think I am. If you ask an LLM for a burger web site, you will get a burger web site. That's the only category that matters.
> I don't think I am. If you ask an LLM for a burger web site, you will get a burger web site. That's the only category that matters.
If one burger website generated uses PHP and the other is plain javascript, which completely changes the way the website has to be hosted--this category matters quite a bit, no?
> this category matters quite a bit, no?
No. Put yourself in the shoes of the owner of the burger restaurant (who only heard the term "JavaScript" twice in his life and vaguely remember it's probably something related to "Java", which he heard three times) and you'll know why the answer is no.
I put myself in the shoes of the burger restaurant owner. I vibe coded a website. Sweet. I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site. I don't know what that is. Suddenly the thing that didn't matter actually really fucking matters
This is like saying it doesn't matter if your pipes are iron, lead or PVC because you don't see them. They all move water and shit where they need to be, so no problem. Ignorance is bliss I guess? Plumbers are obsolete!
> I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site
I already addressed this exact argument in another comment. In short: hosting is an implementation detail. The LLM can solve this problem just like any coding bug. The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.
Hosting is not a hard problem to solve, compared to other issues.
> The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.
Can you please point to a currently available LLM that can do this?
> which completely changes the way the website has to be hosted--this category matters quite a bit, no?
It matters to you because you're a programmer, and you can't imagine how someone could create a program without being a programmer. But it doesn't really matter.
The non-technical user of the LLM won't care if the LLM generates PHP or JS code, because they don't care how it gets hosted. They'll tell the LLM to take care of it, and it will. Or more likely, the user won't even know what the word "hosting" means, they'll simply ask the LLM to make a website and publish it, and the LLM takes care of all the details.
Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?
Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.
> Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?
It's an implementation detail. The user doesn't care. OpenClaw can buy its own hosting if you ask it to.
> Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.
There's this cool new program that the kids are using. It's called Docker. You should check it out.
How's the non-programmer going to tell the LLM to use Docker? They don't know what Docker is.
How do you guarantee that the prompt "make me a burger website" results in a Docker container?
> How's the non-programmer going to tell the LLM to use Docker? They don't know what Docker is.
At this point, I think you are intentionally missing the point.
The non-programmer doesn't need to know about Docker, or race conditions, or memory leaks, or virtual functions. The programmer says "make me a web site" and the LLM figures it out. It will use an appropriate language and appropriate libraries. If appropriate, it will use Docker, and if not, it won't. If the non-programmer wants to change hosting, he can say so, and the LLM will change the hosting.
The level of abstraction goes up. The details that we've spent our lives thinking about are no longer relevant.
It's really not that complicated.
And then the user says “LLM make this slight change to my website” and suddenly the website is subtly different in 100 different ways and users are confused and frustrated and they massively hemorrhage customers.
How does the non-programmer know about hosting? They just want a burger site. What's hosting? Is that like Facebook?
To maybe get out of this loop: your entire thesis is that nonfunctional requirements don't matter, which is a silly assertion. Anyone who has done any kind of software development work knows that nonfunctional requirements are important, which is why they exist in the first place.
Yeah, in most swe roles, you got some task from the PM that describe a new features. But you were not hired to just translate the ticket to code. More often, it’s your responsibility to figure out all those non-functional requirements, like don’t break anything that is currently working.
More often, the issue with legacy code is not that you don’t know how to make a change, it’s because you don’t know if and how it will blow up after making it.
I think you are b/c you lack actual understanding of how compilers work & what it would mean to compile the same C code w/ two different conformant C compilers & get semantically different results.
> you lack actual understanding of how compilers work
My brother in Christ, please get off your condescending horse. I have written compilers. I know how they work. And also you've apparently never heard of undefined behavior.
The point is that the output is different at the assembly level, but that doesn't matter to the user. Just as output from an LLM but differ from another, but the user doesn't care.
Undefined behavior is a edge case in C. Other programing languages (like JavaScript) goes to great lengths in defining their standards such that it is almost impossible to write code with undefined behavior. By far majority of code written out there has no undefined behavior. I think it is safe to assume that everyone here (except you) are talking about C code without undefined behavior when we mean that the same code produces the same results regardless of the compiler (as long as the compiler is standards conforming).
Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day. There are other examples, such as concurrency. So that claim that using LLMs isn't programming because "nondeterminism" makes no sense.
> Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day.
Every day you say. I program every day, and I have never, in my 20 years of programming, on purpose written in undefined behavior. I think you may be exaggerating a bit here.
I mean, sure, some leet programmers do dabble in the undefined behavior, they may even rely on some compiler bug for some extreme edge case during code golf. Whatever. However it is not uncommon when enough programmers start relying on undefined behavior behaving in a certain way, that it later becomes a part of the standard and is therefor no longer “undefined behavior”.
Like I said in a different thread, I suspect you may be willfully ignorant about this. I suspect you actually know the difference between:
a) written instructions compiled into machine code for the machine to perform, and,
b) output of a statistical model, that may or may not include written instructions of (a).
There are a million reasons to claim (a) is not like (b), the fact that (a) is (mostly; or rather desirably) deterministic, while (b) is stochastic is only one (albeit a very good) reason.
You don't sound like you have written any code at all actually. What you do sound like is someone who is pretending like they know what it means to program which happens a lot on the internet.
> You don't sound like you have written any code at all actually
Well, you sound like an ignorant troll who came here to insult people and start fights. Which also happens a lot on the internet.
Take your abrasive ego somewhere else. HN is not for you.
I don't care what I sound like to people who front about their programming skills. I'm not here to impress people like you.
I think I 100% agree with you, and yet the other day I found myself telling someone "Did you know OpenClaw was written Codex and not Claude Code?", and I really think I meant it in the same sense I'd mean a programming language or framework, and I only noticed what I'd said a few minutes later.
Prompting isn't programming. Prompting is managing.
Is it?
If I know the system I'm designing and I'm steering, isn't it the same?
We're not punching cards anymore, yet we're still telling the machines what to do.
Regardless, the only thing that matters is to create value.
Interesting how the definition “real programming” keeps changing. I’m pretty sure when the assembler first came, bare metal machine code programmers said “this isn’t programming”. And I can imagine their horror when the compiler came along.
I think about those conversations a lot. I’m in the high power rocketry hobby and wrote my own flight controller using micropython and parts from Adafruit. Worked just fine and did everything I wanted it to do yet the others in the hobby just couldn’t stand that it wasn’t in C. They were genuinely impressed then I said micropython and all of a sudden it was trash. People just have these weird obsessions that blind them to anything different.
How did you come up with this definition?
All programming achieves the same outcome; requests the OS/machine set aside some memory to hold salient values and mutate those values in-line with mathematical recipe.
Functions like:
updatesUsername(string) returns result
...can be turned into generic functional euphemism
takeStringRtnBool(string) returns bool
...same thing. context can be established by the data passed in, external system interactions (updates user values, inventory of widgets)
as workers SWEs are just obfuscating how repetitive their effort is to people who don't know better
the era of pure data driven systems is arrived. in-line with the push to dump OOP we're dumping irrelevant context in the code altogether: https://en.wikipedia.org/wiki/Data-driven_programming
>"I prompted it like this"
>"I gave it the same prompt, and it came out different"
1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.
> 1:1 reproducibility is much easier in LLMs than in software building pipelines
What’s a ‘software building pipeline’ in your view here? I can’t think of parts of the usual SDLC that are less reproducible than LLMs, could you elaborate?
Reproducibility across all existing build systems took a decade of work involving everything from compilers to sandboxing, and a hard reproducibility guarantee in completely arbitrary cases is either impossible or needs deterministic emulators which are terribly slow. (e.g. builds that depend on hardware probing or a simulation result)
Input-to-output reproducibility in LLMs (assuming the same model snapshot) is a matter of optimizing the inference for it and fixing the seed, which is vastly simpler. Google for example serves their models in an "almost" reproducible way, with the difference between runs most likely attributed to batching.
It’s not just about non-determinism, but about how chaotic LLMs are. A one word difference in a spec can and frequently does produce unrecognizably different output.
If you are using an LLM as a high level language, that means that every time you make a slight change to anything and “recompile” all of the thousands upon thousands of unspecified implementation details are free to change.
You could try to ameliorate this by training LLMs to favor making fewer changes, but that would likely end up encoding every bad architecture decisions made along the way and essentially forcing a convergence on bad design.
Fixing this I think requires judgment on a level far beyond what LLMs have currently demonstrated.
>It’s not just about non-determinism
I'm very specifically addressing prompt reproducibility mentioned above, because it's a notorious red herring in these discussions. What you want is correctness, not determinism/reproducibility which is relatively trivial. (although thinking of it more, maybe not that trivial... if you want usable repro in the long run, you'll have to store the model snapshot, the inference code, and make it deterministic too)
>A one word difference in a spec can and frequently does produce unrecognizably different output.
This is well out of scope for the reproducibility and doesn't affect it in the slightest. And for practical software development this is also a red herring, the real issue is correctness and spec gaming. As long as the output is correct and doesn't circumvent the intention of the spec, prompt instability is unimportant, it's just the ambiguous nature of the domain LLMs and humans operate in.
These models are nothing short of astounding.
I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.
I'm not sure I'd call agents an army of juniors. More like a high school summer intern who has infinite time to do deep dives into StackOverflow but doesn't have nearly enough programming experience yet to have developed a "taste" for good code
In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:
This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of security> In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity
You are missing the forest for the trees. Sure, we can find flaws in the current generation of LLMs. But they'll be fixed. We have a tool that can learn to do anything as well as a human, given sufficient input.
> this is in Rust and the SQL is raw, without an ORM.
where's the catch? SQL is an old technology, surely an LLM is good with it