The core of the entire argument is that the $150/hour is based on a developers ability to physically write code, which is not true. Having something that can generate code reliabily (which these things can barely do even with an expert at the wheel) doesn't address any of the actual hard problems we deal with on a daily basis.
Plus running AI tools is going to get much more expensive. The current prices aren't sustainable long term and they don't have any viable path to reducing costs. If anything the cost of operations for the big company are going to get worse. They're in the "get 'em hooked" stage of the drug deal.
> Having something that can generate code reliabily (which these things can barely do even with an expert at the wheel) doesn't address any of the actual hard problems we deal with on a daily basis.
Not understanding that is something I've been seeing management repeatedly doing for decades.
This article reads like all the things I discovered and the mistakes the company I worked for made learning how to outsource software development back in the late 90s and early 2000s. The only difference is this is using AI to generate the code instead of lower paid developers from developing nations. And, just like software outsourcing as an industry created practices and working styles to maximise profit to outsourcing companies, anyone who builds their business relying on OpenAI/Anthropic/Google/Meta/whoever - is going to need to address the risk of their chosen AI tool vendor ramping up the costs of using the tools to extract all; the value of the apparent cost savings.
This bit matches exactly with my experience:
"The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems. Who does know what code would be needed to solve complex problems? Currently that's only known by software developers, development managers and product managers, three job classifications that are going to be merging rapidly."
We found that assuming the people you employ as "developers" weren't actually also doing the dev management and product management roles was wrong. At least for our business where there were 6 or 8 devs who all understood the business goals and existing codebase and technology. When we eventually;y got successful outsourced development working was after we realised that writing code from lists of tasks/requirements was way less than 50% of what our in-house development team had been doing for years. We ended up saving a lot of money on that 30 or 40% of the work, but 60 or 70% of the higher level _understanding the business and tech stack_ work still needed to be done by people who understood the whole business and had a vested interest in the business succeeding.
Completely agree on your first point: software development is so much more than writing code. LLMs are a threat to programmers for whom the job is 8 hours a day of writing code to detailed specifications provided by other people. I can't remember any point in my own career where I worked with people who got to do that.
There's a great example of that in the linked post itself:
> Let's build a property-based testing suite. It should create Java classes at random using the entire range of available Java features. These random classes should be checked to see whether they produce valid parse trees, satisfying a variety of invariants.
Knowing what that means is worth $150/hour even if you don't type a single line of code to implement it yourself!
And to be fair, the author makes that point themselves later on:
> Agentic AI means that anything you know to code can be coded very rapidly. Read that sentence carefully. If you know just what code needs to be created to solve an issue you want, the angels will grant you that code at the cost of a prompt or two. The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems.
On your second point: I wouldn't recommend betting against costs continuing to fall. The cost reduction trend has been reliable over the past three years.
In 2022 the best available models was GPT-3 text-davinci-003 at $60/million input tokens.
GPT-5 today is $1.25/million input tokens - 48x cheaper for a massively more capable model.
... and we already know it can be even cheaper. Kimi K2 came out two weeks ago benchmarking close to (possibly even above) GPT-5 and can be run at an even lower cost.
I'm willing to bet there are still significantly more optimizations to be discovered, and prices will continue to drop - at least on a per-token basis.
We're beginning to find more expensive ways to use the models though. Coding Agents like Claude Code and Codex CLI can churn through tokens.
I get your point, but I don't think the pricing is long term viable. We're in the burn everything to the ground to earn market share phase. Once things start to stabilize and there is no more user growth, they'll start putting the screws to the users.
I said the same thing about Netflix in 2015 and Gamepass in 2020. It might have taken a while but eventually it happened. And they're gonna have to raise prices higher and faster at some point.
> In 2022 the best available models was GPT-3 text-davinci-003 at $60/million input tokens.
>GPT-5 today is $1.25/million input tokens - 48x cheaper for a massively more capable model.
Yes - but.
GPT-5 and all the other modern "reasoning models" and tools burn through way more tokens to answer the same prompts.
As you said:
> We're beginning to find more expensive ways to use the models though. Coding Agents like Claude Code and Codex CLI can churn through tokens.
Right now, it feels that "frontier models" costs to use are staying the same as they've been for the entire ~5 year history of the current LLM/AI industry. But older models these days are comparably effectively free.
I'm wondering when/if there'll be a asymptotic flattening, where new frontier models are insignificantly better that older ones, and running some model off Huggingface on a reasonably specced up Mac Mini or gaming PC will provide AI coding assistance at basically electricity and hardware depreciation prices?
That really is the most interesting question for me: when will it be possible to run a model that is good enough to drive Claude Code or Codex CLI on consumer hardware?
I so completely want to agree with you. If this were anything other than JavaScript agreeing with your comment would be simple.
I wrote JavaScript in the corporate world for 15 years. Here is the reality:
* Almost nobody wants to do it. The people that get paid for it don't want to do it. They just want to get paid. The result is that everybody who does get paid for it completely sucks. Complete garbage, at least at work. There a lot of amazing people writing JavaScript, just not at work, and why would they try harder. Delivering quality at work far outside the bell curve just results in hostility aside for some very rare exceptions. My exception was when I was doing A/B testing for a major .com.
* Since everybody in the corporate JavaScript world completely sucks every major project eventually fails from a business perspective or stalls into lifeless maintenance mode. It just gets too expensive to maintain 5+ years later or too fragile to pivot to the next business demand. So, it has to get refactored or rebuilt. Sometime that means hoping the next generation framework is ready, and the business is willing to train people on it, and willing to go through growing pains. More often this means calling in outside parties who can do it correctly the first time. Its not about scale. Its about the ability to actually build something original and justify every hour productively. I was on both sides of that fence.
* The reason why the corporate overlords hire outside parties to fix problems from internal teams isn't just about talent. Keep in mind it's tremendously expensive. Yes, those people are capable of producing something that doesn't suck and do so faster. The bigger issue is that they will always deliver reliably, because they are executing under a contract with a work performance statement. The internal teams do not have a contract performance definition that will kill their careers or terminates their incomes. They just have to hope the business remains financial solvent so they don't get caught in a mass layoff. This breeds a lot of entitlement and false expectations that seem to grow on each other.
So, yes, in this case it really is about the ability to write code physically. Yes, you need to juggle client nonsense and have soft skills too, but those are layered on top of just being able to write the code. When your options are limited to a bunch of 0s that depend on copy/paste from predefined framework templates you need somebody who can actually justify their existence in a very practical solutions delivery way.
"You might be expecting that here is where I would start proclaiming the death of software development. That I would start on how the strange new angels of agentic AI are simply going to replace us wholesale in order to feast on that $150/hour, and that it's time to consider alternative careers. I'm not going to do that, because I absolutely don't believe it. Agentic AI means that anything you know to code can be coded very rapidly. Read that sentence carefully. If you know just what code needs to be created to solve an issue you want, the angels will grant you that code at the cost of a prompt or two. The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems. Who does know what code would be needed to solve complex problems? Currently that's only known by software developers, development managers and product managers, three job classifications that are going to be merging rapidly."
I don’t understand why the software luminaries are coalescing around rapid-coding. Code generation was already a thing for forever. And deterministic code generation is as good as it possibly gets for “what I would have written but faster”—you know what it’s going to create. (Yes that was an EMDASH.) But code generation isn’t that much used, I think. I don’t count compilers or other black-box code generation. I mean any kind of ad hoc or more structured “what I would have written but faster” approach. Looking at the AI conversation you’d expect to see many more ad hoc code generation tools to deal with any and all boilerplate.
The downsides of code generation are only amplified with LLM code generation. Oh it’s just what I would have written. Now on the fifteenth iteration/rewrite. Generated idiomatic code for twelve years ago. squints oh yeah I would have written that back then... gosh it feels good to be in this exclusive club.
Developers who get excited by agentic development put out posts like this. (I get excited too.)
Other developers tend to point out objections in terms of maintainability, scalability, overly complicated solutions, and so on. All of which are valid.
However, this part of AI evolves very quickly. So given these are known problems, why shouldn't we expect rapid improvements in agentic AI systems for software development, to the point where software developers who stick with the old paradigm will indeed be eroded in time? I'm genuinely curious because clearly the speed of advancement is significant.
> Other developers tend to point out objections in terms of maintainability, scalability, overly complicated solutions, and so on. All of which are valid.
I've spent the bulk of my 30+ career in various in-house dev/management roles, and small to medium sizes digital agencies or IT consulting places.
I that time I have worked on many hundreds of project, probably thousands.
There are maybe a few dozen that were still in production use without major rewrites on the way for more than 5 years.
I think for a huge amount of commercial projects, "maintainability" is something that developers are passional about, but that is of very little actual value to the client.
Back in the day when I spent a lot of time on comp.lang.perl.misc, there was a well know piece of advice "alway throw away the first version". My career-long takeaway from that has been to always race to a production ready proof of concept quickly enough to get it in front of people - ideally the people who are then spending the money that generates the business profits. Then if it turns successful, re write it from scratch incorporating everything you've learned from the first version - do not be tempted to continually tweak the hastily written code. These days people call something very like that "finding product market fit", and a common startup plan is to prove a business model, and them sell or be acquired before you need to spend the time/money on that rewrite.
Anecdotally, I find early mover advantage to be overrated (ask anyone who bought Betamax or HD-DVD players). It is significantly cheaper – on average – to exploit what you already know and learn from the mistakes of other, earlier movers.
Given that the problems are known and given that things are changing rapidly, we should expect them to be solved eventually (by some force)? No, I think the burden of proof is on whoever wants to address those problems. Not just refer to the never-changing answer “but why not?”[1]
All I see from “excited” developers is denial that there is a problem. But why would it be a problem that you have to review code generated by a program with the same fine-tooth comb that you use for human review?
[1] Some things change fast, some things never change at all.
It took me a while to get into it, but this is really good. You need to make it past the anecdote about building a property-based testing suite with Claude Code though, the real meat is in the second half.
The bit about Knight Capital implies that the software engineers were bad, which is notably untrue.
"A bad [software engineer] can easily destroy that much value even faster (A developer at Knight Capital destroyed $440 million in 45 minutes with a deployment error and some bad configuration logic, instantly bankrupting the firm by reusing a flag variable). "
Both the article's examples there are bogus -- yet in both cases the underlying points are true.
Google generates a lot of revenue per employee not because the employees are good (though many of them are of course), but because they own the front door to the web. And the Knight Capital story has many nuances left out by that summary.
In both cases the author needed a hard hitting but terse example. But as I said, both the claims are true, so in the voice of the courtroom judge, "I'll allow it."
There were decidedly shitty engineering decisions behind that dumpster fire.
The biggest being that the only safe way to recycle feature flag names is to put ample time separation between the last use of the previous meaning for the flag and the first application of the new use. They did not. If they had, they would have noticed that one server was not getting redeployed properly in the time gap between the two uses.
They also did not do a full rollback. They rolled back the code but not the toggles, which ignited the fire.
These are rookie mistakes. If you want to argue they are journeyman mistakes, I won’t fight you too much, but they absolutely demonstrate a lack of mastery of the problem domain. And when millions of dollars change hands per minute you’d better not be Faking it Til You Make It.
The powerpeg feature flag had been deprecated for 9 years? In aggregate, yes, the engineering led to the disaster, but the specific engineers had been making everyone piles of money for a long time. The mistakes were fatal, but in a tiny amount of time on a system that old it's honestly surprising it didn't happen sooner.
> Coding, the backbone and justification for the entire economic model of software development, went from something that could only be done slowly by an expensive few to something anyone could turn on like tap water.
The multitude of freely self taught programmers would suggest otherwise.
If I hired a software developer a few years ago, I might expect them to do roughly what Claude Code does today on some task (?). If I hired a dev today I would expect much more from them than what Claude Code can currently do.
The core of the entire argument is that the $150/hour is based on a developers ability to physically write code, which is not true. Having something that can generate code reliabily (which these things can barely do even with an expert at the wheel) doesn't address any of the actual hard problems we deal with on a daily basis.
Plus running AI tools is going to get much more expensive. The current prices aren't sustainable long term and they don't have any viable path to reducing costs. If anything the cost of operations for the big company are going to get worse. They're in the "get 'em hooked" stage of the drug deal.
> Having something that can generate code reliabily (which these things can barely do even with an expert at the wheel) doesn't address any of the actual hard problems we deal with on a daily basis.
Not understanding that is something I've been seeing management repeatedly doing for decades.
This article reads like all the things I discovered and the mistakes the company I worked for made learning how to outsource software development back in the late 90s and early 2000s. The only difference is this is using AI to generate the code instead of lower paid developers from developing nations. And, just like software outsourcing as an industry created practices and working styles to maximise profit to outsourcing companies, anyone who builds their business relying on OpenAI/Anthropic/Google/Meta/whoever - is going to need to address the risk of their chosen AI tool vendor ramping up the costs of using the tools to extract all; the value of the apparent cost savings.
This bit matches exactly with my experience:
"The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems. Who does know what code would be needed to solve complex problems? Currently that's only known by software developers, development managers and product managers, three job classifications that are going to be merging rapidly."
We found that assuming the people you employ as "developers" weren't actually also doing the dev management and product management roles was wrong. At least for our business where there were 6 or 8 devs who all understood the business goals and existing codebase and technology. When we eventually;y got successful outsourced development working was after we realised that writing code from lists of tasks/requirements was way less than 50% of what our in-house development team had been doing for years. We ended up saving a lot of money on that 30 or 40% of the work, but 60 or 70% of the higher level _understanding the business and tech stack_ work still needed to be done by people who understood the whole business and had a vested interest in the business succeeding.
[delayed]
To be fair the author does point to the many parts of software development that remain excluding the writing of code.
Completely agree on your first point: software development is so much more than writing code. LLMs are a threat to programmers for whom the job is 8 hours a day of writing code to detailed specifications provided by other people. I can't remember any point in my own career where I worked with people who got to do that.
There's a great example of that in the linked post itself:
> Let's build a property-based testing suite. It should create Java classes at random using the entire range of available Java features. These random classes should be checked to see whether they produce valid parse trees, satisfying a variety of invariants.
Knowing what that means is worth $150/hour even if you don't type a single line of code to implement it yourself!
And to be fair, the author makes that point themselves later on:
> Agentic AI means that anything you know to code can be coded very rapidly. Read that sentence carefully. If you know just what code needs to be created to solve an issue you want, the angels will grant you that code at the cost of a prompt or two. The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems.
On your second point: I wouldn't recommend betting against costs continuing to fall. The cost reduction trend has been reliable over the past three years.
In 2022 the best available models was GPT-3 text-davinci-003 at $60/million input tokens.
GPT-5 today is $1.25/million input tokens - 48x cheaper for a massively more capable model.
... and we already know it can be even cheaper. Kimi K2 came out two weeks ago benchmarking close to (possibly even above) GPT-5 and can be run at an even lower cost.
I'm willing to bet there are still significantly more optimizations to be discovered, and prices will continue to drop - at least on a per-token basis.
We're beginning to find more expensive ways to use the models though. Coding Agents like Claude Code and Codex CLI can churn through tokens.
I get your point, but I don't think the pricing is long term viable. We're in the burn everything to the ground to earn market share phase. Once things start to stabilize and there is no more user growth, they'll start putting the screws to the users.
I said the same thing about Netflix in 2015 and Gamepass in 2020. It might have taken a while but eventually it happened. And they're gonna have to raise prices higher and faster at some point.
Netflix prices went up a little bit but not very much.
> In 2022 the best available models was GPT-3 text-davinci-003 at $60/million input tokens.
>GPT-5 today is $1.25/million input tokens - 48x cheaper for a massively more capable model.
Yes - but.
GPT-5 and all the other modern "reasoning models" and tools burn through way more tokens to answer the same prompts.
As you said:
> We're beginning to find more expensive ways to use the models though. Coding Agents like Claude Code and Codex CLI can churn through tokens.
Right now, it feels that "frontier models" costs to use are staying the same as they've been for the entire ~5 year history of the current LLM/AI industry. But older models these days are comparably effectively free.
I'm wondering when/if there'll be a asymptotic flattening, where new frontier models are insignificantly better that older ones, and running some model off Huggingface on a reasonably specced up Mac Mini or gaming PC will provide AI coding assistance at basically electricity and hardware depreciation prices?
That really is the most interesting question for me: when will it be possible to run a model that is good enough to drive Claude Code or Codex CLI on consumer hardware?
gpt-oss-120b fits on a $4000 NVIDIA Spark and can be used by Codex - it's OK but still nowhere near the bigger ones: https://til.simonwillison.net/llms/codex-spark-gpt-oss
But... MiniMax M2 benchmarks close to Sonnet 4 and is 230B - too big for one Spark but can run on a $10,000 Mac Studio.
And Kimi K2 runs on two Mac Studios ($20,000).
So we are getting closer.
I so completely want to agree with you. If this were anything other than JavaScript agreeing with your comment would be simple.
I wrote JavaScript in the corporate world for 15 years. Here is the reality:
* Almost nobody wants to do it. The people that get paid for it don't want to do it. They just want to get paid. The result is that everybody who does get paid for it completely sucks. Complete garbage, at least at work. There a lot of amazing people writing JavaScript, just not at work, and why would they try harder. Delivering quality at work far outside the bell curve just results in hostility aside for some very rare exceptions. My exception was when I was doing A/B testing for a major .com.
* Since everybody in the corporate JavaScript world completely sucks every major project eventually fails from a business perspective or stalls into lifeless maintenance mode. It just gets too expensive to maintain 5+ years later or too fragile to pivot to the next business demand. So, it has to get refactored or rebuilt. Sometime that means hoping the next generation framework is ready, and the business is willing to train people on it, and willing to go through growing pains. More often this means calling in outside parties who can do it correctly the first time. Its not about scale. Its about the ability to actually build something original and justify every hour productively. I was on both sides of that fence.
* The reason why the corporate overlords hire outside parties to fix problems from internal teams isn't just about talent. Keep in mind it's tremendously expensive. Yes, those people are capable of producing something that doesn't suck and do so faster. The bigger issue is that they will always deliver reliably, because they are executing under a contract with a work performance statement. The internal teams do not have a contract performance definition that will kill their careers or terminates their incomes. They just have to hope the business remains financial solvent so they don't get caught in a mass layoff. This breeds a lot of entitlement and false expectations that seem to grow on each other.
So, yes, in this case it really is about the ability to write code physically. Yes, you need to juggle client nonsense and have soft skills too, but those are layered on top of just being able to write the code. When your options are limited to a bunch of 0s that depend on copy/paste from predefined framework templates you need somebody who can actually justify their existence in a very practical solutions delivery way.
This is a very insightful article:
"You might be expecting that here is where I would start proclaiming the death of software development. That I would start on how the strange new angels of agentic AI are simply going to replace us wholesale in order to feast on that $150/hour, and that it's time to consider alternative careers. I'm not going to do that, because I absolutely don't believe it. Agentic AI means that anything you know to code can be coded very rapidly. Read that sentence carefully. If you know just what code needs to be created to solve an issue you want, the angels will grant you that code at the cost of a prompt or two. The trouble comes in that most people don't know what code needs to be created to solve their problem, for any but the most trivial problems. Who does know what code would be needed to solve complex problems? Currently that's only known by software developers, development managers and product managers, three job classifications that are going to be merging rapidly."
This. AI is not replacing us, it is pulling the ladder up behind us.
I don’t understand why the software luminaries are coalescing around rapid-coding. Code generation was already a thing for forever. And deterministic code generation is as good as it possibly gets for “what I would have written but faster”—you know what it’s going to create. (Yes that was an EMDASH.) But code generation isn’t that much used, I think. I don’t count compilers or other black-box code generation. I mean any kind of ad hoc or more structured “what I would have written but faster” approach. Looking at the AI conversation you’d expect to see many more ad hoc code generation tools to deal with any and all boilerplate.
The downsides of code generation are only amplified with LLM code generation. Oh it’s just what I would have written. Now on the fifteenth iteration/rewrite. Generated idiomatic code for twelve years ago. squints oh yeah I would have written that back then... gosh it feels good to be in this exclusive club.
Here's what I don't understand.
Developers who get excited by agentic development put out posts like this. (I get excited too.)
Other developers tend to point out objections in terms of maintainability, scalability, overly complicated solutions, and so on. All of which are valid.
However, this part of AI evolves very quickly. So given these are known problems, why shouldn't we expect rapid improvements in agentic AI systems for software development, to the point where software developers who stick with the old paradigm will indeed be eroded in time? I'm genuinely curious because clearly the speed of advancement is significant.
> Other developers tend to point out objections in terms of maintainability, scalability, overly complicated solutions, and so on. All of which are valid.
I've spent the bulk of my 30+ career in various in-house dev/management roles, and small to medium sizes digital agencies or IT consulting places.
I that time I have worked on many hundreds of project, probably thousands.
There are maybe a few dozen that were still in production use without major rewrites on the way for more than 5 years.
I think for a huge amount of commercial projects, "maintainability" is something that developers are passional about, but that is of very little actual value to the client.
Back in the day when I spent a lot of time on comp.lang.perl.misc, there was a well know piece of advice "alway throw away the first version". My career-long takeaway from that has been to always race to a production ready proof of concept quickly enough to get it in front of people - ideally the people who are then spending the money that generates the business profits. Then if it turns successful, re write it from scratch incorporating everything you've learned from the first version - do not be tempted to continually tweak the hastily written code. These days people call something very like that "finding product market fit", and a common startup plan is to prove a business model, and them sell or be acquired before you need to spend the time/money on that rewrite.
Anecdotally, I find early mover advantage to be overrated (ask anyone who bought Betamax or HD-DVD players). It is significantly cheaper – on average – to exploit what you already know and learn from the mistakes of other, earlier movers.
That same argument points to all humans being irrelevant for all work in a few years.
Given that the problems are known and given that things are changing rapidly, we should expect them to be solved eventually (by some force)? No, I think the burden of proof is on whoever wants to address those problems. Not just refer to the never-changing answer “but why not?”[1]
All I see from “excited” developers is denial that there is a problem. But why would it be a problem that you have to review code generated by a program with the same fine-tooth comb that you use for human review?
[1] Some things change fast, some things never change at all.
It took me a while to get into it, but this is really good. You need to make it past the anecdote about building a property-based testing suite with Claude Code though, the real meat is in the second half.
I think they're useful tools but sooo many AI evangelists are just aggressively using them to slop up things.
The bit about Knight Capital implies that the software engineers were bad, which is notably untrue.
"A bad [software engineer] can easily destroy that much value even faster (A developer at Knight Capital destroyed $440 million in 45 minutes with a deployment error and some bad configuration logic, instantly bankrupting the firm by reusing a flag variable). "
Both the article's examples there are bogus -- yet in both cases the underlying points are true.
Google generates a lot of revenue per employee not because the employees are good (though many of them are of course), but because they own the front door to the web. And the Knight Capital story has many nuances left out by that summary.
In both cases the author needed a hard hitting but terse example. But as I said, both the claims are true, so in the voice of the courtroom judge, "I'll allow it."
There were decidedly shitty engineering decisions behind that dumpster fire.
The biggest being that the only safe way to recycle feature flag names is to put ample time separation between the last use of the previous meaning for the flag and the first application of the new use. They did not. If they had, they would have noticed that one server was not getting redeployed properly in the time gap between the two uses.
They also did not do a full rollback. They rolled back the code but not the toggles, which ignited the fire.
These are rookie mistakes. If you want to argue they are journeyman mistakes, I won’t fight you too much, but they absolutely demonstrate a lack of mastery of the problem domain. And when millions of dollars change hands per minute you’d better not be Faking it Til You Make It.
The powerpeg feature flag had been deprecated for 9 years? In aggregate, yes, the engineering led to the disaster, but the specific engineers had been making everyone piles of money for a long time. The mistakes were fatal, but in a tiny amount of time on a system that old it's honestly surprising it didn't happen sooner.
https://specbranch.com/posts/knight-capital/
Ah, this one says the quiet part out loud: that OSS is on the chopping block
> Coding, the backbone and justification for the entire economic model of software development, went from something that could only be done slowly by an expensive few to something anyone could turn on like tap water.
The multitude of freely self taught programmers would suggest otherwise.
Willing to accept AI agents can replace programmers
Not willing to accept ex-US devs can do a comparable job at half the price
If I hired a software developer a few years ago, I might expect them to do roughly what Claude Code does today on some task (?). If I hired a dev today I would expect much more from them than what Claude Code can currently do.
Ah.. you got me. Put (AI) in the title or something.