One of the nice things about the "dumber" models (like GPT-4) was that it was good enough to get you really far, but never enough to complete the loop. It gave you maybe 90%. 20% of which you had to retrace -- so you had to do 30% of the tough work yourself, which meant manually learning things from scratch.
The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.
The big issue I see coming is that leadership will care less and less about people, and more about shipping features faster and faster. In other words, those that are still learning their craft are fucked up.
The amount of context switching in my day-to-day work has become insane. There's this culture of “everyone should be able to do everything” (within reason, sure), but in practice it means a data scientist is expected to touch infra code if needed.
Underneath it all is an unspoken assumption that people will just lean on LLMs to make this work.
You still have the system design skills, and so far, LLMs are not that good in this field.
They can give plausible architecture but most of the time it’s not usable if you’re starting from scratch.
When you design the system, you’re an architect not a coder, so I see no difference between handing the design to agents or other developers, you’ve done the heavy lifting.
In that perspective, I find LLMs quite useful for learning. But instead of coding, I find myself in long sessions back and forth to ask questions, requesting examples, sequence diagrams .. etc to visualise the final product.
> The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
And then you find out someone else had already solved it. So might as well use the Google 2.0 aka ChatGPT.
The title of this submission is misleading, that's not what they're saying. They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.
The study measures if participants learn the library, but what they should study is if they learn effective coding agent patterns to use the library well. Learning the library is not going to be what we need in the future.
> "We collect self-reported familiarity with AI coding tools, but we do not actually measure differences in prompting techniques."
Many people drive cars without being able to explain how cars work. Or use devices like that. Or interact with people who's thinking they can't explain. Society works like that, it is functional, does not work by full understanding. We need to develop the functional part not the full understanding part. We can write C without knowing the machine code.
You can often recognize a wrong note without being able to play the piece, spot a logical fallacy without being able to construct the valid argument yourself, catch a translation error with much less fluency than producing the translation would require. We need discriminative competence, not generative.
For years I maintained a library for formatting dates and numbers (prices, ints, ids, phones), it was a pile of regex but I maintained hundreds of test cases for each type of parsing. And as new edge cases appeared, I added them to my tests, and iterated to keep the score high. I don't fully understand my own library, it emerged by scar accumulation. I mean, yes I can explain any line, but why these regexes in this order is a data dependent explanation I don't have anymore, all my edits run in loop with tests and my PRs are sent only when the score is good.
Correctness was never grounded in understanding the implementation. Correctness was grounded in the test suite.
You can, most certainly, drive a car without understanding how it works. A pilot of an aircraft on the other hand needs a fairly detailed understanding of the subsystems in order to effectively fly it.
I think being a programmer is closer to being an aircraft pilot than a car driver.
> Many people drive cars without being able to explain how cars work.
But the fundamentals all cars behave the same way all the time. Imagine running a courier company where sometimes the vehicles take a random left turn.
> Or interact with people who's thinking they can't explain
Sure but they trust those service providers because they are reliable . And the reason that they are reliable is that the service providers can explain their own thinking to themselves. Otherwise their business would be chaos and nobody would trust them.
How you approached your library was practical given the use case. But can you imagine writing a compiler like this? Or writing an industrial automation system? Not only would it be unreliable but it would be extremely slow. It's much faster to deal with something that has a consistent model that attempts to distill the essence of the problem, rather than patching on hack by hack in response to failed test after failed test.
But isn't the corrections of those errors that are valuable to society and get us a job?
People can tell they found a bug or give a description about what they want from a software, yet it requires skills to fix the bugs and to build software. Though LLMs can speedup the process, expert human judgment is still required.
If you know that you need O(n) "contains" checks and O(1) retrieval for items, for a given order of magnitude, it feels like you've all the pieces of the puzzle needed to make sure you keep the LLM on the straight and narrow, even if you didn't know off the top of your head that you should choose ArrayList.
Or if you know that string manipulation might be memory intensive so you write automated tests around it for your order of magnitude, it probably doesn't really matter if you didn't know to choose StringBuilder.
That feels different to e.g. not knowing the difference between an array list and linked list (or the concept of time/space complexity) in the first place.
I think the kind of judgement required here is to design ways to test the code without inspecting it manually line by line, that would be walking a motorcycle, and you would be only vibe-testing. That is why we have seen the FastRender browser and JustHTML parser - the testing part was solved upfront, so AI could go nuts implementing.
I partially agree, but I don’t think “design ways to test the code without inspecting it manually line by line” is a good strategy.
Tests only cover cases you already know to look for. In my experience, many important edge cases are discovered by reading the implementation and noticing hidden assumptions or unintended interactions.
When something goes wrong, understanding why almost always requires looking at the code, and that understanding is what informs better tests.
Instead, just learning concepts with AI and then using HI (Human Intelligence) & AI to solve the problem at hand—by going through code line by line and writing tests - is a better approach productivity-, correctness-, efficiency-, and skill-wise.
I can only think of LLMs as fast typists with some domain knowledge.
Like typists of government/legal documents who know how to format documents but cannot practice law. Likewise, LLMs are code typists who can write good/decent/bad code but cannot practice software engineering - we need, and will need, a human for that.
I agree. It's very missleading. Here's what the authors actually say:
> AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.
I assistance produces significant productivity gains across professional domains, particularly for novice workers.
We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
Are the two sentences talking about non-overlapping domains? Is there an important distinction between productivity and efficiency gains? Does one focus on novice users and one on experienced ones? Admittedly did not read the paper yet, might be clearer than the abstract.
That doesn't really line up with my experience, I wanted to debug a CMake file recently, having done no such thing before - AI helped me walk through the potential issues, explaining what I got wrong.
I learned a lot more in a short amount of time than I would've stumbling around on my own.
Afaik its been known for a long time that the most effective way of learning a new skill, is to get private tutoring from an expert.
I agree the title should be changed, but as I commented on the dupe of this submission learning is not something that happens as a beginner, student or "junior" programmer and then stops. The job is learning, and after 25 years of doing it I learn more per day than ever.
When I use AI to write code, after a week or 2, if I go back to the written code I have a hard time catching up. When I write code by myself I always just look at it and I understand what I did.
a program is function of the programmer, how you code is how you think. that is
why it is really difficult, even after 60 years, for multiple people to work on
the same codebase, over the years we have made all kinds of rules and processess
so that code written by one person can be understood and changed by another.
you can also read human code and empathise what were they thinking while writing it
AI code is not for humans, it is just a stream of tokens that do something, you need to build skills to empirically verify that it does what you think it does, but it is pointless to "reason" about it.
Previous title: "Anthropic: AI Coding shows no productivity gains; impairs skill development"
The previous title oversimplified the claim to "all" developers. I found the previous title meaningful while submitting this post because most of the false AI claims of "software engineer is finished" has mostly affected junior `inexperienced` engineers. But I think `junior inexperienced` was implicit which many people didn't pick.
The paper makes a more nuanced claim that AI Coding speeds up work for inexperienced developers, leading to some productivity gains at the cost of actual skill development.
> Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
The library in question was Python trio and the model they used was GPT-4o.
Many say generative AI is like a vending machine. But if your vending machine has not 1 button but a keyboard, and you type anything you want in, and it makes it (Star Trek Replicator) and you use it 10,000 times to refine your recipes, did you learn something or not? How about a 3D printer, do you learn something making designs and printing them?
Instead of "vending machine", I see many people calling generative AI "slot machine", which more aptly describes current genAI tools.
Yes, we can use it 10,000 times to refine our recipes, but "did we learn from it"? I am doubtful about that, given that even after running with the same prompt 10 times, it will give different answers in 8/10 responses.
But I am very confident that I can learn by iterating and printing designs on a 3D printer.
Often when I use it I know that there is a way to do something and I know that I could figure it out by going through some api documents and maybe finding some examples on the web... IOW I already have something in mind.
For example I wanted to add a rate-limiter to an api call with proper http codes, etc. I asked the ai (in IntelliJ it used to be Claude by default but they've since switched to Gemini as default) to generate one for me. The first version was not good so I asked it to do it again but with some changes.
What would take me a couple of hours or more took less than 10 minutes.
I’ve been making the case (e.g. https://youtu.be/uL8LiUu9M64?si=-XBHFMrz99VZsaAa [1]) that we have to be intentional about using AI to augment our skills, rather than outsourcing understanding: great to see Anthropic confirming that.
[1] plug: this is a video about the Patreon community I founded to do exactly that. Just want to make sure you’re aware that’s the pitch before you do ahead and watch.
They lost me in the abstract when said “AI increase productivity especially with novice workers”
From my experience, it was the most experienced and fluent in the engineering world who gained the most value from AI.
I've noticed this as well. I delegate to agentic coders on tasks I need to have done efficiently, which I could do myself and lack time to do. Or on tasks which are in areas I simply don't care much for, for languages which I don't like very much etc
Anthropic paid a team to do a project, and gave them leeway to do it how they wanted. If anything, it's a good signal that Anthropic didn't lean on the scale to have the results go in their favor.
Isn’t it technically in their favor if competition is proven bad, even if it would be equally easy to prove their product likely equally bad or even worse?
This is a fancy way of saying that if you invent the calculator, people get worse at sums. I'm not an AI doomer or a boomer - but it's clear to me that some skills will be permanently relegated to AI.
One of the nice things about the "dumber" models (like GPT-4) was that it was good enough to get you really far, but never enough to complete the loop. It gave you maybe 90%. 20% of which you had to retrace -- so you had to do 30% of the tough work yourself, which meant manually learning things from scratch.
The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.
The big issue I see coming is that leadership will care less and less about people, and more about shipping features faster and faster. In other words, those that are still learning their craft are fucked up.
The amount of context switching in my day-to-day work has become insane. There's this culture of “everyone should be able to do everything” (within reason, sure), but in practice it means a data scientist is expected to touch infra code if needed.
Underneath it all is an unspoken assumption that people will just lean on LLMs to make this work.
You still have the system design skills, and so far, LLMs are not that good in this field.
They can give plausible architecture but most of the time it’s not usable if you’re starting from scratch.
When you design the system, you’re an architect not a coder, so I see no difference between handing the design to agents or other developers, you’ve done the heavy lifting.
In that perspective, I find LLMs quite useful for learning. But instead of coding, I find myself in long sessions back and forth to ask questions, requesting examples, sequence diagrams .. etc to visualise the final product.
> The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
And then you find out someone else had already solved it. So might as well use the Google 2.0 aka ChatGPT.
The title of this submission is misleading, that's not what they're saying. They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.
The study measures if participants learn the library, but what they should study is if they learn effective coding agent patterns to use the library well. Learning the library is not going to be what we need in the future.
> "We collect self-reported familiarity with AI coding tools, but we do not actually measure differences in prompting techniques."
Many people drive cars without being able to explain how cars work. Or use devices like that. Or interact with people who's thinking they can't explain. Society works like that, it is functional, does not work by full understanding. We need to develop the functional part not the full understanding part. We can write C without knowing the machine code.
You can often recognize a wrong note without being able to play the piece, spot a logical fallacy without being able to construct the valid argument yourself, catch a translation error with much less fluency than producing the translation would require. We need discriminative competence, not generative.
For years I maintained a library for formatting dates and numbers (prices, ints, ids, phones), it was a pile of regex but I maintained hundreds of test cases for each type of parsing. And as new edge cases appeared, I added them to my tests, and iterated to keep the score high. I don't fully understand my own library, it emerged by scar accumulation. I mean, yes I can explain any line, but why these regexes in this order is a data dependent explanation I don't have anymore, all my edits run in loop with tests and my PRs are sent only when the score is good.
Correctness was never grounded in understanding the implementation. Correctness was grounded in the test suite.
You can, most certainly, drive a car without understanding how it works. A pilot of an aircraft on the other hand needs a fairly detailed understanding of the subsystems in order to effectively fly it.
I think being a programmer is closer to being an aircraft pilot than a car driver.
> Many people drive cars without being able to explain how cars work.
But the fundamentals all cars behave the same way all the time. Imagine running a courier company where sometimes the vehicles take a random left turn.
> Or interact with people who's thinking they can't explain
Sure but they trust those service providers because they are reliable . And the reason that they are reliable is that the service providers can explain their own thinking to themselves. Otherwise their business would be chaos and nobody would trust them.
How you approached your library was practical given the use case. But can you imagine writing a compiler like this? Or writing an industrial automation system? Not only would it be unreliable but it would be extremely slow. It's much faster to deal with something that has a consistent model that attempts to distill the essence of the problem, rather than patching on hack by hack in response to failed test after failed test.
Interesting argument.
But isn't the corrections of those errors that are valuable to society and get us a job?
People can tell they found a bug or give a description about what they want from a software, yet it requires skills to fix the bugs and to build software. Though LLMs can speedup the process, expert human judgment is still required.
I think there's different levels to look at it.
If you know that you need O(n) "contains" checks and O(1) retrieval for items, for a given order of magnitude, it feels like you've all the pieces of the puzzle needed to make sure you keep the LLM on the straight and narrow, even if you didn't know off the top of your head that you should choose ArrayList.
Or if you know that string manipulation might be memory intensive so you write automated tests around it for your order of magnitude, it probably doesn't really matter if you didn't know to choose StringBuilder.
That feels different to e.g. not knowing the difference between an array list and linked list (or the concept of time/space complexity) in the first place.
I think the kind of judgement required here is to design ways to test the code without inspecting it manually line by line, that would be walking a motorcycle, and you would be only vibe-testing. That is why we have seen the FastRender browser and JustHTML parser - the testing part was solved upfront, so AI could go nuts implementing.
I partially agree, but I don’t think “design ways to test the code without inspecting it manually line by line” is a good strategy.
Tests only cover cases you already know to look for. In my experience, many important edge cases are discovered by reading the implementation and noticing hidden assumptions or unintended interactions.
When something goes wrong, understanding why almost always requires looking at the code, and that understanding is what informs better tests.
Another possibility is to implement the same spec twice, and do differential testing, you can catch diverging assumptions and clarify them.
Isn't that too much work?
Instead, just learning concepts with AI and then using HI (Human Intelligence) & AI to solve the problem at hand—by going through code line by line and writing tests - is a better approach productivity-, correctness-, efficiency-, and skill-wise.
I can only think of LLMs as fast typists with some domain knowledge.
Like typists of government/legal documents who know how to format documents but cannot practice law. Likewise, LLMs are code typists who can write good/decent/bad code but cannot practice software engineering - we need, and will need, a human for that.
I agree. It's very missleading. Here's what the authors actually say:
> AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.
That itself sounds contradictory to me.
I assistance produces significant productivity gains across professional domains, particularly for novice workers.
We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
Are the two sentences talking about non-overlapping domains? Is there an important distinction between productivity and efficiency gains? Does one focus on novice users and one on experienced ones? Admittedly did not read the paper yet, might be clearer than the abstract.
That doesn't really line up with my experience, I wanted to debug a CMake file recently, having done no such thing before - AI helped me walk through the potential issues, explaining what I got wrong.
I learned a lot more in a short amount of time than I would've stumbling around on my own.
Afaik its been known for a long time that the most effective way of learning a new skill, is to get private tutoring from an expert.
Has the claim in your third paragraph been backed by research? Not snark, genuinely curious. I have some anecdotal, personal experience backing it up.
I agree the title should be changed, but as I commented on the dupe of this submission learning is not something that happens as a beginner, student or "junior" programmer and then stops. The job is learning, and after 25 years of doing it I learn more per day than ever.
> They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.
But that's what "impairs learning" means.
When I use AI to write code, after a week or 2, if I go back to the written code I have a hard time catching up. When I write code by myself I always just look at it and I understand what I did.
a program is function of the programmer, how you code is how you think. that is why it is really difficult, even after 60 years, for multiple people to work on the same codebase, over the years we have made all kinds of rules and processess so that code written by one person can be understood and changed by another.
you can also read human code and empathise what were they thinking while writing it
AI code is not for humans, it is just a stream of tokens that do something, you need to build skills to empirically verify that it does what you think it does, but it is pointless to "reason" about it.
++Hard Agree.
Edit: Changed title
Previous title: "Anthropic: AI Coding shows no productivity gains; impairs skill development"
The previous title oversimplified the claim to "all" developers. I found the previous title meaningful while submitting this post because most of the false AI claims of "software engineer is finished" has mostly affected junior `inexperienced` engineers. But I think `junior inexperienced` was implicit which many people didn't pick.
The paper makes a more nuanced claim that AI Coding speeds up work for inexperienced developers, leading to some productivity gains at the cost of actual skill development.
Key snippet from the abstract:
> Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
The library in question was Python trio and the model they used was GPT-4o.
Many say generative AI is like a vending machine. But if your vending machine has not 1 button but a keyboard, and you type anything you want in, and it makes it (Star Trek Replicator) and you use it 10,000 times to refine your recipes, did you learn something or not? How about a 3D printer, do you learn something making designs and printing them?
Instead of "vending machine", I see many people calling generative AI "slot machine", which more aptly describes current genAI tools.
Yes, we can use it 10,000 times to refine our recipes, but "did we learn from it"? I am doubtful about that, given that even after running with the same prompt 10 times, it will give different answers in 8/10 responses.
But I am very confident that I can learn by iterating and printing designs on a 3D printer.
3d printer: you learn something of you make CAD designs yourself and print them yes. It is a skill.
Often when I use it I know that there is a way to do something and I know that I could figure it out by going through some api documents and maybe finding some examples on the web... IOW I already have something in mind.
For example I wanted to add a rate-limiter to an api call with proper http codes, etc. I asked the ai (in IntelliJ it used to be Claude by default but they've since switched to Gemini as default) to generate one for me. The first version was not good so I asked it to do it again but with some changes.
What would take me a couple of hours or more took less than 10 minutes.
@dang the title here is bait. I’d suggest the paper title: “Anthropic: How AI Impacts Skill Formation”
This isn't Twitter. email hn@ycombinator.com
I’ve been making the case (e.g. https://youtu.be/uL8LiUu9M64?si=-XBHFMrz99VZsaAa [1]) that we have to be intentional about using AI to augment our skills, rather than outsourcing understanding: great to see Anthropic confirming that.
[1] plug: this is a video about the Patreon community I founded to do exactly that. Just want to make sure you’re aware that’s the pitch before you do ahead and watch.
They lost me in the abstract when said “AI increase productivity especially with novice workers” From my experience, it was the most experienced and fluent in the engineering world who gained the most value from AI.
I've noticed this as well. I delegate to agentic coders on tasks I need to have done efficiently, which I could do myself and lack time to do. Or on tasks which are in areas I simply don't care much for, for languages which I don't like very much etc
I wonder why these Anthropic researchers chose GPT-4o for their study.
This is really strange and warrants some skepticism
Anthropic paid a team to do a project, and gave them leeway to do it how they wanted. If anything, it's a good signal that Anthropic didn't lean on the scale to have the results go in their favor.
Isn’t it technically in their favor if competition is proven bad, even if it would be equally easy to prove their product likely equally bad or even worse?
Nice to see an AI coding company allow such studies to come out, and it looks decently designed
This is a fancy way of saying that if you invent the calculator, people get worse at sums. I'm not an AI doomer or a boomer - but it's clear to me that some skills will be permanently relegated to AI.
Yes, except the calculator is right 100% of the time. LLMs are right ??% of the time, where ?? constantly changes, changes with prompts, etc.
For $100/hour I can fill in those gaps for you!