In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
6 hours a week is low, unless its the average spread across industries. I think I spend more time in Claude Code via the CLI versus any other app I have on my laptop.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
I kind of enjoy exploring black boxes, trying how different inputs are mapping to differences in outputs. It's kind of like hacking. The problem is, they keep altering the box.
i've seen a number of articles claiming things like "devs self report they'er +x% more productive with AI, but actually they're -y% LESS efficient!". and i think that this is explanation for why.
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
Not sure among devs, but I do know that in other positions in typical corporate bureaucracy, people have a propensity to not report their own automations or productivity gains upward, because the reward structure isn't there.
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my less on at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
> so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
So why is it that the bosses are the ones that are so enthusiastic about adoption?
I've found that setting good guardrails, and running in a sandbox so that the agent doesn't keep asking tedious permission questions, makes things go a LOT smoother.
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
> sandbox so that the agent doesn't keep asking tedious permission questions
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
I never let it go into planning mode, other than to output a plan file that I can audit before giving it the go-ahead to implement. After that I don't want to be bothered, so --dangerously-skip-permissions keeps all but real questions out of the loop, and I can do something else while it works rather than babysit.
Your experience pretty much mirrors my own. I hate to be the 'they're holding it wrong' guy but there's certainly a lot of people out there that have no real idea how to effectively leverage AI.
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
The problem (okay, one of the problems) with renting other people's models is, as you mentioned, that they can and will change out the model without notifying you ahead of time, and you don't always get to control which model you use. (They might decide to retire it, and you won't be able to get it back if they do).
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
Well... as a human software engineer, I've been the one with very strong, intelligent, completely wrong takes. The question is, are the LLMs improving faster than you can improve a junior dev? And is their ceiling as high?
I spend at least 6 hours a week arguing with bots owned by other teams, as I’m unable to reach a human before I bypass their bot. 10k person company, clients are paying for my time.
I don't see a lot of talk about how AI development breaks the old feedback loop of write code, watch it run, change it, repeat. I really hate sitting around waiting for the agent to get done planning, reading the plan, then waiting for the agent to get done coding. It's those 5-10 minute windows when its working that really sap my patience and suck all the fun out of our jobs. Writing code by hand is just more fun.
Yeah, Amazon warehouses are just the same. Humans are only used for tasks beyond the comprehension or physical ability of a machine at that point in time.
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
Understanding what is going on with AI productivity is … frustrating to say the least.
The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
I think it depends on how you measure the boost. If you are talking about generating a first draft then yes, the boost is there. If you’re talking about completing the project in all well tested and architected aspects, then overall there really isn’t a boost.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
I could care less about bot sitting (haven’t we always written our own automation?), but it’s botsitting the unverified slop that people send you that fuels frustration. I thought I worked with competent people who respected me
Our product lead/manager recently sent me an AI generated PRD (complete with a Claude Code spec!) to build core feature which we have had for over 2 years (and is the most used feature by our customers).
I just can't imagine tanking my trust with my coworkers by doing something like that.
So we're now in this world where everyone is instantly 10x more productive at turning their thoughts into code. Now, think about the coworkers you've had that are middling to mediocre. Do you want them to have a tool that makes them 10x more productive?
That's what I wonder about, what happens to all those folks.
Your coworkers haven't changed. What changed is that people can hand off work they never had to think through themselves. So you don't know what they checked and you don't know what you need to. You just have to read the whole thing.
This really hit home for me:
In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
6 hours a week is low, unless its the average spread across industries. I think I spend more time in Claude Code via the CLI versus any other app I have on my laptop.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
I kind of enjoy exploring black boxes, trying how different inputs are mapping to differences in outputs. It's kind of like hacking. The problem is, they keep altering the box.
The box is stochastic by design, and has an untraceable amount of complexity between its context and output by nature.
It may be fun to look at inputs and outputs, but it's not hackable and trying to map one into the other is more like astrology than a science.
Welcome to the slot machines!
My favourite personal experience is how they disabled yolo mode in Claude Code at my workplace
i've seen a number of articles claiming things like "devs self report they'er +x% more productive with AI, but actually they're -y% LESS efficient!". and i think that this is explanation for why.
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
Not sure among devs, but I do know that in other positions in typical corporate bureaucracy, people have a propensity to not report their own automations or productivity gains upward, because the reward structure isn't there.
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my less on at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
https://www.youtube.com/watch?v=OwfNjGxa_D4
> so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
So why is it that the bosses are the ones that are so enthusiastic about adoption?
I've found that setting good guardrails, and running in a sandbox so that the agent doesn't keep asking tedious permission questions, makes things go a LOT smoother.
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
> sandbox so that the agent doesn't keep asking tedious permission questions
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
I never let it go into planning mode, other than to output a plan file that I can audit before giving it the go-ahead to implement. After that I don't want to be bothered, so --dangerously-skip-permissions keeps all but real questions out of the loop, and I can do something else while it works rather than babysit.
Your experience pretty much mirrors my own. I hate to be the 'they're holding it wrong' guy but there's certainly a lot of people out there that have no real idea how to effectively leverage AI.
It doesn't change the premise.
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
I have never disliked our job more.
Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
The problem (okay, one of the problems) with renting other people's models is, as you mentioned, that they can and will change out the model without notifying you ahead of time, and you don't always get to control which model you use. (They might decide to retire it, and you won't be able to get it back if they do).
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
Well... as a human software engineer, I've been the one with very strong, intelligent, completely wrong takes. The question is, are the LLMs improving faster than you can improve a junior dev? And is their ceiling as high?
I spend at least 6 hours a week arguing with bots owned by other teams, as I’m unable to reach a human before I bypass their bot. 10k person company, clients are paying for my time.
I would be tempted to send my own bot to do that drudgery
Just build a bot to bypass their bot.
It may be that they’re protecting their time.
Right. Somewhere there’s a dashboard which lists those 6 hours as time saved.
Corpo bullshittery is the best kind of work. Get paid without actually ever doing anything. Its heaven.
Being alienated from the outcome of your labor is far from my idea of heaven.
Not if you enjoy making things and take pride in your work.
That's some odd image of heaven.
Bot-sitting is the new long compilation times.
I don't see a lot of talk about how AI development breaks the old feedback loop of write code, watch it run, change it, repeat. I really hate sitting around waiting for the agent to get done planning, reading the plan, then waiting for the agent to get done coding. It's those 5-10 minute windows when its working that really sap my patience and suck all the fun out of our jobs. Writing code by hand is just more fun.
'Botsitting' -- that word is going into my 2026 lexicon! :-)
Isn’t this just the new type of work? Human in the loop of automated processes?
Welcome to the factory!
Like Chaplin in Modern Times, we will tighten screws until we lose our minds.
Yeah, Amazon warehouses are just the same. Humans are only used for tasks beyond the comprehension or physical ability of a machine at that point in time.
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
Just 6 hours, lol!
“the incredible ground-level utility that many of us on HN celebrate every day through undeniable, massive productivity gains”
I’ve been told before.
I'm yet to be invited to the celebrations.
Understanding what is going on with AI productivity is … frustrating to say the least.
The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
I think it depends on how you measure the boost. If you are talking about generating a first draft then yes, the boost is there. If you’re talking about completing the project in all well tested and architected aspects, then overall there really isn’t a boost.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
I could care less about bot sitting (haven’t we always written our own automation?), but it’s botsitting the unverified slop that people send you that fuels frustration. I thought I worked with competent people who respected me
Our product lead/manager recently sent me an AI generated PRD (complete with a Claude Code spec!) to build core feature which we have had for over 2 years (and is the most used feature by our customers).
I just can't imagine tanking my trust with my coworkers by doing something like that.
Maybe this is the AI layoff wave we'll see. Sorting out incompetent team members.
the ones who spend all day telling the bosses how great AI is?
So we're now in this world where everyone is instantly 10x more productive at turning their thoughts into code. Now, think about the coworkers you've had that are middling to mediocre. Do you want them to have a tool that makes them 10x more productive?
That's what I wonder about, what happens to all those folks.
Your coworkers haven't changed. What changed is that people can hand off work they never had to think through themselves. So you don't know what they checked and you don't know what you need to. You just have to read the whole thing.
It's not a lack of respect for you; it's a lack of respect for the work itself. That lack is being rewarded and encouraged.
Managers will be sure to tell you how much they respect you. Ask them if they respect the work and you'll get a blank stare.
*couldn’t care less