What happens when coding agents stop feeling like dialup?

(martinalderson.com)

134 points | by martinald 2 days ago ago

169 comments

  • SirensOfTitan a day ago

    > Each of these 'phases' of LLM growth is unlocking a lot more developer productivity, for teams and developers that know how to harness it.

    I still find myself incredibly skeptical LLM use is increasing productivity. Because AI reduces cognitive engagement with tasks, it feels to me like AI increases perceptive productivity but actually decreases it in many cases (and this probably compounds as AI-generated code piles up in a codebase, as there isn't an author who can attach context as to why decisions were made).

    https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

    I realize the author qualified his or her statement with "know how to harness it," which feels like a cop-out I'm seeing an awful lot in recent explorations of AI's relationship with productivity. In my mind, like TikTok or online dating, AI is just another product motion toward comfort maximizing over all things, as cognitive engagement is difficult and not always pleasant. In a nutshell, it is another instant gratification product from tech.

    That's not to say that I don't use AI, but I use it primarily as search to see what is out there. If I use it for coding at all, I tend to primarily use it for code review. Even when AI does a good job at implementation of a feature, unless I put in the cognitive engagement I typically put in during code review, its code feels alien to me and I feel uncomfortable merging it (and I employ similar levels of cognitive engagement during code reviews as I do while writing software).

    • enkrs 14 hours ago

      I use LLMs (like claude-code and codex-cli) the same way accountants use calculators. Without one, you waste all your focus on adding numbers; with one, you just enter values and check if the result makes sense. Programming feels the same—without LLMs, I’m stuck on both big problems (architecture, performance) and small ones (variable names). With LLMs, I type what I want and get code back. I still think about whether it works long-term, but I don’t need to handle every little algorithm detail myself.

      Of course there are going to be discussions what is real programming (like I'm sure there were discussions what is "real" accounting with the onset of a calculator)

      The moment we stop treating LLMs like people and see them as big calculators, it all clicks.

      • janalsncm 13 hours ago

        The issue with your analogy is that calculators do not hallucinate. They do not make mistakes. An accountant is able to fully offload the mental overhead of arithmetic because the calculator is reliable.

        • irjustin 11 hours ago

          > The issue with your analogy is that calculators do not hallucinate. They do not make mistakes. An accountant is able to fully offload the mental overhead of arithmetic because the calculator is reliable.

          If you've ever done any modeling/serious accounting, you'll find that you feel more like a DBA than a "person punching on a calculator". You ask questions and then you figure out how to get the answers you want by "querying" excel cells. Many times querying isn't in quotes.

          To me, the analogy of the parent is quite apt.

          • Jensson 11 hours ago

            But the database doesn't hallucinate data, if always does exactly what you ask it to do and gives you reliable numbers unless you ask it to do a random operation.

            • commakozzi 7 hours ago

              I really don't understand the hallucination problem now in 2025. If you know what you're doing and you know what you need to get from the LLM and you can describe it well enough that it would be hard to screw up, LLMs are incredibly useful. They can nearly one shot an entire (edited here) skeleton architecture that I only need to nudge into the right place before adding what I want on top of it. Yes, i run into code from LLMs that i have to tweak, but it has been incredibly helpful for me. I haven't had hallucination problems in a couple of years now...

              • janalsncm 17 minutes ago

                > I really don't understand the hallucination problem now in 2025

                Perhaps this OpenAI paper would be interesting then (published September 4th):

                https://arxiv.org/pdf/2509.04664

                Hallucination is still absolutely an issue, and it doesn’t go away by reframing it as user error, saying the user didn’t know what they were doing, didn’t know what they needed to get from the LLM, or couldn’t describe it well enough.

            • fancyfredbot 9 hours ago

              I agree databases don't hallucinate but somehow most databases still end up full of garbage.

              Whenever people are doing the data entry you shouldn't trust your data. It's not the same as LLM hallucinations but it's not entirely different either.

            • 360MustangScope 9 hours ago

              That is why you check your results. If you know what the end outcome should be. Doesn’t matter if it hallucinates. If it does, it probably already got you 90% of the work done which is less work that you have to do now to finish it.

        • theshrike79 13 hours ago

          Replace calculator with the modern equivalent: Excel.

          It does make mistakes and is not reliable[0]. The user still needs to have a "feel" for the data.

          (to be pedantic "Excel" doesn't make mistakes, people trusting its defaults do)

          [0] https://timharford.com/2021/05/cautionary-tales-wrong-tools-...

          • Jensson 11 hours ago

            > (to be pedantic "Excel" doesn't make mistakes, people trusting its defaults do)

            So what is your point? An expert that mastered excel don't have to check that excel calculated things correctly, he just need to check that he gave excel the right inputs and formulas. That is not true for LLM, you do have to check that it actually did what you asked regardless how good you are at prompting.

            The only thing I trust an LLM to do correctly are translations, they are very reliable at that, other than that I always verify.

            • theshrike79 10 hours ago

              "Just" check that every cell in the million row xlxs file is correct.

              See the issue here?

              Excel has no proper built-in validation or test suite, not sure about 3rd party ones. The last time I checked some years back there was like one that didn't do much.

              All it takes is one person accidentally or unknowingly entering static data on top of a few formulas in the middle and nobody will catch it. Or Excel "helps" by changing the SEPT1 gene to "September 1. 2025"[0] - this case got so bad they had to RENAME the gene to make Excel behave. "Just" doing it properly didn't work at scale.

              The point I'm trying to get at here that neither tool is perfect and requires validation afterwards. With agentic coding we can verify the results, we have the tools for it - and the agent can run them automatically.

              In this case Excel is even worse because one human error can escalate massively as there is no simple way to verify the output, Excel has no unit test equivalents or validators.

              [0] https://www.progress.org.uk/human-genes-renamed-as-microsoft...

            • 11 hours ago
              [deleted]
        • rusticpenn 11 hours ago

          It depends on how much you want the LLM to do. I personally work on function level and can easily verify if it works with a look and few tests.

        • lupusreal 10 hours ago

          That's why you tell claude code to write tests, and use them, use linting tools, etc. And then you test the code yourself. If you're still concerned, /clear then tell claude code that some other idiot wrote the code and it needs to tear it apart and critique it.

          Hallucination is not an intractable problem, the stochastic nature of hallucinations makes it easy to use the same tools to catch them. I feel like hallucinations have become a cop out, an excuse, for people who don't want to learn how to use these new tools anyway.

          • otabdeveloper4 10 hours ago

            > you now have to not only review and double-check shitty AI code, but also hallucinated AI tests too

            Gee thanks for all that extra productivity, AI overlords.

            Maybe they should replace AI programmers with AI instead?

            • lupusreal 9 hours ago

              I said to make the chatbot do it, not to do all the reviewing yourself. You can do manual reviews once it makes something that works. In the meantime, you can be working on something else entirely.

      • MangoToupe 13 hours ago

        I'm assuming based on the granularity you're referring to autocomplete, and surely that already doesn't feel like dialup.

    • polotics a day ago

      My experience is exactly the opposite of "AI reduces cognitive engagement with tasks": I have to constantly be on my toes to follow what the LLMs are proposing and make sure they are not getting off track over-engineering things, or entering something that's likely to turn into a death loop several turns later. AI use definitely makes my brain run warmer, got to get a FLIR camera to prove it I guess...

      • walleeee a day ago

        So, reduces cognitive engagement with the actual task at hand, and forces a huge attention share to hand-holding.

        I don't think you two are disagreeing.

        I have noticed this personally. It's a lot like the fatigue one gets from too long scrolling online. Engagement is shallower but not any less mentally exhausting than reading a book. You end up feeling more exhausted due to the involuntary attention-scattering.

        • notyourwork a day ago

          It would be analogous to having to double check the IDE added the lines of code I actually typed. That’s not a great productivity boost, it’s a toy still in many ways.

          • a day ago
            [deleted]
        • 17 hours ago
          [deleted]
        • JumpCrisscross 17 hours ago

          > reduces cognitive engagement with the actual task at hand, and forces a huge attention share to hand-holding

          You're in some senses managing an idiot savant, emphasis on the idiot part, except they're also a narcissist who will happily go out of scope if you let it.

          If you have management experience, the analogy is immediately obvious. If you don't, I can see how having speed run learning it with a kid running around with dynamite would be taxing.

          • Jensson 11 hours ago

            Text completer is still the most apt allegory. If you ask it to open a file and do something with it, often it wont open the file and just text complete the output. That failure mode never happens in any sort of human, it just happens because its a text completer.

            Many people hate when you don't anthropomorphize LLM though, but that is the best way to understand how they can fail so spectacularly in ways we would never expect a human to fail.

            • commakozzi 7 hours ago

              Except many of us not having the problems you are having. I don't have LLMs "fail so spectacularly".

              Interestingly, I have friends who aren't coders who use LLMs for various personal needs, and they run into the same kind of problems you are describing. 100% of the time, i've found that it's that they do not understand how to work with an LLM. Once i help them, they start getting better results. I do not have any need to anthropomorphize an LLM. I do however understand that I can use natural language to get quite complex and yes ACCURATE results from AI, IF i know what i'm doing and how to ask for it. It's just a tool, not a person.

      • wildzzz a day ago

        I'm an electrical engineer. One of my jobs is maintaining a couple racks of equipment and the scripts we use to test hardware. I've never been expected to be a programmer beyond things like Matlab but over the past several years, I've been maintaining a python project we use to run these tests. With equipment upgrades and my amateur python skills, we now have a fully automated test, plug in the hardware, hit the green button, and wait for tests to complete and data to be validated. Codesurf absolutely chokes when trying to work on my code, it's just too much of a mess to handle. But I have been using our in-house chatgpt to write some utilities that I've been procrastinating on for years. Like I needed a debug tool to view live telemetry and send commands as required and have been procrastinating for a long time to write this. My existing scripts aren't flexible, they are literally just a script for the test runner to follow. I have an old debug tool but it's not compatible with the existing workflow so it's a pain to run. I told chatgpt what I needed, gave it some specs on the functions it would need from libraries I've written (but didn't want it to see), and it cranked out a perfectly functional python script. I ended up doing a bit of work on the script it gave me since I didn't trust it completely or knew if I could even get it to expand on the work properly. It would have taken me much longer to write on my own so I'm very grateful I could save so much time. Just last week, I had another idea for a different debug tool and did the same process (here's my idea, here's the specs, go) and after a few rounds of "can you add this next?", I had another quality tool ready to go with absolutely no touch-up work needed on my end. I want my tools to have simple Tkinter GUIs but I hate writing GUIs so I'm absolutely thrilled chatgpt can handle that for me.

        I'm a bit of a luddite, I still just use notepad++ and a terminal window to develop my code. I don't want to get bogged down in using vscode so trusting AI to handle things beyond "can you make this email sound better?" has been a big leap for me.

        • jenkinomics a day ago

          In a few to several months you will learn the meaning of "big ball of mud". Then you will either speedrun the last 20-30 years of software development tooling evolution or crash out of your current modus operandi and help fuel future demand for actual software developers.

          • cadamsdotcom a day ago

            Do you have a definition of “actual software developers”?

            To me an “actual software developer” is always learning, always growing, always making mistakes, always looking back and blown away by how far they’ve come - and most importantly, is always willing to generously offer a hand up to anyone who cares enough to learn the craft.

            It’s ok to make a big ball of mud! You can deal with it later once you understand the problem v1 solves. Parallel rebuilds and migrations are part of software engineering. Or alternatively - maybe that big ball of mud does its job, has no new requirements, so can be left quietly chugging along - for potentially decades, never needing a v2.

            • ehnto 17 hours ago

              I think they just mean professional who's responsibility at the company is to code. There are a lot of professionals that do some coding on the side of their main role, but it's not their responsibility to look after the code for 8 hours a day.

          • bsder a day ago

            I disagree. The quoted scenario is the absolute best for LLMs.

            1) An easily defined go/no-go task with defined end point which requires

            2) A bunch of programming code that nobody gives a single shit about

            3) With esoteric function calls that require staring at obscure documentation

            This is the LLM dream task.

            When the next person has to stare at this code, they will throw it out and rerun an LLM on it because the code is irrelevant and the end task is the only thing that matters.

            • citizenpaul a day ago

              Here's the thing. Those first two things don't exist.

              I'm revisiting this comment a lot with LLM's. I don't think many HN readers run into real life mudball/spaghetti code. I think there is a SV bias here where posters think taking a shortcut a few times is what a mudball is.

              There will NEVER be a time in this business where the business is ok with simply scrapping these hundreds of inconsistent one off generations and be ok with something that sorta kinda worked like before. The very places that do this won't use consistent generation methods either. The next person to stare at it will not just rerun the LLM because at that time the ball will be so big not even the LLMs can fix it without breaking something else. Worse the new person won't even know what they don't know or even what to ask it to regenerate.

              Man I'm gonna buy stock in the big three as a stealth long term counter LLM play.

              I've seen outside of SV mudballs and they are messes that defy logical imagination. LLM's are only gonna make that worse. Its like giving children access to a functional tool shop. You are not gonna get a working product no matter how good the tools are.

              • theshrike79 13 hours ago

                A few cases just recently:

                Someone in the company manages a TON of questionnaires. They type the questions into the service, get the results. The results are in an CSV format or some shit. Then they need to manually copy them to Google Sheets and do some adjustments on them.

                Took me about 30 minutes of wall clock time, maybe 5 minutes of my time to have an LLM write me a simple python script that uses the API in the questionnaire service to pull down the data and insert it into a new Google Sheet.

                Saves the person a TON of time every day.

                ---

                Second case was a person who had to do similar manual input to crappy Sheets daily, because that's what the next piece in the process can read.

                This person has a bit of an engineer mindset and vibe-coded a web tool themselves that has a UI that lets them easily fill the same information but view it in a more user friendly way. Then it'll export it in a CSV/JSON format for the next step in the process.

                None of these would've been granted the day(s) of engineering time before, now both were something that could be thrown together quickly over a coffee break or done by themselves over a weekend.

              • abm53 14 hours ago

                I’d go further than the other reply: not only do those first two things definitely exist, they probably represent the plurality of programming tasks.

              • kgdiem 19 hours ago

                I generated a script today to diff 2 CSVs into a Venn diagram, ran it twice, then deleted the code.

                • makapuf 14 hours ago

                  The LLM itself could do have donne it, maybe you didn't need the code at all

                  • theshrike79 13 hours ago

                    It's a language model, not a compiler. Which is what people get wrong.

                    Ask one to count the 'r's in "strawberry" and it may or may not get it right.

                    Ask it to create a program to do it, it'll get it right instantly and it'll work.

                    When we get to a point where "AI" can write a program like that in the background, run it and use its result as a tool, we'll get the next big leap in efficiency.

                • lupusreal 7 hours ago

                  I think the future of computing is ephemeral code like this, created rapidly on demand, then done when the immediate task is done.

              • bsder 21 hours ago

                > Here's the thing. Those first two things don't exist.

                You are 100% wrong on this. They exist all the time when I'm doing a hardware task.

                I need to test a new chip coming off the fab. I need to get the pins in the right place, the test code up and running, the jig positioned correctly, the test vectors for JTAG generated, etc.

                This ... is ... a ... pain ... in ... the ... ass.

                It changes every single time for every single chip. It changes for every jig and every JTAG and every new test machine. Nobody gives one iota of damn about the code as it will be completely different for the next revision of this chip. Once I validate that the chip actually powers on and does something sane, the folks who handle the real testing will generate real vectors--but not before.

                So, generating that go/no-go code is in the way of everything. And nobody cares about what it looks like because it is going to be thrown out immediately afterward.

          • phil21 18 hours ago

            This simply is untrue for a huge portion of work done in IT. It seems in the last decade or so many people have forgotten that programming is simply a way to achieve a business goal or task.

            Some things are highly valuable (e.g. validating electronic equipment via scores of hardware testing) and can be curated by a skilled "amateur" programmer (we used to call these folks "scripters" back in the day) more or less indefinitely. Adding "real programmers" to the mix would simply cause costs to skyrocket with no discernable impact on revenue produced - just some smug programmers showing off how much better their code looks and how much more maintainable it is.

            Stuff like this is domain knowledge distilled into a bash script. If you have the domain knowledge it is typically pretty trivial to simply do a full rewrite if you come in after this guy retires. The domain knowledge and understanding of what the automation is actually doing is the hard and skilled part of the job.

            I'm not downvoting the low-value comment because I believe it needs high visibility for many who come here and see the responses to it. You don't need to "engineer" software for every use-case. Sometimes the guy with deep domain knowledge who can hack and kludge a bash or python script together is 10x more valuable than some guy with a CS degree and a toolbox of "best practices" who doesn't give a shit about the underlying task at hand. I'm sure some fancy new frameworks will be used though!

            Sysadmins of yesteryear who were expected to deeply understand hardware and OS level things, but not be able to program all understand this and would be able to make great use of AI. The advance of programmers into the sysadmin (aka devops) space is really a travesty of speciality skills being lost and discarded. A whole lot of very pretty overengineered code sitting on top of hardware and systems that are barely understood by those who wrote it and it shows.

            • tayo42 16 hours ago

              Idk how you can be nostalgic for hacking things together with bash scripts.

              Bringing software development practices to the sysadmin world has improved it so much.

              Infra as code, no pet servers, languages that don't require massive maintenence every time a dependency or language version changes, testing frameworks.

              Things are so much better then clicking around VMware, bugging the one guy that runs a bash script cron off his laptop to write a feature.

          • fragmede 18 hours ago

            You made a new account just to shit on this human's report of being an electrical engineer and using LLMs to help him get shit done? c'mon man. I get that you're probably a software developer and that you see LLMs as an existential threat to the craft, but this person is saying that this hammer managed to drive in their nail. Their two 2x4's stuck together probably never need get more complex and you're telling them they'll never be able to build a skyscraper like that.

            Ofc we're screwed in a couple more generations of Moore's law, if/when AI is able to one-shot "untangle this big ball of mud for me please".

      • lkey 16 hours ago

        Echoing wallee,

        This task: "I have to constantly be on my toes to follow what the LLMs are proposing"

        and "understanding, then solving the specific problems you are being paid to solve"

        are not the same thing. It's been linked endlessly here but Programming as Theory Building is as relevant today as it was in '85: https://pages.cs.wisc.edu/~remzi/Naur.pdf

    • athrowaway3z 10 hours ago

      Talking about a specific set-up they use isn't the goal of the post, so I don't think it's a cop out.

      "How to harness it" is very clearly the difference between users right now, and I'd say we're currently bottom heavy with poor users stuck in a 'productivity illusion'.

      But there is the question of "what is productivity?"

      I'm finding myself (having AI) writing better structured docs and tests to make sure the AI can do what I ask it to.

      I suspect that turns into compounding interests (or lack of technical debt).

      For an industry where devs have complained, for decades, about productivity metrics being extremely hard or outright bullshit, I see way too many of the same people now waving around studies regarding productivity.

    • Art9681 20 hours ago

      It shifts your cognitive tasks to other things. Like every tool. The tool itself is an abstraction over tedium. We built it for a reason. You will spend less time thinking about some things, and more time thinking about others.

      In that regard, nothing will change.

    • add-sub-mul-div a day ago

      > I realize the author qualified his or her statement with "know how to harness it," which feels like a cop-out I'm seeing an awful lot in recent explorations of AI's relationship with productivity.

      "You're doing AI wrong" is the new "you're doing agile wrong" which was the new "you're doing XP wrong".

      • pjmlp a day ago

        Unfortunely many of us are old enough to know how those wrong eventually became the new normal, the wrong way.

      • CuriouslyC a day ago

        At this point I don't even care to argue with people who disagree with me on this. History will decide the winner and if you think you're not on the losing side, best of luck.

        • add-sub-mul-div a day ago

          I've been watching bad technology decisions win since before some of the people on this forum were born.

        • rsynnott 13 hours ago

          I mean, there's no real sharp edge here. If these things ultimately get to the point where they are demonstrably useful, sceptics will presumably adopt them. Given that the ecosystem changes drastically month-to-month, there's little obvious benefit to being an early adopter.

          • CuriouslyC 8 hours ago

            Actually I think the pascal's wager goes the other way. We're in a giant game of musical chairs in the software industry, and the people who will keep getting seats at each round are the overall most skilled and capable of using AI, and this is going to bias more towards AI skill over time.

            The consequence in this model of not being an early AI adopter is that unless you're a rock star performer already, you're going to fall behind the curve and get ejected from the game of software engineering early. The people who stay in the game until the end will be the ones that have vision now.

            If I'm wrong, then all the people who learned AI now will have just wasted a few years on a low value skill, which is honestly the story of the entire history of tech (hello flash devs?), i.e. not an existential threat to engineers.

            • rsynnott 5 hours ago

              > The consequence in this model of not being an early AI adopter is that unless you're a rock star performer already, you're going to fall behind the curve and get ejected from the game of software engineering early.

              This is assuming that AI _currently improves productivity_. There's little empirical evidence for this (there is evidence that it causes people to believe that they themselves are more productive, but that's not very useful; _all sorts_ of snake oil cause people to believe that they themselves are more productive).

              My baseline assumption right now would be that AI does not, in aggregate, improve productivity (at least in software engineering; it may in some other fields); if it ever _does_, then sure, I'll probably start using it?

              • CuriouslyC 4 hours ago

                AI 100% does currently improve productivity when used correctly. You can say that's a no true Scotsman, but you can look at my company GitHub page to see that I'm delivering.

                AI delivers results when you understand its characteristics and build a workflow around it designed to play to its strengths. AI doesn't deliver huge results when you try to shoehorn it into AI unfriendly workflows. Even if you took the Stanford 95% study on its face (which you shouldn't, there are a lot of methodological issues), there are still 5% of projects that are returning value, and it's not random, it's process differences.

        • jamesnorden a day ago

          I'd say you're a bit biased.

          • CuriouslyC a day ago

            I am biased, but by my personal experience, not the desire to sell anything to anyone. I don't even have products for sale right now, I'm just building free things to try and help people figure out how to ride this wave safely.

            • svieira a day ago

              > I don't even have products for sale right now

              The link in your bio is a long series of tools for various personas with a "schedule a consultation" link next to each one. I'm not sure what "consultation" is if not "a product". But maybe they're all free?

              • CuriouslyC a day ago

                Please find a post where I've shilled anything that I charge money for. I explicitly don't mention that I do consulting here in the spirit of the community. I'm fine with being called out for a strongly pro-AI stance, but I would appreciate not having my patterns of frankness/honesty/community engagement called into question spuriously.

                • svieira 18 hours ago

                  I do not think you are shilling anything at all - merely that you are biased and that bias is somewhat driven by a source of income that you link to your hacker news identity. If you were a Forth hacker with a Forth consultancy, the same thing would apply.

                  That said, I did not intend to call out your bias as a means of questioning your honesty and I apologize if my communication came across as doing so!

                  • aaronbrethorst 14 hours ago

                    That you, in turn, do not immediately disclose your inevitable biases, makes me dubious of your motives. We all have biases. It’s important to be clear eyed and forthright

      • SideburnsOfDoom a day ago

        You mean "It can’t be that stupid, you must be prompting it wrong"

        • slaymaker1907 a day ago

          My favorite is when someone is demoing something that AI can do and they have to feed it some gigantic prompt. At that point, I often ask whether the AI has really made things faster/better or if we've just replaced the old way with an opaque black box.

          • naasking 20 hours ago

            The gigantic prompt has a side benefit though: it's documentation on design and rationale that otherwise would typically not get written.

            • ruszki 13 hours ago

              Until now, it was a code smell if you need those often. There are exceptions to that, but those are a small minority.

              Also, design and rationale what humans need is different than what LLMs need. Even what is needed according to humans writing code/documentation and what’s needed for reading is different, that’s why we have that many bad documentation. There are ton of Apache projects whose documentation is rather a burden than helpful. They are long and absolutely useless.

              • naasking 6 hours ago

                > Until now, it was a code smell if you need those often. There are exceptions to that, but those are a small minority.

                Documentation for a system, particularly a rationale, is never a code smell.

                > There are ton of Apache projects whose documentation is rather a burden than helpful. They are long and absolutely useless.

                LLM prompts are short and to the point by comparison, that's part of my point.

                • ruszki 31 minutes ago

                  > is never a code smell

                  /* This changes the sorting of the original list based on the result, when you use a pipeline <- you have an architectural problem - this happens for example in Splunk */

                  map()

                  /* You need to call these in this exact order, one after another <- your architecture is terrible */

                  processFirst()

                  processSecond()

                  processThird()

                  /* We did this unusual thing because we hate encapsulation <- obvious paraphrase, and lie, you were just lazy, or you didn't have time */

                  class A {

                  public static String x

                  }

                  In unrelated code: A.x = "something";

                  /* We merged these two classes because they looked similar, in the code we have a lot of switches and ifs to differentiate between them, we explained them one-by-one <- do I need to explain this? */

                  class TwoCompletelyUnrelatedThingsInOne

                  > that's part of my point

                  > The gigantic prompt

                  It was clearly not.*

            • hackable_sand 14 hours ago

              Null argument. I'd rather have developers who do system design before the system implementation.

              • naasking 6 hours ago

                Developers who don't document typically do system design beforehand, but that design process often just isn't documented/recorded properly, that's my point. A development environment that records prompt history and lets you use an LLM to query it is a goldmine for auditing and avoiding the pitfalls of Chesterson's fence.

      • bitwize a day ago

        More like the new "you're holding it wrong"

    • tptacek a day ago

      It depends a lot on how you use it, and how much effort you put into getting a knack for using it (which is rough because you're always worried that knack might be out of date within a month or two).

      I use Claude, Codex, and the Gemini CLI (all "supervised command line agents"). I write Go. I am certain that agents improve my productivity in a couple common scenarios:

      1. Unsticking myself from stalling at the start of some big "lift" (like starting a new project, adding a major feature that will pull in some new dependency). The LLM can get things very wrong (this happens to me maybe 20% of the time), but that doesn't matter, because for me the motion of taking something wrong and making it "righter" is much less effort than getting past a blank page, assembling all the resources I need to know how to hello-world a new dependency, that kind of thing. Along with "wrestling for weeks with timing-dependent bugs", this kind of inertia is one of like two principal components of "effort" in programming for me, so clearing it is a very big deal. I'm left at a point where I'm actually jazzed about taking the wheel and delivering the functionality myself.

      2. Large mechanical changes (like moving an interface from one component to another, or major non-architectural refactors, or instrumentation). Things where there's one meta-solution and it's going to be repeated many times across a codebase. Easy to review, a lot of typing, no more kidding myself for 20 minutes that I can make just the right Emacs macro to pull it off.

      3. Bug hunting. To be clear: I am talking here about code I wrote, not the LLM. I run it, it does something stupid. My first instinct now is to drop into Claude or Gemini, paste the logs and an @-reference to a .go file as a starting point, and just say "wtf". My hit rate on it spotting things is very good. If the hit rate was even "meh borderline" that would be a huge win for the amount of effort it takes, but it isn't, it's much better.

      I'm guessing a lot of people do not have these 3 experiences with LLM agents. I'm sorry! But I do think that if you stipulate that I'm not just making this up, it's hard to go from here to "I'm kidding myself about the value this is providing". Note that these are three cases that look nothing at all like vibe-coding.

      • defatigable 20 hours ago

        This matches my experience exactly. #3 is the one I've found most surprising, and it can work outside the context of just analyzing your own code. For example I found a case where an automated system we use started failing due to syntax changes, despite no code changes on our part. I gave Claude the error message and the context that we had made no code changes, and it immediately and correctly identified the root cause as a version bump from an unpinned dependency (whoops) that introduced breaking syntax changes. The version bump had happened four hours prior.

        Could I have found this bug as quickly as Claude? Sure, in retrospect the cause seems quite obvious. But I could just as easily rabbit holed myself looking somewhere else, or taken a while to figure out exactly which dependency caused the issue.

        It's definitely the case that you cannot blindly accept the LLM's output, you have to treat it as a partner and often guide it towards better solutions. But it absolutely can improve your productivity.

      • zaptheimpaler 18 hours ago

        My experience with getting it to write code has not been good so far. Today I had a pretty mechanical task in Go. Basically some but not all functions from a package moved into another package so I asked Gemini to just change instances of pkg.Foo to pkgnew.Foo and import pkgnew in those files. It just got stuck on one file for 2 minutes and at that point I was already halfway through find/replacing it on my own.

        For me it's been somewhat useful to ask questions but always fails at modifying any code in a big codebase.

      • theshrike79 13 hours ago

        With LLM assistance I managed to pinpoint an esoteric Unity issue to within 5 lines of code.

        I've had one 3-day basic course in Unity, but I know how to prompt and guide an AI.

      • lupusreal 7 hours ago

        I agree with these points and would add another: writing code that I don't strictly need and would normally avoid writing at all simply because it's easier to take the lazy route and do without it. This morning while on the shitter, I had claude code use dbus to interface with tumbler for generating thumbnails. My application doesn't neeed thumbnails and normally my reaction would be "Dbus? Eww. I'll just do that later (never)" But five minutes of Claude Code churning got the job done.

      • naasking 20 hours ago

        > 3. Bug hunting.

        Agreed, and also configuration debugging. Treat the LLM as interactive documentation for libraries, frameworks, etc. that you may be using. They're great for solving problems in getting systems up and running, and they explain things better than the documentation because it's specific to your scenario.

    • srcreigh 21 hours ago

      > We do not provide evidence that:

      > AI systems do not currently speed up many or most software developers

      > AI systems in the near future will not speed up developers in our exact setting

      > There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting

    • zahlman 20 hours ago

      > increases perceptive productivity

      increases the perception of productivity?

    • Fire-Dragon-DoL a day ago

      I agree with you but have no metrics yet. It does help as a rubber duck though

    • vel0city a day ago

      Isn't most technology generally about "product motion toward comfort maximizing over all things"? Isn't a can opener "comfort maximizing"? Bicycles comfort maximize over walking the same distance and leave more time for leisure. We use tractors to till the soil because tilling it by dragging plows by hand or by livestock was far less comfortable and led less time for other pursuits.

      "AI" might be a good thing or a bad thing especially depending on the context and implementation, but just generally saying something is about maximizing comfort as some inherently bad goal seems off to me.

    • breakfastduck a day ago

      It depends what environment you're operating within.

      I've used LLMs for code gen at work as well as for personal stuff.

      At work primarily for quick and dirty internal UIs / tools / CLIs it's been fantastic, but we've not unleashed it on our core codebases. It's worth noting all the stuff we've got out of out are things we'd not normally have the time to work on - so a net positive there.

      Outside of work I've built some bespoke software almost entirely generated with human tweaks here and there - again, super useful software for me and some friends to use for planning and managing music events we put on that I'd never normally have the time to build.

      So in those ways I see it as massively increasing productivity - to build lower stakes things that would normally just never get done due to lack of time.

      • phil21 17 hours ago

        I do wonder about the second order effects of the second bit.

        A lot of open source tooling gets created to solve those random "silly" things that are personal annoyances or needs. Then you find out others have the same or similar problem and entire standard tooling or libraries come into existence.

        I have pontificated on how easy access to immediate "one offs" will kill this idea exchange? Instead of one tool maintained by hundreds to fulfill a common need, we will end up with millions of one-off LLM generated that are not shared with anyone else.

        Might be a net win, or a net loss. I'm really not sure!

    • dist-epoch a day ago

      > AI is just another product motion toward comfort maximizing over all things, as cognitive engagement is difficult and not always pleasant. In a nutshell, it is another instant gratification product from tech.

      For me is the exact opposite. When not using AI, while coding you notice various things that could be improved, you can think about the architecture and what features you want next.

      But AI codes so fast, that it's a real struggle keeping up to it. I feel like I need to focus 10 times harder to be able to think about features/architecture in a way that AI doesn't wait after me most of the time.

    • naasking 20 hours ago

      > and this probably compounds as AI-generated code piles up in a codebase, as there isn't an author who can attach context as to why decisions were made

      I don't see why. If anything there's more opportunity for documentation because the code was generated from a natural language prompt that documents exactly what was generated, and often why. Recording prompts in source control and linked to the generated code is probably the way to go here.

    • protocolture a day ago

      >Because AI reduces cognitive engagement with tasks

      Just a sideways thing. Cognitive offloading is something humans do with each other quite a lot. People offload onto colleagues, underlings and spouses all the time.

      People engage with AI through the prism of reducing engagement with the thing they dont like, and increasing engagement with the thing they do like.

      It isnt a straight up productivity boost, but its more like going from being a screenwriter to a director.

  • bryanlarsen a day ago

    Multiple contexts is hard, and often counter-productive. It used to be popular on HN to talk about keeping your "flow", and railing against everything that broke a programmer's flow. These slow AI's constantly break flow.

    • solumunus a day ago

      The new flow is cycling between 5 concurrent agent sessions while watching YouTube.

      • theshrike79 13 hours ago

        And my ADHD brain loves it :D

        Except no YouTube, I just watch my shows off Plex.

      • handfuloflight 20 hours ago

        as they say, "few"

    • hamdingers a day ago

      The ideal for me as a flow-seeker is the quick-edit feature of Void and presumably other editors. It saves me the context switching to google or docs to find the right syntax and method names for what I want to express, without requiring I waste my time waiting for the LLM to figure out what I already know (the what and the where).

      • bryanlarsen 20 hours ago

        The auto complete provided by your lsp server to your editor does a much better job of this than LLM's do, in my opinion.

        • theshrike79 13 hours ago

          My LLM-powered autocomplete can guess complete functions within 80-95% correctness.

          No LSP server autocomplete has managed to do that.

    • wahnfrieden 20 hours ago

      Flow sacredness made sense when we could only do our work serially and juggling tasks just meant switching around one thing at a time. Now we can parallelize more of our activities, so it's time to reevaluate the old wisdom.

      • zahlman 20 hours ago

        I disagree. Nothing has changed about how human brains work. Try to read something while listening to something else at the same time, and see how much you absorb, for example. We can still, in large part, only do our work serially.

        • solumunus 3 hours ago

          You still work serially. It’s just while you’re reviewing output or working on the next prompt you have other agent sessions doing work. It’s entirely possible to be in a flow state cycling between concurrent agent sessions constantly reviewing output, building context and honing prompts.

          The context switching may be difficult for some. Those that can manage it and excel at these new skills will undoubtedly become more productive than those who cannot and do not.

        • wahnfrieden 17 hours ago

          I don’t think you understand. I’m not suggesting you try to do two tasks at the same time. The old wisdom isn’t that we shouldn’t do two things at once with our hands/minds directly. It’s that we shouldn’t put another task into WIP while an existing task is still pending completion, if theres work we can do on that already WIP task. (Lean methodology)

          But now we can start the tasks and rest while they are ongoing, and wait until a later point to evaluate their result. So you have choices: do you avoid delegating any tasks because you think you can do them faster? Do you rest and do nothing or get distracted with non-work while you wait? Or do you begin parallel async work and interleave orchestration and evaluation tasks (performed serially as individual pieces)?

          None of this is about breaking human psychology. It’s about how to adapt processes within lean methodologies that became pervasive in our industry.

  • ryanmcgarvey 20 hours ago

    The only reason I want these things to be any smarter is because I need them to do more work over longer periods screwing up. The only reason I need them to do more work over long periods is because they are too slow to properly pair with.

    If I could have it read more of my project in a single gulp and produce the 10-1000 lines of code I want in a few seconds, I wouldn't need it to go off and write the thousands of lines on its own in the background. But because even trivial changes can take minutes by the time it slurps up the right context and futzes with the linter and types, that ideal pair programmer loop is less attractive.

  • joz1-k a day ago

    From the article: Anthropic has been suffering from pretty terrible reliability problems.

    In the past, factories used to shut down when there was a shortage of coal for steam engines or when the electricity supply failed. In the future, programmers will have factory holidays when their AI-coding language model is down.

    • corentin88 a day ago

      Same as GitHub or Slack downtimes severely impact productivity.

      • thw_9a83c a day ago

        I would argue that dependency on GitHub and Slack is not the same as dependency on AI coding agents. GitHub/Slack are just straightforward tools. You can run them locally or have similar emergency backup tools ready to run locally. But depending on AI agents is like relying on external brains that have knowledge you suddenly don't have if they disappear. Moreover, how many companies could afford to run these models locally? Some of those models aren't even open.

        • danielbln a day ago

          There are plenty of open weight agentic coding models out there. Small ones you can run on a Macbook, big heavy ones you can run on some rented cloud instance. Also, if Anthropic is down, there is still Google, OpenAI, Mistral, Deepseek and so on. This seems like not much of an issue, honestly.

          • thw_9a83c a day ago

            The small ones that you can run on a MacBook are quite useless for programming. Once you have access to a state-of-the-art model, it's difficult to accept any downgrade. That's why I think AI-driven programming will always rely on data centers and the best models.

            > if Anthropic is down, there is still Google, OpenAI, Mistral, Deepseek and so on

            No company is going to pay for subscriptions to all of them. Either way, we'll see a new layer of fragility caused by overdependence on AI. Surely, though, we will adapt by learning from every major event related to this.

            • danielbln a day ago

              > The small ones that you can run on a MacBook are quite useless for programming.

              That really depends on your Macbook :). If you throw enough RAM at it, something like a qwen3-coder will be pretty good. It won't stack up to Claude, or Gemini or GPT, of course, but it's certainly better than nothing and better than useless.

              > No company is going to pay for subscriptions to all of them.

              They don't have to, every lab offers API based pricing. If Anthropic is down, I can hop straight into Codex and use GPT-5 via API, or Gemini via Vertex, or just hop onto AWS Bedrock and continue to use Claude etc.

              I don't think this is an issue in practice, honestly.

          • bongodongobob a day ago

            They exist, but you can't do serious work with them.

        • marcellus23 a day ago

          How exactly can you run GitHub or Slack locally? Their entire purpose is being a place where people can communicate, they need to be centrally available on a network to have any function at all.

          • zahlman 20 hours ago

            > or have similar emergency backup tools ready to run locally

            Developers used to share code through version control before there were websites to serve the "upstream", and they used to communicate without bespoke messenger apps.

            Their former ways of doing so still work just fine.

          • thw_9a83c a day ago

            > How exactly can you run GitHub or Slack locally?

            I meant locally as a company.

            • marcellus23 a day ago

              How does that solve the downtime issue? In my experience company-run instances tend to go down just as often, if not more often.

              • thw_9a83c a day ago

                The experience may differ from company to company, but I was talking about the backup system.

    • bongodongobob a day ago

      This joke is as old as typewriters.

    • catigula a day ago

      >in the future

      >programmers

      Don't Look Up

      • ActionHank a day ago

        There are still people who dictate their emails to a secretary.

        Technology changes, people often don't.

        Programmers will be around for a longer time than anyone realises because most people don't understand how the magic box works let alone the arcane magics that run on it.

        • some_random a day ago

          Sure and there are still people who take a horse and buggy into town, but we are well past Peak Horse.

          • ActionHank a day ago

            A closer analogy would be scribes and the printing press.

            Except it doesn’t fit. For a while now we’ve had access to basically all the knowledge in the world and most people don’t use it.

            Why would people make the effort to build a website using AI when they didn’t do so with any of the existing no-code options available.

            They won’t.

            Will dev be exactly the same in 10 years time? No.

            Will there be more devs than there are now? 100%

            Will experienced devs make bank? Yes.

        • onionisafruit 17 hours ago

          Exactly. I know I’ll still be programming in 5 years. What I don’t know is whether anybody will be paying me.

        • catigula a day ago

          "We can remove your programmers valued at millions per month for $100/month secure cloud agents"

          Pretty easy sell.

          • svieira a day ago

            Yes, it is an easy sell. But when that happens this sentence will also be viable - "We can remove the need for your company, valued at multiple millions a month" ... because after all, PMs and CEOs aren't harder to replace than programmers at that point.

          • ActionHank a day ago

            Crazy how well that’s been working

            • catigula a day ago

              The technology isn't there yet.

              All signs point to it rapidly progressing towards this inflection.

              Current agents are insanely good, every decent programmer using them knows this.

              • ActionHank a day ago

                > The technology isn't there yet. Yes

                > All signs point to it rapidly progressing towards this inflection. Not really

                > Current agents are insanely good, every decent programmer using them knows this. Also not really

                If you find them "insanely good" you were likely doing work that was already very repetitive or broadly implemented by others.

                They have their uses, but it isn't as pervasive as people believe.

                • catigula a day ago

                  Tell me you haven't extensively used Claude Code or Codex GPT-5 without telling me.

                  • ActionHank a day ago

                    Used both, they're great for boilerplate. Anything more they breakdown.

                    • handfuloflight 20 hours ago

                      What breaks down is developer patience faced with a new paradigm.

                    • catigula a day ago

                      Okay well I and many other experienced engineers disagree so the only remaining conclusion is that you're misusing the technology.

                      • dinkumthinkum 16 hours ago

                        Argumentum ad populum. What kind of experience to those engineers? That matters. Another possible conclusion is that the parent was talking about use cases that are not simple web apps or marketing pages but real issues in large software.

                        • ActionHank 10 hours ago

                          Yes, exactly this.

                          Most of my day job is not about bashing out nextjs websites. In fact if I did that one day a year it would be a lot.

                          • catigula 6 hours ago

                            You could assign me a task from your job and I could trivially complete it using agents. However, I will charge consulting rates.

                            • ActionHank 4 hours ago

                              This degree of ignorance is why we don't let consultants near the actually important stuff.

                        • catigula 6 hours ago

                          Again you're taking your own circumstance or even patent inability and extending that to the entire technology.

                          You've set yourself up such that all I need to do is go "I'm developing complex veterinary software including integrations with laboratory equipment" and you're completely falsified. Why expose yourself like this instead of being intellectually humble?

                          • ActionHank 4 hours ago

                            You're the one who has extended their limited experience to all of software engineering.

                            The only reason that anyone has engaged with you here is because you're being intellectually arrogant.

                      • 21 hours ago
                        [deleted]
  • mark_l_watson 9 hours ago

    I rate the value of LLM based AI on how it improves me personally. Positive experiences include being able to more easily. read scientific papers with an AI filling in details for embedded math, and after a vibe coding session when I feel like I understand the problem better because I was engaged with the process and I understand the resulting code.

    Negative experiences include times when I am lazy, turn off my brain and just accept LLM output. I am also a little skeptical about automating email handling, etc. that is cool technology but how useful is it, really? I can imagine insiders talking between themselves saying "feel the bubble!" but when talking with reporters they talk like "feel the AGI" or "oh no, our AI tech is so strong it will take over the world" - excellent strategies for pumping stock prices and valuations.

  • infecto a day ago

    Cursor imo is still one of the only real players in the space. I don’t like the claude code style of coding, I feel too disconnected. Cursor is the right balance for me and it is generally pretty darn quick and I only expect it to get quicker. I hope there are more players that pop up in this space.

    • sealeck a day ago

      Have you tried https://zed.dev ?

      • infecto 6 hours ago

        Yea and for some reason it was not my cup of tea. I think partly due to their paid version feels like an afterthought.

      • dmix a day ago

        How is the pricing? I see it says "500 prompts a month" and only Claude. Cursor is built around token usage and distributes them across multiple models when you hit limits on one which turns out to be pretty economical.

        • ethmarks a day ago

          Zed supports BYOK so you can connect it to GitHub Copilot or Anthropic or OpenRouter or whatever. The 500-per-month limit is only for Zed-hosted prompts. I personally switch between Zed-hosted, GH Copilot, Gemini API, and Ollama. Zed's AI integration isn't quite as "it just works" as Cursor's, but it's still very good and it gives you much more freedom.

        • 85392_school a day ago

          You don't have to use the built in subscription. They have support for curated APIs (OpenAI, Anthropic, GitHub Copilot, OpenRouter, etc), any OpenAI-compatible API, and agents like Gemini CLI and Claude Code in the AI panel.

    • solumunus a day ago

      Wild to me. When I switched from Cursor to Claude it only took me a day to realise that as things stand I would never use Cursor again.

      • infecto 7 hours ago

        We will all have different experiences and workflow but I am not sure why it’s wild. For myself I find tools like Claude code or codex have a place but it’s not me using the tool interactively. They are both too slow in the feedback loop and overly verbose that it’s hard at least for me to establish a good cadence for writing code.

      • vinnymac a day ago

        I am also surprised by this. Especially because we can just run Claude Code from anywhere. Cursor, VS Code, Zed, emacs, vim, JetBrains, etc.

        Cursor CLI, Codex and Gemini work too, but lag slightly behind in a variety of different ways that matter.

        And if you think you’re getting better visual feedback through Cursor it’s likely you’re just not using Claude with your IDE correctly.

        • wilg 19 hours ago

          I mean the problem with Claude Code is you have to use Claude

          • solumunus 3 hours ago

            The model most people have been recommending you use in Cursor for the last year or so… Which model do you find significantly better?

    • CuriouslyC a day ago

      You're going to have to get used to feeling disconnected if you want to stay in the game, that's the direction this is heading (and fast). You need to move up the ladder.

      Also, cursor is both overpriced and very mediocre as an agent. Codex is the way to go.

      • gngoo 14 hours ago

        That is just a personal opinion, not a fact. Either option can be faster or more productive if it suits your personal coding style. I work with both, I also favor one. But money is not exactly an issue.

      • infecto 7 hours ago

        I get you have a financial incentive to say that but at least back it up. I do believe using ai tooling is here and now and a worthwhile endeavor but in my view we have not settled best practices yet and it depends on the individual preferences right now.

        Tools are for us to figure out what works and what does not. Saying be prepared to be disconnected sounds like slop by someone getting forced into someone else’s idea.

        If someone has a great workflow using a tool like codex that’s great but it does not mean it has to work for me. I love using codex for code reviews, testing and other changes that are independent of each other, like bugs. I don’t like using it for feature work, I have spent years building software and I am not going to twiddle my thumbs waiting for codex on something I am building real time. Now I think there is an argument that if you have the perfect blueprint of what to build that you could leverage a tool like codex but I am often not in that position.

        • CuriouslyC 6 hours ago

          AI coding tools right now are rudimentary, and when used properly they can already massively increase velocity and enable capabilities, and this isn't random boosterism this is based on pushing myself towards 100% AI generated code over the last year, and working to improve my throughput and reduce the error rate of my generated code. The AI coding tools industry is being led by a 23 year old with no software engineering or AI experience, that should tell you something about the hype vs rigor tradeoff that is being made.

          Once we collectively start actually engineering AI coding systems rather than trying to surf on vibes their power will become much more apparent to the naysayers who haven't invested the time in the processes and tools.

          As for backing it up, if you want to hop on my company github you can check out the spark graphs for my projects, and feel free to poke around the code I've spent time tightening to see that it's not slop (have fun with the async simd rust in valknut/scribe), and keep in mind I have large private projects (>200k) that are active as well. I've probably delivered 400k LoC in the last month with >60% coverage on large codebases, 99.99% AI generated.

  • mmmllm a day ago

    Speed is not a problem for me. I feel they are at the right speed now where I am able to see what it is doing in real time and check it's on the right track.

    Honestly if it were any faster I would want a feature to slow it down, as I often intervene if it's going in the wrong direction.

  • stuaxo 12 hours ago

    This completely ignores energy usage of "infinite" token usage.

    These things are somewhat useful, but it doesn't come for free, and I'm talking externalities here, which are not priced in.

  • dmix a day ago

    I looked up the graph they are using

    https://openrouter.ai/rankings

    It says "Grok Code Fast 1" is ranked first in token usage? That's surprising. Maybe it's just OpenRouter bias or how the LLM is used.

    I would have assumed Claude would be #1

    • Bolwin a day ago

      Xai has been offering it for free. Loom at the #1 user. It's kilo code, they've been giving grok code for free for weeks.

      • theshrike79 13 hours ago

        Cline too. It was free for a week(end?) and they extended it for a longer time.

        It's pretty good tbh. Some quirks but it's efficient at making changes and seems to understand code pretty well.

    • Dlanv a day ago

      Everyone I know who uses Claude does not use it through openrouter.

    • CuriouslyC a day ago

      Grok code fast is a legit good model. https://www.youtube.com/watch?v=Y-SyfYXupTQ

      • dmix a day ago

        Grok 4 was really good in my experience, but it was really slow. Might try the fast version if my Claude runs out of tokens. I stick to Claude because I know the models output patterns and flaws (which are more predictable than GPT-5).

        • theshrike79 13 hours ago

          Just yesterday I had Cline+Grok Code Fast fix an issue caused by Claude ... who ran out of credits mid-fix.

          (LLMs seem to think Go embeds can use ../../style relative paths, they cannot. And when they notice it doesn't work like that, they use the weirdest shit to try to fix it.)

    • a day ago
      [deleted]
  • cadamsdotcom a day ago

    There’s a quality piece too - I don’t mind dialup speeds of tokens per second if the quality is high enough to avoid rework.

    If you want better speeds, your coding agent might perform better outside US office hours :)

  • arisAlexis a day ago

    Cerebras with Qwen and Mistral with Cerebras already feel like magic

  • oceanparkway 10 hours ago

    This article mentions Cerebras, I tried it out and was 1) disappointed and 2) they started a subscription but gave me no way of cancelling, their billing page is broken :/

  • MangoToupe 13 hours ago

    I'd take quality improvements over speed any day. AI gets you 80% of the way there, but you often spend more time fixing the last 20% than you would have approaching the problem yourself.

  • howmayiannoyyou a day ago

    I expected to see OpenAI, Google, Anthropic, etc. provide desktop applications with integrated local utility models and sandboxed MCP functionality to reduce unnecessary token and task flow, and I still expect this to occur at some point.

    The biggest long-term risk to the AI giant's profitability will be increasingly capable desktop GPU and CPU capability combined with improving performance by local models.

    • mordymoop a day ago

      From experience it seems like preempting context scoping and routing decisions to smaller models just results in those models making bad judgements at a very high speed.

      Whenever I experiment with agent frameworks that spawn subagents with scoped subtasks and restricted context, things go off the rails very quickly. A subagent with reduced context makes poorer choices and hallucinates assumptions about the greater codebase, and very often lacks a basic sense the point of the work. This lack of situational awareness is where you are most likely to encounter js scripts suddenly appearing in your Python repo.

      I don’t know if there is a “fix” for this or if I even want one. Perhaps the solution, in the limit, actually will be to just make the big-smart models faster and faster, so they can chew on the biggest and most comprehensive context possible, and use those exclusively.

      eta: The big models have gotten better and better at longer-running tasks because they are less likely to make a stupid mistake that derails the work at any given moment. More nines of reliability, etc. By introducing dumber models into this workflow, and restricting the context that you feed to the big models, you are pushing things back in the wrong direction.

    • casey2 10 hours ago

      Yup. I expected a google LLM to coordinate with many local expert LLMs with knowledge of local tools and other domain expert LLMs in the cloud.

      I they don't see a viable path forward without specialty hardware

  • brador 11 hours ago

    Imagine when we measure in Mtoks/s. That with guardrails off will be insane.

  • everyone a day ago

    Ive used chatGPT to help me learn new stuff about which I know nothing (this is it's best use imo) and also write boilerplatey functions, eg. write a function that does precisely X.

    Having it integrated into my IDE sounds like a nightmare though. Even the "intellisense" stuff in visual studio is annoying af and I have to turn it off to stop it auto-wrecking my code (eg. adding tonnes of pointless using statement). I dont know how the integrated llm would actually work, but I defo dont want that.

    • theshrike79 13 hours ago

      > "An LLM agent runs tools in a loop to achieve a goal."

      This is the magic sauce compared to copy-pasting snippets to a web browser.

      It can automatically read code, search through it and modify multiple files automatically. Then it can compile it, run tests, run the actual application to check it works etc.

    • arealaccount 21 hours ago

      Eventually you start to anticipate what it will output and you can get ahead of it tabbing through tons of code that you were intending to write.

      But when it doesn’t output what you want you spend mental energy and extra time reorienting to get back on track.

      About 50% of the time it works every time.

    • vrighter a day ago

      Writing the mindless boilerplate is when I'm thinking about the next step. If I didn't write it, i'd still have to take the time to think things through for the next step.

  • curtisszmania a day ago

    [dead]