Token growth indicates future AI spend per dev

(blog.kilocode.ai)

156 points | by twapi 4 hours ago ago

125 comments

  • g42gregory 3 hours ago

    This is why it’s so critical to have open source models.

    In a year or so, the open source models will become good enough (in both quality and speed) to run locally.

    Arguably, OpenAI OSS 120B is already good enough, in both quality and speed, to run on Mac Studio.

    Then $10k, amortized over 3 years, will be enough to run code LLMs 24/7.

    I hope that’s the future.

    • habosa an hour ago

      Every business building on LLMs should also have a contingency plan for if they needed to go to an all open-weights model strategy. OpenAI / Anthropic / Google have nothing stopping them from 100x-ing the price or limiting access or dropping old models or outright competing with their customers. Building your whole business on top of them will prove to be as foolish as all of the media companies that built on top of Facebook and got crushed later.

      • OfficialTurkey an hour ago

        Couldn't you also make this argument about cloud infrastructure from the standard hyperscaler cloud providers (AWS, GCP, ...)? For that matter, couldn't you make this argument about dependency your business has which it purchases from other businesses which are competing against each other to provide it?

        • empiko an hour ago

          In general, you are right, but AI as a field is pretty volatile still. Token producers are still pivoting and are generally losing money. They will have to change their strategy sooner or later, and there is a good chance that the users will not be happy about it.

      • ivape 25 minutes ago

        OpenAI / Anthropic / Google have nothing stopping them from 100x-ing the price

        There is also nothing stopping this silly world from breaking out into a dispute where chips are embargoed. Then we'll have high API prices and hardware prices (if there's any hardware at all). Even for the individual it's worth having that 2-3k AI machine around, perhaps two.

    • skybrian 3 hours ago

      Open source models could be run by low-cost cloud providers, too. They could offer discounts for a long term contract and run it on dedicated hardware.

      • qingcharles 3 hours ago

        This. Your local LLM, even if shared between a pool of devs, is probably only going to be working 8 hours a day. Better to use a cloud provider, especially if you can find a way to ensure data security, if that is an issue for you.

      • wongarsu 3 hours ago

        Exactly. There is no shortage of providers hosting open source models with per-token pricing, with a variety of speeds and context sizes at different price points. Competition is strong and barriers of entry low, ensuring that margins stay low and prices fair.

        If you want complete control over your data and don't trust anyone's assurances that they keep it private (and why should you) then you have to self-host. But if all you care about is a good price then the free market already provides that for open models

      • hkt an hour ago

        Hetzner and Scaleway already do instances with GPUs so this kinda already exists

        • hkt an hour ago

          In fact, does anybody want to hire a server with me? I suspect it'll work out cheaper than Claude max etc: a server from hetzner starts at £220ish: https://www.hetzner.com/dedicated-rootserver/matrix-gpu/

          It might be fun to work out how to share, too. A whole new breed of shell hosting.

    • 6thbit 2 hours ago

      Many of the larger enterprises (retail, manufacture, insurance, etc) are consistently becoming cloud-only or have reduced their data center foot print massively over the last 10 years.

      Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.

      Then there are the smaller startups that just never had their own data center. Are those going to start self-hosting AI models? And all of the related requirements to allow say a few hundred employees to access a local service at once? network, HA, upgrades, etc. Say you have multiple offices in different countries also, and so on.

      • nunez 2 hours ago

        > Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.

        they already are

      • physicsguy 2 hours ago

        > manufacture

        They're much less strict than they were on cloud, but the security practices are really quite strict. I work in this sector and yes, they'll allow cloud, but strong data isolation + segregation, access controls, networking reqs, etc. etc. etc. are very much a thing in the industry still, particularly where the production process is commercially sensitive in itself.

      • g42gregory 2 hours ago

        Enterprises (depending on the sector, think semi manufacturing) will have no choice for two reasons:

        1. Protecting their intellectual property, and

        2. Unknown “safety” constraints baked in. Imagine an engineer unable to ran some security tests because LLM thinks it’s “unsafe”. Meanwhile, VP of Sales is on the line with the customer.

    • hoppp 3 hours ago

      I am looking forward for the AMD 395 max+ PCs to come down in price.

      The inference speed locally will be acceptable in 5-10 years thanks to those generation of chips and finally we can have good local AI apps.

    • asadm 3 hours ago

      Even if they do get better. The latest closed-source {gemini|anthropic|openai} model will always be insanely good and it would be dumb to use a local one from 3 years back.

      Also tooling, you can use aider which is ok. But claude code and gemini cli will always be superior and will only work correctly with their respective models.

      • asgraham 3 hours ago

        I don’t know about your first point: at some point the three-year difference may not be worth the premium, as local models reach “good enough.”

        But the second point seems even less likely to be true: why will Claude code and Gemini cli always be superior? Other than advantageous token prices (which the people willing to pay the aforementioned premium shouldn’t even care about), what do they inherently have over third-party tooling?

        • nickstinemates 2 hours ago

          Even using Claude Code vs. something like Crush yields drastically different results. Same model, same prompt, same cost... the agent is a huge differentiator, which surprised me.

          • asgraham 2 hours ago

            I totally agree that the agent is essential, and that right now Claude Code is semi-unanimously the best agent. But agentic tooling is written, not trained (as far as I can tell—someone correct me) so it’s not immediately obvious to me that a third-party couldn’t eventually do it better.

            Maybe to answer my own question, LLM developers have one, potentially two advantages over third-party tooling developers: 1) virtually unlimited tokens, zero rate limiting with which to play around with tooling dev. 2) the opportunity to train the network on their own tooling.

            The first advantage is theoretically mitigated by insane VC funding, but will probably always be a problem for OSS.

            I’m probably overlooking news that the second advantage is where Anthropic is winning right now; I don’t have intuition for where this advantage will change with time.

      • SparkyMcUnicorn 3 hours ago

        I use Claude Code with other models sometimes.

        For well defined tasks that Claude creates, I'll pass off execution to a locally run model (running in another Claude Code instance) and it works just fine. Not for every task, but more than you might think.

    • okdood64 3 hours ago

      What's performance of running OpenAI OSS 120B on a Mac Studio as compared to running a paid subscription frontier LLM?

      • jermaustin1 3 hours ago

        I will answer for the 20B version on my RTX3090 for anyone who is interested (SUPER happy with the quality it outputs, as well). I've had it write a handful of HTML/CSS/JS SPAs already.

        With medium and high reasoning, I will see between 60 and 120 tokens per second, which is outrageous compared to the LLaMa models I was running before (20-40tps - I'm sure I could have adjusted parameters somewhere in there).

        • ivape 3 hours ago

          Do we know why it’s so fast barring hardware?

          • mattmanser 3 hours ago

            Because he's getting crap output. Open source locally on something that under-powered is vastly worse than paid LLMs.

            I'm no shill, I'm fairly skeptical about AI, but been doing a lot of research and playing to see what I'm missing.

            I haven't bothered running anything locally as the overwhelming consensus is that it's just not good enough yet. And that from posts and videos in the last two weeks.

            I've not seen something so positive about local LLMs anywhere else.

            It's simply just not there yet, and definitely aren't for a 4090.

            • jermaustin1 an hour ago

              That is a bit harsh. I'm actually quite pleased with the code it is outputting currently.

              I'm not saying it is anywhere close to a paid foundation model, but the code it is outputting (albeit simple) has been generally well written and works. I do only get a handful of those high-thought responses before the 50k token window starts to delete stuff, though.

            • ivape 2 hours ago

              I guess I meant how is a 20b param model simply faster than another 20b model? What techniques are they using?

              • medvezhenok 2 hours ago

                It's a MoE (mixture of experts) architecture, which means that there's only 3.6 billion parameters activated per token (but a total of 20b parameters for the model). So it should run at the same speed that a 3.6b model would run assuming that all of the parameters fit in vRAM.

                Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.

                So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.

      • andrewmcwatters 3 hours ago

        Chiming in here, M1 Max MacBook Pro 64GB using gpt-oss:20b over ollama with Visual Studio Code with GitHub Copilot is unusably slow compared to using Claude Sonnet 4, which requires (I think?) GitHub Copilot Pro.

        But I'm happy to pay the subscription vs buying a Mac Studio for now.

        • Jimpulse 2 hours ago

          Ollama's implementation for gpt-oss is poor.

    • root_axis 3 hours ago

      > In a year or so, the open source models will become good enough (in both quality and speed) to run locally.

      "Good enough" for what is the question. You can already run them locally, the problem is that they aren't really practical for the use-cases we see with SOTA models, which are just now becoming passable as semi-reliable autonomous agents. There is no hope of running anything like today's SOTA models locally in the next decade.

      • cyanydeez 3 hours ago

        they might be passable, but there's zero chance they're economical atm.

    • coldtea 2 hours ago

      >In a year or so, the open source models will become good enough (in both quality and speed) to run locally.

      Based on what?

      And where? On systems < 48GB?

    • moritzwarhier 2 hours ago

      After trying gpt-oss:20b, I'm starting to lose faith in this argument, but I share your hope.

      Also, I've never tried really huge local models and especially not RAG with local models.

    • jvanderbot 2 hours ago

      It's not hard to imagine a future where I license their network for inference on my own machine, and they can focus on training.

    • holoduke 3 hours ago

      Problem is that it really eats all resources when using a llm locally. I tried it. But the whole system becomes unresponsive and slow. We need minimum of 1tb memory and dedicated processors to offload.

    • cyanydeez 3 hours ago

      Its not, capitalism isn't about efficiency; it's about lockin. You can't lockin open source models. If fascism under republicans continue, you can bet they'll be shut down due to child safety or whatever excuse the large corporations need to turn off the free efficiency.

    • aydyn 3 hours ago

      This is unrealistic hopium, and deep down you probably know it.

      There's no such thing as models that are "good enough". There are models that are better and models that are worse and OS models will always be worse. Businesses that use better, more expensive models will be more successful.

      • ch4s3 3 hours ago

        > Businesses that use better, more expensive models will be more successful.

        Better back of house tech can differentiate you, but startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach. Anyone here who has been around long enough has seen this play out a number of times.

        • freedomben an hour ago

          > startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach.

          Indeed. In my idealistic youth I bought heavily into the "if you build it, they will come," but that turned out to not at all be reality. Often times the best product loses because of marketing, network effects, or some other reason that has nothing to do with the tech. I wish it weren't that way, but if wishes were fishes we'd all have a fry

      • seabrookmx 3 hours ago

        Most tech hits a point of diminishing returns.

        I don't think we're there yet, but it's reasonable to expect at _some point_ your typical OS model could be 98% of the way to a cutting edge commercial model, and at that point your last sentence probably doesn't hold true.

      • Fade_Dance 3 hours ago

        There is a sweet spot, and at 100k per dev per year some businesses may choose lower priced options.

        The business itself will also massively develop in the coming years. For example, there will be dozens of providers for integrating open source models with an in-house AI framework that smoothly works with their stack and deployment solution.

      • hsuduebc2 3 hours ago

        I agree. It isn't in the interest of any actor including openai to give out their tools for free.

    • mockingloris 3 hours ago

      Most devs where I'm from would scrape to cough up that amount

      More niche use case models have to be developed for cheaper and energy optimized hardware.

      └── Dey well

      • skybrian 3 hours ago

        This would be a business expense. Compared to hiring a developer for a year, it would be more reasonable.

        For a short-term gig, though, I don’t think they would do that.

  • crestfallen33 3 hours ago

    I'm not sure where the author gets the $100k number, but I agree that Cursor and Claude Code have obfuscated the true cost of intelligence. Tools like Cline and its forks (Roo Code, Kilo Code) have shown what unmitigated inference can actually deliver.

    The irony is that Kilo itself is playing the same game they're criticizing. They're burning cash on free credits (with expiry dates) and paid marketing to grab market share -- essentially subsidizing inference just like Cursor, just with VC money instead of subscription revenue.

    The author is right that the "$20 → $200" subscription model is broken. But Kilo's approach of giving away $100+ in credits isn't sustainable either. Eventually, everyone has to face the same reality: frontier model inference is expensive, and someone has to pay for it.

    • patothon 43 minutes ago

      that's a good point, however maybe the difference is that kilo is not creating a situation for themselves where they either have to reprice or they have to throttle.

      I believe it's pretty clear when you use these credits that it's temporary (and that it's a marketing strategy), vs claude/cursor where they have to fit their costs into the subscription price and make things opaque to you

    • fragmede 3 hours ago

      Also frontier model training is expensive, and at some point, eventually, that bill also needs to get paid, by amortizing over inference pricing.

    • fercircularbuf an hour ago

      It sounds like Uber

    • cyanydeez 3 hours ago

      oh go one more step: the reality is these models are more expensive than hiring an intern to do the same thing.

      Unless you got a trove of self starters with a lot of money, they arn't cost efficient.

  • jeanlucas 3 hours ago

    So convenient a future AI dev will cost as much as a human developer, pure coincidence

    • magicalhippo 3 hours ago

      Similar to housing in attractive places no? Price is related to what people can afford, rather than what the actual house/unit is worth in terms of material and labor.

      • maratc 3 hours ago

        Except for "material and labor" there is an additional cost of land.

        That is already "related to what people can afford", in attractive places or not.

    • thisisit 3 hours ago

      This is just a ball park number. Its like the AI dev will cost some what less than a human developer. Enough for AI providers to have huge margins and allow for CTOs to say - "I replaced all devs and saved so much money".

      • mattmanser 2 hours ago

        And then the CTOs will learn the truth that most product managers are just glorified admin assistants who couldn't write a spec for tic-tac-toe.

        And that to write the business analysis that the AI can actually turn into working code requires senior developers.

    • jgalt212 3 hours ago

      It's sort of like how high cost funds net of fees offer the same returns as low cost ETFs net of fees.

      • oblio 2 hours ago

        I'm not sure I understand this one.

    • insane_dreamer 2 hours ago

      The full cost of an employee is a fair bit more than just their base salary.

      • SoftTalker 2 hours ago

        Wait until the taxes on AI come, to pay for all the unemployment they are creating.

    • eli_gottlieb 3 hours ago

      I mean, hey, rather than use AI at work, I'll just take the extra $100k/year and be just that good.

    • naiv 3 hours ago

      But he works 24/7 at then maybe 20x output

      • nicce 3 hours ago

        Why we can’t keep the current jobs but accelerate humanity development by more than 20x with AI? Everyone is just talking about replacement, without the mention of potential.

        • dsign 2 hours ago

          There is great potential. But if humanity can't share a loaf of bread with the needy, nor stop the blood irrigation of the cracked, dusty soil of cursed Canaan[^1], what are the odds that that acceleration will benefit anybody?

          ([^1]: They have been at it for a long while now, a few thousand years?)

        • hx8 3 hours ago

          I don't think there is market demand for 20x more software produced each year. I suspect AI will actively _decrease_ demand for several major sectors of software development, as LLMs take over roles that were handled previously be independent applications.

          • nicce 2 hours ago

            I think it depends on how you view it. With 20x productivity you can start to minimize your supply chain and reduce costs in the long term. No more cloud usage in foreign countries, since you might be able to make the necessary software by yourself. You can start dropping expensive SaaS and you make enough for your own internal needs. Heck, I would just increase the demand because there is so much potential. Consultants and third-party software houses will likely decrease. unless they are even more efficient.

            LLMs act as interfaces to applications which you are capable to build yourself and run your own hardware, since you are much more capable.

            • LtWorf 7 minutes ago

              > and third-party software houses will likely decrease. unless they are even more efficient.

              It's going to be really fun for us people who love to write unicode symbols into numeric input boxes and such funny things.

          • taftster 3 hours ago

            Right. This is insightful. It's not so much about replacing developers, per se. It's about replacing applications that developers were previously employed to create/maintain.

            We talk about AI replacing a workforce, but your observation that it's more about replacing applications is spot on. That's definitely going to be the trend, especially for traditional back-office processing.

            • hx8 3 hours ago

              I'm specifically commenting on the double whammy of increased software developer productivity and decreased demand for independent applications.

          • oblio 2 hours ago
        • buzzerbetrayed 3 hours ago

          I'm not entirely sure I understand exactly what you're suggesting. But I'd imagine it's because a company that doesn't have to pay people will out compete the company that does.

          There could be some scenario where it is advantageous to have humans working with AI. But if that isn't how reality plays out then companies won't be able to afford to pay people.

      • SpaceNoodled 2 hours ago

        An LLM by itself has 0% output.

        An engineer shackled to an LLM has about 80% output.

      • croes 3 hours ago

        And is neither reliable nor liable.

      • crinkly 3 hours ago

        Like fuck that's happening. Human dev will spend entire day gaslighting an electronic moron rather than an an outsourced team.

        The only argument we have so far is wild extrapolation and faith. The burden of proof is on the proclaimer.

  • IshKebab 3 hours ago

    > Both effects together will push costs at the top level to $100k a year. Spending that magnitude of money on software is not without precedent, chip design licenses from Cadence or Synopsys are already $250k a year.

    For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.

    In any case, the reason chip design companies pay EDA vendors these enormous sums is because there isn't really an alternative. Verilator exists, but ... there's a reason commercial EDA vendors can basically ignore it.

    That isn't true for AI. Why on earth would you pay more than a full time developer salary on AI tokens when you could just hire another person instead. I definitely think AI improves productivity but it's like 10-20% maybe, not 100%.

    • cornstalks 3 hours ago

      > For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.

      That actually probably is per developer. You might be able to reassign a seat to another developer, but that's still arguably one seat per user.

      • IshKebab 2 hours ago

        I don't think so. The company I worked for until recently had around 200 licenses for our main simulator - at that rate it would cost $50m/year, but our total run rate (including all salaries and EDA licenses) was only about $15m/year.

        They're super opaque about pricing but I don't think it's that expensive. Apparently formal tools are way more expensive than simulation though (which makes sense), so we only had a handful of those licenses.

        I managed to find a real price that someone posted:

        https://www.reddit.com/r/FPGA/comments/c8z1x9/modelsim_and_q...

        > Questa Prime licenses for ~$30000 USD.

        That sounds way more realistic, and I guess you get decent volume discounts if you want 200 licenses.

  • jjcm 3 hours ago

    At some point the value of remote inference becomes more expensive than just buying the hardware locally, even for server-grade components. A GB200 is ~$60-70k and will run for multiple years. If inference costs continue to scale, at some point it just makes more sense to run even the largest models locally.

    OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.

    • tempest_ 3 hours ago

      Maybe 60-70k nominally but where can you get one that isnt in its entire rack configuration

      • jjcm an hour ago

        Fair, but even if you budget an additional $30k for a self-contained small-unit order, you've brought yourself to the equivalent proposed spend of 1 year of inference.

        At $100k/yr/eng inference spend, your options widen greatly is my point.

  • boltzmann_ 3 hours ago

    Author just choose a nice number and give no argument to it

    • mromanuk 3 hours ago

      Probably chose $100k/yr as an example of the salary of a developer.

  • paulhodge 27 minutes ago

    Fyi Kilocode has low credibility. They’ve been blasting AI subreddits with lots of clickbaity ads and posts, sometimes claiming things that are outright false.

    As far as spend per dev- I can’t even manage to use up the limits on my $100 Claude plan. It gets everything done and I run out of things to ask it. Considering that the models will get better and cheaper over time, I’m personally not seeing a future where I will need to spend that much more than $100 a month.

  • ankit219 an hour ago

    No justification for a $100k number. For $100k a year or about $8k a month, you will end up using 1B tokens a month (that too a generous blended $8 per million input/output tokens including caching while the number is lower than that). Per person.

    I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.

    (Speculating) Many companies woudl want to move away from claude but cant because users love the models.

  • whateveracct 3 hours ago

    This is the goal. Create a reason to shave a bunch off the top of SWE salaries. Pay them less because you "have" to pay for AI tools. All so they don't have to do easy rote work - you still get them to do the high level stuff humans must do.

  • sovietmudkipz 3 hours ago

    What is everyone’s favorite parallel agent stack?

    I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?

  • typs 3 hours ago

    This makes sense as long as people continue to value using the best models (which may or may not continue for lots of reasons).

    I’m not entirely sure that AI companies like Cursor necessarily miscalculated though. It’s noted that the actual strategies the blog advertises are things used by tools like Cursor (via auto mode). The important thing for them is that they are able to successfully push users towards their auto mode and use more usage data to improve their routing and frontier models don’t continue to be so much better AND so expensive that users continue to demand them. I wouldn’t hate that bet if I were Cursor personally.

  • thebigspacefuck 2 hours ago

    Never heard of kilo before, pretty sure this post is just an ad

    • lvl155 an hour ago

      I’ve not heard either but now I am getting ads from them. I guess that was their plan.

  • dcre 2 hours ago

    "The bet was that by the following year, the application inference would cost 90% less, creating a $160 gross profit (+80% gross margins). But this didn't happen, instead of declining the application inference costs actually grew!"

    This doesn't make any sense to me. Why would Cursor et al expect they could pocket the difference if inference costs went down? There's no stickiness to the product; they would compete down to zero margins regardless. If anything, higher total spend is better for them because it's more to skim off of.

  • AstroBen 2 hours ago

    > charge users $200 while providing at least $400 worth of tokens, essentially operating at -100% gross margin.

    Why are we assuming everyone uses the full $400? Margins aren't calculated based on only the heaviest users..

    And where are they pulling the 100k number from?

  • zahlman 3 hours ago

    > The difference in pay between inference and training engineers is because of their relative impact. You train a model with a handful of people while it is used by millions of people.

    Okay, but when did that ever create a comparable effect for any other kind of software dev in history?

  • mockingloris 3 hours ago

    @g42gregory This would mean that for the certain devs, an unfair advantage would be owning a decent on-prem rig running a fine tuned and trained model that has been optimized for specific use case for the user.

    A fellow HN user's post I engaged with recently talked about low hanging fruits.

    What that means for me and where I'm from is some sort of devloan initiative by NGOs and Government Grants, where devs have access to these models/hardware and repay back with some form of value.

    What that is, I haven't thought that far. Thoughts?

    └── Dey well

  • hx8 3 hours ago

    How many parallel agents can one developer actively keep up with? Right now, my number seems to be about 3-5 tasks, if I review the output.

    If we assume 5 tasks, each running $400/mo of tokens, we reach an annual bill of $24,000. We would have to see a 4x increase in token cost to reach the $100,000/yr mark. This seems possible with increased context sizes. Additionally, we might see additional context sizes lead to longer running more complicated tasks which would increase my number of parallel tasks.

  • StratusBen 3 hours ago

    I started https://www.vantage.sh/ - a cloud cost platform that tracks Infra & AI spend.

    The $100k/dev/year figure feels like sticker shock math more than reality. Yes, AI bills are growing fast - but most teams I see are still spending substantially lower annually, and that's before applying even basic optimizations like prompt caching, model routing, or splitting work across models.

    The real story is the AWS playbook all over again: vendors keep dropping unit costs, customers keep increasing consumption faster than prices fall, and in the end the bills still grow. If you’re not measuring it daily, the "marginal cost is trending down" narrative is meaningless - you’ll still get blindsided by scale.

    I'm biased but the winners will be the ones who treat AI like any other cloud resource: ruthlessly measured, budgeted, and tuned.

    • oblio an hour ago

      Ironically, except for Graviton (and that's also plateauing; plus it requires that you're able to use it), basically no old AWS service has been reduced in cost since 2019. EC2, S3, etc.

      • StratusBen 18 minutes ago

        Look at the early days of AWS vs recent years. The fact that AWS services have been basically flat since 2019 in a high-inflation environment is actually pretty dang good on a relative basis.

    • nunez 2 hours ago

      Dude, thank you for this service. I use ec2instance.info and vantage.sh for Azure all of the time.

  • 6thbit 3 hours ago

    An interesting metric is when token bills per dev exceed the cost of hiring a new dev. But also, if paying another dev's worth in tokens getting you further than 2 devs without using AI will you still pay it?

    I wonder how the economics will play out, especially when you add in all the different geographic locations for remote devs and their cost.

    • jjmarr 2 hours ago

      They already do for anything not in Western Europe/North America.

  • austin-cheney 3 hours ago

    There is nothing new here and the math on this is pretty simple. AI greatly increases automation, but its output is not trusted. All research so far shows AI assisted development is a zero sum game regarding time and productivity because time saved by AI is reinvested back into more thorough code reviews than were otherwise required.

    Ultimately, this will become a people problem more than a financial problem. People that lack the confidence to code without AI will cost less to hire and dramatically more to employ, no differently than people reliant on large frameworks. All historical data indicates employers will happily eat that extra cost if it means candidates are easier to identify and select because hiring and firing remain among the most serious considerations for technology selection.

    Candidates, currently thought of 10x, that are productive without these helpers will continue to remain no more or less elusive than they are now. That means employers must choose between higher risks with higher selection costs for the potentially higher return on investment knowing that ROE is only realized if these high performance candidates are allowed to execute with high productivity. Employers will gladly eat increased expenses if they can qualify lower risks to candidate selection.

    • jjmarr 3 hours ago

      You're assuming it's a binary between coding with or without AI.

      In my experience, a 10x developer that can code without AI becomes a 100x developer because the menial tasks they'd delegate to less-skilled employees while setting technical direction can now be delegated to an AI instead.

      If your only skill is writing boilerplate in a framework, you won't be employed to do that with AI. You will not have a job at all and the 100xer will take your salary.

      • austin-cheney an hour ago

        Those are some strange guesses.

      • oblio an hour ago

        The thing is, the 100x can't be in all the verticals, speak all the languages, be a warm body required by legislation, etc, etc. Plus that 100x just became a 10x (x 10x) bus factor.

        This will reduce demand for devs but it's super likely that after a delay, demand for software development will go even higher.

        The only thing I don't know is how that demand for software development will look like. It could be included in DevOps work or IT Project Management work or whatever.

        I guess we'll see in a few years.

  • zeld4 3 hours ago

    give me $50k raise and I need only $10k/yr.

    seriously, I don't see the AI outcome worth that much yet.

    On the current level of ai tools, the attention you need to manage 10+ async tasks are over limit for most human.

    In 10 years maybe, but $100k probably worths much less by then.

  • daft_pink 2 hours ago

    If you are throttled at $200 per month, you should probably just pay another $200 a month for a second subscription, because the value is there. That’s my take from using Claude.

  • mockingloris 3 hours ago

    Doesn't this segue? [We'll need a universal basic income (UBI) in an AI-driven world] https://news.ycombinator.com/item?id=44866518#44866713

    └── Yarn me

    • throaway920181 3 hours ago

      What does "Dey well" and "Yarn me" mean at the bottom of your comments?

      • mockingloris 3 hours ago

        They are Nigerian Pidgin English words:

          - Dey well: Be well
          - Yarn me: Lets talk
        
        └── Dey well/Be well
        • SoftTalker 2 hours ago

          Please don't use signature lines in HN comments.

          Edit: Would have sworn that this was in the guidelines but I don't see it just now.

        • nmeofthestate 2 hours ago

          Ok, don't do that.

  • jvanderbot 2 hours ago

    It's not hard to imagine a future where I license their network for inference on my own machine, and they can focus on training.

    • oblio an hour ago

      The problem with this is that the temptation to do more us too big. Nobody wants to be a "dumb pipe", a utility.

  • mwkaufma 2 hours ago

    Title modded without merit.

  • lvl155 an hour ago

    What is Kilocode?

    • tirumario 30 minutes ago

      open-source AI coding agent extension for VS Code

  • masterj 3 hours ago

    Why even stop at 100k/yr? Surely the graph is up-and-to-the-right forever? https://xkcd.com/605/

  • chiffre01 3 hours ago

    Honestly we're in a race to the bottom right now with AI.

    It's only going to get cheaper to train and run these models as time goes on. Modes running on single consumer grade PCs today were almost unthinkable four years ago.

  • gedy 3 hours ago

    Maybe this is why companies are hyping the "replacing devs" angle, as "wow see we're still cheaper than that engineer!" is going to be only viable pitch.

    • woeirua 2 hours ago

      Its not viable yet, and at current token spend rates, it's likely not going to be viable for several years.

  • turnsout 3 hours ago

    Tools like Cursor rely on the gym model—plenty of people will pay for a tier that they don't fully utilize. The heavy users are subsidized by the majority who may go months without using the tool.

  • AtNightWeCode 3 hours ago

    Don't know about the numbers but is this not the cloud all over again. Promises about cheap storage and you don't maintain it developed into maintenance hell and storage costs steadily rising instead of dropping.

  • yieldcrv 3 hours ago

    I think what this model actually showed is a cyclical aspect of tokens as a commodity

    It is based on supply and demand of GPUs, the demand currently outstrips supply, while the 'frontier models' are also much more computationally efficient than last year's models in some ways - using far fewer computational resources to do the same thing

    so now that everyone wants to use frontier models in "agentic mode" with reasoning eating up a ton more tokens before sticking with a result, the demand is outpacing supply but it is possible it equalizes yet again, before the cycle begins anew

  • throwanem 3 hours ago

    "Tokenomics."

    • TranquilMarmot 3 hours ago

      I studied this in college but I think we had a different idea of what "toke" means

      • throwanem 2 hours ago

        Eh. The implicit claim is the same as everywhere, namely that that $100k/dev/year of AI opex is an enormous bargain over going up two orders of magnitude in capex to pay for the same output from a year's worth of a team. But now that Section 174's back and clearly set to stay for a good long while, it makes sense to see this line of discourse come along.

  • senko 3 hours ago

    tl;dr

    > This is driven by two developments: more parallel agents and more work done before human feedback is needed.