I don't get the widespread hatred of Gas Town. If you read Steve's writeup, it's clear that this is a big fun experiment.
It pushes and crosses boundaries, it is a mixture of technology and art, it is provocative. It takes stochastic neural nets and mashes them together in bizarre ways to see if anything coherent comes out the other end.
And the reaction is a bunch of Very Serious Engineers who cross their arms and harumph at it for being Unprofessional and Not Serious and Not Ready For Production.
I often feel like our industry has lost its sense of whimsy and experimentation from the early days, when people tried weird things to see what would work and what wouldn't.
Maybe it's because we also have suits telling us we have to use neural nets everywhere for everything Or Else, and there's no sense of fun in that.
Maybe it's the natural consequence of large-scale professionalization, and stock option plans and RSUs and levels and sprints and PMs, that today's gray hoodie is just the updated gray suit of the past but with no less dryness of imagination.
> If you read Steve's writeup, it's clear that this is a big fun experiment:
So, Steve has the big scary "YOU WILL DIE" statements in there, but he also has this:
> I went ahead and built what’s next. First I predicted it, back in March, in Revenge of the Junior Developer. I predicted someone would lash the Claude Code camels together into chariots, and that is exactly what I’ve done with Gas Town. I’ve tamed them to where you can use 20–30 at once, productively, on a sustained basis.
"What's next"? Not an experiment. A prediction about how we'll work. The word "productively"? "Productively" is not just "a big fun experiment." "Productively" is what you say when you've got something people should use.
Even when he's giving the warnings, he says things like "If you have any doubt whatsoever, then you can’t use it" implying that it's ready for the right sort of person to use, or "Working effectively in Gas Town involves committing to vibe coding.", implying that working effectively with it is possible.
Every day, I go on Hacker News, and see the responses to a post where someone has an inconsistent message in their blog post like this.
If you say two different and contradictory things, and do not very explicitly resolve them, and say which one is the final answer, you will get blamed for both things you said, and you will not be entitled to complain about it, because you did it to yourself.
I agree, I’m one of the Very Serious Engineers and I liked Steve’s post when I thought it was sort of tongue in cheek but was horrified to come to the HN comments and LinkedIn comments proclaiming Gastown as the future of engineering. There absolutely is a large contingent of engineers who believe this, and it has a real world impact on my job if my bosses think you can just throw a dozen AI agents at our product roadmap and get better productivity than an engineer. This is not whimsical to me, I’m getting burnt out trying to navigate the absurd expectations of investors and executives with the real world engineering concerns of my day to day job.
> If you say two different and contradictory things, and do not very explicitly resolve them, and say which one is the final answer, you will get blamed for both things you said, and you will not be entitled to complain about it, because you did it to yourself.
Our industry is held back in so many ways by engineers clinging to black-and-white thinking.
Sometimes there isn’t a “final” answer, and sometimes there is no “right” answer. Sometimes two conflicting ideas can be “true” and “correct” simultaneously.
It would do us a world of good to get comfortable with that.
My background is in philosophy, though I am a programmer, for what it is worth. I think what I'm saying is subtly different from "black and white thinking".
The final answer can be "each of these positions has merit, and I don't know which is right." It can be "I don't understand what's going on here." It can be "I've raised some questions."
The final answer is not "the final answer that ends the discussion." Rather, it is the final statement about your current position. It can be revised in the future. It does not have to be definitive.
The problem comes when the same article says two contradictory things and does not even try to reconcile them, or try to give a careful reader an accurate picture.
And I think that the sustained argument over how to read that article shows that Yegge did a bad job of writing to make a clear point, albeit a good job of creatring hype.
> Keep in mind that Steve has LLMs write his posts on that blog.
Ok, I can accept that, it's a choice.
> Things said there may not reflect his actual thoughts on the subject(s) at hand.
Nope, you don't get to have it both ways. LLMs are just tools, there is always a human behind them and that human is responsible for what they let the LLM do/say/post/etc.
We have seen the hell that comes from playing the "They said that but they don't mean it" or "It's just a joke" (re: Trump), I'm not a fan of whitewashing with LLMs.
This is not an anti or pro Gas Town comment, just a comment on giving people a pass because they used an LLM.
> And the reaction is a bunch of Very Serious Engineers who cross their arms and harumph at it for being Unprofessional and Not Serious and Not Ready For Production.
> OK! That was like half a dozen great reasons not to use Gas Town. If I haven’t got rid of you yet, then I guess you’re one of the crazy ones. Hang on. This will be a long and complex ride. I’ve tried to go super top-down and simplify as much as I can, but it’s a bit of a textbook.
A sense of art and whimsy and experimentation is less compelling when it's jumping on the hypest of hype-trains. I'd love to see more folk art in programming, but Gas Town is closer to fucking Beeple than anything charming.
I like gastown's moxie, it's fun, and seems kind of tongue in cheek.
What I don't like is people me-tooing gastown as some breakthrough in orchestration. I also don't like how people are doing the same thing for ralph.
In truth, what I hate is people dogpiling thoughtlessly on things, and only caring about what social media has told them to care about. This tendency makes me get warm tingles at the thought of the end of the world. Agent smith was right about humanity.
It’s not the whimsy. It’s that the whimsy is laced with casual disdain, a touch too much “let me buy you a stick of gum and show you how to chew it”, a frustrated tenor never stated but dog whistled “you dumb fucks”. A soft sharp stink of someone very smart shoving that fact in your face as they evangelise “the obvious truth” you’re too stupid to see.
And maybe he’s even right. But the reaction is to the flavour of chip on the shoulder delivery mixed into an otherwise fun piece.
Perhaps it was his followup post about how people are lining up to throw millions of VC dollars at his bizarre whimsical fever dream that disturbs people? I’m all for arts funding, but…
It isn't though. It crossed the chasm when Steve (who I would like to think is somewhat comfortable after writing a book, holding a director level position at several startups) decided to endorse an outright crypto pump and dump.
When he decided to monetize the eyeballs on the project instead of anything related to the engineering. Which, of course, Steve isn't smart enough to understand (in his own words) and he recommends you not buy but he still makes a tidy profit from it.
Its a memecoin now... that has a software project attached to it. Anything related to engineering died the day he failed to disavow the crypto BS and instead starting shrilling it.
What happened to engineers not calling out BS as BS.
I work in a typical web app company which does accounting/banking etc.
A couple of days ago I was sitting in a meeting of 10-15 devs, discussing our AI agents. People were raising issues and brainstorming ways around the problems with AI. How to make the AI better.
Our devs were occupied doing AI things, not accounting/banking things.
If the time savings were as promised, we should have been 3 devs (with the remaining devs replaced by 7-10 AI agents) discussing accounting/banking.
If Gas Town succeeds, it will just be the next toy we play with instead of doing our jobs.
>I often feel like our industry has lost its sense of whimsy and experimentation from the early days, when people tried weird things to see what would work and what wouldn't.
Remember the days when people experimented with and talked about things that werent LLMs?
I used to go to a lot of industry events and I really enjoyed hearing about the diversity of different things people worked on both as a hobby and at work.
Now it's all LLMs all the time and it's so goddamn tedious.
> I used to go to a lot of industry events and I really enjoyed hearing about the diversity of different things people worked on both as a hobby and at work.
I go to tech meetups regularly. The speed at which any conversation end up on the topic of AI is extremely grating to me. No more discussions about interesting problems and creative solutions that people come up with. It's all just AI, agentic, vibe code.
At what point are we going to see the loss of practical skills if people keep on relying on LLMs for all their thinking?
> No more discussions about interesting problems and creative solutions that people come up with. It's all just AI, agentic, vibe code.
And then you give in and ask what they're building with AI, that activation energy finally available to build the side project they wouldn't have built otherwise.
It's like the entire software industry is gambling on "LLMs will get better faster than human skills will decay, so they will be good enough to clean up their own slop before things really fall apart".
I can't even say that's definitely a losing bet-- it could very well happen-- but boy does it seem risky to go all-in on it.
Yeah it's unbelievably tiresome, endless complaints from people pushing up their glasses complaining, ITS A PROJECT ABOUT POLECATS CALLED GAS TOWN MADE FOR FUN, read that again, either admire it and enjoy it or quit the umpteenth complaint about vibecoding.
>Yegge is leaning into the true definition of vibecoding with this project: “It is 100% vibecoded. I’ve never seen the code, and I never care to.”
I don't get it. Even with a very good understanding of what type of work I am doing and a prebuilt knowledge of the code, even for very well specced problem. Claude code etc. just plain fail or use sloppy code. How do these industry figures claim they see no part of a 225K+ line of code and promise that it works?
It feels like we're getting into an era where oceans of code which nobody understands is going to be produced, which we hope AGI swoops in and cleans?
Where is the "super upvote button" when you need it?
YES! I have been playing with vibe coding tools since they came out. "Playing" because only on rare occasions have I created something that is good enough to commit/keep/use. I keep playing with them because, well I have a subscription, but also so I don't fall into the fuddy-duddy camp of "all AI is bad" and can legitimately speak on the value, or lack thereof, of these tools.
Claude Code is super cool, no doubt, and with _highly targeted_ and _well planned_ tasks it can produce valuable output. Period. But, every attempt at full-vibe-coding I've done has gotten hung up at some point and I have to step in an manually fix this. My experience is often:
1. First Prompt: Oh wow, this is amazing, this is the future
2. Second Prompt: Ok, let me just add/tweak a few things
10. 10th prompt: Ugh, everytime I fix one thing, something else breaks
I'm not sure at all what I'm doing "wrong". Flogging the agents along doesn't not work well for me or maybe I am just having trouble letting go of the control and I'm not flogging enough?
But the bottom line is I am generally shocked that something like Gas Town was able to be vibe-coded. Maybe it's a case of the LLM overstating what it's accomplished (typical) and if you look under the hood it's doing 1% of what it says it is but I really don't know. Clearly it's doing something, but then I sit over here trying to build a simple agent with some MCPs hooked up to it using a LLM agent framework and it's falling over after a few iterations.
This is also my experience. Everything I’ve ever tried to vibe code has ended up with off-by-one errors, logic errors, repeated instances of incorrect assumptions etc. Sometimes they appear to work at first, but, still, they have errors like this in them that are often immediately obvious on code review and would definitely show up in anything more than very light real world use.
They _can_ usually be manually tidied and fixed, with varying amounts of effort (small project = easy fixes, on a par with regular code review, large project = “this would’ve been easier to write myself...”)
I guess Gas Town’s multiple layers of supervisory entities are meant to replace this manual tidying and fixing, but, well, really?
I don’t understand how people are supposedly having so much success with things like this. Am I just holding it wrong?
If they are having real success, why are there no open source projects that are AI developed and maintained that are _not_ just systems for managing AI? (Or are there and I just haven’t seen them?...)
Like, why are you manually tidying and fixing things? The first pass is never perfect. Maybe the functionality is there but the code is spaghetti or untestable. Have another agent review and feed that review back into the original agent that built out the code. Keep iterating like that.
My usual workflow:
Agent 1 - Build feature
Agent 2 - Review these parts of the code, see if you find any code smells, bad architecture, scalability problems that will pop up, untestable code, or anything else falling outside of modern coding best practices
Agent 1 - Here's the code review for your changes, please fix
Agent 2 - Do another review
Agent 1 - Here's the code review for your changes, please fix
Repeat until testable, maybe throw in a full codebase review instead of just the feature.
Agent 1 - Code looks good, start writing unit tests, go step by step, let's walk through everything, etc. etc. etc.
Then update your .md directive files to tell the agents how to test.
Voila, you have an llm agent loop that will write decent code and get features out the door.
I'm not trying to be rude here at all but are you manually verifying any of that? When I've had LLMs write unit tests they are quick to write pointless unit tests that seem impressive "2123/2123 tests passed!" but in reality it's testing mostly nothing of value. And that's when they aren't bypassing commit checks or just commenting out tests or saying "I fixed it all" while multiple tests are broken.
Maybe I need a stricter harness but I feel like I did try that and still didn't get good results.
I worry about people who use this approach where they never look at the code. Vibe-coding IS possible but you have to spent a lot of time in plan mode and be very clear about architecture and the abstractions you want it to use.
I've written two seperate moderately-sized codebases using agentic techniques (oftentimes being very lazy and just blanket approving changes), and I don't encounter logic or off-by-one errors very often if at all. It seems quite good at the basic task of writing working code, but it sucks at architecture and you need occasional code review rounds to keep the codebase tidy and readable. My code reviews with the AI are like 50% DRY and separating concerns
In a recent Yegge interview, he mentions that he often throws away the entire codebase and starts from scratch rather than try to get LLMs to refactor their code for architecture.
This has been my best way to learn, put one agent on a big task, let it learn things about the problem and any gotchas, and then have it take notes, do it again until I'm happy with the result, if in the middle I think there's two choices that have merit I ask for a subagent to go explore that solution in another worktree and to make all its own decisions, then I compare. I also personally learn a lot about the problem space during the process so my prompts and choices on us sequent iterations use the right language I need to use.
Honestly, in my experience so far, if an LLM starts going down a bad path, it’s better just to roll back to a point where things were OK and throw away whatever it was doing, rather than trying to course correct.
The secret is that it doesn't work. None of these people have built real software that anyone outside their bubble uses. They are not replacing anyone, they are just off in their own corner building sand castles.
Just because they're one-off tools that only one person uses doesn't mean it's not "real software". I'm actually pretty excited about the fact that it's now feasible for me to replace all my BloatedShittyCommercialApps that I only use 5% of with vibe-coded bespoke tools that only do the important 5%, just for me to use. If that makes it a "sand castle" to you, fine, but this is real software and I'm seeing real benefit here.
I have 100% vibecoded software that I now use instead of commercial implementation that cost me almost 200 usd a month (tool for radiology dictation and report generation).
Not saying it's right, but boy do I have stories about the code used in <insert any medical profession> healthcare applications. Not sure how "vibecoded" programming lines of code is any worse.
I built a clinical pharmacist "pocket calculator" kinda app for a specific function. It was like $.60 in claude credits I think. Built with flutter + dart. It's a simple tool suite and I've only built out one of the tools so far.
Now to be fair, that $.60 session was just the coding. I did some brainstorming in chatgpt and generated good markdown files (claude.md, gemini.md, agents.md) before I started.
And yet I notice you haven't mentioned publishing it and undercutting the market. You could make a lot of money out-competing the existing option if what you produced was production-grade software. I'm guessing the actual case is that you only needed a small subset of the functionality of the paid software, and the LLM stitched together a rough unpolished proof-of-concept that handled your exact specific use case. Which is still great for you! But it's not the future of coding. The world still needs real engineers to make real software that is suitable for the needs of many, and this doesn't replace that.
>The world still needs real engineers to make real software that is suitable for the needs of many, and this doesn't replace that.
I think azan_ is demonstrating that shipping products 'suitable for the needs of many' is going to have to compete with 'slopping software for the needs of one'.
The only people who think that are programmers already or programmer-adjacent. Your mother is never going to be able to use a Gas Town-like workflow to make software for her own needs, nor is she even going to want to spend her weekends trying. These tools still require a baseline minimum of technical knowledge, and a real time investment, and also a real money investment the way some people are using them. Moreover, most real software has interoperability needs. A world where everyone makes their own Twitter or WhatsApp is a world where nobody can talk to anyone else.
There is a small subset of the population who is now enabled to make proof-of-concepts with less effort than before. This is no way diminishes the need for delivering performant, secure, interoperable software at scale to serve humanity's needs.
> Your mother is never going to be able to use a Gas Town-like workflow to make software for her own needs, nor is she even going to want to spend her weekends trying.
I'm going on a tangent here but what's with this constant deprecation of mothers to make a point? There are many people here whose mothers can develop software.
People's mothers are statistically unlikely to be programmers, obviously. My own grandmother was a programmer, but it conveys the idea in two words rather than making up a clunky phrase to describe the exact degree of non-techiness of the hypothetical person.
An interface isn't enough. Even if you never look at the code, the results are going to be influenced significantly by having the vocabulary to accurately describe what you want. The less sufficient your technical vocabulary, the more ambiguous your prompts will be and the less likely it is that the Polecats will be able to deliver anything resembling your unspoken imagination. To say nothing of being able to guide the lost critters when they run into problems.
It sounds like a medical device, in which case marketing it may require FDA approval or notification. Whereas vibe-coding a one-off tool for yourself might still require validation but you're the one taking the risk and accepting liability for it.
I think the thing you're missing is that the tool doesn't need to be marketed because someone else could ask their LLM to make them a similar tool but fitting their use case.
If they're using a 100% vibe-coded tool that they've never read the code of to replace something that would require government approval, for use on real-world patients, they're probably committing medical malpractice as we speak. Let us pray that is not the case.
It doesn't matter if the tool "needs" to be marketed. There is a market of paying customers. If other people are paying $200/month, both your and their lives would be improved significantly by you offering a $100/month replacement software. For all the talk about LLMs replacing the need for packaged software, people are still paying for packaged software, and while they are, you could be making large amounts of money while saving them money. If you're altruistic, you could even release it as FOSS and save a lot of people $200/mo. Unless, of course, your vibe-coded app isn't actually remotely capable of replacing the software in question.
> If you're altruistic, you could even release it as FOSS and save a lot of people $200/mo. Unless, of course, your vibe-coded app isn't actually remotely capable of replacing the software in question.
The experiment is fine if you treat it as an experiment. The problem is the state of the industry where it's treated as serious rather than silly — possibly even by Steve himself.
The 'experiment' isn't the issue. The problem is the entire culture around it. LLM tools are being shoved into everything, LLMs are soaking up trillions in investment, engineers are being told over and over that everything has changed and this garbage is making us obsolete, software quality is decreasing where wide LLM usage is being mandated (eg. Microsoft). Gas Town does not give the vibe of a neutral experiment but rather looks be a full-on delve into AI psychosis with the way Yegge describes it.
To be clear, I think LLMs are useful technology. But the degree of increasing insanity surrounding it is putting people off for obvious reasons.
I share the frustration with the hype machine. I just don't think a guy with a blog is an appropriate target for our frustration with corporate hype culture.
> Ok but this entire idea is very new. Its not an honest criticism to say no one has tried the new idea when they are actively doing it.
Not really new. Back in the day companies used to outsource their stuff to the lowest bidder agencies in proverbial Elbonia, never looked at the code, and then panickedly hired another agency when the things visibly were not what was ordered. Case studies are abound on TheDailyWTF for the last two decades.
Doing the same with agents will give you the same disastrous results for comparably the same money, just faster. Oh and you can't sue them, really.
Fair point on the Elbonia comparison. But we can't sue the SQLite maintainers either, and yet we trust them with basically everything. The reason is that open source developed its own trust mechanisms over decades. We don't have anything close to that with LLMs today. What those mechanisms might look like is an open question that is getting more important as AI generated code becomes more common.
> saying that Yegge hasn't built real software is just not true
I mean... I feel like it's somewhat telling that his wikipedia page spends half its words on his abrasive communication style, and the only thing approximating a product mentioned is a (lost) Rails-on-Javascript port, and 25 years spent developing a MUD on the side.
Certainly one doesn't get to stay a staff-level engineer at Google without writing code - but in terms of real, shipping software, Yegge's resume is a bit light for his tenure in BigTech
> How do these industry figures claim they see no part of a 225K+ line of code and promise that it works?
The only promise is that you will get your face ripped off.
“WARNING DANGER CAUTION
- GET THE F** OUT - YOU WILL DIE
[…] Gas Town is an industrialized coding factory manned by superintelligent robot chimps, and when they feel like it, they can wreck your shit in an instant. They will wreck the other chimps, the workstations, the customers. They’ll rip your face off if you aren’t already an experienced chimp-wrangler.”
Yeah, I'm at that stage 6 or 7. I'm using multiple agents across multiple terminal windows. I'm not even coding any more, literally I haven't written code in like 2-4 months now beyond changing a config value or something.
But I still haven't actually used Gastown. It looks cool. I think it probably works, at least somewhat. I get it. But it's just not what I need right now. It's bleeding edge and experimental.
The main thing holding me back from even tinkering with it is the cost. Otherwise I'd probably play with it a little, but it's not something I'd expect to use and ship production code right now. And I ship a ton of production code with claude.
There is an incentive for dishonesty about what AI can and cannot do.
People from OpenAI was saying that GPT2 had achieved AGI. There is a very clear incentive for that statement to be made by people who are not using AI for anything productive.
Even as increasingly bombastic claims are made, it is obvious that the best AI cannot one-shot everything if you are an actual user. And the worst ones: was using Gemini yesterday and it wouldn't stop outputting emojis, was using Grok and it refused to give me a code snippet because it claimed its system prompt forbade this...what can you say?
I don't understand why anyone would want to work on a codebase they didn't understand either. What happens when something goes wrong?
Again though, there is massive financial incentive to make these claims, and some other people will fall along with that because it is good for their career, etc. I have seen this in my own company where senior people are shoehorning this stuff in that they clearly do not actually use or understand (to be clear, this is engineering not management...these are people who definitely should understand but do not).
Great tool, but the 100% vibecoding without looking at the code, for something that you are actually expecting others to use, is a bad idea. Feels more like performance art than actual work. I like jokes, I like coding, room for both but don't confuse the two.
I don't get you guys that are getting such bad results.
Are you guys just trying to one shot stuff? Are you not using agents to iterate on things? Are you not putting agents against each other (have one code, one critique/test the code, and put them in a loop)?
I still look at the code that's produced, I'm not THAT far down the "vibe coding" path that I'm trusting everything being produced, but I get phenomenal results and I don't actually write any code any more.
So like, yeah, first pass the llm will create my feature and there's definitely some poorly written code or duplicate code or other code smells, but then I tell another agent to review and find all these problems. Then that review gets fed back in to the agent that created the feature. Wham, bam, clean code.
I'm not using gastown or ralph wiggum ($$$) but reading the docs, looking over how things work, I can see how it all comes together and should work. They've been built out to automatically do the review + iteration loop that I do.
My feeling has been that 'serious' software engineers aren't particularly suited to use these tools. Most don't have an interest in managing people or are attracted to the deterministic nature of computing. There's a whole psychology you have to learn when managing people, and a lot of those skills transfer to wrangling AI agents from my experience.
You can't be too prescriptive or verbose when interacting with them, you have to interact with them a bit to start understanding how they think and go from there to determine what information or context to provide. Same for understanding their programming styles, they will typically do what they're told but sometimes they go on a tangent.
You need to know how to communicate your expectations. Especially around testing and interaction with existing systems, performance standards, technology, the list goes on.
My (former) coworker who’s heavy into this stuff produced a lot of unmaintainable slop on his way out while singing agents praises to hire-ups. He also felt he was getting a lot of value and had no issues.
I'm sympathetic to this view, but I also wonder if this is the same thing that assembly language programmers said about compilers. What do you mean that you never look at the machine code? What if the compiler does something inefficient?
Compilers are deterministic. People who write them test that they will produce correct results. You can expect the same code to compile to the same assembly.
With LLMs two people giving the exact same prompts can get wildly different results. That is not a tool you can use to blindly ship production code. Imagine if your compiler randomly threw in a syscall to delete your hard drive, or decide to pass credentials in plain text. LLMs can and will do those things.
Even ignoring determinism, with traditional source code you have a durable, human-readable blueprint of what the software is meant to do that other humans can understand and tweak. There's no analogy in the case of "don't read the code" LLM usage. No artifacts exist that humans can read or verify to understand what the software is supposed to be doing.
yeah there is. it's called "documentation" and "requirements". And it's not like you can't go read the code if you want to understand how it works, it's just not necessary to do so while in the process of getting to working software. I truly do not understand why so many people are hung up on this "I need to understand every single line of code in my program" bs I keep reading here, do you also disassemble every library you use and understand it? no, you just use it because it's faster that way.
Not only that but compiler optimizations are generally based on rigorous mathematical proofs, so that even without testing them you can be pretty sure it will generate equivalent assembly. From the little I know of LLM's, I'm pretty sure no one has figured out what mathematical principles LLM's are generating code from so you cant be sure its going to right aside from testing it.
I write JS, and I have never directly observed the IRs or assembly code that my code becomes. Yet I certainly assume that the compiler author has looked at the compiled output in the process of writing a compiler!
For me the difference is prognosis. Gas Town has no ratchet of quality: its fate was written on the wall since the day Steve decided he didn't want to know what the code says: it will grow to a moderate but unimpressive size before it collapses under its own weight. Even if someone tried to prop it up with stable infra, Steve would surely vibe the stable infra out of existence since he does not care about that
or he will find a way to get the AI to create harnesses so it becomes stable. The lack of imagination and willingness to experiment in the HN crowd is AMAZING me and worrying me at the same time. Never thought a group of engineers would be the most conservative and close minded people I could discuss with.
No, it is not what assembly programmers said about compilers, because you can still look at the compiled assembly, and if the compiler makes a mistake, you can observe it and work around it with inline assembly or, if the source is available, improve the compiler. That is not the same as saying "never look at the code".
We can tell you weren't around for the advent of compilers. To be fair, neither was I since the UNIX c compiler came out in '68 and was by far not the first compiler. Modern comilers you can make that claim about, but early compilers weren't.
All compilers have bugs. Any loss of semantics during compilation would be considered a bug. In order to do that, the source and target language need to be structured and specified. I wasn't around in the 60s either, but I think that hasn't changed.
>but I also wonder if this is the same thing that assembly language programmers said about compilers
But as a programmer writing C code, you're still building out the software by hand. You're having to read and write a slightly higher level encoding of the software.
With vibe coding, you don't even deal with encodings. You just prompt and move on.
The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice. It also doesn't involve any "creativity": a compiler is mostly translating a high-level concept into its predefined lower-level components. I don't know exactly what my code compiles to, but I can be pretty certain what the general idea of the assembly is going to be.
With LLMs all bets are off. Is your code going to import leftpad, call leftpad-as-a-service, write its own leftpad implementation, decide that padding isn't needed after all, use a close-enough rightpad instead? Who knows! It's just rolling dice, so have fun finding out!
> The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice.
That's barely true now. Nix comes close, but builds are only bit-for-bit identical if you set a bunch of extra flags that aren't set by default. The most obvious instability is CPU dispatch order (aka modern single computer systems are themselves distributed, racy systems) changes the generated code ever so slightly.
We don't actually care, because if one compiled version of the code uses r8 for a variable but a different compilation uses r9 for that variable, it doesn't matter because we just assume the resulting binary works the same either way. R8 vs r9 are implementation details that don't matter to humans. See where I'm going with this? If the LLM non-deterministically calls the variable fileName one day, and file_name the next time it's given the same prompt, yeah language syntax purists are going to suffer an aneurysm because one of those is clearly "wrong" for the language in use, but it's really more of an implementation detail at this point. Obviously you can't mix them, the generated code has to be consistent in which one it's using, but if compilers get to chose r8 one day and r9 the next, and we're fine with it, why is having the exact variable name that important, as long as it's being used correctly?
I’ve done builds for aerospace products where the only binary difference between two builds of the same source code is the embedded timestamp. And per FAA review guidelines, this deterministic attribute is required, or else something is wrong in the source code or build process.
I certainly don’t use all compilers everywhere, but I don’t think determinism in compilation is especially rare.
No one is promising anything. It's just a giant experiment and the author explicitly tells you not to use it. I appreciate those that try new things, even it it's possibly akin to throwing s** at a wall and seeing what sticks.
Maybe it changes how we code or maybe it doesn't. Vibe coding has definitely helped me write throwaway tools that were useful.
It's an experiment to discover what the limits are. Maybe the experiment fails because it's scoped beyond the limits of LLMs. Maybe we learn something by how far it gets exactly. Maybe it changes as LLMs get better, or maybe it's a flawed approach to pushing the limits of these.
It's unintuitive, but having an llm verification loop like a code reviewer works impeccably well, you can even create dedicated agents to check for specific problem areas like poor error handling.
This isn't about anthropomorphism, it's context engineering. By breaking things into more agents, you get more focused context windows.
I believe gas town has some review process built in, but my comment is more to address the idea that it's all slop.
As an aside, Opus 4.5 is the first model I used that most of the time doesn't produce much slop, in case you haven't tried it. Still produces some slop, but not much human required for building things (it's mostly higher level and architectural things they need guidance on).
Mostly, it's not the model that is lacking but the visibility it has. Often the top level business context for a problem is out of reach, spread across slack, email, internal knowledge and meetings.
Once I digest some of this and give it to Claude, it's mostly smooth sailing but then the context window becomes the problem. Compactions during implementation remove a lot of important info. There should really be a Claude monitoring top level context and passing work to agents. I'm currently figuring out how to orchastrate that nicely with Claude Code MD files.
With respect to architecture, it generally makes sound decisions but I want to tweak it, often trading off simplicity vs. security and scale. These decisions seem very subtle and likely include some personal preferences I haven't written anywhere.
Do you understand at a molecular level how cooking works? Or do you just do some rote actions according to instructions? How do you know if your cooking worked properly without understanding chemistry? Without looking at its components under a microscope?
Simple: you follow the directions, eat the food, and if it tastes good, it worked.
If cooks don't understand physics, chemistry, biology, etc, how do all the cooks in the world ensure they don't get people sick? They follow a set of practices and guidelines developed to ensure the food comes out okay. At scale, businesses develop even more practices (pasteurization, sanitization, refrigeration, etc) to ensure more food safety. None of the people involved understand it at a base level. There are no scientists directly involved in building the machines or day-to-day operations. Yet the entire world's food supply works just fine.
It's all just abstractions. You don't need to see the code for the code to work.
Every gt command runs bd version to verify the minimum beads version requirement. Under high concurrency (17+ agent sessions), this check times out and blocks gt commands from running.
Impact:
With 17+ concurrent sessions each running gt commands:
- Each gt command spawns bd version
- Each bd version spawns 5-7 git processes
- This creates 85-120+ git processes competing for resources
- The 2-second timeout in gt is exceeded
- gt commands fail with "bd version check timed out"
Many comments undermining Gas Town, inadvertently assisting it by revealing failure modes and solutions to those. I'm excited when the discourse evolves around building out these frameworks and ideals. There's many comments on people not understanding something.
I have a feeling its less getting small components to always work but more of systems that are inherently resistant to failure of underlying components. Probably similar to biology, as I understand it components of the human body fail every day, yet the body persists and no one engineers it nor looks at its underlying code, but instead has built-in feedback/control systems similar to Gas Town rudimentary versions.
>while Yegge made lots of his own ornate, zoopmorphic [sic] diagrams of Gas Town’s architecture and workflows, they are unhelpful. Primarily because they were made entirely by Gemini’s Nano Banana. And while Nano Banana is state-of-the-art at making diagrams, generative AI systems are still really shit at making illustrative diagrams. They are very hard to decipher, filled with cluttered details, have arrows pointing the wrong direction, and are often missing key information.
So true! Not to mention the garbled text and inconsistent visuals across the diagrams———an insult to the reader's intelligence. How do people tolerate this visual embodiment of slurred speech?
Yeah I couldn’t figure out if they were just intended as illustrations and gave up trying to read them after a while.
Which is unfortunate as it would have been really helpful to have actually legible architecture diagrams, given the prose was so difficult for me to untangle due to the manic “fun” irreverent style (and it’s fine to write with a distinctive voice to make it more interesting, but still … confusing).
Plus the dozens of new unique names and connections introduced every few paragraphs to try to keep in my head…
I first asked Gemini 3 Pro to condense it to a boring technical overview and it produced a single page outline and Mermaid diagrams that were nearly as unintelligible as the original post so even AI has issues digesting it apparently…
The author's high-value flowcharts vs Steve Yegge's AI art is enough of a case-in-point for how confusing his posts and repos are. However this is a pervasive problem with AI coding tools. Unsurprisingly, the creators of these tools are also the most bullish about agentic coding, so the source code shows the consequences. Even Claude Code itself seems to experience an unusually high number of regressions or undocumented changes for such a widely used product. I had the same problem when recently trying to understand the details of spec-kit or sprites from their docs. Still, I agree that Gas Town is a very instructive example of what the future of AI coding will look like. I'm confident mature orchestration workflows will arrive in 2026.
> Yegge deserves praise for exercising agency and taking a swing at a system like this [...] then running a public tour of his shitty, quarter-built plane while it’s mid-flight
This quote sums it all up for me. It's a crazy project that moves the conversation forward, which is the main value I see in it.
It very well could be a logjam breaker for those who are fortunate enough to get out more than they put into it... but it's very much a gamble, and the odds are against you.
Did you catch the part where it crossed over into a crypto pump-and-dump scam, with Yegge's approval? And then the guy behind the "Ralph" vibe coding thing endorsed the same scam, despite being a former crypto critic who should absolutely know better?
Brought to you by the creators (abstractly) of vibe coding, ralph and yolo mode. Either a conspiracy to deconstruct our view of reality, or just a tendency to invent funny words for novelty
I believe agentic coding could eventually be a paradigm shift, if and only if the agents become self-conscious of design decisions and their implications on the system and its surrounding systems as a whole.
If that doesn’t happen, the entire workflow devolves into specifying system states and behavior in natural language, which is something humans are exceedingly bad at.
Coincidently, that is why we have invented programming languages: to be able to express program state and behavior unambiguously.
I’m not bullish on a future where I have to write specifications on all explicit and implicit corner and edge cases just to have an agent make software design choices which don’t feel batshit insane to humans.
We already have software corporations which produce that kind of code simply because the people doing the specifying don’t know the system or the domain it operates in, and the people doing the implementing of those specifications don’t necessarily know any of that either.
Lots of comments about Gas Town (which I get, it's hard not to talk about it!), but I thought this was a pretty good article -- nice job of summing up various questions and suggesting ways to think about them. I like this bit in particular:
> A more conservative, easier to consider, debate is: how close should the code be in agentic software development tools? How easy should it be to access? How often do we expect developers to edit it by hand?
> Framing this debate as an either/or – either you look at code or don’t, either you edit code by hand or you exclusively direct agents, either you’re the anti-AI-purist or the agentic-maxxer – is unhelpful.
> The right distance isn’t about what kind of person you are or what you believe about AI capabilities in the current moment. How far away you step from the syntax shifts based on what you’re building, who you’re building with, and what happens when things go wrong.
If it's stupid, but it works, it isn't stupid. Gas town transcends stupid. It is an abstract garbage generator. Call it art, call it an experiment, but you cannot call it a solution to a problem by any definition of the word.
> In the same way any poorly designed object or system gets abandoned
Hah, tell that to Docker, or React (the ecosystem, not the library), or any of the other terrible technologies that have better thought-out alternatives, but we're stuck with them being the de facto standard because they were first.
Design indeed becomes the bottleneck, I think that this points to a step that is implied but still worth naming explicitly -> design isn't just planning upfront. It is a loop where you see output, see if it is directionally right, and refine.
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
I ran a similar operation over summer where I treated vibecoding like a war. I was the general. I had recon (planning), and frontmen/infantry making the changes. Bugs and poor design were the enemy. Planning docs were OPORD, we had sit reps, and after action reports - complete e2e workflow. Even had hooks for sounds and sprites. Was fun for a bit but regressed to simpler conceptual and more boring workflows.
Anyways we'll likely always settle on simpler/boring - but the game analogies are fun in the time being. A lot of opportunity to enhance UX around design, planning, and review.
Yegge is just running arbitrage on an information gap.
It's the same chasm that all the AI vendors are exploiting: the gap between people who have some idea what is going on and the vast mass of people who don't but are addicted to excitement or fear of the future.
Yegge is being fake-playful about it but if you have read any of his other writing, this tracks. None of it is to be taken very seriously because he values provocation and mischief a little too highly, but bits of it have some ideas worth thinking about.
Has anyone contrasted gas town to Stanford's DSPY (https://dspy.ai/)? They seem related, but I have trouble understanding exactly what Gas Town is and so can't myself do a comparison?
I have not tried Gas Town yet, but Steve's beads https://github.com/steveyegge/beads (used by Gas Town) has been a game-changer, on the order of what claude code was when it arrived.
I've been researching the usage of Developer tooling at mine and other organizations for years now and I'm genuinely trying to understand where agentic coding fits into the evolving landscape.
One of the most solid things im beginning to understand is that many people dont understand how these tools influence technical debt.
Debt doesnt come due immediately, its accrued and may allow for the purchase of things that were once too expensive, but eventually the bill comes due.
Ive started referring to vibe-coding as "Credit Cards" for developers. Allowing them to accrue massive amounts of technical debt that were previously out of reach. This can provide some competent developers with incredible improvments to their work. But for the people who accrue more Technical Debt than they have the ability to pay off, it can sink their project and cost our organization alot in lost investment of both time and money.
I see Gas Town and tools like as debt schemes where someone applies for more credit cards to pay the payments on prior cards they've maxed out, compounding the issue with the vague goal of "eventually it pays off."
So color me skeptical.
Not sure if this analogy holds up to all things, but its been helping my organization navigate the application of agents, since it allows us to allocate spend depending on the seniority of each developer. Thus ive been feeling like an underwriter having to figure out if a developer requesting more credits or budget for agentic code can be trusted to pay off the debt they will accrue.
Gas Town has a very clear "mad scientist/performance art" sort of thing going on, and I love that. It's taking a premise way past its logical conclusion, and I think that's fun to watch.
I haven't seen anything to suggest that Yegge is proposing it as a serious tool for serious work, so why all the hate?
> I also think Yegge deserves praise for exercising agency and taking a swing at a system like this, despite the inefficiencies and chaos of this iteration. And then running a public tour of his shitty, quarter-built plane while it’s mid-flight.
Can we please stop with the backhanded compliments and judgement? This is cutting edge technology in a brand new field of computing using experimental methods. Please give the guy a break. At least he's trying to advance the state of the art, unlike all the people that copy everyone else.
> Please give the guy a break. At least he's trying to advance the state of the art.
The problem is that as an outsider it really looks like someone is trying to herd a bunch of monkeys into writing Shakespeare, or trying to advance impressionist art by pretending a baby's first crayon scratches are equivalent to a Pollock.
I bet he's having a lot of fun playing around with "cutting-edge technology", but it's missing any kind of scientific rigor or analysis, so the results are going to be completely useless to anyone wanting to genuinely advance the use of LLMs for programming.
I agree that he probably has a lot of fun. What he's doing is an equivalent of throwing a hand grenade into a crowd and enjoying the chaos of it all - he's set in life, can comfortably retire while the rest of the industry tries to deal with that hand grenade. Where some people are fighting to get the safety pin out while others are trying to stop them.
I don't get the widespread hatred of Gas Town. If you read Steve's writeup, it's clear that this is a big fun experiment.
It pushes and crosses boundaries, it is a mixture of technology and art, it is provocative. It takes stochastic neural nets and mashes them together in bizarre ways to see if anything coherent comes out the other end.
And the reaction is a bunch of Very Serious Engineers who cross their arms and harumph at it for being Unprofessional and Not Serious and Not Ready For Production.
I often feel like our industry has lost its sense of whimsy and experimentation from the early days, when people tried weird things to see what would work and what wouldn't.
Maybe it's because we also have suits telling us we have to use neural nets everywhere for everything Or Else, and there's no sense of fun in that.
Maybe it's the natural consequence of large-scale professionalization, and stock option plans and RSUs and levels and sprints and PMs, that today's gray hoodie is just the updated gray suit of the past but with no less dryness of imagination.
> If you read Steve's writeup, it's clear that this is a big fun experiment:
So, Steve has the big scary "YOU WILL DIE" statements in there, but he also has this:
> I went ahead and built what’s next. First I predicted it, back in March, in Revenge of the Junior Developer. I predicted someone would lash the Claude Code camels together into chariots, and that is exactly what I’ve done with Gas Town. I’ve tamed them to where you can use 20–30 at once, productively, on a sustained basis.
"What's next"? Not an experiment. A prediction about how we'll work. The word "productively"? "Productively" is not just "a big fun experiment." "Productively" is what you say when you've got something people should use.
Even when he's giving the warnings, he says things like "If you have any doubt whatsoever, then you can’t use it" implying that it's ready for the right sort of person to use, or "Working effectively in Gas Town involves committing to vibe coding.", implying that working effectively with it is possible.
Every day, I go on Hacker News, and see the responses to a post where someone has an inconsistent message in their blog post like this.
If you say two different and contradictory things, and do not very explicitly resolve them, and say which one is the final answer, you will get blamed for both things you said, and you will not be entitled to complain about it, because you did it to yourself.
I agree, I’m one of the Very Serious Engineers and I liked Steve’s post when I thought it was sort of tongue in cheek but was horrified to come to the HN comments and LinkedIn comments proclaiming Gastown as the future of engineering. There absolutely is a large contingent of engineers who believe this, and it has a real world impact on my job if my bosses think you can just throw a dozen AI agents at our product roadmap and get better productivity than an engineer. This is not whimsical to me, I’m getting burnt out trying to navigate the absurd expectations of investors and executives with the real world engineering concerns of my day to day job.
It's a half-joke. No need to take it that seriously or that jokingly. It's mostly only grifters and cryptocurrency scammers claiming it's amazing.
I think ideas from it will probably partially inspire future, simpler systems.
> If you say two different and contradictory things, and do not very explicitly resolve them, and say which one is the final answer, you will get blamed for both things you said, and you will not be entitled to complain about it, because you did it to yourself.
Our industry is held back in so many ways by engineers clinging to black-and-white thinking.
Sometimes there isn’t a “final” answer, and sometimes there is no “right” answer. Sometimes two conflicting ideas can be “true” and “correct” simultaneously.
It would do us a world of good to get comfortable with that.
My background is in philosophy, though I am a programmer, for what it is worth. I think what I'm saying is subtly different from "black and white thinking".
The final answer can be "each of these positions has merit, and I don't know which is right." It can be "I don't understand what's going on here." It can be "I've raised some questions."
The final answer is not "the final answer that ends the discussion." Rather, it is the final statement about your current position. It can be revised in the future. It does not have to be definitive.
The problem comes when the same article says two contradictory things and does not even try to reconcile them, or try to give a careful reader an accurate picture.
And I think that the sustained argument over how to read that article shows that Yegge did a bad job of writing to make a clear point, albeit a good job of creatring hype.
Or -- and hear me out -- unserious people are saying nonsense things for attention and pointing this out is the appropriate response.
Keep in mind that Steve has LLMs write his posts on that blog. Things said there may not reflect his actual thoughts on the subject(s) at hand.
> Keep in mind that Steve has LLMs write his posts on that blog.
Ok, I can accept that, it's a choice.
> Things said there may not reflect his actual thoughts on the subject(s) at hand.
Nope, you don't get to have it both ways. LLMs are just tools, there is always a human behind them and that human is responsible for what they let the LLM do/say/post/etc.
We have seen the hell that comes from playing the "They said that but they don't mean it" or "It's just a joke" (re: Trump), I'm not a fan of whitewashing with LLMs.
This is not an anti or pro Gas Town comment, just a comment on giving people a pass because they used an LLM.
There's a rather fine line between "don't believe everything you read" and "don't believe anything you read". At least in this case.
Is this confirmed true? Yegge has a very very long history of writing absurdly long posts / rants.
Back in the day they used to be coherent.
> If you read Steve's writeup
Personally I got about 3 paragraphs into what seemed like a twelve-page fevered dream and filed it under "not for me yet".
> And the reaction is a bunch of Very Serious Engineers who cross their arms and harumph at it for being Unprofessional and Not Serious and Not Ready For Production.
Exactly!
They’re part of Steve’s art project, they just don’t realise it.
> OK! That was like half a dozen great reasons not to use Gas Town. If I haven’t got rid of you yet, then I guess you’re one of the crazy ones. Hang on. This will be a long and complex ride. I’ve tried to go super top-down and simplify as much as I can, but it’s a bit of a textbook.
For better or worse, we are making history.
A sense of art and whimsy and experimentation is less compelling when it's jumping on the hypest of hype-trains. I'd love to see more folk art in programming, but Gas Town is closer to fucking Beeple than anything charming.
I like gastown's moxie, it's fun, and seems kind of tongue in cheek.
What I don't like is people me-tooing gastown as some breakthrough in orchestration. I also don't like how people are doing the same thing for ralph.
In truth, what I hate is people dogpiling thoughtlessly on things, and only caring about what social media has told them to care about. This tendency makes me get warm tingles at the thought of the end of the world. Agent smith was right about humanity.
It’s not the whimsy. It’s that the whimsy is laced with casual disdain, a touch too much “let me buy you a stick of gum and show you how to chew it”, a frustrated tenor never stated but dog whistled “you dumb fucks”. A soft sharp stink of someone very smart shoving that fact in your face as they evangelise “the obvious truth” you’re too stupid to see.
And maybe he’s even right. But the reaction is to the flavour of chip on the shoulder delivery mixed into an otherwise fun piece.
Perhaps it was his followup post about how people are lining up to throw millions of VC dollars at his bizarre whimsical fever dream that disturbs people? I’m all for arts funding, but…
Its because people are treating the experiment like a serious path forward for their business.
"our industry has lost its sense of whimsy"
The first thing I thought as I read his post and saw the images of the weasels was that he should make a game of it. Maybe name it Bitborn.
It isn't though. It crossed the chasm when Steve (who I would like to think is somewhat comfortable after writing a book, holding a director level position at several startups) decided to endorse an outright crypto pump and dump.
When he decided to monetize the eyeballs on the project instead of anything related to the engineering. Which, of course, Steve isn't smart enough to understand (in his own words) and he recommends you not buy but he still makes a tidy profit from it.
Its a memecoin now... that has a software project attached to it. Anything related to engineering died the day he failed to disavow the crypto BS and instead starting shrilling it.
What happened to engineers not calling out BS as BS.
> I don't get the widespread hatred of Gas Town.
Fear over what it means if it works.
I work in a typical web app company which does accounting/banking etc.
A couple of days ago I was sitting in a meeting of 10-15 devs, discussing our AI agents. People were raising issues and brainstorming ways around the problems with AI. How to make the AI better.
Our devs were occupied doing AI things, not accounting/banking things.
If the time savings were as promised, we should have been 3 devs (with the remaining devs replaced by 7-10 AI agents) discussing accounting/banking.
If Gas Town succeeds, it will just be the next toy we play with instead of doing our jobs.
>I often feel like our industry has lost its sense of whimsy and experimentation from the early days, when people tried weird things to see what would work and what wouldn't.
Remember the days when people experimented with and talked about things that werent LLMs?
I used to go to a lot of industry events and I really enjoyed hearing about the diversity of different things people worked on both as a hobby and at work.
Now it's all LLMs all the time and it's so goddamn tedious.
> I used to go to a lot of industry events and I really enjoyed hearing about the diversity of different things people worked on both as a hobby and at work.
I go to tech meetups regularly. The speed at which any conversation end up on the topic of AI is extremely grating to me. No more discussions about interesting problems and creative solutions that people come up with. It's all just AI, agentic, vibe code.
At what point are we going to see the loss of practical skills if people keep on relying on LLMs for all their thinking?
> No more discussions about interesting problems and creative solutions that people come up with. It's all just AI, agentic, vibe code.
And then you give in and ask what they're building with AI, that activation energy finally available to build the side project they wouldn't have built otherwise.
"Oh, I'm building a custom agentic harness!"
...
It's like the entire software industry is gambling on "LLMs will get better faster than human skills will decay, so they will be good enough to clean up their own slop before things really fall apart".
I can't even say that's definitely a losing bet-- it could very well happen-- but boy does it seem risky to go all-in on it.
Some of the heads like Altman seem to be putting all their chips in the "AGI in [single-digit number] years" pile.
Yeah it's unbelievably tiresome, endless complaints from people pushing up their glasses complaining, ITS A PROJECT ABOUT POLECATS CALLED GAS TOWN MADE FOR FUN, read that again, either admire it and enjoy it or quit the umpteenth complaint about vibecoding.
Yeah where he probably Burns like a million dollars of money.
Just for fun!
He's paying $600 a month for 3x Claude Max subs. It's in his article.
>Yegge is leaning into the true definition of vibecoding with this project: “It is 100% vibecoded. I’ve never seen the code, and I never care to.”
I don't get it. Even with a very good understanding of what type of work I am doing and a prebuilt knowledge of the code, even for very well specced problem. Claude code etc. just plain fail or use sloppy code. How do these industry figures claim they see no part of a 225K+ line of code and promise that it works?
It feels like we're getting into an era where oceans of code which nobody understands is going to be produced, which we hope AGI swoops in and cleans?
Where is the "super upvote button" when you need it?
YES! I have been playing with vibe coding tools since they came out. "Playing" because only on rare occasions have I created something that is good enough to commit/keep/use. I keep playing with them because, well I have a subscription, but also so I don't fall into the fuddy-duddy camp of "all AI is bad" and can legitimately speak on the value, or lack thereof, of these tools.
Claude Code is super cool, no doubt, and with _highly targeted_ and _well planned_ tasks it can produce valuable output. Period. But, every attempt at full-vibe-coding I've done has gotten hung up at some point and I have to step in an manually fix this. My experience is often:
1. First Prompt: Oh wow, this is amazing, this is the future
2. Second Prompt: Ok, let me just add/tweak a few things
10. 10th prompt: Ugh, everytime I fix one thing, something else breaks
I'm not sure at all what I'm doing "wrong". Flogging the agents along doesn't not work well for me or maybe I am just having trouble letting go of the control and I'm not flogging enough?
But the bottom line is I am generally shocked that something like Gas Town was able to be vibe-coded. Maybe it's a case of the LLM overstating what it's accomplished (typical) and if you look under the hood it's doing 1% of what it says it is but I really don't know. Clearly it's doing something, but then I sit over here trying to build a simple agent with some MCPs hooked up to it using a LLM agent framework and it's falling over after a few iterations.
This is also my experience. Everything I’ve ever tried to vibe code has ended up with off-by-one errors, logic errors, repeated instances of incorrect assumptions etc. Sometimes they appear to work at first, but, still, they have errors like this in them that are often immediately obvious on code review and would definitely show up in anything more than very light real world use.
They _can_ usually be manually tidied and fixed, with varying amounts of effort (small project = easy fixes, on a par with regular code review, large project = “this would’ve been easier to write myself...”)
I guess Gas Town’s multiple layers of supervisory entities are meant to replace this manual tidying and fixing, but, well, really?
I don’t understand how people are supposedly having so much success with things like this. Am I just holding it wrong?
If they are having real success, why are there no open source projects that are AI developed and maintained that are _not_ just systems for managing AI? (Or are there and I just haven’t seen them?...)
Yeah, it sounds like "you're holding it wrong"
Like, why are you manually tidying and fixing things? The first pass is never perfect. Maybe the functionality is there but the code is spaghetti or untestable. Have another agent review and feed that review back into the original agent that built out the code. Keep iterating like that.
My usual workflow:
Agent 1 - Build feature Agent 2 - Review these parts of the code, see if you find any code smells, bad architecture, scalability problems that will pop up, untestable code, or anything else falling outside of modern coding best practices Agent 1 - Here's the code review for your changes, please fix Agent 2 - Do another review Agent 1 - Here's the code review for your changes, please fix
Repeat until testable, maybe throw in a full codebase review instead of just the feature.
Agent 1 - Code looks good, start writing unit tests, go step by step, let's walk through everything, etc. etc. etc.
Then update your .md directive files to tell the agents how to test.
Voila, you have an llm agent loop that will write decent code and get features out the door.
I'm not trying to be rude here at all but are you manually verifying any of that? When I've had LLMs write unit tests they are quick to write pointless unit tests that seem impressive "2123/2123 tests passed!" but in reality it's testing mostly nothing of value. And that's when they aren't bypassing commit checks or just commenting out tests or saying "I fixed it all" while multiple tests are broken.
Maybe I need a stricter harness but I feel like I did try that and still didn't get good results.
I worry about people who use this approach where they never look at the code. Vibe-coding IS possible but you have to spent a lot of time in plan mode and be very clear about architecture and the abstractions you want it to use.
I've written two seperate moderately-sized codebases using agentic techniques (oftentimes being very lazy and just blanket approving changes), and I don't encounter logic or off-by-one errors very often if at all. It seems quite good at the basic task of writing working code, but it sucks at architecture and you need occasional code review rounds to keep the codebase tidy and readable. My code reviews with the AI are like 50% DRY and separating concerns
In a recent Yegge interview, he mentions that he often throws away the entire codebase and starts from scratch rather than try to get LLMs to refactor their code for architecture.
This has been my best way to learn, put one agent on a big task, let it learn things about the problem and any gotchas, and then have it take notes, do it again until I'm happy with the result, if in the middle I think there's two choices that have merit I ask for a subagent to go explore that solution in another worktree and to make all its own decisions, then I compare. I also personally learn a lot about the problem space during the process so my prompts and choices on us sequent iterations use the right language I need to use.
Honestly, in my experience so far, if an LLM starts going down a bad path, it’s better just to roll back to a point where things were OK and throw away whatever it was doing, rather than trying to course correct.
The secret is that it doesn't work. None of these people have built real software that anyone outside their bubble uses. They are not replacing anyone, they are just off in their own corner building sand castles.
Just because they're one-off tools that only one person uses doesn't mean it's not "real software". I'm actually pretty excited about the fact that it's now feasible for me to replace all my BloatedShittyCommercialApps that I only use 5% of with vibe-coded bespoke tools that only do the important 5%, just for me to use. If that makes it a "sand castle" to you, fine, but this is real software and I'm seeing real benefit here.
> The secret is that it doesn't work.
I have 100% vibecoded software that I now use instead of commercial implementation that cost me almost 200 usd a month (tool for radiology dictation and report generation).
Wait, so you're a radiologist and you're using software you vibecoded to generate radiology reports for real patients? Is that, like, allowed?
Not saying it's right, but boy do I have stories about the code used in <insert any medical profession> healthcare applications. Not sure how "vibecoded" programming lines of code is any worse.
Depends where in the world they are. Here in Hungary, it’s not uncommon to email your-family-doctor@gmail.com
What does that have to do with vibe-coding?
My partner is a radiologist and I'd love to hear more about what you built. The engineer in me is also curious how much this cost in credits?
It CAN be cheap.
I built a clinical pharmacist "pocket calculator" kinda app for a specific function. It was like $.60 in claude credits I think. Built with flutter + dart. It's a simple tool suite and I've only built out one of the tools so far.
Now to be fair, that $.60 session was just the coding. I did some brainstorming in chatgpt and generated good markdown files (claude.md, gemini.md, agents.md) before I started.
And yet I notice you haven't mentioned publishing it and undercutting the market. You could make a lot of money out-competing the existing option if what you produced was production-grade software. I'm guessing the actual case is that you only needed a small subset of the functionality of the paid software, and the LLM stitched together a rough unpolished proof-of-concept that handled your exact specific use case. Which is still great for you! But it's not the future of coding. The world still needs real engineers to make real software that is suitable for the needs of many, and this doesn't replace that.
>The world still needs real engineers to make real software that is suitable for the needs of many, and this doesn't replace that.
I think azan_ is demonstrating that shipping products 'suitable for the needs of many' is going to have to compete with 'slopping software for the needs of one'.
The only people who think that are programmers already or programmer-adjacent. Your mother is never going to be able to use a Gas Town-like workflow to make software for her own needs, nor is she even going to want to spend her weekends trying. These tools still require a baseline minimum of technical knowledge, and a real time investment, and also a real money investment the way some people are using them. Moreover, most real software has interoperability needs. A world where everyone makes their own Twitter or WhatsApp is a world where nobody can talk to anyone else.
There is a small subset of the population who is now enabled to make proof-of-concepts with less effort than before. This is no way diminishes the need for delivering performant, secure, interoperable software at scale to serve humanity's needs.
> Your mother is never going to be able to use a Gas Town-like workflow to make software for her own needs, nor is she even going to want to spend her weekends trying.
I'm going on a tangent here but what's with this constant deprecation of mothers to make a point? There are many people here whose mothers can develop software.
People's mothers are statistically unlikely to be programmers, obviously. My own grandmother was a programmer, but it conveys the idea in two words rather than making up a clunky phrase to describe the exact degree of non-techiness of the hypothetical person.
What if we packaged Gas Town up in an operating system userspace, put it on rails, and gave people an interface to it?
An interface isn't enough. Even if you never look at the code, the results are going to be influenced significantly by having the vocabulary to accurately describe what you want. The less sufficient your technical vocabulary, the more ambiguous your prompts will be and the less likely it is that the Polecats will be able to deliver anything resembling your unspoken imagination. To say nothing of being able to guide the lost critters when they run into problems.
It sounds like a medical device, in which case marketing it may require FDA approval or notification. Whereas vibe-coding a one-off tool for yourself might still require validation but you're the one taking the risk and accepting liability for it.
I think the thing you're missing is that the tool doesn't need to be marketed because someone else could ask their LLM to make them a similar tool but fitting their use case.
If they're using a 100% vibe-coded tool that they've never read the code of to replace something that would require government approval, for use on real-world patients, they're probably committing medical malpractice as we speak. Let us pray that is not the case.
It doesn't matter if the tool "needs" to be marketed. There is a market of paying customers. If other people are paying $200/month, both your and their lives would be improved significantly by you offering a $100/month replacement software. For all the talk about LLMs replacing the need for packaged software, people are still paying for packaged software, and while they are, you could be making large amounts of money while saving them money. If you're altruistic, you could even release it as FOSS and save a lot of people $200/mo. Unless, of course, your vibe-coded app isn't actually remotely capable of replacing the software in question.
Not everything has to be monetized, buddy. It's okay to relax.
> If you're altruistic, you could even release it as FOSS and save a lot of people $200/mo. Unless, of course, your vibe-coded app isn't actually remotely capable of replacing the software in question.
Vibe-coded radiology reports, finally the 21st century will get its own Therac-25 incident.
How much costs you renting vibecoding tools?
such tools cost 10-20/mo usually?
no that's not true. I rarely now write a SINGLE line of code both at work or at home. Even simple config switches, I ask codex/gemini to do it.
You always have to review overall diff though and go back to agent with broader corrections to do.
> You always have to review overall diff though and go back to agent with broader corrections to do.
This thread is about vibe coding _without_ looking at the code.
It is fine to have criticisms of this, I have many, but saying that Yegge hasn't built real software is just not true.
Yegge obviously built real software in the past. He has not built real software wherein he never looked at the code, as he is now promoting.
Ok but this entire idea is very new. Its not an honest criticism to say no one has tried the new idea when they are actively doing it.
Honestly I don't get the hostility. Yegge is running an experiment. I don't think it will work, but it will be interesting and informative to watch.
The experiment is fine if you treat it as an experiment. The problem is the state of the industry where it's treated as serious rather than silly — possibly even by Steve himself.
The 'experiment' isn't the issue. The problem is the entire culture around it. LLM tools are being shoved into everything, LLMs are soaking up trillions in investment, engineers are being told over and over that everything has changed and this garbage is making us obsolete, software quality is decreasing where wide LLM usage is being mandated (eg. Microsoft). Gas Town does not give the vibe of a neutral experiment but rather looks be a full-on delve into AI psychosis with the way Yegge describes it.
To be clear, I think LLMs are useful technology. But the degree of increasing insanity surrounding it is putting people off for obvious reasons.
I share the frustration with the hype machine. I just don't think a guy with a blog is an appropriate target for our frustration with corporate hype culture.
> Ok but this entire idea is very new. Its not an honest criticism to say no one has tried the new idea when they are actively doing it.
Not really new. Back in the day companies used to outsource their stuff to the lowest bidder agencies in proverbial Elbonia, never looked at the code, and then panickedly hired another agency when the things visibly were not what was ordered. Case studies are abound on TheDailyWTF for the last two decades.
Doing the same with agents will give you the same disastrous results for comparably the same money, just faster. Oh and you can't sue them, really.
Maybe it's better, who knows.
Fair point on the Elbonia comparison. But we can't sue the SQLite maintainers either, and yet we trust them with basically everything. The reason is that open source developed its own trust mechanisms over decades. We don't have anything close to that with LLMs today. What those mechanisms might look like is an open question that is getting more important as AI generated code becomes more common.
> saying that Yegge hasn't built real software is just not true
I mean... I feel like it's somewhat telling that his wikipedia page spends half its words on his abrasive communication style, and the only thing approximating a product mentioned is a (lost) Rails-on-Javascript port, and 25 years spent developing a MUD on the side.
Certainly one doesn't get to stay a staff-level engineer at Google without writing code - but in terms of real, shipping software, Yegge's resume is a bit light for his tenure in BigTech
> How do these industry figures claim they see no part of a 225K+ line of code and promise that it works?
The only promise is that you will get your face ripped off.
“WARNING DANGER CAUTION - GET THE F** OUT - YOU WILL DIE […] Gas Town is an industrialized coding factory manned by superintelligent robot chimps, and when they feel like it, they can wreck your shit in an instant. They will wreck the other chimps, the workstations, the customers. They’ll rip your face off if you aren’t already an experienced chimp-wrangler.”
Yeah, I'm at that stage 6 or 7. I'm using multiple agents across multiple terminal windows. I'm not even coding any more, literally I haven't written code in like 2-4 months now beyond changing a config value or something.
But I still haven't actually used Gastown. It looks cool. I think it probably works, at least somewhat. I get it. But it's just not what I need right now. It's bleeding edge and experimental.
The main thing holding me back from even tinkering with it is the cost. Otherwise I'd probably play with it a little, but it's not something I'd expect to use and ship production code right now. And I ship a ton of production code with claude.
There is an incentive for dishonesty about what AI can and cannot do.
People from OpenAI was saying that GPT2 had achieved AGI. There is a very clear incentive for that statement to be made by people who are not using AI for anything productive.
Even as increasingly bombastic claims are made, it is obvious that the best AI cannot one-shot everything if you are an actual user. And the worst ones: was using Gemini yesterday and it wouldn't stop outputting emojis, was using Grok and it refused to give me a code snippet because it claimed its system prompt forbade this...what can you say?
I don't understand why anyone would want to work on a codebase they didn't understand either. What happens when something goes wrong?
Again though, there is massive financial incentive to make these claims, and some other people will fall along with that because it is good for their career, etc. I have seen this in my own company where senior people are shoehorning this stuff in that they clearly do not actually use or understand (to be clear, this is engineering not management...these are people who definitely should understand but do not).
Great tool, but the 100% vibecoding without looking at the code, for something that you are actually expecting others to use, is a bad idea. Feels more like performance art than actual work. I like jokes, I like coding, room for both but don't confuse the two.
I don't get you guys that are getting such bad results.
Are you guys just trying to one shot stuff? Are you not using agents to iterate on things? Are you not putting agents against each other (have one code, one critique/test the code, and put them in a loop)?
I still look at the code that's produced, I'm not THAT far down the "vibe coding" path that I'm trusting everything being produced, but I get phenomenal results and I don't actually write any code any more.
So like, yeah, first pass the llm will create my feature and there's definitely some poorly written code or duplicate code or other code smells, but then I tell another agent to review and find all these problems. Then that review gets fed back in to the agent that created the feature. Wham, bam, clean code.
I'm not using gastown or ralph wiggum ($$$) but reading the docs, looking over how things work, I can see how it all comes together and should work. They've been built out to automatically do the review + iteration loop that I do.
My feeling has been that 'serious' software engineers aren't particularly suited to use these tools. Most don't have an interest in managing people or are attracted to the deterministic nature of computing. There's a whole psychology you have to learn when managing people, and a lot of those skills transfer to wrangling AI agents from my experience.
You can't be too prescriptive or verbose when interacting with them, you have to interact with them a bit to start understanding how they think and go from there to determine what information or context to provide. Same for understanding their programming styles, they will typically do what they're told but sometimes they go on a tangent.
You need to know how to communicate your expectations. Especially around testing and interaction with existing systems, performance standards, technology, the list goes on.
All our best performing devs/engineers are using the tools the most.
I think this is something a lot of people are telling themselves though, sure.
I have some success but by the time I'm done I'm often not sure if I saved any time.
My (former) coworker who’s heavy into this stuff produced a lot of unmaintainable slop on his way out while singing agents praises to hire-ups. He also felt he was getting a lot of value and had no issues.
It lets 0.05X developers be 0.2X developers and 1X developers be 0.9-1.1X developers.
The problem is some 0.05X developers thought they were 0.5X and now they think they're 2X.
Nah, our best devs/engineers use the tools the most.
In my real life experience it's been the middling devs that always talk about "ai slop" and how the tools can't do their jobs.
I'm sympathetic to this view, but I also wonder if this is the same thing that assembly language programmers said about compilers. What do you mean that you never look at the machine code? What if the compiler does something inefficient?
Not even remotely close.
Compilers are deterministic. People who write them test that they will produce correct results. You can expect the same code to compile to the same assembly.
With LLMs two people giving the exact same prompts can get wildly different results. That is not a tool you can use to blindly ship production code. Imagine if your compiler randomly threw in a syscall to delete your hard drive, or decide to pass credentials in plain text. LLMs can and will do those things.
Even ignoring determinism, with traditional source code you have a durable, human-readable blueprint of what the software is meant to do that other humans can understand and tweak. There's no analogy in the case of "don't read the code" LLM usage. No artifacts exist that humans can read or verify to understand what the software is supposed to be doing.
yeah there is. it's called "documentation" and "requirements". And it's not like you can't go read the code if you want to understand how it works, it's just not necessary to do so while in the process of getting to working software. I truly do not understand why so many people are hung up on this "I need to understand every single line of code in my program" bs I keep reading here, do you also disassemble every library you use and understand it? no, you just use it because it's faster that way.
Not only that but compiler optimizations are generally based on rigorous mathematical proofs, so that even without testing them you can be pretty sure it will generate equivalent assembly. From the little I know of LLM's, I'm pretty sure no one has figured out what mathematical principles LLM's are generating code from so you cant be sure its going to right aside from testing it.
I write JS, and I have never directly observed the IRs or assembly code that my code becomes. Yet I certainly assume that the compiler author has looked at the compiled output in the process of writing a compiler!
For me the difference is prognosis. Gas Town has no ratchet of quality: its fate was written on the wall since the day Steve decided he didn't want to know what the code says: it will grow to a moderate but unimpressive size before it collapses under its own weight. Even if someone tried to prop it up with stable infra, Steve would surely vibe the stable infra out of existence since he does not care about that
or he will find a way to get the AI to create harnesses so it becomes stable. The lack of imagination and willingness to experiment in the HN crowd is AMAZING me and worrying me at the same time. Never thought a group of engineers would be the most conservative and close minded people I could discuss with.
No, it is not what assembly programmers said about compilers, because you can still look at the compiled assembly, and if the compiler makes a mistake, you can observe it and work around it with inline assembly or, if the source is available, improve the compiler. That is not the same as saying "never look at the code".
The compiler is deterministic and the translation does not lose semantics. The meaning of your code is an exact reflection of what is produced.
We can tell you weren't around for the advent of compilers. To be fair, neither was I since the UNIX c compiler came out in '68 and was by far not the first compiler. Modern comilers you can make that claim about, but early compilers weren't.
All compilers have bugs. Any loss of semantics during compilation would be considered a bug. In order to do that, the source and target language need to be structured and specified. I wasn't around in the 60s either, but I think that hasn't changed.
Which early compilers were nondeterministic?
I feel like this argument would make a lot more sense if LLMs had anywhere near the same level of determinism as a compiler.
>but I also wonder if this is the same thing that assembly language programmers said about compilers
But as a programmer writing C code, you're still building out the software by hand. You're having to read and write a slightly higher level encoding of the software.
With vibe coding, you don't even deal with encodings. You just prompt and move on.
I wonder if assembly programmers felt this way about the reliability of the electical components which their code relies upon...
I wonder if electrical engineers felt this way about the reliability of the silicon crystal lattice their circuits rely upon…
The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice. It also doesn't involve any "creativity": a compiler is mostly translating a high-level concept into its predefined lower-level components. I don't know exactly what my code compiles to, but I can be pretty certain what the general idea of the assembly is going to be.
With LLMs all bets are off. Is your code going to import leftpad, call leftpad-as-a-service, write its own leftpad implementation, decide that padding isn't needed after all, use a close-enough rightpad instead? Who knows! It's just rolling dice, so have fun finding out!
> The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice.
That's barely true now. Nix comes close, but builds are only bit-for-bit identical if you set a bunch of extra flags that aren't set by default. The most obvious instability is CPU dispatch order (aka modern single computer systems are themselves distributed, racy systems) changes the generated code ever so slightly.
We don't actually care, because if one compiled version of the code uses r8 for a variable but a different compilation uses r9 for that variable, it doesn't matter because we just assume the resulting binary works the same either way. R8 vs r9 are implementation details that don't matter to humans. See where I'm going with this? If the LLM non-deterministically calls the variable fileName one day, and file_name the next time it's given the same prompt, yeah language syntax purists are going to suffer an aneurysm because one of those is clearly "wrong" for the language in use, but it's really more of an implementation detail at this point. Obviously you can't mix them, the generated code has to be consistent in which one it's using, but if compilers get to chose r8 one day and r9 the next, and we're fine with it, why is having the exact variable name that important, as long as it's being used correctly?
I’ve done builds for aerospace products where the only binary difference between two builds of the same source code is the embedded timestamp. And per FAA review guidelines, this deterministic attribute is required, or else something is wrong in the source code or build process.
I certainly don’t use all compilers everywhere, but I don’t think determinism in compilation is especially rare.
No one is promising anything. It's just a giant experiment and the author explicitly tells you not to use it. I appreciate those that try new things, even it it's possibly akin to throwing s** at a wall and seeing what sticks.
Maybe it changes how we code or maybe it doesn't. Vibe coding has definitely helped me write throwaway tools that were useful.
After listening to Yegge's interview, I'm not sure this is accurate: https://www.youtube.com/watch?v=zuJyJP517Uw
For example, he makes a comment to the effect that anyone using an IDE to look at code in 2026 is a "bad engineer."
> It's just a giant experiment and the author explicitly tells you not to use it.
No, he threw up a hyperbolic warning and then dove deep into how this is the future of all coding in the rest of his talks/writing.
It’s as good a warning as someone saying “I’m not {X} but {something blatantly showing I am X}”
Reminds me of Matt Levine on https://www.lesswrong.com/posts/WACraar4p3o6oF2wD/sam-altman...
Who's promising it works?
It's an experiment to discover what the limits are. Maybe the experiment fails because it's scoped beyond the limits of LLMs. Maybe we learn something by how far it gets exactly. Maybe it changes as LLMs get better, or maybe it's a flawed approach to pushing the limits of these.
It's unintuitive, but having an llm verification loop like a code reviewer works impeccably well, you can even create dedicated agents to check for specific problem areas like poor error handling.
This isn't about anthropomorphism, it's context engineering. By breaking things into more agents, you get more focused context windows.
I believe gas town has some review process built in, but my comment is more to address the idea that it's all slop.
As an aside, Opus 4.5 is the first model I used that most of the time doesn't produce much slop, in case you haven't tried it. Still produces some slop, but not much human required for building things (it's mostly higher level and architectural things they need guidance on).
> it's mostly higher level and architectural things they need guidance on
Any examples you can share?
Mostly, it's not the model that is lacking but the visibility it has. Often the top level business context for a problem is out of reach, spread across slack, email, internal knowledge and meetings.
Once I digest some of this and give it to Claude, it's mostly smooth sailing but then the context window becomes the problem. Compactions during implementation remove a lot of important info. There should really be a Claude monitoring top level context and passing work to agents. I'm currently figuring out how to orchastrate that nicely with Claude Code MD files.
With respect to architecture, it generally makes sound decisions but I want to tweak it, often trading off simplicity vs. security and scale. These decisions seem very subtle and likely include some personal preferences I haven't written anywhere.
Do you understand at a molecular level how cooking works? Or do you just do some rote actions according to instructions? How do you know if your cooking worked properly without understanding chemistry? Without looking at its components under a microscope?
Simple: you follow the directions, eat the food, and if it tastes good, it worked.
If cooks don't understand physics, chemistry, biology, etc, how do all the cooks in the world ensure they don't get people sick? They follow a set of practices and guidelines developed to ensure the food comes out okay. At scale, businesses develop even more practices (pasteurization, sanitization, refrigeration, etc) to ensure more food safety. None of the people involved understand it at a base level. There are no scientists directly involved in building the machines or day-to-day operations. Yet the entire world's food supply works just fine.
It's all just abstractions. You don't need to see the code for the code to work.
That's a terrible analogy lol.
1. Chefs do learn the chemistry, at least enough to know why their techniques work.
2. Food scientist is a real job
3. The supply chain absolutely does have scientists involved in day to day operations lol.
A better analogy is just shoving the entire contents of the fridge into a pot, plastic containers and all, and assuming it'll be fine.
OP defines herself as a mediocre engineer. She's trying to sell you Slop Town, not engineering principles.
Originally I thought that Gas Town was some form of high level satire like GOODY-2 but it seems that some of you people have actually lost the plot.
Ralph loops are also stupid because they don't make use of kv cache properly.
---
https://github.com/steveyegge/gastown/issues/503
Problem:
Every gt command runs bd version to verify the minimum beads version requirement. Under high concurrency (17+ agent sessions), this check times out and blocks gt commands from running.
Impact:
With 17+ concurrent sessions each running gt commands:
- Each gt command spawns bd version
- Each bd version spawns 5-7 git processes
- This creates 85-120+ git processes competing for resources
- The 2-second timeout in gt is exceeded
- gt commands fail with "bd version check timed out"
I think it is satire, and pretty obvious one at that; is anybody taking it for real?
> Ralph loops are also stupid because they don't make use of kv cache properly.
This is a cost/resources thing. If it's more effective and the resources are available, it's completely fine.
Many comments undermining Gas Town, inadvertently assisting it by revealing failure modes and solutions to those. I'm excited when the discourse evolves around building out these frameworks and ideals. There's many comments on people not understanding something.
I have a feeling its less getting small components to always work but more of systems that are inherently resistant to failure of underlying components. Probably similar to biology, as I understand it components of the human body fail every day, yet the body persists and no one engineers it nor looks at its underlying code, but instead has built-in feedback/control systems similar to Gas Town rudimentary versions.
>while Yegge made lots of his own ornate, zoopmorphic [sic] diagrams of Gas Town’s architecture and workflows, they are unhelpful. Primarily because they were made entirely by Gemini’s Nano Banana. And while Nano Banana is state-of-the-art at making diagrams, generative AI systems are still really shit at making illustrative diagrams. They are very hard to decipher, filled with cluttered details, have arrows pointing the wrong direction, and are often missing key information.
So true! Not to mention the garbled text and inconsistent visuals across the diagrams———an insult to the reader's intelligence. How do people tolerate this visual embodiment of slurred speech?
Yeah I couldn’t figure out if they were just intended as illustrations and gave up trying to read them after a while.
Which is unfortunate as it would have been really helpful to have actually legible architecture diagrams, given the prose was so difficult for me to untangle due to the manic “fun” irreverent style (and it’s fine to write with a distinctive voice to make it more interesting, but still … confusing).
Plus the dozens of new unique names and connections introduced every few paragraphs to try to keep in my head…
I first asked Gemini 3 Pro to condense it to a boring technical overview and it produced a single page outline and Mermaid diagrams that were nearly as unintelligible as the original post so even AI has issues digesting it apparently…
The author's high-value flowcharts vs Steve Yegge's AI art is enough of a case-in-point for how confusing his posts and repos are. However this is a pervasive problem with AI coding tools. Unsurprisingly, the creators of these tools are also the most bullish about agentic coding, so the source code shows the consequences. Even Claude Code itself seems to experience an unusually high number of regressions or undocumented changes for such a widely used product. I had the same problem when recently trying to understand the details of spec-kit or sprites from their docs. Still, I agree that Gas Town is a very instructive example of what the future of AI coding will look like. I'm confident mature orchestration workflows will arrive in 2026.
> Yegge deserves praise for exercising agency and taking a swing at a system like this [...] then running a public tour of his shitty, quarter-built plane while it’s mid-flight
This quote sums it all up for me. It's a crazy project that moves the conversation forward, which is the main value I see in it.
It very well could be a logjam breaker for those who are fortunate enough to get out more than they put into it... but it's very much a gamble, and the odds are against you.
Everything i have learned about the schizophrenic thing "gas town" has been against my will.
Did you catch the part where it crossed over into a crypto pump-and-dump scam, with Yegge's approval? And then the guy behind the "Ralph" vibe coding thing endorsed the same scam, despite being a former crypto critic who should absolutely know better?
Brought to you by the creators (abstractly) of vibe coding, ralph and yolo mode. Either a conspiracy to deconstruct our view of reality, or just a tendency to invent funny words for novelty
It’s brainrot, that’s what it is.
I believe agentic coding could eventually be a paradigm shift, if and only if the agents become self-conscious of design decisions and their implications on the system and its surrounding systems as a whole.
If that doesn’t happen, the entire workflow devolves into specifying system states and behavior in natural language, which is something humans are exceedingly bad at.
Coincidently, that is why we have invented programming languages: to be able to express program state and behavior unambiguously.
I’m not bullish on a future where I have to write specifications on all explicit and implicit corner and edge cases just to have an agent make software design choices which don’t feel batshit insane to humans.
We already have software corporations which produce that kind of code simply because the people doing the specifying don’t know the system or the domain it operates in, and the people doing the implementing of those specifications don’t necessarily know any of that either.
Lots of comments about Gas Town (which I get, it's hard not to talk about it!), but I thought this was a pretty good article -- nice job of summing up various questions and suggesting ways to think about them. I like this bit in particular:
> A more conservative, easier to consider, debate is: how close should the code be in agentic software development tools? How easy should it be to access? How often do we expect developers to edit it by hand?
> Framing this debate as an either/or – either you look at code or don’t, either you edit code by hand or you exclusively direct agents, either you’re the anti-AI-purist or the agentic-maxxer – is unhelpful.
> The right distance isn’t about what kind of person you are or what you believe about AI capabilities in the current moment. How far away you step from the syntax shifts based on what you’re building, who you’re building with, and what happens when things go wrong.
If it's stupid, but it works, it isn't stupid. Gas town transcends stupid. It is an abstract garbage generator. Call it art, call it an experiment, but you cannot call it a solution to a problem by any definition of the word.
"If it's stupid, but it works, it isn't stupid" is a maxim that only applies to luxury use cases where the results fundamentally don't matter.
As soon as the results actually matter, the maxim becomes "if it works, but it's stupid, it doesn't work".
It is simply because Mr. Yegge is seeking attention. As he always did.
> In the same way any poorly designed object or system gets abandoned
Hah, tell that to Docker, or React (the ecosystem, not the library), or any of the other terrible technologies that have better thought-out alternatives, but we're stuck with them being the de facto standard because they were first.
Design indeed becomes the bottleneck, I think that this points to a step that is implied but still worth naming explicitly -> design isn't just planning upfront. It is a loop where you see output, see if it is directionally right, and refine.
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
I ran a similar operation over summer where I treated vibecoding like a war. I was the general. I had recon (planning), and frontmen/infantry making the changes. Bugs and poor design were the enemy. Planning docs were OPORD, we had sit reps, and after action reports - complete e2e workflow. Even had hooks for sounds and sprites. Was fun for a bit but regressed to simpler conceptual and more boring workflows.
Anyways we'll likely always settle on simpler/boring - but the game analogies are fun in the time being. A lot of opportunity to enhance UX around design, planning, and review.
My instinct is that effective AI agent orchestration will resemble human agile software development more than Steve Yegge’s formulation:
> “It will be like kubernetes, but for agents,” I said.
> “It will have to have multiple levels of agents supervising other agents,” I said.
> “It will have a Merge Queue,” I said.
> “It will orchestrate workflows,” I said.
> “It will have plugins and quality gates,” I said.
More “agile for agents” than “Kubernetes for agents”.
Yegge is just running arbitrage on an information gap.
It's the same chasm that all the AI vendors are exploiting: the gap between people who have some idea what is going on and the vast mass of people who don't but are addicted to excitement or fear of the future.
Yegge is being fake-playful about it but if you have read any of his other writing, this tracks. None of it is to be taken very seriously because he values provocation and mischief a little too highly, but bits of it have some ideas worth thinking about.
Has anyone contrasted gas town to Stanford's DSPY (https://dspy.ai/)? They seem related, but I have trouble understanding exactly what Gas Town is and so can't myself do a comparison?
Viable System Model when?
I have not tried Gas Town yet, but Steve's beads https://github.com/steveyegge/beads (used by Gas Town) has been a game-changer, on the order of what claude code was when it arrived.
I've been researching the usage of Developer tooling at mine and other organizations for years now and I'm genuinely trying to understand where agentic coding fits into the evolving landscape. One of the most solid things im beginning to understand is that many people dont understand how these tools influence technical debt.
Debt doesnt come due immediately, its accrued and may allow for the purchase of things that were once too expensive, but eventually the bill comes due.
Ive started referring to vibe-coding as "Credit Cards" for developers. Allowing them to accrue massive amounts of technical debt that were previously out of reach. This can provide some competent developers with incredible improvments to their work. But for the people who accrue more Technical Debt than they have the ability to pay off, it can sink their project and cost our organization alot in lost investment of both time and money.
I see Gas Town and tools like as debt schemes where someone applies for more credit cards to pay the payments on prior cards they've maxed out, compounding the issue with the vague goal of "eventually it pays off." So color me skeptical.
Not sure if this analogy holds up to all things, but its been helping my organization navigate the application of agents, since it allows us to allocate spend depending on the seniority of each developer. Thus ive been feeling like an underwriter having to figure out if a developer requesting more credits or budget for agentic code can be trusted to pay off the debt they will accrue.
Gas Town has a very clear "mad scientist/performance art" sort of thing going on, and I love that. It's taking a premise way past its logical conclusion, and I think that's fun to watch.
I haven't seen anything to suggest that Yegge is proposing it as a serious tool for serious work, so why all the hate?
It’s doesn’t matter what Yegge means by it. Other folks are taking it seriously.
"I give it a hot minute before this type of task tracking lands in Claude Code."
aaaaand right on cue: https://github.com/anthropics/claude-code/commit/e431f5b4964... https://www.threads.com/@boris_cherny/post/DT15_k2juQH/at-th...
Gas Town could be good as a short film. Hell, I thought by all the criticism that it was a short film.
Which building in gastown is the infinite token burning machine?
Brawndo energy
Pretty hilarious write up and interesting frontier research project. I love it.
I love it! I'm at level 6 and brave enough to try. I'm in. Giving this a shot!
> I also think Yegge deserves praise for exercising agency and taking a swing at a system like this, despite the inefficiencies and chaos of this iteration. And then running a public tour of his shitty, quarter-built plane while it’s mid-flight.
Can we please stop with the backhanded compliments and judgement? This is cutting edge technology in a brand new field of computing using experimental methods. Please give the guy a break. At least he's trying to advance the state of the art, unlike all the people that copy everyone else.
> Please give the guy a break. At least he's trying to advance the state of the art.
The problem is that as an outsider it really looks like someone is trying to herd a bunch of monkeys into writing Shakespeare, or trying to advance impressionist art by pretending a baby's first crayon scratches are equivalent to a Pollock.
I bet he's having a lot of fun playing around with "cutting-edge technology", but it's missing any kind of scientific rigor or analysis, so the results are going to be completely useless to anyone wanting to genuinely advance the use of LLMs for programming.
I agree that he probably has a lot of fun. What he's doing is an equivalent of throwing a hand grenade into a crowd and enjoying the chaos of it all - he's set in life, can comfortably retire while the rest of the industry tries to deal with that hand grenade. Where some people are fighting to get the safety pin out while others are trying to stop them.