Welcome to Gas Town

(steve-yegge.medium.com)

26 points | by gmays 21 hours ago ago

5 comments

qcnguy 8 hours ago

This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.

Gas Town is clearly the same thing multiplied by ten thousand. The number of overlapping and adhoc concepts in this design is overwhelming. Steve is ahead of his time but we aren't going to end up using this stuff. Instead a few of the core insights will get incorporated into other agents in a simpler but no less effective way.

And anyway the big problem is accountability. The reason everyone makes a face when Steve preaches agent orchestration is that he must be in an unusual social situation. Gas Town sounds fun if you are accountable to nobody: not for code quality, design coherence or inferencing costs. The rest of us are accountable for at least the first two and even in corporate scenarios where there is a blank check for tokens, that can't last. So the bottleneck is going to be how fast humans can review code and agree to take responsibility for it. Meaning, if it's crap code with embarrassing bugs then that goes on your EOY perf review. Lots of parallel agents can't solve that fundamental bottleneck.

[-]

ac29 2 hours ago

>This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.

Yeah this describes my feeling on beads too. I actually really like the idea - a lightweight task/issue tracker integrated with a coding agent does seem more useful than a pile of markdown todos/plans/etc. But it just doesnt work that well. Its really buggy and the bugs seem to confuse the agent since it was given instructions to do things a certain way that dont work consistently.

mccoyb 20 hours ago

The article seems to be about fun, which I'm all for, and I highly appreciate the usage of MAKER as an evaluation task (finally, people are actually evaluating their theories on something quantitative) but the messaging here seems inherently contradictory:

> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.

Then:

> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.

I see -- so where exactly is my focus supposed to sit?

As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.

For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.

On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).

Finally:

> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.

This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.

[-]

iamwil 20 hours ago

> For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C. On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

Can you talk more about the structure of your workflow and how you evolved it to be that?

[-]

mccoyb 19 hours ago

I've tried most of the agentic "let it rip" tools. Quickly I realized that GPT 5~ was significantly better at reasoning and more exhaustive than Claude Code (Opus, RL finetuned for Claude Code).

"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.

I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.

I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle

It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.

In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.

One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.

I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.