One of the biggest productivity improvements I've had as a developer was to make a habit of planning all my work upfront. Specifically, when I pick up a ticket, I break it down into a big bullet point list of TODOs. Doing it this way leads to a better design, to dealing with inter-ticket dependencies upfront, to clarifying the spec upfront (which yes, is part of your job as a senior developer), and most valuable of all it allows me to get into flow state much more regularly when I am programming.
Its not a surprise to me that this approach also helps AI coding agents to work more effectively, as in-depth planning is essentially moving the thinking upfront.
I do this process for myself but I also make it a ritual with the juniors and mids I mentor. We'll sit with a new ticket together and do a little brainstorming session. This typically helps me give them a higher-level view of how the specific unit of work fits into the overall ecosystem and provides a lot of clarity that is often sorely lacking in ticket descriptions. Ultimately it'll culminate in a small series of bullet points that describe our intention for the implementation. And I'm very deliberate about the words I choose. Intentions are just that and they will often not survive first contact with the enemy. That's totally fine and when that happens we typically discuss it asynchronously or a quick chat after standup. It's a collaborative process but it has a lot of autonomy baked in.
In my work notes (a giant Markdown file, e.g. 2025.md), my process is extremely similar to the OP, all the way down to the serialized checklist of items being marked off once they're completed.
Nobody who delivers any system professionally thinks it’s a bad thing to plan out and codify every piece of the problem you’re trying to solve.
That’s part of what waterfall advocates for. Write a spec, and decompose to tasks until you can implement each piece in code.
Where the model breaks - and what software developers rightly hate - is unnecessarily rigid specifications.
If your project’s acceptance criteria are bound by a spec that has tasked you with the impossible, while simultaneously being impossible to change, then you, the dev, are screwed. This is doubly true in cases where you might not get to implementing the spec until months after the spec has been written - in which case, the spec has calcified into something immutable in stakeholders’ minds.
Agile is frequently used by weak product people and lousy project managers as an excuse to “figure it out when we get there”. It puts off any kind of strategic planning or decision making until the last possible second.
I’ve lost track of the number of times that this has caused rework in projects I’ve worked on.
>That’s part of what waterfall advocates for. Write a spec, and decompose to tasks until you can implement each piece in code.
That's what agile advocates for too. The difference is purely in how much spec you write before you start implementing.
Waterfall says specify the whole milestone up front before developing. Agile says create the minimum viable spec before implementing and then getting back to iterating on the spec again straight after putting it into a customer's hands.
Waterfall doesnt really get a bad rap it doesnt deserve. The longer those feedback loops are the more scope you have for fucking up and not dealing with it quickly enough.
I don’t think this whole distinction between waterfall and agile really exists. They are more like caricatures of what really happens. You have always had leader who could guide a project in a reasonable way, plan as much as necessary, respond to changes and keep everything on track. And you have people who did the opposite. There are plenty of agile teams that refuse to respond to changes because “the sprint is already planned” which then causes other teams to get stuck waiting for the changes they need. or you have the next 8 sprints planned out in detail with no way to make changes.
In the end you there is project management that can keep a project on track while also being able to adapt to change and others that aren’t able to do so and choose to hide behind some bureaucratic process. Has always existed and will keep existing no matter how you call it.
I wrote this, a few years ago, about being careful to avoid "concrete galoshes"[0].
I've found that it's a balancing act; like so many things in software development. We can't rush in, willy-nilly, but it's also possible to kill the project by spending too much time, preparing (think "The Anal-Retentive Chef" skits, from Saturday Night Live).
Also, I have found that "It Depends" is an excellent mantra for life, in general, and software development, in specific.
I think having LLM-managed specs might be a good idea, as it reduces the overhead required to maintain them.
> One of the biggest productivity improvements I've had as a developer was to make a habit of planning all my work upfront. Specifically, when I pick up a ticket, I break it down into a big bullet point list of TODOs.
How an individual developer chooses to process a single ticket is completely unrelated to agile or waterfall. Agile is about structuring your work over a project so that you get a happy customer who ends up with what they actually needed and not what they thought they wanted when they signed the contract and that turned out to be completely not what they needed after two months.
Just to further clarify what I said above: What I talk about here specifically is for the developer who picks up a ticket to flesh out and plan the ticket before diving into code.
This doesn't say anything about what is appropriate for larger project planning. I don't have much experience doing project planning, so I'd look to others for opinions on that.
This is how I approach my stories as well. I used to call this “action plan” way before it became fashionable with the rise of AI agents.
It helps me not only reduce the complexity into more manageable chunks but go back to business team for smoothening the rough edges which would otherwise require a rework after review.
The article itself advocates that one should "split complex requirements into multiple simple ones." So I don't disagree here, at least I don't think I do.
If we have a differing interpretation of what the article is motivating for, then please take the opportunity to contemplate an additional perspective and let it enrich your own.
There are two extremes, having everything you do planned up front, and literally planning nothing and just doing stuff.
The power of agile is supposed to be "I don't need to figure this out now, I'll figure it out based on experimentation" which doesn't mean nothing at all is planned.
If you're not planning a mission to Jupiter, you don't need every step planned out before you start. But in broad strokes it's also good to have a plan.
The optimum is to have some recorded shape of the work to come but to give yourself space to change your mind based on your experiences and to plan the work so you can change the plan.
The backlash against waterfall is the result of coming up with very detailed plans before you start, having those plans change constantly during the whole project requiring you to throw away large amounts of completed work, and when you find things that need to change, not being able to because management has decided on The Plan (which they will decide something new on later, but you can't change a thing).
For some decisions, the best time to plan is up front, for other decisions the best time to design is while you're implementing. There's a balance and these things need to be understood by everybody, but they are generally not.
I like to split out exploration and discovery (research) as a third step (the first step) in the process. Before a plan can be devised research needs to be conducted. The more time between research, planning, and execution increases the likelihood of rework or failure.
The best time to plan is dependent on how stable/unstable the environment is.
I vibe coded for months but switched to spec driven development in the last 6 months
I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc
My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.
I don't see how using a spec has made me less agile. My iteration takes 8 hours.
However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.
People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.
People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.
I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.
I'll probably be proven wrong eventually, but my main thought about spec driven dev with llms is that it introduces an unreliable compiler. It will produced different results every time it is run and it's up to the developer to review the changes. Which just seems like a laborious error prone task.
No, this is the right take. Spec driven development is good, but having loose markdown "specs" that leave a bunch up to the discretion of the LLM is bad. The right approach is a project spec DSL that agents write, which can be compiled via codegen in a more controlled way.
Why would you want to rerun it? In that context a human is also an unreliable compiler. Put two humans on the task and you will get two different results. Even putting the same human on the same task again will yield something different. LLMs producing unreliable output that can't be reproduced is definitely a problem but not in this case.
Might be misunderstanding the workflow here, but I think if a change request comes and I alter the spec, I'd need to re run the llm bit that generates the code?
You'd want to have the alteration reference existing guides to the current implementation.
I haven't jumped in headfirst to the "AI revolution", but I have been systematically evaluating the tooling against various use cases.
The approach that tends to have the best result for me combines a collection of `RFI` (request for implementation) markdown documents to describe the work to be done, as well as "guide" documents.
The guide documents need to keep getting updated as the code changes. I do this manually but probably the more enthusiastic AI workflow users would make this an automated part of their AI workflow.
It's important to keep the guides brief. If they get too long they eat context for no good reason. When LLMs write for humans, they tend to be very descriptive. When generating the guide documents, I always add an instruction to tell the LLM to "be succinct and terse", followed by "don't be verbose". This makes the guides into valuable high-density context documents.
The RFIs are then used in a process. For complex problems, I first get the LLM to generate a design doc, then an implementation plan from that design doc, then finally I ask it to implement it while referencing the RFI, design doc, impl doc, and relevant guide docs as context.
If you're altering the spec, you wouldn't ask it to regen from scratch, but use the guide documents to compute the changes needed to implement the alteration.
Hm, maybe it's me who misunderstands the workflow. In that case I agree with you.
That said, I think the non-determinism when rerunning a coding task is actually pretty useful when you're trying to brainstorm solutions. I quite often rerun the same prompt multiple times (with slight modifications or using different models) and write down the implementation details that I like before writing the final prompt. When I'm not happy with the throwaway solutions at all I reconsider the overall specification.
However, the same non-determinism has also made me "lose" a solution that I threw out and where the real prompt actually performed worse. So nowadays I try to make it a habit to stash the throwaway solutions just in case. There's probably something in Cursor where you can dig out things you backtracked on but I'm not a power user.
You would need to rerun the LLM, but you wouldn't necessarily need to rebuild the codebase from scratch.
You can provide the existing spec, the new spec, and the existing codebase all as context, then have the LLM modify the codebase according to the updates to the spec.
Humans are unreliable compilers but good devs are able to "think outside of the box" in terms of using creative ways to protect against their human foibles while LLMs cant.
If I get a nonsensical requirement i push back. If i see some risky code i will think of some way to make it less risky.
You don't need this type of work to be deterministic. It doesn't really matter if the LLM names a function "IsEven" vs "IsNumberEvent".
Have you ever written the EXACT same code twice?
> it introduces an unreliable compiler.
So then by definition so our humans. If compiling is "taking text and converting it to code" that's literally us.
> it's up to the developer to review the changes. Which just seems like a laborious error prone task.
There are trade-offs to everything. Have you ever worked with an off-shore team? They tend to produce worse code and have 1% of the context the LLM does. I'd much rather review LLM-written code than "I'm not even the person you hired because we're scamming the system" developers.
You want it to be as close to deterministic as possible to reduce the risk of the LLM doing something crazy like deleting a feature or functionality. Sure, the idea is for reviews to catch it but it's easier to miss there when there is a lot of noise. I agree that it's very similar to an offshore team that's just focused on cranking out code versus caring about what it does.
People defer thinking about what correct and incorrect actually
looks like for a whole wide scope of scenarios and instead choose
to discover through trial and error.
LLMs are _still_ terrible at deriving even the simplest of logical
entailment. I've had the latest and greatest Claude and GPT derive 'B
instead of '(not B) from '(and A (not B)) when 'A and 'B are anything
but the simplest of English sentences.
I shudder to think what they decide the correct interpretations of a
spec written in prose is.
Seems like you are all just redefining what spec and waterfall means.
A spec was from a customer where it would detail every feature. They would be huge, but usually lack enough detail or be ambiguous. They would be signed off by the customer and then you'd deliver to the spec.
It would contain months, if not years, worth of work. Then after all this work the end product would not meet the actual customer needs.
A day's work is not a spec. It's a ticket's worth of work, which is agile.
Agile is an iterative process where you deliver small chunks of work and the customer course corrects as regular intervals. Commonly 3/4 week sprints, made up of many tickets that take hours or days, per course correct.
Generally each sprint had a spec, and each ticket had a spec. But it sounds like until now you've just been winging it, with vague definitions per feature. It's very common, especially where the PO or PM are bad at their job. Or the developer is informally acting as PO.
Now you're making specs per ticket, you're just now doing what many development teams already do. You're just bizarrely calling it a new process.
It's like watching someone point at a bicycle and insist it's a rocketship.
A customer generally provides requirements (the system should do...) which are translated into a spec (the module/function/method should do...). The set of specs map to requirements. Requirements may be derived from or represented by user stories and specs may or may not by developed in an agile way or written down ahead of time. Whether you have or derive requirements and specs is entirely orthogonal to development methodology. People need to get away from the idea that having specs is any more than a formal description of what the code should do.
The approach we take is the specs are developed from the tests and tests exercise the spec point in its entirety. That is, a test and a spec are semantically synonymous within the code base. Any interesting thing we're playing with is using the specs alongside the signatures to have an LLM determine when the spec is incomplete.
A spec consists of three different kinds of requirements: functional requirements, non-functional requirements, and constraints. It’s supposed to fully describe how the product responds to the context and the desires of stakeholders.
The problem I see a lot with Agile is that people over-focus on functional requirements in the form of user stories. Which in your case would be statements like “X should do…”
Same. I fancy myself a decent technical communicator and architect. I write specs which consists of giant lists of acceptance criteria, on my phone, laying in bed...
Kick that over to some agents to bash on, check in and review here and there, maybe a little mix of vibe and careful corrections by me, and it's done!
Usually in less time, but! any time an agent is working on work shit, Im working on my race car... so its a win win win to me. Im still using my brain, no longer slogging through awful "human centered" programming languages, more time my hobbies.
Isn't that the dream?
Now, to crack this research around generative gibber-lang programming... 90% of our generative code problems are
related to the programming languages themselves. Intended for humans, optimized for human interaction, speed, and parsing. Let the AIs design, speak, write, and run the code. All I care about is that the program passes my tests and does what I intended. I do not care if it has indents, or other stupid dogmatic aspects of what makes one language equally usable to any other, but no "my programming language is better!", who cares. Loving this era.
A lot of software process exists to solve a specific problem IMO:
Devs get married to their first implementation; Stakeholders don’t tolerate rework
If companies and individuals could throw more away, then we wouldn’t need to obsess over planning. The “spec” and “design” would get discovered through doing. I’ve never worked anywhere where a long up front design addressed the important design issues. Those get discovered after you’ve tried to implement a solution a few times and failed.
If we say throwing away as a feature rather than a bug, we’d probably work more efficiently.
This article if for those who already made up their mind that "spec-based-development" isn't for them.
I believe (and practice) that spec-based development is one of the future methodologis for developing projects with LLMs. At least it will be one of the niches.
Author thinks about specs as waterfalls. I think about them as a context entrypoint for LLMs. Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc) LLM will be able to build sufficient initial context for the solution to expand it by reading files and grepping text. And the most interesting is that you can make LLM to keep the context/spec/projetc file updated each time LLM updates the project. Viola: now you are in agile again: just keep iterating on the context/spec/project
This is the key, with test driven dev sprinkled in.
You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.
Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.
The tests provide context and documentation for future LLM runs.
This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.
Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.
I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.
I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.
> Once specs are captured as tests, the LLM can no longer hallucinate.
Tests are not a correctness proof. I can’t trust LLMs to correctly reason about their code, and tests are merely a sanity check, they can’t verify that the code was correctly reasoned.
> You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
Except when it decides to remove all the tests, change their meaning to make them pass or write something not in the spec.
Hallucinations are not a problem of the input given, it’s in the foundations of LLMs and so far nobody have solved it. Thinking it won’t happen can and will have really bad outcomes.
It doesn't matter because use of version control is mandatory. When you see things missing or bypassed, audit-instructed LLMs detect these issues and roll-back changes.
I like to keep domains with their own isolated workspaces and git repos. I am not there yet, but I plan on making a sort of local-first gitflow where agents have to pull the codebase, make a new branch, make changes, and submit pull requests to the main codebase.
I would ultimately like to make this a oneliner for agents, where new agents are sandboxed with specific tools and permissions cloning the main codebase.
Fresh-context agents then can function as code reviewers, with escalation to higher tier agents (higher tier = higher token count = more expensive to run) as needed.
In my experience, with correct prompting, LLMs will self-correct when exposed to auditors.
If mistakes do make it through, it is all version controlled, so rolling back isn't hard.
This is the right flow. As agents get better, work will move from devs orchestrating in ides/tuis to reactive, event driven orchestration surfaced in VCS with developers on the loop. It cuts out the middleman and lets teams collaboratively orchestrate and steer.
But do you understand the problem and its context well enough to write tests for the solution?
Take prolog and logic programming. It's all about describing the problem and its context and let the solver find the solution. Try writing your specs in pseudo-prolog code and you will be surprised with all the missing information you're leaving up to chance.
My objective is to write prompts for LLMs that can write prompts for LLMs that can write code.
When there is a problem downstream the descendant hierarchy, it is a failure of parent LLM's prompts, so I correct it at the highest level and allow it to trickle down.
This eventually resolves into a stable configuration with domain expertise towards whatever function I require, in whatever language is best suited for the task.
If I have to write tests manually, I have already failed. It doesn't matter how skilled I am at coding or capable I am at testing. It is irrelevant. Everything that can be automated should be automated, because it is a force amplifier.
> Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc)
What's not waterfall about this is lost on me.
Sounds to me like you're arguing waterfall is fine if each full run is fast/cheap enough, which could happen with LLMs and simple enough projects. [0]
Agile was offering incremental spec production , which had the tremendous advantage of accumulating knowledge incrementally as well. It might not be a good fit for LLMs, but revising the definition to make it fit doesn't help IMHO.
[0] Reminds me that reducing the project scopes to smaller runs was also a well established way to make waterfall bearable.
Waterfall with short iteration time is not possible by definition.
You might as well say agile is still waterfall, what are sprints if not waterfall with a 2 week iteration time. And Kanbal is just a collection of indepent waterfalls... It's not a useful definition of waterfall.
Just as most agile projects aren't Agile, most waterfall projects weren't strict Waterfall as it was preached.
That being said, when for instance you had a project that should take 2 years and involve a dozen team, you'd try to cut it in 3 or 4 phases, to even if it would only be "released" and fully tested at the end of it all. At least if your goal was to have it see the light in a reasonable time frame.
Where I worked we also did integration runs at given checkpoints to be able to iron out issues earlier in the process.
PS: on agile, the main specificity I'm seeing is the ability to infinitely extend a project as the scope and specs are typically set on the go. Which is a feature if you're a contractor for a project. you can't do that with waterfall.
Most shops have a mix of pre-planning and on-the go specing to get a realistic process.
> Waterfall with short iteration time is not possible by definition.
What definition would that be?
Regardless, at this point it's all semantics. What I care about is how you do stuff, not the label you assign and in my book writing specs to ground the LLM is a good idea. And I don't even like specs, but in this instance, it works.
Exactly. There is a spec, but there is no waterfall required to work and maintain it. Author from the article dismissed spec-based development exactly because they saw resemblance with waterfall. But waterfall isn't required for spec-centric development.
> There is a spec, but there is no waterfall required to work and maintain it.
The problem with waterfall is not that you have to maintain the spec, but that a spec is the wrong way to build a solution. So, it doesn't matter if the spec is written by humans or by LLMs.
I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself. A simple readme or a design document, which already should exist for humans, should be enough.
The thing is that the parallels you are drawing is for things that is very explicitly not the source of the code, but exists alongside it. Code is the ultimate truth. Documentation is a more humane way to describe it. Tests are there to ensure that what is there is what we want. And linters are there to warn us of specific errors. None of these create code.
To go from spec to code requires a lot of decisions (each introducing technical debt). Automating the process remove control over those decisions and over the ultimate truth that is the code. But why can't the LLM retains the trace of the decisions so that it presents control point to alter the results. Instead, it's always a rewrite from scratch.
The downfall of Waterfall is that there are too many unproven assumptions in too long of a design cycle. You don't get to find out where you were wrong until testing.
If you break a waterfall project into multiple, smaller, iterative Waterfall processes (a sprint-like iteration), and limit the scope of each, you start to realize some of the benefits of Agile while providing a rich context for directing LLM use during development.
Comparing this to agile is missing the point a bit. The goal isn't to replace agile, it's to find a way that brings context and structure to vibe coding to keep the LLM focused.
"rapid, iterative Waterfall" is a contradiction. Waterfall means only one iteration. If you change the spec after implementation has started, then it's not waterfall. You can't change the requirements, you can't iterate.
Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
> Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
If only this were accurate. Royce's chart (at the beginning of the paper, what became Waterfall, but not what he recommended by the end of the paper) has been adopted by the DOD. They're slowly moving away from it, but it's used on many real-world projects and fails about as spectacularly as you'd expect. If projects deliver on-time, it's because they blow up their budget and have people work long days and weekends for months or years at a time. If it delivers on budget, it's because they deliver late or cut out features. Either way, the pretty plan put into the presentations is not met.
People really do (and did) think that the chart Royce started with was a good idea, they're not competent, but somehow they got into positions in management to force this stupidity.
I would maybe argue that there is a sweet spot of how much you feed in (with some variability depending on task). I tend to keep my initial instructions succinct, then build them up iteratively. Others write small novels of instructions before they start, which personally don't like as much. I don't always know what I don't know, so speccing ahead in great detail can sometimes be detrimental.
Agree. I don't use term "spec" as it was with "spec-based development" before llms. There details were required to be defined upfront. With LLMs you can start with vague spec, missing some sections and clarify it with iterations.
Sweet spot will be a moving target. LLMs build-in assumptions, ways to expand concepts will be chaning with LLMs development. So best practices will change with change of the LLMs capabilities. The same set of instructions, not too detailed, were so much better handled by sonnet 4 than sonnet 3 in my experience. Sonnet 3.5 was for me a breaking point which showed that context-based llm development is a feasible strategy.
You're right that this is the future, but I believe the thread is misdiagnosing the core 'system error'.
The frustration thomascountz describes (tweaking, refining, reshaping) isn't a failure of methodology (SDD vs. Iteration). It's 'cognitive overload' from applying a deterministic mental model to a probabilistic system.
With traditional code, the 'spec' is a blueprint for logic. With an LLM, the 'spec' is a protocol for alignment.
The 'bug' is no longer a logical flaw. It's a statistical deviation. We are no longer debugging the code; we are debugging the spec itself. The LLM is the system executing that spec.
This requires a fundamental shift in our own 'mental OS'—from 'software engineer' to 'cognitive systems architect'.
I know enough about Machine Learning and statistics to understand that errors are always there. It just needs to be small enough to not matter in the decisions that need to be taken (hopefully). But the thing is that computers can't differentiate errors from correct behavior. Anything in the code is true and if the result is catastrophic, so be it.
As software engineers, it's very often easy to specify what the system should do. But ensuring that it doesn't do what he shouldn't do is the tiresome part of the job. And most tools we created is to ensure the latter.
I could not have said it better. We're on the same page with you.
I would add that to my opinion if previously code production/management was a limiting factor in software development, today it's not. The conceptualisation (onthology, methodology) of the framework (spec-centric devlopment) for the system production and maintenance (code, artifacts, running system) becomes a new limiting factor. But it's matter of time we'll figure out 2-3 methodologies (like it happened with the agile's scrum/kanban) which will become a new "baseline". We're at the early stages when new "laws of llm development" (as in "laws of physics") is still being figured out.
I would simply replace LLM by agent in your reasoning, in the sense that you'll need a strong preprocessing step and multiple iterations to exploit such complete specs.
There is sense in your words. Especially in the context of the modern day vocabulary.
I though about the concept of this ort of methodology before "agent" (which I would define as "sideeffects with LLM integration") was marketed into community vocabulary. And I'm still rigidly sticking to what I consider "basics". Hope that does not impede understanding.
I had a small embedded project and I did it > 70% using LLM's. This is exactly how I did it. Specs are great for grounding the LLM. Coding with LLM's is going to mean relying more on process since you can't fully trust them. It means writing specs, writing small models to validate, writing tests and a lot of code review to understand what the heck it's doing.
I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.
After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.
Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.
The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.
I wouldn't call that accidental complexity? It's just a set of preferences.
Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.
The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?
Also remember this was a simple example; once we get to real business logic efficiencies grow.
It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).
Something boring and standard is something that keeps going with minimal intervention while getting better each time.
I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.
I'm struggling to see what you'd choose to do differently here?
Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!
I think the challenge is how to create a small but evolvable spec.
What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?
Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.
Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.
It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!
I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.
In my experience, spending 20–30 minutes writing a good spec results in code that is about 90% close to what I expected, which reduces the back-and-forth with the tool. It also helps me clarify and define with some level of precision what I actually want. During the specification phase, I can iterate until the design proposed by the tool is close to what I envision, reducing the number of surprises when the tool generates code. It’s not perfect, and there are still details the tool misses that require additional prompts, but overall I can get good results in a single session, whereas before I would exhaust the tokens and need to start a new session again.
My best "AI win" so far was in an area where I had to create a number of things that all followed a similar pattern. I created one hand-crafted example and created a general spec and specific ones for each component. It worked really well and I was, for a moment, experiencing a 10-30X productivity boost while having resultant code that I could review quickly and understand. It was also more consistent that I think I would have gotten from hand coding as it is easy to drift a little in terms of style and decisions.
Of course, this is all very situational and based on the problem being solved at the time. The risk with "practices" is they are generally not concerned with problem being solved and insist on applying the same template regardless.
It seems to me that most people (myself included) never experienced the actual Waterfall elsewhere than in school curriculum descriptions.
It's a bit funny to see people describe a spec written in days (hours) and iterations lasting multiple weeks as "waterfall".
But these days I've already had people argue that barely stopping to think about a problem before starting to prompt a solution is "too tedious of a process".
To add to this: I've worked on projects that came out of waterfall process, and on projects that came out of too hasty agile iterations.
They both have issues but they are very different. A waterfall project would have inscrutable structure and a large amount of "open doors" just in case a need of an extension at some place would materialize. Paradoxically this makes the code difficult to extend and debug because of overdone abstractions.
Hasty agile code has too many TODOs with "put this hardcoded value in a parameter". It is usually easier to add small features but when coming to a major design flaw it can be easier to throw everything out.
For UI code, AI seems to heavily tend towards the latter.
I did professional waterfall development and SDD is exactly waterfall. The problem with waterfall was never the time that it took, it is that the spec locks you into a small niche and iterative changes force enormous complexity to keep spec and code consistent.
The problems with waterfall come when much of the project is done and then you discover that your spec doesn't quite work, but the changes to your spec require half the requirements to subtly change, so that it can work at all. But then these subtle changes need to be reflected in code everywhere. Do this a couple of times (with LLM and without) and now your code and spec only superficially look like one another.
> the problem with waterfall wasn't the detailed spec
The detailed spec is exactly the problem with the waterfall development. The spec presumes that it is the solution, whereas Agile says “Heck, we don't even understand our problem well, let alone understanding a solution to it.”
Beginning with a detailed spec fast with an LLM already puts you into a complex solution space, which is difficult to navigate compared to a simpler solution space. Regardless of the iteration speed, waterfall is the method that puts you into a complex space. Agile is the one you begin with smaller spaces to arrive at a solution.
> How can you even develop something if you don’t have a clear idea what you’re building?
But, the statement "we don't even understand our problem well" is typically correct. In most cases where new software is started, the problem isn't well-defined, amenable to off-the-shelf solutions. And you will never know as little about the problem as you do on day one. Your knowledge will only grow.
It is more useful to acknowledge this reality and develop coping strategies than to persist in denial of it. At the time that the agile manifesto was written, the failure of "big up-front design" was becoming plainly evident. You think that you know the whole spec, and then it meets reality much as the Titanic met an iceberg.
Agile does not say "no design, no idea", it points out things that are more valuable than doomed attempts at "100% complete design and all the ideas before implementation". e.g. "while there is value in (comprehensive documentation, following a plan), we value (Working software, Responding to change) more. (see https://agilemanifesto.org/ )
In other words, start by doing enough design, and then some working software to flush out the flawed thinking in the design. And then iterate with feedback.
You have am idea but typically you neither have a complete understanding nor a detailed view of the solution, and of course things tend to chanhe over time.
That's the key benefit of starting small and of iterating: it allows you to learn and to improve. You don't learn anything about your problem amd solution by writing a comprehensive design spec upfront.
The delay is just irrelevant. It has nothing to do with it working ot not.
>b) no (cheap) way to iterate or deliver outside the spec
You could always do this in a waterfall project. Just make whatever changes to the code and ship. The problem is the same for SDD, as soon as you want quick changes you have to abandon the spec. Iterating the spec and the code quickly is impossible for any kind of significant complex project.
Either the spec contains sufficient details to make implementation feasible and iteration times become long and the process of any change becomes tedious and complex or the spec is insufficient in describing the complexity of the project, which makes it insufficient to guide an LLM adequately.
There is a fundamental contradiction here, which LLMs can not resolve. People like SDD, for a exactly the reason Managers like waterfall.
The detailed design spec is an issue hence Agile's "working code over comprehensive documentation". Your two points are consequences of this.
"Heavy documentation before coding" (article) is essentially a bad practice that Agile identified and proposed a remedy to.
Now the article is really about AI-driven development im which the AI agent is a "code monkey" that must be told precisely what to do. I think the interesting thing here will be do find the right balance... IMHO this works best when using LLMs only for small bits at a time instead of trying to specify the whole feature or product.
The Agile manifesto states exactly what I wrote. It's not that comprehensive documentation isn't valuable, it's that working software is more valuable.
In addition, the big issue is when the comprehensive documentation is written first (as in waterfall) because it delays working software and feedback on how well the design works. Bluntly, this does not work.
That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code (who will do maintenance?)
The article seems to miss many of those points.
IMHO a good start is to have the LLM prompt be a few lines at most and generate about 100 lines of code so you can read it and understand it quickly, tweak it, use it, repeat. Not even convinced you need to keep a record of the prompt at all.
> That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code
REPL development and Live programming is similar to that. But when something works, it stays working. Even with the Edit-Compile-Run cycle, you can be very fast if the cycle is short enough (seconds). I see people going all in with LLMs (and wishing for very powerful machines) while ignoring other tools that could give better return on a 5 year old laptop.
Yeah but can't we expect bureaucratic companies to adopt such a methodology exactly like that: write a spec for years, run the LLM agent every 6 months, blame techs for the bad result, iterate, and also forbid coding outside of the spec.
I’ve done spec driven development using a bunch of markdown files. That works fine but what I have found really works is using beads: https://github.com/steveyegge/beads
I’m letting the agent help me draft the specs anyway and I found that the agent is a lot more focused when it can traverse a task tree using beads.
It’s the one spec or planning tool that I find really helps get things done without a bunch of intervention.
Another technique I employ is I require each task to be TDD. So every feature has two tasks: write tests that fail, implement feature and don’t notify me until tests complete. Then I ask the agent to tell me how to review the task and require I review every task before moving to the next one. I love this process because the agent tells me exactly what commands to run to review the task. Then I do a code review and ask it questions. Reading agent code is exhausting so I try to make the tasks as discrete and minimal as possible.
These are simple techniques that humans employ during development and I find it worked very well.
There are also times when I need to write some docs to help me better understand the problem and I usually just dump those in a specs folder.
I think spec-kit is an interesting idea but too heavy handed. Just use beads and you’ll see what I mean.
Another technique I employed for a fully vibed tool (https://github.com/neurosnap/zmx) is to have the agent get as far as possible in a project and then I completely rewrite it using the agent code purely as a reference.
Bit of a tangent, but this reminds me of a video[1] I watched a bit ago where there was someone who interviewed 20 or so people who were both engineers and programmers, and asked them what the two fields could learn from each other. One of the things it mentioned from the perspective of a physically-based engineer is that a little more up-front planning can make a big difference, and that's stuck with me ever since.
This is a weird article. How many times in your career have you been handed a grossly under-specified feature and had to muddle your way through, asking relevant people along the way and still being told at the end that it’s wrong?
This is exactly the same thing but for AIs. The user might think that the AI got it wrong, except the spec was under-specified and it had to make choices to fill in the gaps, just like a human would.
It’s all well and good if you don’t actually know what you want and you’re using the AI to explore possibilities, but if you already have a firm idea of what you want, just tell it in detail.
Maybe the article is actually about bad specs? It does seem to venture into that territory, but that isn’t the main thrust.
Overall I think this is just a part of the cottage industry that’s sprung up around agile, and an argument for that industry to stay relevant in the age of AI coding, without being well supported by anything.
I sometimes wonder how many comments here are driving a pro AI narrative. This very much seems like one of those:
The agent here is:
Look on HN for AI skeptical posts. Then write a comment that highlights how the human got it wrong. And command your other AI agents to up vote that reply.
It has nothing to do with AI, the article is just plain wrong. You have to be either extremely dumb, extremely inexperienced or only working solo to not understand this.
All those small micro decisions, discussions, and dead ends can be recorded and captured by the AI. If you do something that doesn’t make sense given past choices, it can ask you.
Gradually, over time, it can gather more and more data that only lives in your brain at the time you’re building. It’s only partially captured by git commits but mostly lost to time.
Now, when you change code, the system can say, “Jim wrote that 5 years ago for this reason. Is the reason not valid anymore?”. You might get this on a good code review, but probably not. And definitely not if Jim left 2 years ago.
That's why in my workflow I don't write single monster specs. Rather, I work with the LLM to iterate on small, individual, highly constrained specs that provide useful context for what/why/how -- stories, if you will -- that include a small set of critical requirements and related context -- the criteria by which you might "accept" the work -- and then I build up a queue of those "stories" that form a, you might say, backlog of work that I then iterate with the LLM to implement.
I then organize that backlog so that I can front-load uncovering unknowns while delivering high-value features first.
This isn't rocket science.
By far the biggest challenge I experience is compounding error during those iterative cycles creating brittleness, code duplication, and generally bad architecture/design. Finding ways to incorporate key context or other hints in those individual work items is something I'm still sorting out.
There’s nothing keeping you from scoping the spec to an agile package of work. To the contrary: even if you start with a full spec for a multi-day-AI-coding-session, you are free to instruct it to follow agile principles. Just ask it to add checkpoints at which you want to be able to let users test a prototype, or where you want to revisit and discuss the plan.
I have similar feelings. I’m willing to believe there are scenarios where this kind of thing makes sense, maybe – a colleague has had great success on a small, predictable greenfield project with it. I don’t work on many of those. My main objections are that I’ve had plenty of success with LLMs without intermediate detailed specs (and I don’t think my failures would have been helped by them), and I just don’t like the idea of primarily reviewing specs. Some sort of plan or research document is a different matter - that’s fine. But the kind of code-like formalised spec thing? I want to look at code, it’s just easier. Plus, I’m going to be reviewing the code too (not doing so is irresponsible in my opinion), so having spec AND code is now double the text to read.
The part of the process the actually needs improving, in my experience in larger codebases, is the research phase, not the implementation. With good, even quite terse research, it’s easy to iterate on a good implementation and then probably take over to finish it off.
I really think LLMs and their agent systems should be kept in their place as tools, first and foremost. We’re still quite early in their development, and they’re still fundamentally unreliable, that I don’t think we should be re-working over-arching work practices around them.
I've found that SDD is actually what you need to be able to work with code bases when they go above around 100 000 lines of code. It's what unlocked getting LLMs to work well with large codebases for me.
I still don't get it, can you clarify? It's not the research phase that I'm disputing. Clearly for a large codebase, you need some good way to take all that information (code, product knowledge) and distill it down to something that can fit in the context, ready for implementation. And it's that research that is going to get harder the bigger the codebase. (My current experience is with a repo around 1.5 million lines.) I'm saying that the output of that research, in my experience, doesn't need to be anything like the detail of an exact spec. It can be a sort of one-to-two-pager Markdown doc, at most – and any further detail is much more ergonomic for me to iterate over in the form of code.
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
A point I like to make in discussions like this is that software and hardware specifications are very different. We think of software as the thing we're building. But it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Compared to what an architect does when they create a blueprint for a building, creating blueprints for software source code is not a thing.
What in waterfall is considered the design phase is the equivalent of an architect doing sketches, prototypes, and other stuff very early in the project. It's not creating the actual blue print. The building blue print is the equivalent of source code here. It's a complete plan for actually constructing the building down to every nut and bolt.
The big difference here is that building construction is not automated, costly, and risky. So architects try to get their blueprint to a level where they can minimize all of that cost and risk. And you only build the bridge once. So iterating is not really a thing either.
Software is very different; compiling and deploying is relatively cheap and risk free. And typically fully automated. All the effort and risk is contained in the specification process itself. Which is why iteration works.
Architects abandon their sketches and drafts after they've served their purpose. The same is true in waterfall development. The early designs (whiteboard, napking, UML, brainfart on a wiki, etc.) don't matter once the development kicks off. As iterations happen, they fall behind and they just don't matter. Many projects don't have a design phase at all.
The fallacy that software is imperfect as an engineering discipline because we are sloppy with our designs doesn't hold up once you realize that essentially all the effort goes into creating hyper detailed specifications, i.e. the source code.
Having design specifications for your specifications just isn't a thing. Not for buildings, not for software.
We could just stop calling it an engineering discipline. You've laid out plenty of reasons why it is nothing like an engineering discipline in most contexts where people write software.
Real software engineering does exist. It does so precisely in places where you can't risk trying it and seeing it fail, like control systems for things which could kill someone if they failed.
People get offended when you claim most software engineering isn't engineering. I am pretty certain I would quickly get bored if I was actually an engineer. Most real world non-software engineers don't even really get to build anything, they're just there to check designs/implementations for potential future problems.
Maybe there are also people in the software world who _do_ want to do real engineering and they are offended because of that. Who knows.
> it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Agree. My favourite description of software development is specification and translation - done iteratively.
Today, there are two primary phases:
1. Specification by a non-developer and the translation of that into code. The former is led by BAs/PMs etc and the output is feature specs/user stories/acceptance tests etc. The latter id done by developers: they translate the specs into code.
2. The resulting code is also, as you say, a spec. It gets translated into something the machine can run. This is automated by a compiler/interpreter (perhaps in multiple steps, e.g. when a VM is involved).
There have been several attempts over the years to automate the first step. COBOL was probably the first; since then we've had 4GLs, CASE tools, UML among others. They were all trying to close the gap: to take phase 1 specification closer to what non-developers can write - with the result automatically translated to working code.
Spec-driven development is another attempt at this. The translator (LLM) is quite different to previous efforts because it's non-deterministic. That brings some challenges but also offers opportunities to use input language that isn't constrained to be interpretable by conventional means (parsers implementing formal grammars).
We're in the early days of spec-driven. It may fail like its predecessors or it may not. But first order, there's nothing sacrosanct about the use of 3rd generation languages as the means to represent the specification. The pivotal challenge is whether translation from the starting specification can be reliably translated to working software.
Where I work we do (for high assurance software) systems specifications, systems design, software specifications and software design and ultimately source code.
That said, there is a bit of redundancy between software design and source code. We tend to rather get rid of the development of the latter than the former though, i.e. by having the source code be generated by some modelling tool.
I just feel that the community has learned nothing.
Opinions about "what works" being pushed as fact. No evidence, no attempt to create evidence (because it's hard). Enless commentary and opinion pieces, naive people being coached by believers into doing things that seem to work on specific examples.
If you have an example that worked for you it doesn't mean that it's a useful way to work for everyone else in every other situation.
I think I've seen enough of a trend: all these LLM ideas eventually get absorbed by the LLM provider and integrated. The OSS projects or companies with products eventually become irrelevant.
So they're more like 3rd party innovations to lobby LLM providers to integrate functionalities.
X prompting method/coding behaviors? Integrated. Media? Integrated. RAG? Integrated. Coding environment? Integrated. Agents? Integrated. Spec-driven development? It's definitely present, perhaps not as formal yet.
But specs are per feature, it’s just an up front discussion first like you’d have on may things rather than questions-> immediate code writing from a model.
Thank you for writing this. It was also my first impression after seeing not only spec driven dev, but agentic systems that try to mimic human roles and processes 1:1. It feels a bit like putting a saddle on an automobile so that it feels more familiar.
It's a nice observation that Spec-Driven Development essentially implements the waterfall model.
Personally, I tried SDD, consciously trying to like it, but gave up. I find writing specs much harder than writing code, especially when trying to express the finer points of a project. And of course, there is also that personal preference: I like writing code, much more than text. Yes, there are times where I shout "Do What I Mean, not what I say!", but these are mostly learning opportunities.
You can do SDD in waterfall or Agile. The interaction I’m having with users right now is that the feedback and iterations are happening within hours or days instead of the usual 1-2 week sprint length. In my case SDD is enabling my team to be hyper-agile.
I have no idea how to reconcile sdd and waterfall. With SDD you’re working per feature, right? Waterfall is speccing the entire project upfront and with a strong force against any changes as you go before a final delivery.
"Agile methodologies killed the specification document long ago. Do we really need to bring it back from the dead?"
Not at FAANG. Or at least not at Google where I was for 10 years. They were obsessed with big upfront PRDs and design docs, and they were key to getting promotion and recognition.
These days those kinds of documents -- which were laborious to produce, mostly boilerplate, and a pain to maintain, and often not really read by anybody other than promo committee -- could be produced easily by prompting an LLM.
Having drunk the well-spring of XP and agile early in my career, I found it continually frustrating. Actual development followed iterative practices, but not officially.
Developers spend most of their time reading long Markdown files, hunting for basic mistakes hidden in overly verbose, expert-sounding prose. It’s exhausting.
Those requirements exist regardless of whether you write them down in Markdown files or not. Spec-driven Development is just making what needs to be built explicit rather than assuming the whole team know what the code should do.
Practically every company already uses 'spec-driven development', just with incredibly vague specs in the form of poorly written Jira tickets. Developers like it because it gives them freedom to be creative in how they interpret what needs to be done, plus they don't need to plan things and their estimates can be total nonsense if they want, and Product Owners and BAs like it because it means they can blame shift to the dev team if something is missed by saying "We thought that was obvious!"
Every team should be capturing requirements at a level of detail that means they know how the code should work. That doesn't need to be done up front. You can iterate. Requirements are a thing that grow with a project. All that spec-driven development is doing is pushing teams to actually write them down.
You can get away with incredibly vague "specs" in tickets most of the time because of a shared understanding of the system and product – of the kind that an LLM won't have (without very careful and well-organised in-repo documentation). When it works, which in my experience is quite a lot, it's very efficient. Sometimes it fails, sure. Sometimes it's annoying because you actually do need more detail than is provided and have to ask for it in a conversation. In my experience that's a low cost because it happens much less often and isn't too laborious when it does.
But crucially, the details here are coming from the issue authors. Do you really think that issue authors are going to be reviewing LLM-generated specs? I don't think so. And so engineers will be the intermediary. If that's going to be me, I would rather mediate between the issue author, some kind of high-level plan, and code. Not the issue author, a high-level plan, code-like specs, and code. There is one extra layer in the latter that I don't see the value of.
> Developers like it because it gives them freedom to be creative in how they interpret what needs to be done, plus they don't need to plan things and their estimates can be total nonsense if they want
I like it because it moves me closer to the product, the thing actually being built. You seem to be asking to move the clock back to where there was a stricter division of labour. Maybe that's necessary in some industries, but none that I've worked in.
> Coding assistants are intimidating: instead of an IDE full of familiar menus and buttons, developers are left with a simple chat input. How can we ensure that the code is correct with so little guidance?
Isn't this was Kiro IDE is about? Spec-driven dev?
Most software developers are doomed to rediscover time and time again that 4 weeks of frantic developments save two hours of calmly thinking about the task.
Worked in a company where they’ve spent 6 months waiting for vendor to implement custom hardware solution instead of just optimizing software to run faster in a month, no joke.
I expected testing and verification to come before the release, so that you don't have any users yet was implied. As for when you have produced hardware, when it's only prototypes, it should still be cheap, hardware also needs multiple revisions, before all bugs are ruled out.
When that is not the case, working without a spec won't help either.
I was thinking more about my experience in corporate settings.
Hardware needs to be procured or implemented in the cloud - there's a lot of work on the architectures and costs early in projects so as to ensure that things will cost in. Changing that can invalidate business cases, and also can be very difficult due to architectural and security controls.
In terms of users, in corporates the user communities must be identified, trained, sometimes made redundant, sometimes given extra responsibilities. Once you have got this all lined up any changes become very hard because suddenly, like a ripple over a lake when a pebble is dropped in, everyone who's touched has a reason why they are going to miss targets (you are that reason) and therefore want 100% bonus (there is no money for 100% bonus for all).
In previous jobs I would have delighted in pointing out that if there are no users the system can't be funded!
I agree that working without a spec is madness, it's just not realistic in the real world either. People expect you to stand behind a commitment to deliver, they also want to know what they are paying for. However, things do change, both really (as in something happens and the system must now accomodate it) and also due to discovery (we didn't know, we couldn't have known, but now we know and must accomodate this knowledge). It's really important to factor this in, although perfect flexibility is infinitely expensive and completely unrealistic...
A bit of flex can be cheap, easy and a lifesaver though.
Much of the hype around SDD is really about developers never having experienced a real waterfall project.
Of course SDD/Waterfall helps the LLM/Outsourced labor to implement software in a predictable way. Waterfall was always a method to please Managers and in the case of SDD the manager is the user promoting the coding agent.
The problem with SDD/Waterfall is not the first part of the project. The problems come when you are deep into the project, your spec is a total mess and the tiniest feature you want to add requires extremely complex manipulation of the spec.
The success people are experiencing is the success managers have experienced at the beginning of their software projects. SDD will fail for the same reason Waterfall has failed. The constant increasing of complexity in the project, required to keep code and spec consistent can not be managed by LLM or human.
For myself, I found that having a work methodology similar to spec-driven development is much better than vibe coding. The agent makes less mistakes, it stays on the path and I have less issues to fix.
And while at it, I found out that using TDD also helps.
I, for one, welcome the fact that agile/scrum/daily standup/etc. rituals will be outdated. While they might be somehow useful in some software development projects, in the past 10 years it turned out to be a cult of lunatics who want to apply it to any engineering work, not just software, and think any other approach than that will result in bad outcomes and less productivity. Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Very valid. In the beginning all this was driven by developers. Then it was LinkedIn-ified and suddenly we had to deal with agile coaches. Essentially people with no tech qualifications play with developers as guineapigs without understanding the why.
Same is true for UX and DevOps, just create a bunch of positions based on some blog post, and congratulate your self on a job well done. Screwing over the developer (engineers) as usual. Even though they actually might be interested in those jobs.
This is the main problem with big tech informing industry decisions, they win because they make sure they understand what all of this means. For all other companies this just creates a mess and your mentioned frustration.
> Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Open office is the densest and cheapest office layout. That is the reason it exists and the reason it will persist. All other reasons are inferior.
Ah, another Dunning-kruger-bait post (someone who doesn't know what they're talking about who writes a blog post very confidently, and it's read and upvoted by people who also don't know what they're talking about). Very good.
The immediate pooh-poohing of Waterfall is the big tell here. If they don't give you an example of an actual Waterfall project they've worked on, or can't elucidate why it wasn't just that one project or organization that made Waterfall bad, they're likely parroting myths or a single anecdotal experience. And that bad experience was likely based on not understanding it to begin with (Waterfall in particular is the subject of many myths and lies). I've had terrible Agile experiences. Does that make Agile terrible?
In my experience, Agile has a tendency to succeed despite itself. Since you don't do planning, you just write one bit at a time. But of course eventually this doesn't work, so you spend more time rearchitecting, rewriting, and fixing things. But hey, look, you made something useful! ....it still isn't making the company any money yet, but it's a thing you can see, so everyone feels better. You can largely do this work by yourself, so you can move fast; until you need something controlled by someone else, at which point you show up at the 11th hour, and dump a demand on their desk that must be finished immediately. Often these recipients have no choice, because the organization needs the thing you're slapping together, and they're "being a blocker". And those recipients then can't accomplish what they need to, because they haven't been given any documentation to know what to do. Bugs, rushed deadlines (or worse, no deadlines), dead-cats-over-the-wall, wasted effort, dysfunction. Is this the only way to do Agile? Of course not. But it's easy for me to paint the entire model this way, based on my experience.
There does not exist a project management method which is inherently bad. I repeat: No formal project management method is bad. Methods are simply frameworks by which you organize and execute work. You will not be successful just because you used a framework. You still have to do the organizing and execute the work in a not-terrible-way. You have a lot of wiggle room about how things are done, and that is what determines the outcome. You can do things quickly or slow, skillfully or shoddily, half-assed or competently, individually or collaboratively. It's how you take each step, not which step you take.
As long as humans at at the reigns, it doesn't matter what method you use. The same project, with the same method, can either go to shit, or turn out great. The difference between the two is how you use the methods. You have to do the work, and do it well. In organizations, with other humans, that's often very difficult, because it means you depend on something outside of your control. So leadership, skill, and a collaborative, positive culture, are critical to getting things done.
You can't code without specifications, period. Specifications can have various forms but in ultimately define how your program should work.
The problem with what people call "Waterfall" is that there is an assumption that at some point you have a complete and correct spec and you code off of that.
A spec is never complete. Any methodology applied in a way that does not allow you to go back to revise and/or clarify specs will cause trouble. This was possible with waterfall and is more explicitly encouraged with various agile processes. How much it actually happens in practice differs regardless of how you name the methodology that you use.
This is a Tyranny of Structurelessness sort of thing. In that essay there is a structure in structureless organizations but it's implicit and unarticulated which makes it inscrutable, potentially more malleable (which could be good), but also constricting to those who don't know the actual structure when they try and accomplish things.
If you don't have explicit specifications (which don't have to be complete before starting to develop code), you still have specs, but they're unarticulated. They exist in the minds of the developers, managers, customers, and what you end up with is a confused mess. You have specs, but you don't necessarily know what they are, or you know something about them but you've failed to communicate them to others and they've failed to communicate with you.
And TTD would say "start with a failing test, it is an executable specification for the code that you will need to make it pass. Then make it pass, then refactor to make nice code that still passes all tests - red, green, refactor under tests"
One of the biggest productivity improvements I've had as a developer was to make a habit of planning all my work upfront. Specifically, when I pick up a ticket, I break it down into a big bullet point list of TODOs. Doing it this way leads to a better design, to dealing with inter-ticket dependencies upfront, to clarifying the spec upfront (which yes, is part of your job as a senior developer), and most valuable of all it allows me to get into flow state much more regularly when I am programming.
Its not a surprise to me that this approach also helps AI coding agents to work more effectively, as in-depth planning is essentially moving the thinking upfront.
(I wrote more about this here: https://liampulles.com/jira-tickets.html)
I do this process for myself but I also make it a ritual with the juniors and mids I mentor. We'll sit with a new ticket together and do a little brainstorming session. This typically helps me give them a higher-level view of how the specific unit of work fits into the overall ecosystem and provides a lot of clarity that is often sorely lacking in ticket descriptions. Ultimately it'll culminate in a small series of bullet points that describe our intention for the implementation. And I'm very deliberate about the words I choose. Intentions are just that and they will often not survive first contact with the enemy. That's totally fine and when that happens we typically discuss it asynchronously or a quick chat after standup. It's a collaborative process but it has a lot of autonomy baked in.
In my work notes (a giant Markdown file, e.g. 2025.md), my process is extremely similar to the OP, all the way down to the serialized checklist of items being marked off once they're completed.
Waterfall gets an unnecessarily bad rap.
Nobody who delivers any system professionally thinks it’s a bad thing to plan out and codify every piece of the problem you’re trying to solve.
That’s part of what waterfall advocates for. Write a spec, and decompose to tasks until you can implement each piece in code.
Where the model breaks - and what software developers rightly hate - is unnecessarily rigid specifications.
If your project’s acceptance criteria are bound by a spec that has tasked you with the impossible, while simultaneously being impossible to change, then you, the dev, are screwed. This is doubly true in cases where you might not get to implementing the spec until months after the spec has been written - in which case, the spec has calcified into something immutable in stakeholders’ minds.
Agile is frequently used by weak product people and lousy project managers as an excuse to “figure it out when we get there”. It puts off any kind of strategic planning or decision making until the last possible second.
I’ve lost track of the number of times that this has caused rework in projects I’ve worked on.
>That’s part of what waterfall advocates for. Write a spec, and decompose to tasks until you can implement each piece in code.
That's what agile advocates for too. The difference is purely in how much spec you write before you start implementing.
Waterfall says specify the whole milestone up front before developing. Agile says create the minimum viable spec before implementing and then getting back to iterating on the spec again straight after putting it into a customer's hands.
Waterfall doesnt really get a bad rap it doesnt deserve. The longer those feedback loops are the more scope you have for fucking up and not dealing with it quickly enough.
I don’t think this whole distinction between waterfall and agile really exists. They are more like caricatures of what really happens. You have always had leader who could guide a project in a reasonable way, plan as much as necessary, respond to changes and keep everything on track. And you have people who did the opposite. There are plenty of agile teams that refuse to respond to changes because “the sprint is already planned” which then causes other teams to get stuck waiting for the changes they need. or you have the next 8 sprints planned out in detail with no way to make changes.
In the end you there is project management that can keep a project on track while also being able to adapt to change and others that aren’t able to do so and choose to hide behind some bureaucratic process. Has always existed and will keep existing no matter how you call it.
>The difference is purely in how much spec you write before you start implementing.
Ah, and therein lies the problem.
I’ve seen companies frequently elect “none at all” as the right amount of spec to write.
I’d rather have far too many specs than none.
I wrote this, a few years ago, about being careful to avoid "concrete galoshes"[0].
I've found that it's a balancing act; like so many things in software development. We can't rush in, willy-nilly, but it's also possible to kill the project by spending too much time, preparing (think "The Anal-Retentive Chef" skits, from Saturday Night Live).
Also, I have found that "It Depends" is an excellent mantra for life, in general, and software development, in specific.
I think having LLM-managed specs might be a good idea, as it reduces the overhead required to maintain them.
[0] https://littlegreenviper.com/various/concrete-galoshes/
I for one really like developing and co-managing specs with an LLM.
I think it’s a great conversational tool for evaluating and shaking out weak points in a design at an early phase.
> One of the biggest productivity improvements I've had as a developer was to make a habit of planning all my work upfront. Specifically, when I pick up a ticket, I break it down into a big bullet point list of TODOs.
You're describing Agile.
How an individual developer chooses to process a single ticket is completely unrelated to agile or waterfall. Agile is about structuring your work over a project so that you get a happy customer who ends up with what they actually needed and not what they thought they wanted when they signed the contract and that turned out to be completely not what they needed after two months.
I agree with you. I think this is the Plan step in a "Plan-Act-Reflect" loop.
Just to further clarify what I said above: What I talk about here specifically is for the developer who picks up a ticket to flesh out and plan the ticket before diving into code.
This doesn't say anything about what is appropriate for larger project planning. I don't have much experience doing project planning, so I'd look to others for opinions on that.
This is how I approach my stories as well. I used to call this “action plan” way before it became fashionable with the rise of AI agents.
It helps me not only reduce the complexity into more manageable chunks but go back to business team for smoothening the rough edges which would otherwise require a rework after review.
It doesn’t seem like you read the article, which argues against this sort of pre-planning.
The article itself advocates that one should "split complex requirements into multiple simple ones." So I don't disagree here, at least I don't think I do.
If we have a differing interpretation of what the article is motivating for, then please take the opportunity to contemplate an additional perspective and let it enrich your own.
This all seems like a re-hash of the top-down vs bottom-up arguments from before the 90’s (which were never resolved either).
There are two extremes, having everything you do planned up front, and literally planning nothing and just doing stuff.
The power of agile is supposed to be "I don't need to figure this out now, I'll figure it out based on experimentation" which doesn't mean nothing at all is planned.
If you're not planning a mission to Jupiter, you don't need every step planned out before you start. But in broad strokes it's also good to have a plan.
The optimum is to have some recorded shape of the work to come but to give yourself space to change your mind based on your experiences and to plan the work so you can change the plan.
The backlash against waterfall is the result of coming up with very detailed plans before you start, having those plans change constantly during the whole project requiring you to throw away large amounts of completed work, and when you find things that need to change, not being able to because management has decided on The Plan (which they will decide something new on later, but you can't change a thing).
For some decisions, the best time to plan is up front, for other decisions the best time to design is while you're implementing. There's a balance and these things need to be understood by everybody, but they are generally not.
I like to split out exploration and discovery (research) as a third step (the first step) in the process. Before a plan can be devised research needs to be conducted. The more time between research, planning, and execution increases the likelihood of rework or failure.
The best time to plan is dependent on how stable/unstable the environment is.
I vibe coded for months but switched to spec driven development in the last 6 months
I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc
My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.
I don't see how using a spec has made me less agile. My iteration takes 8 hours.
However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.
People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.
People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.
I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.
I'll probably be proven wrong eventually, but my main thought about spec driven dev with llms is that it introduces an unreliable compiler. It will produced different results every time it is run and it's up to the developer to review the changes. Which just seems like a laborious error prone task.
No, this is the right take. Spec driven development is good, but having loose markdown "specs" that leave a bunch up to the discretion of the LLM is bad. The right approach is a project spec DSL that agents write, which can be compiled via codegen in a more controlled way.
Why would you want to rerun it? In that context a human is also an unreliable compiler. Put two humans on the task and you will get two different results. Even putting the same human on the same task again will yield something different. LLMs producing unreliable output that can't be reproduced is definitely a problem but not in this case.
Might be misunderstanding the workflow here, but I think if a change request comes and I alter the spec, I'd need to re run the llm bit that generates the code?
You'd want to have the alteration reference existing guides to the current implementation.
I haven't jumped in headfirst to the "AI revolution", but I have been systematically evaluating the tooling against various use cases.
The approach that tends to have the best result for me combines a collection of `RFI` (request for implementation) markdown documents to describe the work to be done, as well as "guide" documents.
The guide documents need to keep getting updated as the code changes. I do this manually but probably the more enthusiastic AI workflow users would make this an automated part of their AI workflow.
It's important to keep the guides brief. If they get too long they eat context for no good reason. When LLMs write for humans, they tend to be very descriptive. When generating the guide documents, I always add an instruction to tell the LLM to "be succinct and terse", followed by "don't be verbose". This makes the guides into valuable high-density context documents.
The RFIs are then used in a process. For complex problems, I first get the LLM to generate a design doc, then an implementation plan from that design doc, then finally I ask it to implement it while referencing the RFI, design doc, impl doc, and relevant guide docs as context.
If you're altering the spec, you wouldn't ask it to regen from scratch, but use the guide documents to compute the changes needed to implement the alteration.
I'm using claude code primarily.
Hm, maybe it's me who misunderstands the workflow. In that case I agree with you.
That said, I think the non-determinism when rerunning a coding task is actually pretty useful when you're trying to brainstorm solutions. I quite often rerun the same prompt multiple times (with slight modifications or using different models) and write down the implementation details that I like before writing the final prompt. When I'm not happy with the throwaway solutions at all I reconsider the overall specification.
However, the same non-determinism has also made me "lose" a solution that I threw out and where the real prompt actually performed worse. So nowadays I try to make it a habit to stash the throwaway solutions just in case. There's probably something in Cursor where you can dig out things you backtracked on but I'm not a power user.
You would need to rerun the LLM, but you wouldn't necessarily need to rebuild the codebase from scratch.
You can provide the existing spec, the new spec, and the existing codebase all as context, then have the LLM modify the codebase according to the updates to the spec.
Humans are unreliable compilers but good devs are able to "think outside of the box" in terms of using creative ways to protect against their human foibles while LLMs cant.
If I get a nonsensical requirement i push back. If i see some risky code i will think of some way to make it less risky.
You don't need this type of work to be deterministic. It doesn't really matter if the LLM names a function "IsEven" vs "IsNumberEvent".
Have you ever written the EXACT same code twice?
> it introduces an unreliable compiler.
So then by definition so our humans. If compiling is "taking text and converting it to code" that's literally us.
> it's up to the developer to review the changes. Which just seems like a laborious error prone task.
There are trade-offs to everything. Have you ever worked with an off-shore team? They tend to produce worse code and have 1% of the context the LLM does. I'd much rather review LLM-written code than "I'm not even the person you hired because we're scamming the system" developers.
You want it to be as close to deterministic as possible to reduce the risk of the LLM doing something crazy like deleting a feature or functionality. Sure, the idea is for reviews to catch it but it's easier to miss there when there is a lot of noise. I agree that it's very similar to an offshore team that's just focused on cranking out code versus caring about what it does.
Could I see one of your specs as an example?
I shudder to think what they decide the correct interpretations of a spec written in prose is.
Lisp quotes are confusing in prose.
I would love to see a prompt where it fails such a thing. Do you have an example?
Still better than my coworkers ...
Seems like you are all just redefining what spec and waterfall means.
A spec was from a customer where it would detail every feature. They would be huge, but usually lack enough detail or be ambiguous. They would be signed off by the customer and then you'd deliver to the spec.
It would contain months, if not years, worth of work. Then after all this work the end product would not meet the actual customer needs.
A day's work is not a spec. It's a ticket's worth of work, which is agile.
Agile is an iterative process where you deliver small chunks of work and the customer course corrects as regular intervals. Commonly 3/4 week sprints, made up of many tickets that take hours or days, per course correct.
Generally each sprint had a spec, and each ticket had a spec. But it sounds like until now you've just been winging it, with vague definitions per feature. It's very common, especially where the PO or PM are bad at their job. Or the developer is informally acting as PO.
Now you're making specs per ticket, you're just now doing what many development teams already do. You're just bizarrely calling it a new process.
It's like watching someone point at a bicycle and insist it's a rocketship.
A customer generally provides requirements (the system should do...) which are translated into a spec (the module/function/method should do...). The set of specs map to requirements. Requirements may be derived from or represented by user stories and specs may or may not by developed in an agile way or written down ahead of time. Whether you have or derive requirements and specs is entirely orthogonal to development methodology. People need to get away from the idea that having specs is any more than a formal description of what the code should do.
The approach we take is the specs are developed from the tests and tests exercise the spec point in its entirety. That is, a test and a spec are semantically synonymous within the code base. Any interesting thing we're playing with is using the specs alongside the signatures to have an LLM determine when the spec is incomplete.
A spec consists of three different kinds of requirements: functional requirements, non-functional requirements, and constraints. It’s supposed to fully describe how the product responds to the context and the desires of stakeholders.
The problem I see a lot with Agile is that people over-focus on functional requirements in the form of user stories. Which in your case would be statements like “X should do…”
I don't necessarily disagree, but can you give an example of a non functional requirement that influences the design?
[delayed]
Same. I fancy myself a decent technical communicator and architect. I write specs which consists of giant lists of acceptance criteria, on my phone, laying in bed...
Kick that over to some agents to bash on, check in and review here and there, maybe a little mix of vibe and careful corrections by me, and it's done!
Usually in less time, but! any time an agent is working on work shit, Im working on my race car... so its a win win win to me. Im still using my brain, no longer slogging through awful "human centered" programming languages, more time my hobbies.
Isn't that the dream?
Now, to crack this research around generative gibber-lang programming... 90% of our generative code problems are related to the programming languages themselves. Intended for humans, optimized for human interaction, speed, and parsing. Let the AIs design, speak, write, and run the code. All I care about is that the program passes my tests and does what I intended. I do not care if it has indents, or other stupid dogmatic aspects of what makes one language equally usable to any other, but no "my programming language is better!", who cares. Loving this era.
A lot of software process exists to solve a specific problem IMO:
Devs get married to their first implementation; Stakeholders don’t tolerate rework
If companies and individuals could throw more away, then we wouldn’t need to obsess over planning. The “spec” and “design” would get discovered through doing. I’ve never worked anywhere where a long up front design addressed the important design issues. Those get discovered after you’ve tried to implement a solution a few times and failed.
If we say throwing away as a feature rather than a bug, we’d probably work more efficiently.
This article if for those who already made up their mind that "spec-based-development" isn't for them.
I believe (and practice) that spec-based development is one of the future methodologis for developing projects with LLMs. At least it will be one of the niches.
Author thinks about specs as waterfalls. I think about them as a context entrypoint for LLMs. Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc) LLM will be able to build sufficient initial context for the solution to expand it by reading files and grepping text. And the most interesting is that you can make LLM to keep the context/spec/projetc file updated each time LLM updates the project. Viola: now you are in agile again: just keep iterating on the context/spec/project
This is the key, with test driven dev sprinkled in.
You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.
Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.
The tests provide context and documentation for future LLM runs.
This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.
Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.
I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.
I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.
> Once specs are captured as tests, the LLM can no longer hallucinate.
Tests are not a correctness proof. I can’t trust LLMs to correctly reason about their code, and tests are merely a sanity check, they can’t verify that the code was correctly reasoned.
> You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
Except when it decides to remove all the tests, change their meaning to make them pass or write something not in the spec. Hallucinations are not a problem of the input given, it’s in the foundations of LLMs and so far nobody have solved it. Thinking it won’t happen can and will have really bad outcomes.
You can solve this easily by having a separate agent write the tests, and not giving the implementing agent write permission on test files.
It doesn't matter because use of version control is mandatory. When you see things missing or bypassed, audit-instructed LLMs detect these issues and roll-back changes.
I like to keep domains with their own isolated workspaces and git repos. I am not there yet, but I plan on making a sort of local-first gitflow where agents have to pull the codebase, make a new branch, make changes, and submit pull requests to the main codebase.
I would ultimately like to make this a oneliner for agents, where new agents are sandboxed with specific tools and permissions cloning the main codebase.
Fresh-context agents then can function as code reviewers, with escalation to higher tier agents (higher tier = higher token count = more expensive to run) as needed.
In my experience, with correct prompting, LLMs will self-correct when exposed to auditors.
If mistakes do make it through, it is all version controlled, so rolling back isn't hard.
This is the right flow. As agents get better, work will move from devs orchestrating in ides/tuis to reactive, event driven orchestration surfaced in VCS with developers on the loop. It cuts out the middleman and lets teams collaboratively orchestrate and steer.
But do you understand the problem and its context well enough to write tests for the solution?
Take prolog and logic programming. It's all about describing the problem and its context and let the solver find the solution. Try writing your specs in pseudo-prolog code and you will be surprised with all the missing information you're leaving up to chance.
I am not writing the tests, LLMs are.
My objective is to write prompts for LLMs that can write prompts for LLMs that can write code.
When there is a problem downstream the descendant hierarchy, it is a failure of parent LLM's prompts, so I correct it at the highest level and allow it to trickle down.
This eventually resolves into a stable configuration with domain expertise towards whatever function I require, in whatever language is best suited for the task.
If I have to write tests manually, I have already failed. It doesn't matter how skilled I am at coding or capable I am at testing. It is irrelevant. Everything that can be automated should be automated, because it is a force amplifier.
> Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc)
What's not waterfall about this is lost on me.
Sounds to me like you're arguing waterfall is fine if each full run is fast/cheap enough, which could happen with LLMs and simple enough projects. [0]
Agile was offering incremental spec production , which had the tremendous advantage of accumulating knowledge incrementally as well. It might not be a good fit for LLMs, but revising the definition to make it fit doesn't help IMHO.
[0] Reminds me that reducing the project scopes to smaller runs was also a well established way to make waterfall bearable.
Waterfall with short iteration time is not possible by definition.
You might as well say agile is still waterfall, what are sprints if not waterfall with a 2 week iteration time. And Kanbal is just a collection of indepent waterfalls... It's not a useful definition of waterfall.
Just as most agile projects aren't Agile, most waterfall projects weren't strict Waterfall as it was preached.
That being said, when for instance you had a project that should take 2 years and involve a dozen team, you'd try to cut it in 3 or 4 phases, to even if it would only be "released" and fully tested at the end of it all. At least if your goal was to have it see the light in a reasonable time frame.
Where I worked we also did integration runs at given checkpoints to be able to iron out issues earlier in the process.
PS: on agile, the main specificity I'm seeing is the ability to infinitely extend a project as the scope and specs are typically set on the go. Which is a feature if you're a contractor for a project. you can't do that with waterfall.
Most shops have a mix of pre-planning and on-the go specing to get a realistic process.
> Waterfall with short iteration time is not possible by definition.
What definition would that be?
Regardless, at this point it's all semantics. What I care about is how you do stuff, not the label you assign and in my book writing specs to ground the LLM is a good idea. And I don't even like specs, but in this instance, it works.
> What's not waterfall about this is lost on me.
Exactly. There is a spec, but there is no waterfall required to work and maintain it. Author from the article dismissed spec-based development exactly because they saw resemblance with waterfall. But waterfall isn't required for spec-centric development.
> There is a spec, but there is no waterfall required to work and maintain it.
The problem with waterfall is not that you have to maintain the spec, but that a spec is the wrong way to build a solution. So, it doesn't matter if the spec is written by humans or by LLMs.
I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself. A simple readme or a design document, which already should exist for humans, should be enough.
> I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself.
“I don’t see the point of maintaining a documentation for developers. They should be able to grep and understand the code itself”
“I don’t see the point of maintaining tests for developers. They should be able to grep and understand the code itself”
“I don’t see the point of compilers/linters for developers. They should be able to grep and find issues themselves”
The thing is that the parallels you are drawing is for things that is very explicitly not the source of the code, but exists alongside it. Code is the ultimate truth. Documentation is a more humane way to describe it. Tests are there to ensure that what is there is what we want. And linters are there to warn us of specific errors. None of these create code.
To go from spec to code requires a lot of decisions (each introducing technical debt). Automating the process remove control over those decisions and over the ultimate truth that is the code. But why can't the LLM retains the trace of the decisions so that it presents control point to alter the results. Instead, it's always a rewrite from scratch.
> “I don’t see the point of maintaining a documentation for developers. They should be able to grep and understand the code itself”
I cannot think that this comment is done in good faith, when I clearly wrote above that documentation should already exist for humans:
> A simple readme or a design document, which already should exist for humans, should be enough.
I see rapid, iterative Waterfall.
The downfall of Waterfall is that there are too many unproven assumptions in too long of a design cycle. You don't get to find out where you were wrong until testing.
If you break a waterfall project into multiple, smaller, iterative Waterfall processes (a sprint-like iteration), and limit the scope of each, you start to realize some of the benefits of Agile while providing a rich context for directing LLM use during development.
Comparing this to agile is missing the point a bit. The goal isn't to replace agile, it's to find a way that brings context and structure to vibe coding to keep the LLM focused.
"rapid, iterative Waterfall" is a contradiction. Waterfall means only one iteration. If you change the spec after implementation has started, then it's not waterfall. You can't change the requirements, you can't iterate.
Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
> Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
If only this were accurate. Royce's chart (at the beginning of the paper, what became Waterfall, but not what he recommended by the end of the paper) has been adopted by the DOD. They're slowly moving away from it, but it's used on many real-world projects and fails about as spectacularly as you'd expect. If projects deliver on-time, it's because they blow up their budget and have people work long days and weekends for months or years at a time. If it delivers on budget, it's because they deliver late or cut out features. Either way, the pretty plan put into the presentations is not met.
People really do (and did) think that the chart Royce started with was a good idea, they're not competent, but somehow they got into positions in management to force this stupidity.
I would maybe argue that there is a sweet spot of how much you feed in (with some variability depending on task). I tend to keep my initial instructions succinct, then build them up iteratively. Others write small novels of instructions before they start, which personally don't like as much. I don't always know what I don't know, so speccing ahead in great detail can sometimes be detrimental.
Agree. I don't use term "spec" as it was with "spec-based development" before llms. There details were required to be defined upfront. With LLMs you can start with vague spec, missing some sections and clarify it with iterations.
Sweet spot will be a moving target. LLMs build-in assumptions, ways to expand concepts will be chaning with LLMs development. So best practices will change with change of the LLMs capabilities. The same set of instructions, not too detailed, were so much better handled by sonnet 4 than sonnet 3 in my experience. Sonnet 3.5 was for me a breaking point which showed that context-based llm development is a feasible strategy.
You're right that this is the future, but I believe the thread is misdiagnosing the core 'system error'.
The frustration thomascountz describes (tweaking, refining, reshaping) isn't a failure of methodology (SDD vs. Iteration). It's 'cognitive overload' from applying a deterministic mental model to a probabilistic system.
With traditional code, the 'spec' is a blueprint for logic. With an LLM, the 'spec' is a protocol for alignment.
The 'bug' is no longer a logical flaw. It's a statistical deviation. We are no longer debugging the code; we are debugging the spec itself. The LLM is the system executing that spec.
This requires a fundamental shift in our own 'mental OS'—from 'software engineer' to 'cognitive systems architect'.
I know enough about Machine Learning and statistics to understand that errors are always there. It just needs to be small enough to not matter in the decisions that need to be taken (hopefully). But the thing is that computers can't differentiate errors from correct behavior. Anything in the code is true and if the result is catastrophic, so be it.
As software engineers, it's very often easy to specify what the system should do. But ensuring that it doesn't do what he shouldn't do is the tiresome part of the job. And most tools we created is to ensure the latter.
I could not have said it better. We're on the same page with you.
I would add that to my opinion if previously code production/management was a limiting factor in software development, today it's not. The conceptualisation (onthology, methodology) of the framework (spec-centric devlopment) for the system production and maintenance (code, artifacts, running system) becomes a new limiting factor. But it's matter of time we'll figure out 2-3 methodologies (like it happened with the agile's scrum/kanban) which will become a new "baseline". We're at the early stages when new "laws of llm development" (as in "laws of physics") is still being figured out.
I would simply replace LLM by agent in your reasoning, in the sense that you'll need a strong preprocessing step and multiple iterations to exploit such complete specs.
There is sense in your words. Especially in the context of the modern day vocabulary.
I though about the concept of this ort of methodology before "agent" (which I would define as "sideeffects with LLM integration") was marketed into community vocabulary. And I'm still rigidly sticking to what I consider "basics". Hope that does not impede understanding.
I had a small embedded project and I did it > 70% using LLM's. This is exactly how I did it. Specs are great for grounding the LLM. Coding with LLM's is going to mean relying more on process since you can't fully trust them. It means writing specs, writing small models to validate, writing tests and a lot of code review to understand what the heck it's doing.
I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.
After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.
Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
Just picking on the example:
The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.
The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.
I wouldn't call that accidental complexity? It's just a set of preferences.
Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.
The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?
Also remember this was a simple example; once we get to real business logic efficiencies grow.
It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).
Something boring and standard is something that keeps going with minimal intervention while getting better each time.
Your team's fixed preferences get stored into your .agents.md file so you don't type it over and over.
If you change your preferences, the team refactors.
I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.
I'm struggling to see what you'd choose to do differently here?
Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!
I think the challenge is how to create a small but evolvable spec.
What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?
Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.
Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.
It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!
I think the problem is you still care about the craft. You need to let go and let the tide take you.
I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.
The rip tide
In my experience, spending 20–30 minutes writing a good spec results in code that is about 90% close to what I expected, which reduces the back-and-forth with the tool. It also helps me clarify and define with some level of precision what I actually want. During the specification phase, I can iterate until the design proposed by the tool is close to what I envision, reducing the number of surprises when the tool generates code. It’s not perfect, and there are still details the tool misses that require additional prompts, but overall I can get good results in a single session, whereas before I would exhaust the tokens and need to start a new session again.
My best "AI win" so far was in an area where I had to create a number of things that all followed a similar pattern. I created one hand-crafted example and created a general spec and specific ones for each component. It worked really well and I was, for a moment, experiencing a 10-30X productivity boost while having resultant code that I could review quickly and understand. It was also more consistent that I think I would have gotten from hand coding as it is easy to drift a little in terms of style and decisions.
Of course, this is all very situational and based on the problem being solved at the time. The risk with "practices" is they are generally not concerned with problem being solved and insist on applying the same template regardless.
And this is absolutely fine, because the problem with waterfall wasn't the detailed spec, it was
a) the mulyi-year lead time from starting the spec to getting a finished product
b) no (cheap) way to iterate or deliver outside the spec
Neither of these are a problem with SDD.
It seems to me that most people (myself included) never experienced the actual Waterfall elsewhere than in school curriculum descriptions.
It's a bit funny to see people describe a spec written in days (hours) and iterations lasting multiple weeks as "waterfall".
But these days I've already had people argue that barely stopping to think about a problem before starting to prompt a solution is "too tedious of a process".
To add to this: I've worked on projects that came out of waterfall process, and on projects that came out of too hasty agile iterations.
They both have issues but they are very different. A waterfall project would have inscrutable structure and a large amount of "open doors" just in case a need of an extension at some place would materialize. Paradoxically this makes the code difficult to extend and debug because of overdone abstractions.
Hasty agile code has too many TODOs with "put this hardcoded value in a parameter". It is usually easier to add small features but when coming to a major design flaw it can be easier to throw everything out.
For UI code, AI seems to heavily tend towards the latter.
Preparation is waterfall!!! Just code without instructions, bruv!!!
Documentation gets out of date quickly!!!
I did professional waterfall development and SDD is exactly waterfall. The problem with waterfall was never the time that it took, it is that the spec locks you into a small niche and iterative changes force enormous complexity to keep spec and code consistent.
The problems with waterfall come when much of the project is done and then you discover that your spec doesn't quite work, but the changes to your spec require half the requirements to subtly change, so that it can work at all. But then these subtle changes need to be reflected in code everywhere. Do this a couple of times (with LLM and without) and now your code and spec only superficially look like one another.
> the problem with waterfall wasn't the detailed spec
The detailed spec is exactly the problem with the waterfall development. The spec presumes that it is the solution, whereas Agile says “Heck, we don't even understand our problem well, let alone understanding a solution to it.”
Beginning with a detailed spec fast with an LLM already puts you into a complex solution space, which is difficult to navigate compared to a simpler solution space. Regardless of the iteration speed, waterfall is the method that puts you into a complex space. Agile is the one you begin with smaller spaces to arrive at a solution.
It wasn't the spec - it was the inability to change the spec.
It's the ability to _change_ quickly (or be agile) in response to feedback that marks the difference.
> whereas Agile says “Heck, we don't even understand our problem well, let alone understanding a solution to it.”
How can you even develop something if you don’t have a clear idea what you’re building?
> How can you even develop something if you don’t have a clear idea what you’re building?
But, the statement "we don't even understand our problem well" is typically correct. In most cases where new software is started, the problem isn't well-defined, amenable to off-the-shelf solutions. And you will never know as little about the problem as you do on day one. Your knowledge will only grow.
It is more useful to acknowledge this reality and develop coping strategies than to persist in denial of it. At the time that the agile manifesto was written, the failure of "big up-front design" was becoming plainly evident. You think that you know the whole spec, and then it meets reality much as the Titanic met an iceberg.
Agile does not say "no design, no idea", it points out things that are more valuable than doomed attempts at "100% complete design and all the ideas before implementation". e.g. "while there is value in (comprehensive documentation, following a plan), we value (Working software, Responding to change) more. (see https://agilemanifesto.org/ )
In other words, start by doing enough design, and then some working software to flush out the flawed thinking in the design. And then iterate with feedback.
You have am idea but typically you neither have a complete understanding nor a detailed view of the solution, and of course things tend to chanhe over time.
That's the key benefit of starting small and of iterating: it allows you to learn and to improve. You don't learn anything about your problem amd solution by writing a comprehensive design spec upfront.
I have an idea to build payments processor, how does that get me any closer to actual payments processing?
The spec is the problem.
The delay is just irrelevant. It has nothing to do with it working ot not.
>b) no (cheap) way to iterate or deliver outside the spec
You could always do this in a waterfall project. Just make whatever changes to the code and ship. The problem is the same for SDD, as soon as you want quick changes you have to abandon the spec. Iterating the spec and the code quickly is impossible for any kind of significant complex project.
Either the spec contains sufficient details to make implementation feasible and iteration times become long and the process of any change becomes tedious and complex or the spec is insufficient in describing the complexity of the project, which makes it insufficient to guide an LLM adequately.
There is a fundamental contradiction here, which LLMs can not resolve. People like SDD, for a exactly the reason Managers like waterfall.
The detailed design spec is an issue hence Agile's "working code over comprehensive documentation". Your two points are consequences of this.
"Heavy documentation before coding" (article) is essentially a bad practice that Agile identified and proposed a remedy to.
Now the article is really about AI-driven development im which the AI agent is a "code monkey" that must be told precisely what to do. I think the interesting thing here will be do find the right balance... IMHO this works best when using LLMs only for small bits at a time instead of trying to specify the whole feature or product.
It's often forgotten that the last part of the agile manifesto states that the eg 'comprehensive documentation' _is_ valuable.
The key to Agile isn't documentation - it's in the ability to change at speed (perhaps as markets change). Literally "agile".
This approach allows for that comprehensive documentation without sacrificing agility.
The Agile manifesto states exactly what I wrote. It's not that comprehensive documentation isn't valuable, it's that working software is more valuable.
In addition, the big issue is when the comprehensive documentation is written first (as in waterfall) because it delays working software and feedback on how well the design works. Bluntly, this does not work.
That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code (who will do maintenance?)
The article seems to miss many of those points.
IMHO a good start is to have the LLM prompt be a few lines at most and generate about 100 lines of code so you can read it and understand it quickly, tweak it, use it, repeat. Not even convinced you need to keep a record of the prompt at all.
> That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code
REPL development and Live programming is similar to that. But when something works, it stays working. Even with the Edit-Compile-Run cycle, you can be very fast if the cycle is short enough (seconds). I see people going all in with LLMs (and wishing for very powerful machines) while ignoring other tools that could give better return on a 5 year old laptop.
Yeah but can't we expect bureaucratic companies to adopt such a methodology exactly like that: write a spec for years, run the LLM agent every 6 months, blame techs for the bad result, iterate, and also forbid coding outside of the spec.
I’ve done spec driven development using a bunch of markdown files. That works fine but what I have found really works is using beads: https://github.com/steveyegge/beads
I’m letting the agent help me draft the specs anyway and I found that the agent is a lot more focused when it can traverse a task tree using beads.
It’s the one spec or planning tool that I find really helps get things done without a bunch of intervention.
Another technique I employ is I require each task to be TDD. So every feature has two tasks: write tests that fail, implement feature and don’t notify me until tests complete. Then I ask the agent to tell me how to review the task and require I review every task before moving to the next one. I love this process because the agent tells me exactly what commands to run to review the task. Then I do a code review and ask it questions. Reading agent code is exhausting so I try to make the tasks as discrete and minimal as possible.
These are simple techniques that humans employ during development and I find it worked very well.
There are also times when I need to write some docs to help me better understand the problem and I usually just dump those in a specs folder.
I think spec-kit is an interesting idea but too heavy handed. Just use beads and you’ll see what I mean.
Another technique I employed for a fully vibed tool (https://github.com/neurosnap/zmx) is to have the agent get as far as possible in a project and then I completely rewrite it using the agent code purely as a reference.
Not sure why Steve re-inveted a VCS for agents, if you just use Github/Gitlab all this stuff is properly tracked and surfaced.
This sounds promising! Could you give us examples of how you give instructions to the coding agent?
Bit of a tangent, but this reminds me of a video[1] I watched a bit ago where there was someone who interviewed 20 or so people who were both engineers and programmers, and asked them what the two fields could learn from each other. One of the things it mentioned from the perspective of a physically-based engineer is that a little more up-front planning can make a big difference, and that's stuck with me ever since.
[1] (pretty sure this is the right one): https://youtu.be/CmIGPGPdxTI
This is a weird article. How many times in your career have you been handed a grossly under-specified feature and had to muddle your way through, asking relevant people along the way and still being told at the end that it’s wrong?
This is exactly the same thing but for AIs. The user might think that the AI got it wrong, except the spec was under-specified and it had to make choices to fill in the gaps, just like a human would.
It’s all well and good if you don’t actually know what you want and you’re using the AI to explore possibilities, but if you already have a firm idea of what you want, just tell it in detail.
Maybe the article is actually about bad specs? It does seem to venture into that territory, but that isn’t the main thrust.
Overall I think this is just a part of the cottage industry that’s sprung up around agile, and an argument for that industry to stay relevant in the age of AI coding, without being well supported by anything.
I sometimes wonder how many comments here are driving a pro AI narrative. This very much seems like one of those:
The agent here is:
Look on HN for AI skeptical posts. Then write a comment that highlights how the human got it wrong. And command your other AI agents to up vote that reply.
It has nothing to do with AI, the article is just plain wrong. You have to be either extremely dumb, extremely inexperienced or only working solo to not understand this.
> Agile methodologies killed the specification document long ago. Do we really need to bring it back from the dead?
it didn't really kill it - it just made the spec massively disjoint, split across hundreds to thousands of randomly filled Jira tickets.
I think that might be the key here.
All those small micro decisions, discussions, and dead ends can be recorded and captured by the AI. If you do something that doesn’t make sense given past choices, it can ask you.
Gradually, over time, it can gather more and more data that only lives in your brain at the time you’re building. It’s only partially captured by git commits but mostly lost to time.
Now, when you change code, the system can say, “Jim wrote that 5 years ago for this reason. Is the reason not valid anymore?”. You might get this on a good code review, but probably not. And definitely not if Jim left 2 years ago.
> it didn't really kill it - it just made the spec massively disjoint, split across hundreds to thousands of randomly filled Jira tickets.
Don’t forget 80% of project knowledge in Jim’s head, and nobody knows how it all connects ever since he left 5 years ago.
I mean, yeah?
That's why in my workflow I don't write single monster specs. Rather, I work with the LLM to iterate on small, individual, highly constrained specs that provide useful context for what/why/how -- stories, if you will -- that include a small set of critical requirements and related context -- the criteria by which you might "accept" the work -- and then I build up a queue of those "stories" that form a, you might say, backlog of work that I then iterate with the LLM to implement.
I then organize that backlog so that I can front-load uncovering unknowns while delivering high-value features first.
This isn't rocket science.
By far the biggest challenge I experience is compounding error during those iterative cycles creating brittleness, code duplication, and generally bad architecture/design. Finding ways to incorporate key context or other hints in those individual work items is something I'm still sorting out.
(and yes, I use en-dashes, and no I'm not an AI)
There’s nothing keeping you from scoping the spec to an agile package of work. To the contrary: even if you start with a full spec for a multi-day-AI-coding-session, you are free to instruct it to follow agile principles. Just ask it to add checkpoints at which you want to be able to let users test a prototype, or where you want to revisit and discuss the plan.
Anything that ends with "Driven Development" is a little suspicious to me. It usually means that someone wants to sell books or conference tickets.
I have similar feelings. I’m willing to believe there are scenarios where this kind of thing makes sense, maybe – a colleague has had great success on a small, predictable greenfield project with it. I don’t work on many of those. My main objections are that I’ve had plenty of success with LLMs without intermediate detailed specs (and I don’t think my failures would have been helped by them), and I just don’t like the idea of primarily reviewing specs. Some sort of plan or research document is a different matter - that’s fine. But the kind of code-like formalised spec thing? I want to look at code, it’s just easier. Plus, I’m going to be reviewing the code too (not doing so is irresponsible in my opinion), so having spec AND code is now double the text to read.
The part of the process the actually needs improving, in my experience in larger codebases, is the research phase, not the implementation. With good, even quite terse research, it’s easy to iterate on a good implementation and then probably take over to finish it off.
I really think LLMs and their agent systems should be kept in their place as tools, first and foremost. We’re still quite early in their development, and they’re still fundamentally unreliable, that I don’t think we should be re-working over-arching work practices around them.
I've found that SDD is actually what you need to be able to work with code bases when they go above around 100 000 lines of code. It's what unlocked getting LLMs to work well with large codebases for me.
I still don't get it, can you clarify? It's not the research phase that I'm disputing. Clearly for a large codebase, you need some good way to take all that information (code, product knowledge) and distill it down to something that can fit in the context, ready for implementation. And it's that research that is going to get harder the bigger the codebase. (My current experience is with a repo around 1.5 million lines.) I'm saying that the output of that research, in my experience, doesn't need to be anything like the detail of an exact spec. It can be a sort of one-to-two-pager Markdown doc, at most – and any further detail is much more ergonomic for me to iterate over in the form of code.
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
Replace the product people with LLMs and keep the engineers. You'll get better results that way.
A point I like to make in discussions like this is that software and hardware specifications are very different. We think of software as the thing we're building. But it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Compared to what an architect does when they create a blueprint for a building, creating blueprints for software source code is not a thing.
What in waterfall is considered the design phase is the equivalent of an architect doing sketches, prototypes, and other stuff very early in the project. It's not creating the actual blue print. The building blue print is the equivalent of source code here. It's a complete plan for actually constructing the building down to every nut and bolt.
The big difference here is that building construction is not automated, costly, and risky. So architects try to get their blueprint to a level where they can minimize all of that cost and risk. And you only build the bridge once. So iterating is not really a thing either.
Software is very different; compiling and deploying is relatively cheap and risk free. And typically fully automated. All the effort and risk is contained in the specification process itself. Which is why iteration works.
Architects abandon their sketches and drafts after they've served their purpose. The same is true in waterfall development. The early designs (whiteboard, napking, UML, brainfart on a wiki, etc.) don't matter once the development kicks off. As iterations happen, they fall behind and they just don't matter. Many projects don't have a design phase at all.
The fallacy that software is imperfect as an engineering discipline because we are sloppy with our designs doesn't hold up once you realize that essentially all the effort goes into creating hyper detailed specifications, i.e. the source code.
Having design specifications for your specifications just isn't a thing. Not for buildings, not for software.
We could just stop calling it an engineering discipline. You've laid out plenty of reasons why it is nothing like an engineering discipline in most contexts where people write software.
Real software engineering does exist. It does so precisely in places where you can't risk trying it and seeing it fail, like control systems for things which could kill someone if they failed.
People get offended when you claim most software engineering isn't engineering. I am pretty certain I would quickly get bored if I was actually an engineer. Most real world non-software engineers don't even really get to build anything, they're just there to check designs/implementations for potential future problems.
Maybe there are also people in the software world who _do_ want to do real engineering and they are offended because of that. Who knows.
Extend your reasoning.
> it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Agree. My favourite description of software development is specification and translation - done iteratively.
Today, there are two primary phases:
1. Specification by a non-developer and the translation of that into code. The former is led by BAs/PMs etc and the output is feature specs/user stories/acceptance tests etc. The latter id done by developers: they translate the specs into code.
2. The resulting code is also, as you say, a spec. It gets translated into something the machine can run. This is automated by a compiler/interpreter (perhaps in multiple steps, e.g. when a VM is involved).
There have been several attempts over the years to automate the first step. COBOL was probably the first; since then we've had 4GLs, CASE tools, UML among others. They were all trying to close the gap: to take phase 1 specification closer to what non-developers can write - with the result automatically translated to working code.
Spec-driven development is another attempt at this. The translator (LLM) is quite different to previous efforts because it's non-deterministic. That brings some challenges but also offers opportunities to use input language that isn't constrained to be interpretable by conventional means (parsers implementing formal grammars).
We're in the early days of spec-driven. It may fail like its predecessors or it may not. But first order, there's nothing sacrosanct about the use of 3rd generation languages as the means to represent the specification. The pivotal challenge is whether translation from the starting specification can be reliably translated to working software.
If it can (big if) then economics will win out.
Where I work we do (for high assurance software) systems specifications, systems design, software specifications and software design and ultimately source code.
That said, there is a bit of redundancy between software design and source code. We tend to rather get rid of the development of the latter than the former though, i.e. by having the source code be generated by some modelling tool.
I just feel that the community has learned nothing.
Opinions about "what works" being pushed as fact. No evidence, no attempt to create evidence (because it's hard). Enless commentary and opinion pieces, naive people being coached by believers into doing things that seem to work on specific examples.
If you have an example that worked for you it doesn't mean that it's a useful way to work for everyone else in every other situation.
Defining the spec is ofcourse also needed in an agile process.
The difference is when is this done; In a upfront discussion, while developing, or after user feedback.
For LLM we know it needs to be written down. (At least if we want human tracability)
And agile ofcourse is a shortend waterfall to get user feedback early on.
Giving enough context is important for every case.(humans and llm's alike)
I think I've seen enough of a trend: all these LLM ideas eventually get absorbed by the LLM provider and integrated. The OSS projects or companies with products eventually become irrelevant.
So they're more like 3rd party innovations to lobby LLM providers to integrate functionalities.
X prompting method/coding behaviors? Integrated. Media? Integrated. RAG? Integrated. Coding environment? Integrated. Agents? Integrated. Spec-driven development? It's definitely present, perhaps not as formal yet.
Sounds like the makings of a healthy software ecosystem
I swear I've never seen the waterfall dissappear. I've never seen agile work out so far. I don't say it can't work. But I haven't seen it yet.
What I really want is to be able to do the things I'm good at. Usually that is not what gets assigned to me or is next in line.
But specs are per feature, it’s just an up front discussion first like you’d have on may things rather than questions-> immediate code writing from a model.
Don't really think the thesis is fair.
SDD as it's presented is a bit heavy weight, if you experimented with a bit, there is a lighter version that can work.
For some mini modules, we keep a single page spec as 'source of truth' instead of the code.
It's nice but has it's caveats but they are less of a concern over time.
>> You can see my instructions in the coding session logs
such a rare (but valued!) occurrence in these posts. Thanks for sharing
Thank you for writing this. It was also my first impression after seeing not only spec driven dev, but agentic systems that try to mimic human roles and processes 1:1. It feels a bit like putting a saddle on an automobile so that it feels more familiar.
It's a nice observation that Spec-Driven Development essentially implements the waterfall model.
Personally, I tried SDD, consciously trying to like it, but gave up. I find writing specs much harder than writing code, especially when trying to express the finer points of a project. And of course, there is also that personal preference: I like writing code, much more than text. Yes, there are times where I shout "Do What I Mean, not what I say!", but these are mostly learning opportunities.
You can do SDD in waterfall or Agile. The interaction I’m having with users right now is that the feedback and iterations are happening within hours or days instead of the usual 1-2 week sprint length. In my case SDD is enabling my team to be hyper-agile.
I have no idea how to reconcile sdd and waterfall. With SDD you’re working per feature, right? Waterfall is speccing the entire project upfront and with a strong force against any changes as you go before a final delivery.
"Agile methodologies killed the specification document long ago. Do we really need to bring it back from the dead?"
Not at FAANG. Or at least not at Google where I was for 10 years. They were obsessed with big upfront PRDs and design docs, and they were key to getting promotion and recognition.
These days those kinds of documents -- which were laborious to produce, mostly boilerplate, and a pain to maintain, and often not really read by anybody other than promo committee -- could be produced easily by prompting an LLM.
Having drunk the well-spring of XP and agile early in my career, I found it continually frustrating. Actual development followed iterative practices, but not officially.
Developers spend most of their time reading long Markdown files, hunting for basic mistakes hidden in overly verbose, expert-sounding prose. It’s exhausting.
Those requirements exist regardless of whether you write them down in Markdown files or not. Spec-driven Development is just making what needs to be built explicit rather than assuming the whole team know what the code should do.
Practically every company already uses 'spec-driven development', just with incredibly vague specs in the form of poorly written Jira tickets. Developers like it because it gives them freedom to be creative in how they interpret what needs to be done, plus they don't need to plan things and their estimates can be total nonsense if they want, and Product Owners and BAs like it because it means they can blame shift to the dev team if something is missed by saying "We thought that was obvious!"
Every team should be capturing requirements at a level of detail that means they know how the code should work. That doesn't need to be done up front. You can iterate. Requirements are a thing that grow with a project. All that spec-driven development is doing is pushing teams to actually write them down.
You can get away with incredibly vague "specs" in tickets most of the time because of a shared understanding of the system and product – of the kind that an LLM won't have (without very careful and well-organised in-repo documentation). When it works, which in my experience is quite a lot, it's very efficient. Sometimes it fails, sure. Sometimes it's annoying because you actually do need more detail than is provided and have to ask for it in a conversation. In my experience that's a low cost because it happens much less often and isn't too laborious when it does.
But crucially, the details here are coming from the issue authors. Do you really think that issue authors are going to be reviewing LLM-generated specs? I don't think so. And so engineers will be the intermediary. If that's going to be me, I would rather mediate between the issue author, some kind of high-level plan, and code. Not the issue author, a high-level plan, code-like specs, and code. There is one extra layer in the latter that I don't see the value of.
> Developers like it because it gives them freedom to be creative in how they interpret what needs to be done, plus they don't need to plan things and their estimates can be total nonsense if they want
I like it because it moves me closer to the product, the thing actually being built. You seem to be asking to move the clock back to where there was a stricter division of labour. Maybe that's necessary in some industries, but none that I've worked in.
> Developers like it
Do they?
> Coding assistants are intimidating: instead of an IDE full of familiar menus and buttons, developers are left with a simple chat input. How can we ensure that the code is correct with so little guidance?
Isn't this was Kiro IDE is about? Spec-driven dev?
That's Event-B (minus the formal side) with LLM
Most software developers are doomed to rediscover time and time again that 4 weeks of frantic developments save two hours of calmly thinking about the task.
Worked in a company where they’ve spent 6 months waiting for vendor to implement custom hardware solution instead of just optimizing software to run faster in a month, no joke.
I'm still not convinced there's anything wrong with waterfall for some projects.
It is/was fine if you were willing to bet on your base specs being near perfect.
As long as you haven't started testing and verification, changing the spec is cheap.
If your architecture allows it.
If you are working with constrained hardware or users... it isn't.
I expected testing and verification to come before the release, so that you don't have any users yet was implied. As for when you have produced hardware, when it's only prototypes, it should still be cheap, hardware also needs multiple revisions, before all bugs are ruled out.
When that is not the case, working without a spec won't help either.
I was thinking more about my experience in corporate settings.
Hardware needs to be procured or implemented in the cloud - there's a lot of work on the architectures and costs early in projects so as to ensure that things will cost in. Changing that can invalidate business cases, and also can be very difficult due to architectural and security controls.
In terms of users, in corporates the user communities must be identified, trained, sometimes made redundant, sometimes given extra responsibilities. Once you have got this all lined up any changes become very hard because suddenly, like a ripple over a lake when a pebble is dropped in, everyone who's touched has a reason why they are going to miss targets (you are that reason) and therefore want 100% bonus (there is no money for 100% bonus for all).
In previous jobs I would have delighted in pointing out that if there are no users the system can't be funded!
I agree that working without a spec is madness, it's just not realistic in the real world either. People expect you to stand behind a commitment to deliver, they also want to know what they are paying for. However, things do change, both really (as in something happens and the system must now accomodate it) and also due to discovery (we didn't know, we couldn't have known, but now we know and must accomodate this knowledge). It's really important to factor this in, although perfect flexibility is infinitely expensive and completely unrealistic...
A bit of flex can be cheap, easy and a lifesaver though.
Much of the hype around SDD is really about developers never having experienced a real waterfall project.
Of course SDD/Waterfall helps the LLM/Outsourced labor to implement software in a predictable way. Waterfall was always a method to please Managers and in the case of SDD the manager is the user promoting the coding agent.
The problem with SDD/Waterfall is not the first part of the project. The problems come when you are deep into the project, your spec is a total mess and the tiniest feature you want to add requires extremely complex manipulation of the spec.
The success people are experiencing is the success managers have experienced at the beginning of their software projects. SDD will fail for the same reason Waterfall has failed. The constant increasing of complexity in the project, required to keep code and spec consistent can not be managed by LLM or human.
For myself, I found that having a work methodology similar to spec-driven development is much better than vibe coding. The agent makes less mistakes, it stays on the path and I have less issues to fix.
And while at it, I found out that using TDD also helps.
What LLM tools are folks seeing that do the most or the best to integrate specs?
Amazon's Kiro is incredibly spec driven. Haven't tried it but interested. Amplifier has a strong document-driven-development loop also built-in. https://github.com/microsoft/amplifier?tab=readme-ov-file#-d...
I, for one, welcome the fact that agile/scrum/daily standup/etc. rituals will be outdated. While they might be somehow useful in some software development projects, in the past 10 years it turned out to be a cult of lunatics who want to apply it to any engineering work, not just software, and think any other approach than that will result in bad outcomes and less productivity. Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Very valid. In the beginning all this was driven by developers. Then it was LinkedIn-ified and suddenly we had to deal with agile coaches. Essentially people with no tech qualifications play with developers as guineapigs without understanding the why.
Same is true for UX and DevOps, just create a bunch of positions based on some blog post, and congratulate your self on a job well done. Screwing over the developer (engineers) as usual. Even though they actually might be interested in those jobs.
This is the main problem with big tech informing industry decisions, they win because they make sure they understand what all of this means. For all other companies this just creates a mess and your mentioned frustration.
No way, how will I develop without knowing what Bob the farter did yesterday?
> Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Open office is the densest and cheapest office layout. That is the reason it exists and the reason it will persist. All other reasons are inferior.
Ah, another Dunning-kruger-bait post (someone who doesn't know what they're talking about who writes a blog post very confidently, and it's read and upvoted by people who also don't know what they're talking about). Very good.
The immediate pooh-poohing of Waterfall is the big tell here. If they don't give you an example of an actual Waterfall project they've worked on, or can't elucidate why it wasn't just that one project or organization that made Waterfall bad, they're likely parroting myths or a single anecdotal experience. And that bad experience was likely based on not understanding it to begin with (Waterfall in particular is the subject of many myths and lies). I've had terrible Agile experiences. Does that make Agile terrible?
In my experience, Agile has a tendency to succeed despite itself. Since you don't do planning, you just write one bit at a time. But of course eventually this doesn't work, so you spend more time rearchitecting, rewriting, and fixing things. But hey, look, you made something useful! ....it still isn't making the company any money yet, but it's a thing you can see, so everyone feels better. You can largely do this work by yourself, so you can move fast; until you need something controlled by someone else, at which point you show up at the 11th hour, and dump a demand on their desk that must be finished immediately. Often these recipients have no choice, because the organization needs the thing you're slapping together, and they're "being a blocker". And those recipients then can't accomplish what they need to, because they haven't been given any documentation to know what to do. Bugs, rushed deadlines (or worse, no deadlines), dead-cats-over-the-wall, wasted effort, dysfunction. Is this the only way to do Agile? Of course not. But it's easy for me to paint the entire model this way, based on my experience.
There does not exist a project management method which is inherently bad. I repeat: No formal project management method is bad. Methods are simply frameworks by which you organize and execute work. You will not be successful just because you used a framework. You still have to do the organizing and execute the work in a not-terrible-way. You have a lot of wiggle room about how things are done, and that is what determines the outcome. You can do things quickly or slow, skillfully or shoddily, half-assed or competently, individually or collaboratively. It's how you take each step, not which step you take.
As long as humans at at the reigns, it doesn't matter what method you use. The same project, with the same method, can either go to shit, or turn out great. The difference between the two is how you use the methods. You have to do the work, and do it well. In organizations, with other humans, that's often very difficult, because it means you depend on something outside of your control. So leadership, skill, and a collaborative, positive culture, are critical to getting things done.
"The rise of specification"
OMG really? You think?
(Except people will get AI to write it.)
You can't code without specifications, period. Specifications can have various forms but in ultimately define how your program should work.
The problem with what people call "Waterfall" is that there is an assumption that at some point you have a complete and correct spec and you code off of that.
A spec is never complete. Any methodology applied in a way that does not allow you to go back to revise and/or clarify specs will cause trouble. This was possible with waterfall and is more explicitly encouraged with various agile processes. How much it actually happens in practice differs regardless of how you name the methodology that you use.
Of course you can code without specifications. Most software projects don't have them these days.
In contrast they're still the standard in the hardware design world.
This is a Tyranny of Structurelessness sort of thing. In that essay there is a structure in structureless organizations but it's implicit and unarticulated which makes it inscrutable, potentially more malleable (which could be good), but also constricting to those who don't know the actual structure when they try and accomplish things.
If you don't have explicit specifications (which don't have to be complete before starting to develop code), you still have specs, but they're unarticulated. They exist in the minds of the developers, managers, customers, and what you end up with is a confused mess. You have specs, but you don't necessarily know what they are, or you know something about them but you've failed to communicate them to others and they've failed to communicate with you.
> Of course you can code without specifications. Most software projects don't have them these days.
And most software projects are complete mess that waste unfathomable amounts of resources. But yeah, you “can” develop like that.
Code is a specification. You cannot have a program unless you specify how it works.
And TTD would say "start with a failing test, it is an executable specification for the code that you will need to make it pass. Then make it pass, then refactor to make nice code that still passes all tests - red, green, refactor under tests"