I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.
My thoughts on vibe coding vs production code:
- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs
- This is partly b/c it is good at things I'm not good at (e.g. front end design)
- But then I need to go in and double check performance, correctness, information flow, security etc
- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)
- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs
- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.
I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.
Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD options on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
I think there's a lot to pick apart here but I think the core premise is full of truth. This gap is real contrary to what you might see influencers saying and I think it comes from a lot of places but the biggest one is writing code is very different than architecting a product.
I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.
Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.
I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it
And some people do, both things can be true. I'd rather make a tool just for me that breaks when I introduce a new requirement and I just add into it and keep going.
I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.
Some minor UX enhancement SaaS of the most recent VC-funded wave will do. Maybe those who forgot how to invest in R&D and spent last 20 years just fixing bugs. There’s plenty of SaaS on the market that offers added value beyond the code. Data brokers. Domain experts, etc. Even if homemade solution is sometimes possible, initial development costs are going to be just one of several important factors in choosing whether to build or to buy.
How many products are actually like that? If I could easily replace github, datadog/sentry/whatever, cloudflare, aws, tailscale that would be great. In my view building and owning is better than buying or renting. Especially when it comes to data--it would be much better for me to own my telemetry data for example than to ship it off to another company. But I don't think you (or anyone) will be vibecoding replacements for these services anytime soon. They solve big, hard, difficult problems.
This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.
Perplexity just launched a tool that builds and hosts small bespoke tools.
I tried it works wells. I can do the same thing in my Linux machine, but even my 12 year old now can get perplexity to build him a tool to compare ram prices at different chinease vendors.
Author admittedly didn’t know how to scale his app for thousands or hundreds of thousands of users. He jokes about it working great on localhost or “my machine”.
Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.
Used Codex for the whole project. At first I used claude for the architect of the backend since thats where I usually work and got experience in. The code runner and API endpoints were easy to create for the first prototype. But then it got to the UI and here's where sh1t got real. The first UI was in react though I had specifically told it to use Vue. The code editor and output window were a mess in terms of height, there was too much space between the editor and the output window and no matter how much time I spent prompting it and explaining to it, it just never got it right. Got tired and opened figma, used it to refine it to what I wanted. Shared the code it generated to github, cloned the code locally then told codex to copy the design and finally it got it right.
Then came the hosting where I wanted the code runner endpoint to be in a docker container for security purpose since someone could execute malicious code that took over the server if I just hosted it without some protection and here it kept selecting out of date docker images. Had to manually guide it again on what I needed. Finally deployed and got it working especially with a domain name. Shared it with a few friends and they suggested some UI fixes which took some time.
For the runner security hardening I used Deepseek and claude to generate a list of code that I could run to show potential issues and despite codex showing all was fine, was able to uncover a number of issues then here is where it got weird, it started arguing with me despite showing all the issues present. So I compiled all the issues in one document, shared the dockerfile and linux secomp config tile with claude and the also issues document. It gave me a list of fixes for the docker file to help with security hardening which I shared back with codex and that's when it fixed them.
Currently most of the issues were resolved but the whole process took me a whole week and I am still not yet done, was working most evenings. So I agree that you cannot create a usable product used by lots of users in 30 minutes not unless it's some static website. It's too much work of constant testing and iteration.
I have not been coding for a few years now. I was wondering if vibe coding could unstick some of my ideas. Here is my question, can I use TDD to write tests to specify what I want and then get the llm to write code to pass those tests?
You absolutely can. This is one of recommended directions with agentic coding. But you can go farther and ask llm to write tests too. The review/approve them.
Yes, I mostly do spec driven developement. And at the design stage, I always add in tests. I repeat this pattern for any new features or bug fixes, get the agent to write a test (unit, intergration or playwright based), reproduce the issue and then implement the change and retest etc... and retest using all the other tests.
That's a great approach, though I'd also recommend setting up a strong basis for linting, type checking, compilation, etc depending on the language. An LLM given a full test suite and guard rails of basic code style rules will likely do a pretty good job.
I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.
I came across the following yesterday: "The Great Way is not difficult for those who have no preferences," a famous Zen teaching from the Hsin Hsin Ming by Sengstan
As we move from tailors to big box stores I think we have to get used to getting what we get, rather than feeling we can nitpick every single detail.
I'd also be more interested in how his 3rd, 4th or 5th vibe coded app goes.
I keep seeing things that were vibe coded and thinking, "That's really impressive for something that you only spent that much time on".
To have a polished software project, you must spend time somewhat menially iterating and refining (as each type of user).
To have a polished software project,
you need to have started with tests and test coverage from the start for the UI, too.
Writing tests later is not as good.
I have taken a number of projects from a sloppy vibe coded prototype to 100% test coverage. Modern coding llm agents are good at writing just enough tests for 100% coverage.
But 100% test coverage doesn't mean that it's quality software, that it's fuzzed, or that it's formally verified.
Quality software requires extensive manual testing, iteration, and revision.
I haven't even reviewed this specific project; it's possible that the author developed a quality (CLI?) UI without e2e tests in so much time?
Was the process for this more like "vibe coding" or "pair programming with an LLM"?
The 80/20 rule doesn’t go away. I am an AI true believer and I appreciate how fast we can get from nothing to 80% but the last “20%” still takes 80%+ of the time.
Woodworking is an analogy that I like to use in deciding how to apply coding agents. The finished product needs to be built by me, but now I can make more, and more sophisticated, jigs with the coding agents, and that in turn lets me improve both quality and quantity.
Can you expand on this? You definitely don’t need 6 months for a note taking app to be useable it is more you need to compete with the state of the art right
It depends entirely on what you want. You can literally code a JavaScript 1-liner that will make a <textarea> then put the content back in the URL and it will work serverless on pretty much any platform with a Web browser.
You can also write a note taking app that will be federated yet private, that will have its own scripting language, etc. I mean you can yak-shave your way to write your own OS or even designing your own CPU for that.
So... I'm not sure that metric, time, means much without a proper context, including who does it. It's quite different if to do that, regardless of the tooling used, if you are a professional developer, designer, fullstack dev, prototypist, PM, marketer, writer, etc.
Ah, note taking as hobby finally explains to me why these apps seem so popular. I don't think I have ever considered that I need one. And it to be something that shouldn't be fully solved multiple times now. But it really being hobby does kinda make the point for me.
whatever you prototype - the one who built it in 6 month will have economy of scale to make it cheaper than your diy solution, and because they serve many customers and developed it for 6 months - their product will be 100x better than the one you diy
there is very very rare use case when diy makes sense. in 99% of cases its just a toy that feels nice as you kinda did it. but if you factor in the time etc it is always costs 100x more than $5/month you could usually buy
I can't say I'm impressed by this at all. 100+ hours to build a shitty NFT app that takes one picture and a predefined prompt, then mints you a dinosaur NFT. This is the kind of thing I would've seen college students slam out over a weekend for a coding jam with no experience and a few cans of red bull with more quality and effort. Has our standards really gotten so low? I don't see any craftsmanship at play here.
If you hear someone spouting off about how vibe coding allows for creation of killer apps in a fraction of the time/cost, just ask them if you can see what successful killer apps they’ve created with it. It’s always crickets at that point because it’s somewhere between wishful thinking and an outright lie.
Of course vibe coding is going to be a headache if you have very particular aesthetic constraints around both the code and UX, and you aren't capable of clearly and explicitly explaining those constraints (which is often hard to do for aesthetics).
There are some good points here to improve harnesses around development and deployment though, like a deployment agent should ask if there is an existing S3 bucket instead of assuming it has to set everything up. Deployment these days is unnecessarily complicated in general, IMO.
I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.
My thoughts on vibe coding vs production code:
- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs
- This is partly b/c it is good at things I'm not good at (e.g. front end design)
- But then I need to go in and double check performance, correctness, information flow, security etc
- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)
- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs
- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.
I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.
Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD options on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.
The test cases themselves becomes the foci - the LLM usually can't get them right.
I think there's a lot to pick apart here but I think the core premise is full of truth. This gap is real contrary to what you might see influencers saying and I think it comes from a lot of places but the biggest one is writing code is very different than architecting a product.
I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.
They're... launching an NFT product in 2026...
I know it's not the point of this article, but really?
Yep. As much as the rest of it resonated with LLM coding experiences I'm having, the NFT thing is unfortunate.
With sufficiently advanced vibe coding the need for certain type of product just vanishes.
I needed it, I quickly build it myself for myself, and for myself only.
Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.
I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it
And some people do, both things can be true. I'd rather make a tool just for me that breaks when I introduce a new requirement and I just add into it and keep going.
If we could return to one-off payments without dark patterns I would agree. Hopefully at least the software that rely on grift will start to vanish.
I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.
Some minor UX enhancement SaaS of the most recent VC-funded wave will do. Maybe those who forgot how to invest in R&D and spent last 20 years just fixing bugs. There’s plenty of SaaS on the market that offers added value beyond the code. Data brokers. Domain experts, etc. Even if homemade solution is sometimes possible, initial development costs are going to be just one of several important factors in choosing whether to build or to buy.
jira is a perfect example of an abysmal product that was marketed well.
How many products are actually like that? If I could easily replace github, datadog/sentry/whatever, cloudflare, aws, tailscale that would be great. In my view building and owning is better than buying or renting. Especially when it comes to data--it would be much better for me to own my telemetry data for example than to ship it off to another company. But I don't think you (or anyone) will be vibecoding replacements for these services anytime soon. They solve big, hard, difficult problems.
This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.
Perplexity just launched a tool that builds and hosts small bespoke tools.
I tried it works wells. I can do the same thing in my Linux machine, but even my 12 year old now can get perplexity to build him a tool to compare ram prices at different chinease vendors.
It makes sense if you want bespoke software to do a specific job in a way best suited to your workflow.
Could you do the same in eg. Photoshop? Maybe, but even if, you would need to learn how.
This seems more like he is bad at describing what he wants and is prompting for “a UI” and then iterating “no, not like that” for 99 hours.
Author admittedly didn’t know how to scale his app for thousands or hundreds of thousands of users. He jokes about it working great on localhost or “my machine”.
Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.
I have had the experience with creating https://swiftbook.dev/learn
Used Codex for the whole project. At first I used claude for the architect of the backend since thats where I usually work and got experience in. The code runner and API endpoints were easy to create for the first prototype. But then it got to the UI and here's where sh1t got real. The first UI was in react though I had specifically told it to use Vue. The code editor and output window were a mess in terms of height, there was too much space between the editor and the output window and no matter how much time I spent prompting it and explaining to it, it just never got it right. Got tired and opened figma, used it to refine it to what I wanted. Shared the code it generated to github, cloned the code locally then told codex to copy the design and finally it got it right.
Then came the hosting where I wanted the code runner endpoint to be in a docker container for security purpose since someone could execute malicious code that took over the server if I just hosted it without some protection and here it kept selecting out of date docker images. Had to manually guide it again on what I needed. Finally deployed and got it working especially with a domain name. Shared it with a few friends and they suggested some UI fixes which took some time.
For the runner security hardening I used Deepseek and claude to generate a list of code that I could run to show potential issues and despite codex showing all was fine, was able to uncover a number of issues then here is where it got weird, it started arguing with me despite showing all the issues present. So I compiled all the issues in one document, shared the dockerfile and linux secomp config tile with claude and the also issues document. It gave me a list of fixes for the docker file to help with security hardening which I shared back with codex and that's when it fixed them.
Currently most of the issues were resolved but the whole process took me a whole week and I am still not yet done, was working most evenings. So I agree that you cannot create a usable product used by lots of users in 30 minutes not unless it's some static website. It's too much work of constant testing and iteration.
I have not been coding for a few years now. I was wondering if vibe coding could unstick some of my ideas. Here is my question, can I use TDD to write tests to specify what I want and then get the llm to write code to pass those tests?
You absolutely can. This is one of recommended directions with agentic coding. But you can go farther and ask llm to write tests too. The review/approve them.
Yes, I mostly do spec driven developement. And at the design stage, I always add in tests. I repeat this pattern for any new features or bug fixes, get the agent to write a test (unit, intergration or playwright based), reproduce the issue and then implement the change and retest etc... and retest using all the other tests.
That's a great approach, though I'd also recommend setting up a strong basis for linting, type checking, compilation, etc depending on the language. An LLM given a full test suite and guard rails of basic code style rules will likely do a pretty good job.
I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.
Yes
I came across the following yesterday: "The Great Way is not difficult for those who have no preferences," a famous Zen teaching from the Hsin Hsin Ming by Sengstan
As we move from tailors to big box stores I think we have to get used to getting what we get, rather than feeling we can nitpick every single detail.
I'd also be more interested in how his 3rd, 4th or 5th vibe coded app goes.
If you ask for something complicated this headline is more than true. But why complicate things, keep it simple and keep it fast.
Also this article uses 'pfp' like it's a word, I can't figure out what it means.
I'm able to vibe code simple apps in 30 minutes, polish it in four hours and now I've been enjoying it for 2 months.
I noticed this as well. I had to look it up. Apparently ‘pfp’ means ‘profile picture’.
Apparently it means profile photo.
I keep seeing things that were vibe coded and thinking, "That's really impressive for something that you only spent that much time on".
To have a polished software project, you must spend time somewhat menially iterating and refining (as each type of user).
To have a polished software project, you need to have started with tests and test coverage from the start for the UI, too.
Writing tests later is not as good.
I have taken a number of projects from a sloppy vibe coded prototype to 100% test coverage. Modern coding llm agents are good at writing just enough tests for 100% coverage.
But 100% test coverage doesn't mean that it's quality software, that it's fuzzed, or that it's formally verified.
Quality software requires extensive manual testing, iteration, and revision.
I haven't even reviewed this specific project; it's possible that the author developed a quality (CLI?) UI without e2e tests in so much time?
Was the process for this more like "vibe coding" or "pair programming with an LLM"?
this is why i use ai just for one file at the time, as extension of my own programming. not so fast, but keeps control
The 80/20 rule doesn’t go away. I am an AI true believer and I appreciate how fast we can get from nothing to 80% but the last “20%” still takes 80%+ of the time.
The old rules still apply mainly.
Woodworking is an analogy that I like to use in deciding how to apply coding agents. The finished product needs to be built by me, but now I can make more, and more sophisticated, jigs with the coding agents, and that in turn lets me improve both quality and quantity.
>> people who say they "vibecoded an app in 30 minutes" are either building simple copies of existing projects,
those are not copies, they aren't even features. usually part of a tiny feature that barely works only in demo.
with all vibe coding in the world today you still need at least 6 months full time to build a nice note taking app.
If we are talking something more difficult - it will be years - or you will need a team and it will still take a long time.
Everything less will result in an unusable product that works only for demo and has 80% churn.
Can you expand on this? You definitely don’t need 6 months for a note taking app to be useable it is more you need to compete with the state of the art right
I'd argue you need between 6 minutes and 6 years.
It depends entirely on what you want. You can literally code a JavaScript 1-liner that will make a <textarea> then put the content back in the URL and it will work serverless on pretty much any platform with a Web browser.
You can also write a note taking app that will be federated yet private, that will have its own scripting language, etc. I mean you can yak-shave your way to write your own OS or even designing your own CPU for that.
So... I'm not sure that metric, time, means much without a proper context, including who does it. It's quite different if to do that, regardless of the tooling used, if you are a professional developer, designer, fullstack dev, prototypist, PM, marketer, writer, etc.
> Can you expand on this?
sure. does your note taking app supports formatting? you don't need it today. you will need it at some point. images? same.
does it handle file corruption etc? no? then its pretty much useless.
does it work across devices? in modern world, again, it is pretty much useless without it
it works across devices? then it needs hosting. if it is hosted it needs auth, it needs backups
you can go on for ever.
the bar for very minimal note taking app that you actually will use is very high, with other software it is even higher.
and this is not even state of art, this is must haves
What universe do you live in
>with all vibe coding in the world today you still need at least 6 months full time to build a nice note taking app.
Bad example, note apps loaded with features are anti-productive and are for people who treat note taking as a hobby itself.
You have Obsidian anyway if you want something open source to work with.
Ah, note taking as hobby finally explains to me why these apps seem so popular. I don't think I have ever considered that I need one. And it to be something that shouldn't be fully solved multiple times now. But it really being hobby does kinda make the point for me.
You seem to be making the assumption that "app" means "sellable product", rather than "one off that works for me". It doesn't.
When everyone is able to make their own one off prototype in 30 minutes, no one will pay for the thing that took someone 6 months.
whatever you prototype - the one who built it in 6 month will have economy of scale to make it cheaper than your diy solution, and because they serve many customers and developed it for 6 months - their product will be 100x better than the one you diy
there is very very rare use case when diy makes sense. in 99% of cases its just a toy that feels nice as you kinda did it. but if you factor in the time etc it is always costs 100x more than $5/month you could usually buy
I can't say I'm impressed by this at all. 100+ hours to build a shitty NFT app that takes one picture and a predefined prompt, then mints you a dinosaur NFT. This is the kind of thing I would've seen college students slam out over a weekend for a coding jam with no experience and a few cans of red bull with more quality and effort. Has our standards really gotten so low? I don't see any craftsmanship at play here.
Look at the screenshots to understand what the author means by 'product'.
We don't need to shit on someone who shared their experiences and thoughts.
This would have been generic slop if it wasn't for AI.
If you hear someone spouting off about how vibe coding allows for creation of killer apps in a fraction of the time/cost, just ask them if you can see what successful killer apps they’ve created with it. It’s always crickets at that point because it’s somewhere between wishful thinking and an outright lie.
Of course vibe coding is going to be a headache if you have very particular aesthetic constraints around both the code and UX, and you aren't capable of clearly and explicitly explaining those constraints (which is often hard to do for aesthetics).
There are some good points here to improve harnesses around development and deployment though, like a deployment agent should ask if there is an existing S3 bucket instead of assuming it has to set everything up. Deployment these days is unnecessarily complicated in general, IMO.