I prefer the ancient Chinese science of Oracle Bone divination. You take the scapulae of an ox and copy the PR diff onto the bone using jiǎgǔwén encoding, then throw it in a fire until thermal expansion causes the bone to crack.
You then take a photo of the cracked bone and feed it back to your coding agent, which has been properly trained in interpreting Oracle Bones to extract PR review comments.
If the PR is too big too fit on the bone, you reject it for being too big. If after three rounds of review the bones keep cracking in the same spot, reject the PR. You accept the PR once the bone starts to seep bone marrow before cracking (it will crack first if there are any PR comments left)
So before the gate keeping starts, my first crack at optimizations was in 1987 on a 1 MHz Apple //e
1. I wrote the code in BASIC
2. I wrote the code in assembly
3. I got more improvement because storing and reading from the first page of memory took two clock cycles instead of 3.
But this isn’t 1986, this is 2026. I “vibe coded” my first project this year. I designed the AWS architecture from the an empty account using IAC, I chose every service to use, I verified every permission, I chose and designed the orchestration, the concurrency model, I gathered requirements. What I didn’t do is look at a line of Python code or infrastructure code aside from the permissions that Codex generated.
Now to answer your questions :
How did I validate the correctness? Just like if I had written it myself. I had Codex to create a shell script to do end to end tests of all of the scenarios I cared about and when one broke, I went back to Codex to fix it. I was very detailed about the scenarios.
The web front end that I used was built by another developer. I haven’t touched web dev in a decade. I told Codex what changes I needed and I verified the changes by deploying it and testing it manually.
How did I validate the performance? Again just like I would do on something I wrote. I tested it first with a few hundred transactions to verify the functionality and then I stress tested it with a real world amount a transactions. The first iteration broke horribly. Not because of Claude code. It was a bad design.
But here’s the beauty. It took me a day to do the bad implementation that would have taken me three or four days. I then redesigned it, didn’t use the AWS service and did I designed that was much more scalable and it took a day. I knew in theory how it worked under the hood. But not in practice. Again I tested for scalability by testing the result.
The architectural quality? I validated it by synthesizing real world traffic. ChatGPT in thinking mode did find a subtle concurrency bug. That was my fault though. I designed the concurrency implementation, Codex just did what I told it to do.
Subtle bugs happen whether people write it or an agent writes it. You do the best you can with your tests and when they come up you fix it?
How do I prevent technical debt? All large implementations have technical debt. Again just like when I lead a team - I componenitize everything with clean interfaces. It makes it easier for coding agents and people.
You don’t. A JS dev isn’t going to catch an uninitialized variable in C and probably doesn’t even know the damage nasal demons can cause. You either throw more LLMs at it or learn the language.
I prefer the ancient Chinese science of Oracle Bone divination. You take the scapulae of an ox and copy the PR diff onto the bone using jiǎgǔwén encoding, then throw it in a fire until thermal expansion causes the bone to crack.
You then take a photo of the cracked bone and feed it back to your coding agent, which has been properly trained in interpreting Oracle Bones to extract PR review comments.
If the PR is too big too fit on the bone, you reject it for being too big. If after three rounds of review the bones keep cracking in the same spot, reject the PR. You accept the PR once the bone starts to seep bone marrow before cracking (it will crack first if there are any PR comments left)
So before the gate keeping starts, my first crack at optimizations was in 1987 on a 1 MHz Apple //e
1. I wrote the code in BASIC
2. I wrote the code in assembly
3. I got more improvement because storing and reading from the first page of memory took two clock cycles instead of 3.
But this isn’t 1986, this is 2026. I “vibe coded” my first project this year. I designed the AWS architecture from the an empty account using IAC, I chose every service to use, I verified every permission, I chose and designed the orchestration, the concurrency model, I gathered requirements. What I didn’t do is look at a line of Python code or infrastructure code aside from the permissions that Codex generated.
Now to answer your questions :
How did I validate the correctness? Just like if I had written it myself. I had Codex to create a shell script to do end to end tests of all of the scenarios I cared about and when one broke, I went back to Codex to fix it. I was very detailed about the scenarios.
The web front end that I used was built by another developer. I haven’t touched web dev in a decade. I told Codex what changes I needed and I verified the changes by deploying it and testing it manually.
How did I validate the performance? Again just like I would do on something I wrote. I tested it first with a few hundred transactions to verify the functionality and then I stress tested it with a real world amount a transactions. The first iteration broke horribly. Not because of Claude code. It was a bad design.
But here’s the beauty. It took me a day to do the bad implementation that would have taken me three or four days. I then redesigned it, didn’t use the AWS service and did I designed that was much more scalable and it took a day. I knew in theory how it worked under the hood. But not in practice. Again I tested for scalability by testing the result.
The architectural quality? I validated it by synthesizing real world traffic. ChatGPT in thinking mode did find a subtle concurrency bug. That was my fault though. I designed the concurrency implementation, Codex just did what I told it to do.
Subtle bugs happen whether people write it or an agent writes it. You do the best you can with your tests and when they come up you fix it?
How do I prevent technical debt? All large implementations have technical debt. Again just like when I lead a team - I componenitize everything with clean interfaces. It makes it easier for coding agents and people.
You don’t. A JS dev isn’t going to catch an uninitialized variable in C and probably doesn’t even know the damage nasal demons can cause. You either throw more LLMs at it or learn the language.
https://factory.strongdm.ai has some advice I've found useful.
How can a tree fall in the wood where nobody there?
You do a cross analysis.
- Compile it with the maximum number of warnings enabled
- Run linters/analyzers/fuzzers on it
- Ask another LLM to review it
if you audit it, then you're not vibing.
Isn't that like proofreading text in a language you're not familiar with?
That's the neat part - you don't!
by burying your head in the sand and convincing yourself that the llm doesn't generate any slop.