Hi HN I built Bauform to generate production-ready code, not demos

(bauform-beta.fly.dev)

1 points | by tekodu 14 hours ago ago

1 comments

tekodu 14 hours ago

Tested against GPT-5 and Claude 4.5 on 10 production specifications: - Bauform: 10/10 pass all validation gates - GPT-5: 0/10 (generates Streamlit UIs instead of REST APIs) - Claude 4.5: 0/10 (same failure)

The problem: Frontier models pattern-match "build a validator" → "create Streamlit demo" regardless of actual requirements asking for production APIs.

Try it yourself: - Live beta: https://bauform-beta.fly.dev/ - Benchmark with all results: https://github.com/tekodu/bauform-evals - Quick API test: curl -X POST https://bauform-beta.fly.dev/v1/engine/generate \ -H "Content-Type: application/json" \ -d '{"spec": "CSV validator with REST API", "params": {}}' - Analysis paper (under peer review): https://www.dropbox.com/scl/fi/vtmztpdkm0ns86qapxp5p/bauform...

We use 5-gate validation: functional, security, limits, latency, stability. Binary pass/fail - production either works or doesn't.

The results are cryptographically signed (Ed25519) and fully reproducible.

Happy to answer questions about the methodology or system architecture.