Typed languages are better suited for vibecoding

(solmaz.io)

143 points | by hosolmaz 7 hours ago ago

121 comments

timuckun 5 hours ago

It's been my experience that strongly opinionated frameworks are better for vibe coding regardless of the type system.

For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.

Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.

[-]

siva7 2 hours ago

In comparison a completely unopinionated framework like fastapi, which got a popularity boost in the early a.i. surge, is a mess to work with if you are vibe coding. Most popular frameworks follow the principle of having no clear way how to do things and leave it up to the developer. Opinionated frameworks got out of fashion after rails but it turns out those are significantly better suited for a.i. assisted development.

[-]

solumunus 40 minutes ago

You can opinionate Claude remarkably well with context files. I use a very barebones routing framework with my own architecture and Claude knows how all the parts should fit together. I also publish to context files the entire database structure along with foreign key pairings, that made a tremendous difference.

yurishimo an hour ago

That's an interesting assertion you make there about opinionated frameworks. Do you have a source for that? From my perspective, opinionated frameworks have only gotten more popular. Rails might not be the darling of every startup in existence anymore but I think that's largely down to other languages coming in and adopting the best parts of Rails and crafting their own flavor that plays to the strengths of their favorite programming language. Django, Laravel, Spring Boot, Blazor, Phoenix, etc etc.

While a lot of people here on this platform like to tinker and are often jumping to a new thing, most of my colleagues have no such ideas of grandeur and just want something that works. Rails and it's acolytes work really well. I'm curious to know what popular frameworks you're referencing that don't fit into this Rails-like mold?

[-]

siva7 36 minutes ago

I'm not familiar with all frameworks you listed, but i've worked extensively with spring boot and i can assert you that it's not a opinionated framework (as in one way how to do things correctly). Blazor and Phoenix are niche frameworks that don't have wide adoption outside this site. Django has a shared history/competition with Rails but it's also not widely popular.

topato 5 hours ago

This is pretty anecdotal, but it feels like most of the published rails source code you find online (and by extension, an LLM has found) is from large, stable, and well-documented code.

[-]

rafamvc 5 hours ago

Claude code with rails is amazing. Should out to Obie for the Claude on rails. Works phenomenally well.

globular-toast 10 minutes ago

Well yeah, it's like how a 5 year old can talk about what they want in their sandwich but will probably struggle to describe the flavours and textures they enjoy in detail.

delifue 5 hours ago

In my experience Gemini can one-shot go apps. Determining it requires sound eval instead of anecdotes.

[-]

timuckun 3 hours ago

My experience with Gemini has been pretty dismal. The CLI works much better than the VS code extension and both of them have struggled with one shotting go. Single files or single functions no problem though.

Tostino 4 hours ago

I'd really like to know what type of apps you're actually one-shotting with an AI. Seriously, can you please give me some example code or something because it seems like anything past a trivial program that doesn't actually do what you specified is far beyond their capabilities.

[-]

bobro 2 hours ago

if AI could really one-shot important, interesting apps, shouldn’t we be seeing them everywhere? where’s the surge of new apps that are so trivial to make? who’s hiding all this incredible innovation that can be so easily generated?

[-]

jdiff an hour ago

If AI could really accelerate or even take over the majority of work on an established codebase, we should be seeing a revolution in FOSS libraries and ecosystems. The gap has been noted many times, but so far all anyone's been able to dig up are one-off, laboriously-tended-to pull requests. No libraries or other projects with any actual downstream users.

EGreg 5 hours ago

Basically it's like this:

the more constraints you have, the more freedom you have to "vibe" code

and if someone actually built AI for writing tests, catching bugs and iterating 24/7 then you'd have something even cooler

[-]

delta_p_delta_x 39 minutes ago

> if someone actually built AI for writing tests, catching bugs and iterating 24/7

This is called a nightly CI/CD pipeline.

Run a build and run all tests and run all coverage at midnight, failed/regressed tests and reduced coverage automatically are assigned to new tickets for managers to review and assign.

woodruffw 5 hours ago

> I am managing projects in languages I am not fluent in—TypeScript, Rust and Go—and seem to be doing pretty well.

This framing reminds me of the classic problem in media literacy: people know when a journalistic source is poor when they’re a subject matter expert, but tend to assume that the same source is at least passably good when less familiar with the subject.

I’ve had the same experience as the author when doing web development with LLMs: it seems to be doing a pretty good job, at least compared to the mess I would make. But I’m not actually qualified to make that determination, and I think a nontrivial amount of AI value is derived from engineers thinking that they are qualified as such.

[-]

muglug 4 hours ago

Yup — this doesn't match my experience using Rust with Claude. I've spent 2.5 years writing Rust professionally, and I'm pretty good at it. Claude will hallucinate things about Rust code because it’s a statistical model, not a static analysis tool. When it’s able to create code that compiles, the code is invariably inefficient and ugly.

But if you want it to generate chunks of usable and eloquent Python from scratch, it’s pretty decent.

And, FWIW, I’m not fluent in Python.

[-]

micahscopes 3 hours ago

With access to good MCP tools, I've had really good experience using claude code to write rust: https://news.ycombinator.com/item?id=44702820

[-]

phi-go 2 hours ago

What MCP tools are you using?

js2 4 hours ago

Hah... yeah, no, its Python isn't great. I'd definitely workable and better than what I see from 9/10 junior engineers, but it tends to be pretty verbose and over-engineered.

My repos all have pre-commit hooks which run the linters/formatters/type-checkers. Both Claude and Gemini will sometimes write code that won't get past mypy and they'll then struggle to get it typed correct before eventually by passing the pre-commit check with `git commit -n`.

I've had to add some fairly specific instructions to CLAUDE.md/GEMINI.md to get them to cut this out.

Claude is better about following the rules. Gemini just flat out ignores instructions. I've also found Gemini is more likely to get stuck in a loop and give up.

That said, I'm saying this after about 100 hours of experience with these LLMs. I'm sure they'll get better with their output and I'll get better with my input.

[-]

physicsguy an hour ago

To be fair, depending on what libraries you’re using, Python typing isn’t exactly easy even for a human, I spend more time battling with type checkers and stubs than I would like.

Mockapapella 3 hours ago

> When it’s able to create code that compiles, the code is invariably inefficient and ugly.

Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?

[-]

jdiff an hour ago

I'd be interested to know the answer to this as well. Considering the wealth of AI IDE integrations, it's very eyebrow-raising that there are zero instances of this. Seems like somewhat low hanging fruit to rule out tokens that are clearly syntactically or semantically invalid.

tayo42 3 hours ago

> Claude will hallucinate things about Rust code because it’s a statistical model, not a static analysis tool.

I think that's the point of the article.

In a dynamic language or a compiled language, its going to be hallucinating either way. If you vibe coding the errors are caught earlier so you can vibe code them away before it blows up at run time.

[-]

muglug 3 hours ago

Static analysis tools like rustc and clippy are powerful, but there are large classes of errors that escape those analyses — e.g. things like off-by-one errors.

js2 4 hours ago

After decades of writing software, I feel like I have a pretty good sense for "this can't possibly be idiomatic" in a new language. If I sniff something is off, I start Googling for reference code, large projects in that language, etc.

You can also just ask the LLM: are you sure this is idiomatic?

Of course it may lie to you...

[-]

NitpickLawyer 3 hours ago

> You can also just ask the LLM: are you sure this is idiomatic?

I found the reverse flow to be better. Never argue. Start asking questions first. "What is the idiomatic way of doing x in y?" or "Describe idiomatic y when working on x" or similar.

Then gather some stuff out of the "pedantic" generations and add to your constraints, model.md, task.md or whatever your stuff uses.

You can also use this for a feedback loop. "Here's a task and some code, here are some idiomatic concepts in y, please provide feedback on adherence to these standards".

woodruffw 4 hours ago

> If I sniff something is off, I start Googling for reference code, large projects in that language, etc.

This works so long as you know how to ask the question. But it's been my experience that an LLM directed on a task will do something, and I don't even know how to frame its behavior in language in a way that would make sense to search for.

(My experience here is with frontend in particular: I'm not much of a JS/TS/HTML/CSS person, and LLMs produce outputs that look really good to me. But I don't know how to even begin to verify that they are in fact good or idiomatic, since there's more often than not multiple layers of intermediating abstractions that I'm not already familiar with.)

[-]

fc417fc802 2 hours ago

> and I don't even know how to frame its behavior in language in a way that would make sense to search for.

Have you tried recursion? Something like: "Using idiomatic terminology from the foo language ecosystem, explain what function x is doing."

If all goes well it will hand you the correct terminology to frame your earlier question. Then you can do what the adjacent comment describes and ask it what the idiomatic way of doing p in q is.

[-]

woodruffw an hour ago

I think you’re missing the point. The point is that I’m not qualified to evaluate the LLM’s output in this context. Having it self-report doesn’t change that fact, it’s just playing hide the pickle by moving the evaluation around.

bravesoul2 5 hours ago

Why I only use it on stuff I can properly judge.

giantrobot 5 hours ago

Gell-Mann Amnesia [0]

[0] https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect

[-]

woodruffw 5 hours ago

Thank you! I couldn’t remember the term.

teiferer 6 minutes ago

Nit upfront: Python is typed, just not statically typed.

What dynamically typed languages lack in compile-time safety, the programmer must make up using (automated) testing. With adequate tests, a python program doesn't break more than a Rust or Go program. It's just that people often regard testing as an annoying chore which is the first thing they skip when vibe coding (or "going fast and breaking things" which is then literally what happens).

[-]

tonyhart7 2 minutes ago

"a python program doesn't break more than a Rust or Go program"

but it is tho, You literally can just give LLM to check LSP to analyze early it for you without write test to begin, Their LSP and Compiler is just that smart

isodev 8 minutes ago

Typed but maybe with the exception of the likes of Swift where Claude reveals just how complex and ambiguous the language can be. The lack of documentation and overly complex proposal documents also appear to overload the LLM context and confuse them.

sshine 9 minutes ago

I've been vibe-coding for weeks in Rust, and it works great.

I've been vibe-coding for a few days in Haskell, and I don't like the result.

Maybe I am just accustomed to being ok with verbose Rust, while Haskell comes with a great potential for elegance that the LLM does not explore.

Regardless, the argument that types guide the LLM in a very reliable way holds in both cases.

Reubend 4 hours ago

While I agree with the main thesis here, I find this extremely worrying:

> I am amazed every time how my 3-5k line diffs created in a few hours don’t end up breaking anything, and instead even increase stability.

In my personal opinion, there's no way you're going to get a high quality code base while adding 3,000 - 5,000 lines of code from LLMs on a regular basis. Those are huge diffs.

[-]

pablitokun 2 hours ago

Yeah imagine one of your colleagues doing this to your code base...

[-]

a96 an hour ago

Imagine a reviewer that doesn't block that patch immediately.

Of course, there might be some exceptions like if the codebase for some reason has some massive fixed tables or imports upstream files that may get updated occasionally. Those end up as massive patches or sets.

misja111 an hour ago

My experience with Python and Scala so far is different. With Python the LLM's do a pretty good job. The code always compiles, sometimes there are some logical or architectural errors but that's it.

With Scala, I have to give the LLM a super simple job, e.g. creating some mock data for a unit test, and even then it frequently screws up; every now and then it gives me code that doesn't even compile. So much for Scala's strong type system ..

lukev 6 hours ago

As has been said, actual evals are needed here.

Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.

In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.

[-]

resonious 5 hours ago

Right, I've noticed agents are very trigger happy with 'any'.

I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".

[-]

smackeyacky 5 hours ago

Experienced devs coming in to TypeScript are also trigger happy with 'any' until they work out what's going on. Especially if they've come from Javascript.

monkpit 2 hours ago

I’ve tried enforcing no-explicit-any just to have the agent disable the linter rule. I guess I didn’t say you couldn’t do that…

rossjudson 2 hours ago

LLMs are minimizing energy to solve problems, and if they can convince the human to go away happy with 'any', so be it.

There's a fine line between gradient descent, pedantry, and mocking. I suspect we will learn more about it.

energy123 5 hours ago

The question can be asked two ways:

(1) Are current LLMs better at vibe coding typed languages, under some assumptions about user workflow?

(2) Are LLMs as a technology more suited to typed languages in principle, and should RL pipelines gravitate that way?

mewpmewp2 5 hours ago

This is why I have very specific ruleset and linting for my LLMs, not allowing any at all and other quality checks.

[-]

Mtinie 5 hours ago

Is this a shareable ruleset? I would completely understand if not but I’m interested in learning new ways to interact with my tools.

monkpit 2 hours ago

Until the agent disables the linter rule without you noticing!

[-]

js2 an hour ago

Yup. I've watched both Claude and especially Gemini get frustrated trying to deal with my pre-commit checks (usually mypy) and deciding to do `git commit -n` even though my rules tell explicitly, multiple times, that it's never okay to bypass the pre-commit checks.

solatic an hour ago

It's not so much typing that is valuable for vibecoding, but being able to give the agent hooks into tooling that provides negative feedback for errors. The easiest is typing, sure, because it's built into the compiler. But you can also add in static analysis linters and automated testing, including - notably - testing for performance.

Of course, you have to tell the agent to set up static analysis linters first, and tell the agent to write tests. But then it'll obey them.

The reason why large enterprises could hire armies of juniors in the past, safely, was because they set up all manner of guardrails that juniors could bounce off of. Why would you "hire" a "junior" agent without the same guardrails in place?

linkage 6 hours ago

This claim needs to be backed up by evals. I could just as well argue the opposite, that LLMs are best at coding Python because there are two orders of magnitude more Python in their training sets than C++ or Rust.

In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.

[-]

js2 an hour ago

ty still misses things caught by mypy. It also doesn't have the same level of support for Pydantic yet. I use it (because it's so damn fast), but along with mypy, not a replacement yet.

Yes, mypy is slow, but who cares if it's the agent waiting on it to complete.

yibers 5 hours ago

I agree that the training sets for LLMs have much more training data for Python than for Rust. But C++ has existed before Python I believe. So I doubt there is 2 orders of magnitude of Python code more than C++.

[-]

a96 an hour ago

Python is pretty old, so I had a quick look.

https://en.wikipedia.org/wiki/C%2B%2B#History

In 1985, the first edition of The C++ Programming Language was released, which became the definitive reference for the language, as there was not yet an official standard.[31] The first commercial implementation of C++ was released in October of the same year.[28]

In 1998, C++98 was released, standardizing the language, and a minor update (C++03) was released in 2003.

https://en.wikipedia.org/wiki/History_of_Python

The programming language Python was conceived in the late 1980s,[1] and its implementation was started in December 1989[2] by Guido van Rossum at CWI in the Netherlands as a successor to ABC capable of exception handling and interfacing with the Amoeba operating system.[3]

Python reached version 1.0 in January 1994.

Of course it's hard to say how much that is reflected in code available and is any of the old code still valid input for modern use. It does broadly look like c++ is older, in general.

hibikir 5 hours ago

You miss how many fewer programmers were there in the early years, how much of that code was ever public, and even if it was, how useful it was, as C++ has changed drastically since, say, what we used to write in 2001.

vidarh 5 hours ago

It's not just a question of whether there is more actual code in a given language, but how much is available in the public and private training data.

I've done work on reviewing and fine-tuning training data with a couple of providers, and the amount of Python code I got to see at least out-distanced C++ code by far more than 2 orders of magnitude. It could be a heavily biased sample, but I have no problems believing it also could be representative.

dccsillag 5 hours ago

I think you vastly overestimate the capacity of Python typing.

herrington_d 6 hours ago

The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.

Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.

I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`

The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.

[-]

mithras 5 hours ago

In my experience you can get around it by having a linter rule disallowing it and using a local claude file instructing it to fix the linting issues every time it does something.

[-]

vidarh 5 hours ago

You can equally get around a significant portion of the purported issues with dynamically typed languages by having Claude run tests, and try to run the actual code.

I have no problem believing they will handle some languages better than others, but I don't think we'll know whether typing makes a significant difference vs. other factors without actual tests.

herrington_d 5 hours ago

it does not always work in my experience due to complex type definitions. Also extra tool calls and time are needed to fix linting.

MattGaiser 5 hours ago

Or just bad training data. I've seen "any" casually used everywhere.

jjcm 5 hours ago

I've noticed a fairly similar pattern. I particularly like vibecoding with golang. Go is extremely verbose, which makes it almost like an opposite perl - writing go is a bad experience, but reading go is delightful. The verbosity of golang makes it so you're able to always jump in and understand context, often from just a single file.

Pre-llms, this was an up front cost when writing golang, which made the cost/benefit tradeoff often not worth it. With LLMs, the cost of writing verbose code not only goes down, it forces the LLM to be strict with what it's writing and keeps it on track. The cost/benefit tradeoff has increased greatly in go's favor as a result.

[-]

WD-42 2 hours ago

No shade on Go but you kinda just said that the language has always looked like AI generated code and this works in its favor now because you don’t actually have to write it anymore. Funny, but not sure I’d consider that in Go’s favor.

MutedEstate45 2 hours ago

The real win isn't static vs dynamic typing. It's immediate, structured feedback for LLM iteration. cargo check gives the LLM a perfectly formatted error it can fix in the next iteration. Python's runtime errors are often contextless ('NoneType has no attribute X') and only surface after execution. Even with mypy --strict, you need discipline to check it constantly. The compiler makes good feedback unavoidable.

levocardia an hour ago

Interesting...my experience has been that LLMs are generally better at more common languages (not surprising: more data exists in those languages!). So, my admittedly amateur vibe coding experiences have been best in Python and pretty vanilla web development setups. When I push LLMs to, say, fit advanced statistical models in R, they fall apart pretty badly. Yet they can crush a PyTorch or SciKitLearn task no problem.

jbellis 5 hours ago

I'm really shocked at how slow people are to realize this, because it's blindingly obvious. I guess that just shows how much the early adopter crowed is dominated by python and javascript.

(BTW the answer is Go, not Rust, because the other thing that makes a language well suited for AI development is fast compile times.)

[-]

woodruffw 4 hours ago

My experience with agent-assisted programming in Rust is that the agent typically runs `cargo check` instead of `cargo build` for this exact reason -- it's much faster and catches the relevant compilation errors.

(I don't have an opinion on one being better than the other for LLM-driven development; I've heard that Go benefits from having a lot more public data available, which makes sense to me and seems like a very strong advantage.)

rbalicki 3 hours ago

Folks here may be interested in checking out Isograph. In [this conference talk](https://www.youtube.com/watch?v=sf8ac2NtwPY), I vibe code an Isograph app, and make non-trivial refactors to it using Cursor. This is only feasible because the interface between components is very simple, and all the hard stuff (generating a query for exactly the needed data, wiring things up, etc.) is done deterministically, by a compiler.

It's not quite the same principal OP articulates, which is that a compiler provides safety and that certainty lets you move fast when vibe coding. Instead, what I'm claiming is that you can move fast by allowing the LLM to focus on fewer things. (Though, incidentally, the compiler does give you that safety net as well.)

chrisjharris 6 hours ago

I've been wondering about this for some time. My initial assumption was that would be that LLMs will ultimately be the death of typed languages, because type systems are there to help programmers not make obvious mistakes, and near-perfect LLMs would almost never make obvious mistakes. So in a world of near-perfect LLMs, a type system is just adding pointless overhead.

In this current world of quite imperfect LLMs, I agree with the OP, though. I also wonder whether, even if LLMs improve, we will be able to use type systems not exactly for their original purpose but more as a way of establishing that the generated code is really doing what we want it to, something similar to formal verification.

[-]

ImprobableTruth 5 hours ago

Even near-perfect LLMs would benefit from the compiler optimizations that types allow.

However perfect LLMs would just replace compilers and programming languages above assembly completely.

exclipy 5 hours ago

The closest we got to vibe coding pre-LLMs was using a language with a very good strong type system in a good IDE and hitting Ctrl-Space to autocomplete your way to a working program.

I wonder if LLMs can use the type information more like a human with an IDE.

eg. It generates "(blah blah...); foo." and at that point it is constrained to only generate tokens corresponding to public members of foo's type.

Just like how current gen LLMs can reliably generate JSON that satisfies a schema, the next gen will be guaranteed to natively generate syntactically and type- correct code.

[-]

koolba 5 hours ago

> I wonder if LLMs can use the type information more like a human with an IDE.

Just throw more GPUs at the problem and generate N responses in parallel and discard the ones that fail to match the required type signature. It’s like running a linter or type check step, but specific to that one line.

[-]

xwolfi 5 hours ago

We have infinite uranium anyway !

esafak 4 hours ago

LLMs can use LSPs. https://en.wikipedia.org/wiki/Language_Server_Protocol

treyd 5 hours ago

You already can use LLM engines that force generation according to an arbitrary CFG definition. I am not aware of any systems that apply that to generating actual programming language code.

J_Shelby_J 5 hours ago

Writing rust and the LLM almost never gets function signatures and returns types wrong.

That just leaves the business logic to sort out. I can only imagine that IDEs will eventually pair directly with the compiler for instant feedback to fix generations.

But rust also has traits, lifetimes, async, and other type flavors that multiples complexity and causes issues. It also an in progress language… im about to add a “don’t use once cell.. it’s part of std now “ to my system prompt. So it’s not all sunshine, and I’m deeply curious how a pure vibe coded rust app would turn out.

[-]

762236 5 hours ago

Gemini has been doing a fantastic job for me for Rust

randomifcpfan 3 hours ago

Here’s a study that found that for small problems Gemini is almost equally good at Python and Rust. Looking at the scores of all the languages tested, it seems that the popularity of the language is the most important factor:

https://jackpal.github.io/2025/03/29/Gemini_2.5_Pro_Advent_o...

[-]

whytevuhuni an hour ago

But isn't it the case that Python is vastly more popular than Rust?

If Gemini is equally good at them in spite of that, doesn't that mean it'd be better at Rust than at Python if it had equal training in both?

anupshinde 5 hours ago

I am comfortable with both Python and Go. I prefer Go for performance; however, the earlier issue was verbosity.

It is easier to write things using a Python dict than to create a struct in Go or use the weird `map[string]interface{}` and then deal with the resulting typecast code.

After I started using GitHub Copilot (before the Agents), that pain went away. It would auto-create the field names, just by looking at the intent or a couple of fields. It was just a matter of TAB, TAB, TAB... and of course I had to read and verify - the typing headache was done with.

I could refactor the code easily. The autocomplete is very productive. Type conversion was just a TAB. The loops are just a TAB.

With Agents, things have become even better - but also riskier, because I can't keep up with the code review now - it's overwhelming.

SteveJS 5 hours ago

I think this is true -- especially for new code.

I did this not knowing any rust: https://github.com/KnowSeams/KnowSeams and rust felt like a very easy to use a scripting language.

[-]

NitpickLawyer 2 hours ago

Really cool stuff, I appreciate you sharing this.

Although, to be fair this is far from vibecoding. Your setup, at a glance, says a lot about how you use the tools, and it's clear you care about the end result a lot.

You have a PRD file, your tasks are logged, each task defines both why's and how's, your first tasks are about env setup, quality of dev flow, exploration and so on. (as a nice tidbit, the model(s) seem to have caught on to this, and I see some "WHY:" as inline comments throughout the code, with references to the PRD. This feels nice)

It's a really cool example of "HOW" one should approach LLM-assisted coding, and shows that methods and means matter more than your knowledge in langx or langy. You seem to have used systems meant to help you in both speed of dev and ease of testing that what you got is what you need. Kudos!

I might start using your repo as a good example of good LLM-assisted dev flows.

xwolfi 5 hours ago

That seems a little bit dangerous, why not do it in a language you know ? Plus, this is not launching rockets on the moon, it's a sentence splitter with a fancy state machine (probably very useful in your niche, not a critique) - the difficulty was for you to put the effort to build a complicated state machine, the rest was frankly... not very LLM-needing and now you can't maintain your own stuff without Nvidia burning uranium.

Did the LLM help at all in designing the core, the state machine itself ?

[-]

SteveJS 4 hours ago

Nah it was a hobby project because I was laid off for a bit.

Rust's RegEx was perfect because it doesn't allow anything that isn't a DFA. Yes-ish, the LLM facilitated designing the state machine, because it was part of the dev-loop I was trying out.

The speed is primarily what enabled finding all of the edge cases I cared about. Given it can split 'all' of a local project gutenberg mirror in < 10 seconds on my local dev box I could do things I wouldn't otherwise attempt.

The whole thing is there in the ~100 "completed tasks" directory.

NischalM 6 hours ago

I have found this to be true as well. Although I exclusively used python and R at work and tried CC several times for small side projects, it always seemed to have problems and ended up in a loop trying to fix its own errors. CC seems much better at vibe coding with typescript. I went from no knowledge of node.js development to deploying reasonable web app on vercel in a few days. Asking CC to run tsc after changes helps it fix any errors because of the faster feedback from the type system compared to python. Granted this was only for a personal side project and may not be true for production systems that might be much larger, I was pleasantly surprised how easy it was in typescript compared to python

[-]

cttet 5 hours ago

It may be a Claude specific thing. I tried to ask Claude to various tasks in machine learning, like implement gradient boosting without specifying the language, thinking it will use Python since it is the most common option and have utilities like Numpy to make it much easier. But Claude mostly choose Javascript for the language and somehow managed to do it in JS.

koakuma-chan 6 hours ago

> I was pleasantly surprised how easy it was in typescript compared to python

It's time for people to wake up and stop using Python, and forcing me to use Python

rgoldfinger 4 hours ago

Totally agree. With ai coding, ensuring correctness is critical. Having types and compile-time checks helps a lot.

fluxkernel 6 hours ago

All existing programming languages are designed for human beings. Is it the right time to design something that is specifically for vibe coding? For example, ease of read/understanding is probably much more important than all the syntactic sugars to reduce typing. Creating ten ways to accomplish the same task is not useful for LLMs.

[-]

largbae 5 hours ago

I've been wondering if Java would have a resurgence due to strong typing even into the error types, and widespread runtime availability. But so far, seems no.

poink 5 hours ago

Typed languages are also better suited to IDE assistance and static analysis

I'm a relatively old school lisp fan, but it's hard to do this job for a long time without eventually realizing helping your tools is more valuable than helping yourself

Myrmornis 4 hours ago

Python has static typing unless you don't add any types. The vast majority of reputable Python codebases nowadays use static typing rigorously. If you don't, you should. To enforce it when coding with an agent you can either tell the agent to run the type checker after every edit (e.g. via a hook in Claude Code), or if you're using an agent that has access to the LSP diagnostics then tell it to look at them and demand that they are clean after every edit (easy with Cursor, and achieveable in Claude Code I believe via MCP).

[-]

physicsguy an hour ago

> The vast majority of reputable Python codebases nowadays use static typing rigorously

As judged by who? And in what field?

I mean, if I look at the big Python libraries I use regularly none of them have types - Django, DRF, NumPy, SciPy, Scikit-learn. That’s not to say there aren’t externally provided stubs but the library authors themselves are often not the ones writing them

heavyset_go 4 hours ago

Why isn't the agent smart enough to recognize typed Python code existing in a project or detect that an explicit py.typed file exists?

[-]

Myrmornis 3 hours ago

In the case of Claude Code the hook feature is ideal for this so I could imagine the designers deciding that it is more appropriate to put the user in control. That said I think I do agree with you that -- given Python's fairly unique position of having good static typing but not requiring it -- the agents should default to running the type checker if they see it configured in pyproject.toml.

brikym 5 hours ago

You could just leave it at "Typed languages are better."

throwawaymaths 2 hours ago

i find claude is very good with elixir, which is a dynamically typed language. i suspect strong conventions and value immutability help.

warrenmiller 5 hours ago

it aint great at c# i can tell you. this from grok yesterday:

foreach (string enumName in Enum.GetNames(typeof(Pair)))

{

  if (input.Contains($"${enumName}"))

macawfish 3 hours ago

I've had really good experiences with claude code + rust

jongjong 3 hours ago

My experience suggests the opposite of what this article claims. Claude Code is ridiculously good with vanilla JavaScript, provided that your code is well written. I tried it with a TypeScript code base and it wasn't anywhere near as good.

With JS, Claude has very high success rate. Only issue I had with it was that one time it forgot to update one part of the code which was in a different file but as soon as I told it, it updated it perfectly.

With TypeScript my experience was that it struggles to find things. Writing tests was a major pain because it kept trying to grep the build output because it had to mock out one of the functions in the test and it just couldn't figure it out.

Also typed code it produces is more complex to solve the same problem with more different files and it struggles to get the right context. Also TS is more verbose (this is objectively true and measurable); requires more tokens so it literally costs more.

gompertz 6 hours ago

Curious, has it been proven that typed languages are easier for LLMs to work with as they dont have to infer types?

[-]

treve 6 hours ago

Do they infer anything? Correct me if I'm wrong but having the types right there in the source for training data just means more context.

benreesman 6 hours ago

I'm not aware of any rigorous study on it, but my personal anecdote is that I don't even bother with Claude Code or similar unless the language is Haskell, the deployment is Nix, the config is Dhall, and I did property tests. Once you set it up like that you just pour money in until its too much money or its stuck, and thats how far LLMs can go now.

I used to yell at Claude Code when it tried to con me with mocks to get the TODO scratched off, now I laugh at the little bastard when it tries to pull a fast one on -Werror.

Nice try Claude Code, but around here we come to work or we call in sick, so what's it going to be?

[-]

herrington_d 6 hours ago

There are researches backing some sort of "typed language is better for LLM". Like https://arxiv.org/abs/2504.09246, Type-Constrained Code Generation with Language Models, where LLM's output is constrainted by type checkers.

Also https://arxiv.org/abs/2406.03283, Enhancing Repository-Level Code Generation with Integrated Contextual Information, uses staic analyzers to produce prompts with more context info.

Yet, the argument does directly translate to the conclusion that typed language is rigorously better for LLM without external tools. However, typed language and its static analysis information do seem to help LLM.

[-]

vidarh 5 hours ago

Dynamically typed languages are far from "untyped". Though they may well require more effort to analyze from scratch without making assumptions, there is nothing inherently preventing type-constrained code generation of the kind the first paper proposes even without static typing.

A system doing type-constrained code-generation can certainly implement its own static type system by tracking a type for variables it uses and ensuring those constraints are maintained without actually emitting the type checks and annotations.

Similarly, static analyzers can be - and have been - applied to dynamically typed languages, though if these projects have been written using typical patterns of dynamic languages the types can get very complex, so this tends to work best with code-bases written for it.

cultofmetatron 5 hours ago

this is just the kind of sass I needed today. cheers!

nu11ptr 5 hours ago

Everything said is true without AI as well, at least for me. I don't hate Python, and I like it for very small scripts, but for large programs the lack of static type makes it much to brittle IMO. Static typing gives the confidence that not every single line needs testing, which reduces friction during the lifecycle of the code.

itsafarqueue 5 hours ago

This generalises to “Agents respond well to red/green feedback loops”.

OutOfHere 4 hours ago

The argument against Python is weak because Python can be written with types. Moreover, the types can be checked for correctness by various type checkers.

The issue is those who don't use type checkers religiously with Python - they give Python a bad name.

lvl155 5 hours ago

I can say with 100% certainty that they all stink at Rust. It’s laughably bad. Python, on the other hand, is surprisingly good.

[-]

energy123 5 hours ago

I scraped every comment on HN that discussed using Rust with LLMs and about half gave positive feedback, half negative feedback.

Can you explain more why you've arrived at this opinion?

OutOfHere 4 hours ago

With Python, it scales better if the Python is well-typed, not so much otherwise.

It's the best at Go imho since it has enforced types and a garbage collector.

rvz 5 hours ago

Such extraordinary claims, require extraordinary evidence. Not "vibes"

> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees.

There are no "safety guarantees" with typed, compiled languages such as C, C++, and the like. Even with Go, Rust and others, if you don't know the language well enough, you won't find the "logic bugs" and race conditions in your own code that the LLM creates; even with the claims of "safety guarantees".

Additionally, the author is slightly confusing the meaning of "safety guarantees" which refers to memory safety. What they really mean is "reasoning with the language's types" which is easier to do with Rust, Go, etc and harder with Python (without types) and Javascript.

Again we will see more of LLM written code like this example: [0]

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

adamnemecek 5 hours ago

They are also better suited for being ported to other languages, also unsurprisingly

Mistletoe 6 hours ago

I don't know what vibecoding is, and at this point I'm too afraid to ask.

[-]

bashtoni 5 hours ago

I wouldn't worry too much, no-one seems to be able to agree what it means anyway.

Depending on who you speak to it can be anything from coding only by describing the general idea of what you want, to just being another term for LLM assisted programming.

shric 5 hours ago

It’s fine to not know what it is, but what is the rationale for commenting that you don’t know? Why not just look it up? Or don’t, as you’re too afraid to ask.

OutOfHere 4 hours ago

The strict original definition of vibe coding is it is LLM writing code with the programmer never caring about the code, only caring about the code's runtime output. It is easily the worst way to use LLMs for code, and I think even coining the term was a highly irresponsible and society-damaging move by Karpathy, making me lose much respect for him. This coined definition was taken literally by managers to fire workers.

In truth, for LLM generated code to be maintainable and scalable, it first needs to be speced-out super well by the engineer in collaboration with the LLM, and then the generated code must also be reviewed line-by-line by the engineer.

There is no room for vibe coding in making things that last and don't immediately get hacked.