150k lines of vibe coded Elixir: The Good, the Bad and the Ugly

(getboothiq.com)

53 points | by InternetGiant 10 hours ago ago

47 comments

It's the second time today when I see that the higher number of LoC is served as something positive. I would put it strictly in "Ugly" category. I understand the business logic that says that as long as you can vibe code away from any problems, what's the point of even looking at the code.

[-]

esafak an hour ago

Think of it as 60 man-years of work.

pjmlp 40 minutes ago

Remember, there used to be a time programmers productivity was measured in LoC per hour.

As such, this is high productivity! /s

njhnjhnjh an hour ago

Higher LOC means the code will be able to handle a wider range of different input conditions. In the next couple years we should be able to 10x or even 100x the amount of code we can manage with LLMs, to the extent that our systems' world models are effectively indistinguishable from the real world.

[-]

miningape an hour ago

Yes, as we all know, when evaluating which programming language to use, you should get a line count of the compiler's repo. More lines = more capabilities.

Why would I ever want a language with less capabilities?

[-]

williamcotton an hour ago

I mean, awk? jq? SQL?

enricotr 26 minutes ago

'Means' according to what? Put some (laughtable) reference so I can laught louder.

adw an hour ago

> to the extent that our systems' world models are effectively indistinguishable from the real world.

https://genius.com/Jorge-luis-borges-on-exactitude-in-scienc...

quietbritishjim an hour ago

Genuinely hard to tell if satire.

Just in case not, consider whether the short function

   def is_even(x):
      return (x%2) == 0

Handles a wider range of input conditions than the higher LOC function

   def is_even(x):
     if x == 0:
       return True
     if x == 2:
       return True
     if x == 4:
       return True
     ...
     return False

pmontra 3 hours ago

> In Elixir tests, each test runs in a database transaction that rolls back at the end. Tests run async without hitting each other. No test data persists.

And it confuses Claude.

This way of running tests is also what Rails does, and AFAIK Django too. Tests are isolated and can be run in random order. Actually, Rails randomizes the order so if the are tests that for any reason depend on the order of execution, they will eventually fail. To help debug those cases, it prints the seed and it can be used to rerun those tests deterministically, including the calls to methods returning random values.

I thought that this is how all test frameworks work in 2026.

[-]

netghost 3 hours ago

I did too, and I've had a challenging time convincing people outside of those ecosystems that this is possible, reasonable, we've been doing it for over a decade.

[-]

njhnjhnjh an hour ago

Get ChatGPT to explain it to me or it doesn't exist

gavmor 2 hours ago

Story of my life in so many dimensions.

vmg12 2 hours ago

Why not just write to the db? Just make every test independent, use uuids / random ids for ids.

[-]

mystifyingpoi an hour ago

> Just make every test independent

That's easier said than done. Simple example: API that returns a count of all users in the database. The obvious correct implementation that will work would be just to `select count(*) from users`. But if some other test touches users table beforehand, it won't work. There is no uuid to latch onto here.

vladraz 16 minutes ago

Frankly this is the better solution for async tests. If the app can handle multiple users interacting with it simultaneously, then it can handle multiple tests. If it can’t, then the dev has bigger problems.

As for assertions, it’s not that hard to think of a better way to check if you made an insertion or not into the db without writing “assert user_count() == 0”

jonator 4 hours ago

I can attest to everything. Using Tidewave MCP to give your agent access to the runtime via REPL is a superpower, especially with Elixir being functional. It's able to proactively debug and get runtime feedback on your modular code as it's being written. It can also access the DB via your ORM Ecto modules. It's a perfect fit and incredibly productive workflow.

[-]

ogig 2 hours ago

Some MCP's do give the models superpowers. Adding playwright MCP changed my CC from mediocre frontend skills, to really really good. Also, it gives CC a way to check what it's done, and many times correct obvious errors before coming back at you. Big leap.

ch4s3 4 hours ago

Which models are you using? I’ve had mixed luck with GPT 5.2.

[-]

barkerja 2 hours ago

Opus 4.5 with Elixir has been remarkably good for me. I've been writing Elixir in production since ~2018 and it continues to amaze me at the quality of code it produces.

I've been tweaking my skills to avoid nested cases, better use of with/do to control flow, good contexts, etc.

[-]

ch4s3 41 minutes ago

I'll have to check it out. I've found GPT to be adequate at producing running code that I can improve either by hand, or very specific prompting.

What does your workflow look like?

jonator 4 hours ago

I've been using Opus 4.5 via Claude Code

botacode 5 hours ago

Great article that concretizes a lot of intuitions I've had while vibe coding in Elixir.

We don't 100% AI it but this very much matches our experience, especially the bits about defensiveness.

Going to do some testing this week to see if a better agents file can't improve some of the author's testing struggles.

epolanski 2 hours ago

I'm a bit lost on few bad and ugly points.

They could've been sorted with precise context injection of claude.md files and/or dedicated subagents, no?

My experience using Claude suggests you should spend a good amount of time scaffolding its instructions in documents it can follow and refer to if you don't want it to end in the same loops over and over.

Author hasn't written on whether this was tried.

tossandthrow 4 hours ago

It seems like the 100% vibe coded is an exaggeration given that Claude fails at certain tasks.

The new generation of code assistants are great. But when I dogmatically try to only let the AI work on a project it usually fails and shots itself in its proverbial feet.

If this is indeed 100% vibe coded, then there is some magic I would love to learn!

[-]

ogig 2 hours ago

My last two projects have been 100% coded using Claude, and one has certain complexity. I don't think there is coming back for me.

[-]

tossandthrow 15 minutes ago

What is your secret sauce? How do you organize your project?

logicprog 5 hours ago

It's interesting that Claude is able to effectively write Elixir, even if it isn't super idiomatic without established styles in the codebase, considering Elixir is a pretty niche and relatively recent language.

What I'd really like to see though is experiments on whether you can few shot prompt an AI to in-context-learn a new language with any level of success.

[-]

d3ckard 3 hours ago

I would argue effectiveness point.

It's certainly helpful, but has a tendency to go for very non idiomatic patterns (like using exceptions for control flow).

Plus, it has issues which I assume are the effect of reinforcement learning - it struggles with letting things crash and tends to silence things that should never fail silently.

majoe 3 hours ago

I tried different LLMs with various languages so far: Python, C++, Julia, Elixir and JavaScript.

The SOTA models come do a great job for all of them, but if I had to rank the capabilities for each language it would look like this:

JavaScript, Julia > Elixir > Python > C++

That's just a sample size of one, but I suspect, that for all but the most esoteric programming languages there is more than enough code in the training data.

[-]

ogig 2 hours ago

I've used CC with TypeScript, JavaScript and Python. Imo TypeScript gives best results. Many times CC will be alerted and act based on the TypeScript compile process, another useful layer in it's context.

ch4s3 4 hours ago

You can accurately describe elixir syntax in a few paragraphs, and the semantics are pretty straightforward. I’d imagine doing complex supervision trees falls flat.

dist-epoch 4 hours ago

Unless that new language has truly esoteric concepts, it's trivial to pattern-match it to regular programming constructs (loops, functions, ...)

alecco 2 hours ago

Async or mildly complex thread stuff is like kryptonite for LLMs.

[-]

catlifeonmars an hour ago

Also for humans.

[-]

omnicognate 27 minutes ago

Incompetent ones, sure.

(Apologies for the snark, but the routine denial of what decent human programmers are very obviously capable of in order to claim equivalence to AI is getting quite annoying. Async and multithreaded code is definitely hard to write and reason about, but there are plenty of humans capable of doing it.)

calvinmorrison 28 minutes ago

I dont know erlang. My hobby LLM project is having it write a fully featured ERP in Erlang.

An ERP is practically an OS.

It now has

- pluggable modules with a core system - Users/Roles/ACLs/etc. - an event system (IE so we can roll up Sales Order journal entries into the G/L) - G/L, SO, AR, AP - rollback/retries on transactions

i havent written a line of code

te_chris 2 hours ago

The imperative thing is so frustrating. Even the latest models still write elixir like a JS developer, checking nils, maybe_do_blah helper functions everywhere. 30 lines when 8 would do.

[-]

cpursley an hour ago

Try these:

- https://github.com/agoodway/.claude/blob/main/skills/elixir-...

- https://github.com/agoodway/.claude/blob/main/agents/elixir-...

Getting pretty good results so far.

[-]

barkerja an hour ago

These should get added to https://skills.sh/?q=elixir

njhnjhnjh an hour ago

More LOC means the code is more adaptable. In the past, increased LOC were difficult for humans to manage. However, as technological progress marches forward and AI takes over the job of code management, it will actually become beneficial to have increased LOC. In the next few years, AI code producers will generate literally superhuman codebases, which will contain vast quantities of simple, interrelated evaluations, creating software with such comprehensive subject area modelling that it would have been literally impossible for humans to manage in the old days. We are entering a new era, and the old assumptions no longer apply.

[-]

ahub an hour ago

Are you trolling ?

deadbabe 2 hours ago

Everyone always ends these articles with “I expect it will get better”

What if it doesnt? What if LLMs just stay mostly the same level of usefulness they are now, but the costs continue to rise as subsidization wears off?

Is it still worth it? Maybe, but not worth abandoning having actual knowledge of what you’re doing.

[-]

solumunus 2 hours ago

I expect the costs at source will go down even if model performance doesn’t improve much, and hopefully that will offset the unraveling of subsidisation. I’d be happy enough with that outcome, I don’t really need them to be any better although of course it would be nice. I would love for them to be faster and cheaper.

phplovesong 2 hours ago

"It writes 100% of our code"

- Silently closes the tab, and makes a remark to avoid given software at any cost.

[-]

Ronsenshi an hour ago

You're not missing much. Seems to me like they wrote 150k lines of code for some glorified photo app with ChatGPT in the backend for image processing. Oh and some note-taking it seems.

[-]

timacles 25 minutes ago

I await (also doubt) the day this produces something truly useful and not just generic derivative functionality glued together