This is a really great test for vibe coding. This isn't easy, but it took me several hours to pass. Vibe coding the results is ... not exactly faster. Reminding it to output logs (I'm just doing this in chat and manually copy/pasting the code), it getting hung up on 'the maximum wait time' exactly equaling the challenge, etc. Opus was able to generate a passing implementation up to level 7 on the first level but can't seem to pass level 12. Sonnet, had to iterate on every level up to level 5, and couldn't pass that level.
This kind of stuff can be a great LLM benchmark as Opus basically screwed it up and created a monstrosity as solution on first try.
This is a really great test for vibe coding. This isn't easy, but it took me several hours to pass. Vibe coding the results is ... not exactly faster. Reminding it to output logs (I'm just doing this in chat and manually copy/pasting the code), it getting hung up on 'the maximum wait time' exactly equaling the challenge, etc. Opus was able to generate a passing implementation up to level 7 on the first level but can't seem to pass level 12. Sonnet, had to iterate on every level up to level 5, and couldn't pass that level.
AKA the hard drive scheduling game. Takes me back to my first algorithms class in school thirty five years ago.
Fun!
Reminds me that one of my favourite exercises in TLA+ is to design an elevator call system.