3 comments

  • fuhsnn 2 hours ago

    Didn't see people mention this in the initial discussion, but despite not having access to internet, the agents actually had access to the source code of GNU binutils (has assembler, linker and readelf), and many C compilers (SDCC, PCC, chibicc, cproc, etc) in their working directory [1]. These are supposed to test compilation, but there's no way to prove that Claude didn't get crucial info from these projects.

    I also found the compiler to have uncanny resemblance with chibicc. With 30-ish edge cases[2] yielding the same behavior, it's hard to believe there's no influence of chibicc in its algorithms.

    [1] https://github.com/anthropics/claudes-c-compiler/blob/6f1b99...

    [2] https://github.com/anthropics/claudes-c-compiler/issues/232

  • logicprog 2 hours ago

    It's really funny how the goalposts shift over time. Before this happened, they would have all been into something that it was impossible for an agent swarm to do this. Now that it has happened, they're finding all the flaws with it, ignoring the trajectory of improvement over time, and insisting it isn't truly impressive just because it isn't perfect.

    Basically, every point they make feels like pointless nitpicking in order to be able to deny what's going on. It essentially feels like a cope.

    The argument that the model depended on extensive test suites and a custom harness doesn't really hold up to me, because that harness was just a simple hack to gather version of concepts that people experimenting with Ralph loops and agent swarms and agent orchestrators have been doing for a while now, which would be easy to build and out of the box generalized solution to, since very little the actual harness was custom to the project itself.

    Similarly, the argument that this only worked because of extensive test suites, including torture testing, doesn't really make sense to me because like, because like we have a long tradition of systems that can provide that kind of test suite to specify any product you want to produce, from BDD to PBT to DST, and the blog post explicitly acknowledges that the point of this is sort of to show that the job of software engineers from now on might end up be coming about specifying a problem sufficiently instead of directly writing the code that achieves it; and even that would vastly change the entire industry.

    Similarly, I find it very funny that even this article is forced to admit that the code is of pretty solid quality, even if it isn't as beautiful or elegant as something a rust expert might write (and it's always easy to criticize code quality from the peanut gallery, so I take that point with a grain of salt).

    In a similar manner, I don't really find the argument that because things like TCC and GCC were in the models training data that this doesn't matter convincing. Previous C compilers that would be in its training data were implemented in C, almost certainly not Rust, and implementing almost anything in rust that has been implemented in sea requires a substantially different architecture in the large in order to account for lifetimes and borrow checking and also typically to maintain even basic rust code quality and avoid unsafe completely different idioms and approaches to algorithms in the small as well. I say this having written several tens of thousands of lines of rust in the past. This means that in my opinion, it's difficult to call anything that these LLMs did just retrieval and reorganization; I think the best you can probably say is that they picked up a few general approaches to compiler algorithms and structure and understanding what a compiler is and generally how it should work from how those code bases in its training set. But you can't say that iturg's regitating or translating them directly. And at that point, that's the equivalent of someone having taken courses on or read books about compilers producing a new compiler. It is still impressive.

    Similarly, I find the argument that because there is a course that's much cheaper than $20,000 that teaches you how to write a basic C compiler that this is not impressive, very strange. The whole reason this is impressive is because this is something computers could not do autonomously before and now they can. The price of doing it with a computer compared to having a human do it isn't really relevant yet. And the price will come down[0]. I also think it's very likely that the basic C compiler you'll get from a course like the one they linked would not actually be able to compile SQLite, DOOM, or the Linux kernel 6.9 if you actually put it to the test.

    Also, it's really funny to me that they complain that this compiler project didn't also implement a linker and assembler. The entire point was to implement a compiler. That was the project under discussion. The fact that it uses an external linker and assembler is not a point against it. It's a complete non-sequitur.