Fil-C: A memory-safe C implementation

(lwn.net)

113 points | by chmaynard 8 hours ago ago

28 comments

  • mbrock 3 hours ago

    I'm working on packaging Fil-C for Nix, as well as integrating Fil-C as a toolchain in Nix so you can build any Nix package with Fil-C.

    https://github.com/mbrock/filnix

    It's working. It builds tmux, nethack, coreutils, Perl, Tcl, Lua, SQLite, and a bunch of other stuff.

    Binary cache on https://filc.cachix.org so you don't have to wait 40 minutes for the Clang fork to build.

    If you have Nix with flakes on a 64-bit Linux computer, you can run

        nix run github:mbrock/filnix#nethack
    
    right now!
    • kragen an hour ago

      That's very exciting! Thank you!

  • kragen 4 hours ago

    Either Fil-C or a different implementation of the same idea seems essential to me. A great deal of software has been written in C, and without some way of running it, we lose access to that intellectual heritage. But pervasive security vulnerabilities mean that the traditional "YOLO" approach to C compilation is a bad idea for software that has to handle untrusted input, such as Web browsing or email.

    Pizlo seems to have found an astonishingly cheap way to do the necessary pointer checking, which hopefully I will be able to understand after more study. (The part I'm still confused about is how InvisiCaps work with memcpy.)

    tialaramex points out that we shouldn't expect C programmers to be excited about Fil-C. The point tialaramex mentions is "DWIM", like, accessing random memory and executing in constant time, but I think most C programmers won't be willing to take a 4× performance hit. After all, if they wanted to be using a slow language, they wouldn't be writing their code in C. But I think that's the wrong place to look for interest: Fil-C's target audience is users of C programs, not authors of C programs. We want the benefits of security and continued usage of existing working codebases, without having to pay the cost to rewrite everything in Rust or TypeScript or whatever. And for many of us, much of the time, the performance hit may be acceptable.

    • nielsbot 4 hours ago

      I like to share this every time there's a post about memory safe C:

      Apple has a memory-safer C compiler/variant they use to compile their boot loaders:

      https://saaramar.github.io/iBoot_firebloom/

      • pizlonator 2 hours ago

        That was my idea and I wrote a good chunk of the compiler and runtime.

        • kragen 2 hours ago

          (For those who didn't make the connection, pizlonator also wrote Fil-C.)

      • EPWN3D 38 minutes ago

        You don't even need to reverse it. It's in the public clang, and I'm working on helping my team adopt it in some test cases.

        And it's not just the bounds-checking that's great -- it makes a bunch of C anti-patterns much harder, and it makes you think a lot harder about pointer ownership and usage. Really a great addition to the language, and it's source-compatible with empty macro-definitions (with two exceptions).

        • kragen 28 minutes ago

          Interesting! How do you get started?

      • kragen 4 hours ago

        Yeah, fat pointers are definitely a viable approach, but a lot of the existing C code that is the main argument for Fil-C assumes it can put a pointer in a long. (Most of the C code that assumed you could put it in an int has been flushed out by now, but that was a big problem when the Alpha came out.) I'm guessing that the amount of existing C code in Apple's bootloader is minimal, maybe 1000 lines, not the billions of lines you can compile with Fil-C.

        • matthewfcarlson 3 hours ago

          You’re off by a few orders of magnitude. I’ll grant you, what is the bootloader becomes a very complex question. Even if you scope it to just “what is the code physically etched into the chip as the mask ROM” (secureROM) you’re talking hundreds of thousands. If you’re talking about all the code that runs before the kernel starts executing you’re talking hundreds of millions.

          • kragen 3 hours ago

            No, I was only talking about the pre-existing C code that wasn't written for the bootloader, which therefore might have incompatibilities with fat pointers you had to hunt down and fix.

            Also I'm really skeptical about your "hundreds of millions" number, even if we're talking about all the code that runs before the kernel starts. How do you figure? The entire Linux kernel doesn't contain a hundred millions of lines of code, and that includes all the drivers for network cards, SCSI controllers, and multiport serial boards that nobody's made in 30 years, plus ports to Alpha, HP PA-RISC, Loongson, Motorola 68000, and another half-dozen architectures. All of that contains maybe 30 million lines. glibc is half a million. Firefox 140.4.0esr is 33 million. You're saying that the bootloader is six times the size of Firefox?

            Are you really suggesting that tens of gigabytes of source code are compiled into the bootloader? That would make the bootloader at least a few hundred megabytes of executable code, probably gigabytes, wouldn't it?

        • ummonk 3 hours ago

          Couldn’t one just make long bigger then to make it match?

          • kragen 2 hours ago

            Maybe so; I haven't tried. Probably a lot less code depends on unsigned long wrapping at 2⁶⁴ than used to depend on unsigned int wrapping at 2¹⁶, and we got past that. But stability standards were lower then. Any code that runs on both 32-bit and 64-bit LP64 systems can't be too dependent on the exact sizeof long, and sizeof long already isn't sizeof int the way it was on 32-bit platforms.

      • conradev 3 hours ago

        and the author of Fil-C worked on that!

        • kragen 3 hours ago

          Oh, somehow I missed that connection!

      • astrange 4 hours ago

        A descendent of this is in clang as -fbounds-safety.

    • TuxSH 3 hours ago

      Also this is _de facto_ limited to userspace application for the mainstream OSes if my understanding is correct.

      Reading Fil-C website's "InvisiCaps by example" page, I see that "Laundering Integers As Pointers" is disallowed. This essentially disqualifies Fil-C for low-level work, which makes for a substantial part of C programs.

      (int2ptr for MMIO/pre-allocated memory is in theory UB, in practice just fine as long as you don't otherwise break aliasing rules (and lifetime rules in C++) - as the compiler will fail to track provenance at least once).

      But that isn't really what Fil-C is aimed at - the value is, as you implied, in hardening userspace applications.

      • pizlonator 2 hours ago

        It’s not so fundamental of a limitation.

        Fil-C already allows memory mapped I/O in the form of mmap.

        The only thing missing that is needed for kernel level MMIO is a way to forge a capability. I don’t allow that right now, but that’s mostly a policy decision. It also falls out from the fact that InvisiCaps optimize the lower by having it double as a pointer to the top of the capability. That’s also not fundamental; it’s an implementation choice.

        It’s true that InvisiCaps will always disallow int to ptr casts, in the sense that you get a pointer with no capability. You’d want MMIO code to have some intrinsic like `zunsafe_forge_ptr` that clearly calls out what’s happening and then you’d use that wherever you define your memory mapped registers.

      • mbrock 3 hours ago

        Check out this document to see how the Fil-C ports of Python and Perl and so on work:

        https://github.com/mbrock/filnix/blob/main/ports/analysis.md

        This is still within the userspace application realm but it's good to know that Fil-C does have explicit capability-preserving operations (`zxorptr`, `zretagptr`, etc) to do e.g. pointer tagging, and special support for mapping pointers to integer table indices and back (`zptrtable`, etc).

      • kragen 3 hours ago

        Yes, I think that's reasonable. I imagine you wouldn't have to extend Fil-C very much to sneak some memory-mapped I/O addresses into your program, but maybe having the garbage collector pause the program in the middle of an interrupt handler would have other bad effects. Like, if you were generating a video signal, you'd surely get display glitches.

  • gnabgib 7 hours ago

    No discussion, but just on the front page last week (31 points) https://news.ycombinator.com/item?id=45655519

    Previous discussion:

    2025 Safepoints and Fil-C (87 points, 1 month ago, 44 comments) https://news.ycombinator.com/item?id=45258029

    2025 Fil's Unbelievable Garbage Collector (603 points, 2 months ago, 281 comments) https://news.ycombinator.com/item?id=45133938

    2024 The Fil-C Manifesto: Garbage In, Memory Safety Out (13 points, 17 comments) https://news.ycombinator.com/item?id=39449500

  • nextaccountic 2 hours ago

    Somewhat related, safe C++ proposal is not being continued

    https://news.ycombinator.com/item?id=45234460

  • synergy20 5 hours ago

    posted multiple times, x86 only last time I checked

    • pizlonator 2 hours ago

      Yeah because I’m limiting my test matrix.

      There’s nothing about how Fil-C is designed that constrains it to x86_64. It doesn’t strongly rely on x86’s memory model. It doesn’t strongly rely on 64-bit.

      I’m focusing on one OS and arch until I have more contributors and so more bandwidth to track bugs across a wider set of platforms.

    • lambdaone 4 hours ago

      All the more reason to make it portable. I wonder if this can be implemented via LLVM?

      • kragen 4 hours ago

        It is implemented via LLVM.

  • dmitrygr 40 minutes ago

    TLDR: 4x slowdown in the normal case

    the performance overhead of this approach for most programs makes them run about four times more slowly

    • pizlonator 14 minutes ago

      > TLDR: 4x slowdown in the normal case

      4x slower isn't the normal case. 4x is at the upper end of the overheads you'll see.