Perhaps Rust Needs "Defer"

(gaultier.github.io)

47 points | by broken_broken_ a day ago ago

50 comments

  • pwdisswordfishz a day ago

    > So now I am confused, am I allowed to free() the Vec's pointer directly or not?

    No, you are not; simple as that. Miri is right. Rust using malloc/free behind the scenes is an internal implementation detail you are not supposed to rely on. Rust used to use a completely different memory allocator, and this code would have crashed at runtime if it were still the case. Since when is undocumented information obtained from strace a stable API?

    It's not like you can rely on Rust references and C pointers being identical in the ABI either, but the sample in the post blithely conflates them.

    > It might be a bit surprising to a pure Rust developer given the Vec guarantees, but since the C side could pass anything, we must be defensive.

    This is just masking bugs that otherwise could have been caught by sanitizers. Better to leave it out.

    • JJJollyjim a day ago

      It is in fact documented that you can't do this:

      "Currently the default global allocator is unspecified. Libraries, however, like cdylibs and staticlibs are guaranteed to use the System by default.", however:

      "[std::alloc::System] is based on malloc on Unix platforms and HeapAlloc on Windows, plus related functions. However, it is not valid to mix use of the backing system allocator with System, as this implementation may include extra work, such as to serve alignment requests greater than the alignment provided directly by the backing system allocator."

      https://doc.rust-lang.org/std/alloc/index.html https://doc.rust-lang.org/std/alloc/struct.System.html

      • astrange a day ago

        > such as to serve alignment requests greater than the alignment provided directly by the backing system allocator

        Surely the system allocator provides memalign() or similar? Does Windows not have one of those?

        • jsheard a day ago

          Kind of but not really, the Windows heap doesn't support arbitrary alignments natively, so the aligned malloc function it provides is implemented as a wrapper around the native heap which over-allocates and aligns within the allocated block. The pointer you get back isn't the start of the allocation in the native heap, so you can't pass it to the standard free() without everything exploding.

          I don't think fixing that mess was ever a priority for Microsoft because it's mainly an issue for C, and their focus has long been on C++ instead. C++ new/delete knows the alignment of the type being allocated for so it can dispatch to the appropriate path automatically with no overhead in the common <=16 byte case.

        • jcelerier a day ago

          On macOS, aligned_alloc works starting from macOS 10.15.

    • josephg a day ago

      Yep. The right philosophy is to always put the allocation and deallocation code in the same language. If you create a struct with C, also make a destructor in C and call that from rust to destroy your object. (Usually by putting an ffi call in your drop impl).

      And the same is true in reverse: rust struct might be temporarily owned and used by C code, but it should always be destroyed from rust.

      • flohofwoe a day ago

        It's also a bad idea within the same language.

        A C library returning a pointer to allocated memory and then expecting the caller to free that memory with a function outside the library (like calling stdlib free()) is just bad API design (because you can't and shouldn't need to know whether the library is actually using the stdlib alloc functions under the hood - or whether the library has been linked with the same C stdlib than your own code - for instance when the library resides in a DLL, or the library might decide to bypass malloc and directly use lower-level OS calls for allocating memory).

        If you have a 'create' function in a C library, also always have a matching 'destroy' function.

        On top of that it's also usually a good idea to let the library user override things like memory allocation or file IO functions.

        ...and of course 'general purpose' global allocators are a bad idea to begin with :)

        • filmor a day ago

          That is exactly what what the parent post proposed.

    • CryZe a day ago

      > It's not like you can rely on Rust references and C pointers being identical in the ABI either.

      You absolutely can rely on that: https://doc.rust-lang.org/reference/type-layout.html#pointer...

      In fact, it's almost even best practice to do so, because it reduces the unsafe needed on Rust's side. E.g. if you want to pass a nullable pointer to T, declare it as e.g. a parameter x: Option<&T> on the Rust side and now can you do fully safe pattern matching as usual to handle the null case.

      Want to use some Rust type (not even repr(C)) from C? Just do something like this:

      #[no_mangle] pub extern "C" fn Foo_new() -> Box<Foo> { Box::new(Foo::new()) }

      #[no_mangle] pub extern "C" fn Foo_free(_: Box<Foo>) { /* automatic cleanup */ }

      #[no_mangle] pub extern "C" fn Foo_do_something(this: &mut Foo) { this.do_something() }

      all fully safe. Unfortunately for the OP, Vec<_> isn't FFI safe, so that's still one of those cases that are on the rough side.

      • pwdisswordfishz a day ago

        > Pointers and references have the same layout.

        > The layout of a type is its size, alignment, and the relative offsets of its fields.

        So “layout” only constrains the memory addresses where a value and its constituents are stored; it does not cover niches. Pointers are allowed to be null, references are not. There is talk of making it illegal to have a reference be unaligned, or even point to very low addresses: <https://github.com/rust-lang/rfcs/pull/3204>. At one point, there was even talk of certain kinds of references not even being stored as memory addresses at all: <https://github.com/rust-lang/rfcs/pull/2040>. And Box<_> is not #[repr(transparent)] either. To put it more generally, the only semantics of a Box<_> or a reference is that it grants you access to a value of a given type and is inter-convertible with a pointer. Only *const _ and *mut _ have a guaranteed ABI.

        Just because you write fewer “unsafe” keywords does not mean your code is more safe.

  • haileys a day ago

    The way to do this in idiomatic Rust is to make a wrapper type that implements drop:

        struct MyForeignPtr(*mut c_void);
    
        impl Drop for MyForeignPtr {
            fn drop(&mut self) {
                unsafe { my_free_func(self.0); }
            }
        }
    
    Then wrap the foreign pointer with MyForeignPtr as soon as it crosses the FFI boundary into your Rust code, and only ever access the raw pointer via this wrapper object. Don't pass the raw pointer around.
    • aapoalas a day ago

      In this case they seem to have Rust on both sides of the FFI API and seemingly want to avoid heap allocating the Vec's static parts, which makes sense.

      So, an extended version if your solution would be to initialize the static parts on the stack using `MaybeUninit`, and `assume_init` on that after the call to get an `OwningArrayC<T>` out, which then can have a Drop impl defined for it.

    • drdude a day ago

      This is a great advice, thank you. I am learning Rust and this definitely gives me good pointers (no pun) how I should idiomatically handle them.

      EDIT: can this be done/automated using macros?

      • skavi a day ago

        can’t really be handled nicely by macros outside of the case where the drop impl doesn’t need any data from the scope.

  • klauserc a day ago

    Would `Vec::into_boxed_slice` [1] be the answer here? It gives you a `Box<[Foo]>`, which doesn't have a capacity (it still knows its own length).

    1: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.int...

    • aapoalas a day ago

      Yup, it definitely would.

  • pornel a day ago

    Fun fact: Box<T> is allowed in C FFI in arguments and return types. From C's perspective it's a non-NULL pointer to T, but on the Rust side it gets memory management like a native Rust type.

    Option<Box<T>> is allowed too, and it's a nullable pointer. Similarly &mut T is supported.

    Using C-compatible Rust types in FFI function declarations can remove a lot of boilerplate. Unfortunately, Vec and slices aren't one of them.

  • jclulow a day ago

    > Let's ignore for now that this will surprise every C developer out there that have been doing if (NULL != ptr) free(ptr) for 50 years now.

    If you've been doing C for five decades, it's a shame not to have noticed that it's totally fine to pass a NULL pointer to free().

    FWIW, I don't otherwise agreed with the thesis. I've written probably ten FFI wrappers around C libraries now, and in every case I was able to store C pointers in some Rust struct or other, where I could free them in a Drop implementation.

    I also think it's not actually that unusual for C allocators (other than the truly ancient malloc(3C) family) to require you to pass the allocation size back to the free routine. Often this size is static, so you just use sizeof on the thing you're freeing, but when it's not, you keep track of it yourself. This avoids the need for the allocator to do something like intersperse capacity hints before the pointers it returns.

    • Traubenfuchs a day ago

      If it's really fine, shouldn't the compiler optimize that if away?

      • cozzyd 14 hours ago

        It has different performance characteristics

      • lifthrasiir a day ago

        Of course (there are several GCC attributes for that), but it is not really like that an additional check improves readability anyway.

    • a day ago
      [deleted]
    • dathinab a day ago

      > Let's ignore for now that this will surprise every C developer out there that have been doing if (NULL != ptr) free(ptr) for 50 years now.

      if you did 50 years of development and haven't realized that in projects crossing language (or just separately complied C libs) you have to free things where you create them and can't use the many many libraries which do require you to use their own free/destruction functions and don't even have understand that free is flat and in turn you need to also "destruct" each item in a struct manually and that if you don't have a custom function for this it can easily become a huge bug prone mess

      .. then I'm really not sure how this happened tbh.

      but if you are not a senior C/C++/FFI developer maybe not a C/C++/FFI developer at all it's not hard to see how you might have missed this things tbh.

    • SpaceNoodled a day ago

      Not all C runtimes allow you to deallocate a null ppinter without consequence.

      • lifthrasiir a day ago

        Such C runtime is not standard-compliant [1]:

        > The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs.

        [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#p...

        • dathinab a day ago

          doesn't matter

          50 years ago non standard compliance was the norm, even today it is quite common (to use non compliant extensions)

          and there is also stuff like caches, arena allocations, jemalloc etc. which might not be linked against libc/free and might require manual free function usage, external APIs providing a their own free/destruction functions is really really normal

          • lifthrasiir a day ago

            Accepting a null pointer in free is a very easy part of the conformance and I am yet to see a libc implementation that doesn't so far, so your statement should be pretty much outdated even if it were true. And since malloc-free interface is so popular in C, other allocators are also commonly modelled after that (plus their own APIs) anyway; jemalloc for example has an usual free with the standard interface.

      • magicalhippo a day ago

        Even the C89 draft[1] says passing null should do nothing, and as far as I can tell it's still[2] the case.

        Which runtimes would this be?

        [1]: http://jfxpt.com/library/c89-draft.html#4.10.3.2

        [2]: https://en.cppreference.com/w/c/memory/free

      • jabl a day ago

        That was a long time ago. A decade or so ago Jim Meyering did experiments with a lot of allocators, and subsequently went on a spree removing NULL checks before free() in a lot free software toolchain projects (gcc, gdb, binutils, etc).

      • jcelerier a day ago

        If free(NULL) doesn't work it's not the C programming language but something else

      • epcoa a day ago

        Life is too short to deal with such brokenness then.

        Seriously, those jobs typically don't pay more than peanuts anyhow.

      • a day ago
        [deleted]
  • namjh a day ago

    Anyone who needs to handle FFI in Rust, should read the FFI chapter in Rustonomicon: https://doc.rust-lang.org/nomicon/ffi.html

    Unsafe Rust is indeed very hard to write correctly. Rustonomicon is a good start to learn unsafe Rust.

  • aapoalas a day ago

    The article only shows Rust code calling the FFI API but suggests that C/C++ might also be used to call the API.

    In these cases I can imagine the caller passing in a pointer/reference to uninitialised stack memory, which is also UB in the last version if the allocating code! A `&mut T` must always point to a valid `T` and must not point to uninitialised memory.

    It seems to me like it'd be best to take a `&mut MaybeUninit<T>` parameter instead, and write through that. A further upside is that now if the caller _is_ Rust code, you can use MaybeUninit to reserve the stack space for the `OwningArrayC<T>` and then after the FFI call you can use `assume_init` to move an owned, initialised `OwningArrayC<T>` out of the `MaybeUninit` and get all of Rust's usual automatic `Drop` guarantees: This is the defer you wanted all along.

  • John23832 a day ago

    Implement Drop on a custom guard type. Boom. There you go.

  • jtrueb a day ago

    I would totally use a Swift-style concise defer. Writing the idiomatic wrapper struct and implementing Drop is obtuse and repetitive. Both wrapper structs and the scopeguard macro approach have to be wasting compilation time.

    I have a lot of FFI code paths that end up easier to understand if you use the C-style instead of the Rusty approach.

    Readability and comp time should be important even in FFI code paths

  • marcodiego a day ago

    Never used it, but I follow wg14, ISO's workgroup that maintains the C programming language specification. From there, it looks like defer is one of the most important improvements since the go programming language and I've seen a good number of examples were it prevents mistakes, makes the code more readable and avoids use of goto.

    I can only hope it 'infects' other languages.

    • iainmerrick a day ago

      But Go's defer operates at function scope, which strikes me as the same kind of unfortunate mistake as JavaScript's "var" hoisting.

      What we want everywhere is block-scoped defer, right?

  • lalaithion a day ago

    Instead of passing around the capacity in a struct or as an extra out parameter until you call free, could you instead branch on capacity == 0 in get_foos and set *out_foos to a null pointer? Just because the vector struct never has a null pointer doesn’t mean you can’t use the null pointer in your own API.

  • GrantMoyer a day ago

    This only addresses a small point in the article, but you can shrink capacity to size before passing a Vec to C, then assume capacity = size when you need to free the Vec.

  • scotty79 a day ago

    Why would anyone expect that they can free in one language something that was allocated in another? The allocator might work completely differently and require completely different actions to free.

    • jeroenhd a day ago

      If you're writing any other language to comply with old C libraries, calling malloc()/free() is usually fine. Most C software uses malloc() on Linux, at least for the external API surface.

      That only works when calling into C code, and even then it assumes the C code implements allocation using malloc()/free(), with proxies if alternative memory allocators are used.

      This behaviour can't be relied upon, but for many programs glueing themselves together with C libraries that's not stopping anyone. This is why I'm not a big fan of sharing pointers between languages (or even libraries, to be honest); I'd much prefer intermediate identifiers (file descriptors, Windows HANDLEs) to track instances that pass in and out of a library, although sometimes those cause too much overhead.

      • magicalhippo a day ago

        I've long had the strong belief that allocation and deallocation should be done by the same party.

        Either the caller is responsible and the function just uses the provided buffers/structures, or there is a clear convention that the caller has to call a cleanup function on the returned buffers/structures.

        Mixing allocation and deallocation responsibilities between caller and callee almost always leads to pain and sorrow, in my experience.

        Of course, if you're using a shared library or similar which can be written in something else entirely, mixing is a complete no-go as GP mentions.

      • plorkyeran a day ago

        Calling free() on memory allocated by another library isn't fine even if that library promises that it allocated the memory with malloc(). You don't have to do anything particularly esoteric to be a problem; on Windows it's as simple as building a dynamic library with /MT (statically link the C runtime).

  • xanathar a day ago

    1) Do not free memory allocated in one language (even more: in one library unless explicitly documented so) into another. Rust can use a custom allocator, and what it uses by default is an implementation detail that can change at a blink.

    2) Do not use Vec<T> to pass arrays to C. Use boxed slices. Do not even try to allocate a Vec and free a Box... how can it even work?!

    3) free(ptr) can be called even in ptr is NULL

    4) ... I frankly stopped reading. I will never know if Rust actually needs Go's defer or not.

  • dathinab a day ago

    > libc::free > Hmm...ok...Well that's a bit weird,

    It _really_ isn't, it's actually exactly how C (or C++) works if you have library allocating something for you you also need to use that library to free it as especially in context of linked libraries you can't be sure how something was allocated, if it used some arena, if maybe some of you libraries use jemalloc and others do not etc. So it's IMHO a very basic 101 of using external C/C++ libraries fact fully unrelated to rust (through I guess it is common for this to be not thought well).

    Also it's normal even if everything uses the same alloc because of the following point:

    > So now I am confused, am I allowed to free() the Vec<T>'s pointer directly or not?

    no and again not rust specific, free is always a flat freeing `Vec<T>'s` aren't guaranteed to be flat (depending on T), and even if, some languages have small vec optimizations (through rust I thing guarantees that it's not done with `Vec` even in the future for FFI compatibility reasons)

    so the get to go solution for most FFI languages boundaries (not rust specific) is you create "C external" (here rust) types in their language, hand out pointer which sometimes are opaque (C doesn't know the layout) and then _hand them back for cleanup_ cleaning them up in their language.

    i.e. you would have e.g. a `drop_vec_u8` extern C function which just does "create vec from ptr and drop it" (which should get compiled to just a free in case of e.g. `Vec<u8>` but will also properly work for `Vec<MyComplexType>`.

    > Box::from_raw(foos);

    :wut_emoji:??

    in many languages memory objects are tagged and treating one type as another in context of allocations is always a hazard, this can even happen to you in C/C++ in some cases (e.g. certain arena allocators)

    again this is more a missing knowledge in context of cross language C FFI in general then rust specific (maybe someone should write a "Generic Cross Language C FFI" knowledge web book, I mean while IMHO it is basic/foundational knowledge it is very often not thought well at all)

    > OwningArrayC > defer!{ super::MYLIB_free_foos(&mut foos); }

    the issue here isn't rust missing defer or goto or the borrow checker, but trying to write C in rust while OwningArrayC as used in the blog is a overlap of anti-patterns wanting to use but not use rust memory management at the same time in a inconsistent way

    If you want to "free something except not if it has been moved" rust has a mechanic for it: `Drop`. I.e. the most fundamental parts of rust (memory) resource management.

    If you want to attach drop behavior to an existing type there is a well known pattern called drop guard, i.e. a wrapper type impl Drop i.e. `struct Guard(OwnedArrayC); impl Drop for Guard {...} maybe also impl DerefMut for Guard`. (Or `Guard(Option<..>)` or `Guard(&mut ...)` etc. depending on needs, like e.g. wanting to be able to move it out conveniently).

    In rust it is a huge anti pattern to have a guard for a resource and not needing to access the resource through the guard (through you will have to do it sometimes) as it often conflicts with borrow checker and for RAII like languages in general is more error prone. Which is also why `scopeguard` provides a guard which wrapps the data you need to cleanup. That is if you use `scopeguard::guard` and similar instead of `scopeguard::defer!` macro which is for convenience when the cleanup is on global state. I.e. you can use `guard(foos, |foos| super::MYLIB_free_foos(&mut foos))` instead of deferr and it would work just fin.

    Through also there is a design issue with super::MYLIB_free_foos(&mut foos) itself. If you want `OwningArrayC` to actually (in rust terms) own the array then passing `&mut foos` is a problem as after the function returns you still have foos with a dangling pointer. So again it shows that there is a the way `OwningArrayC` is done is like trying to both use and not use rusts memory management mechanics at the same time (also who owns the allocation of OwningArrayC itself is not clear in this APIs).

    I can give following recommendations (outside of using `guard`):

    - if Vec doesn't get modified use `Box<[T]>` instead

    - if vec is always accessed through rust consider passing a `Box<Vec<T>>` around instead and always converting to/from `Box<Vec<T>>`/`&Vec<T>`/`&mut Vec<T>`, Box/&/&mut have some defactor memory repr compatibilities with pointer so you can directly place them in a ffi boundary (I think it's guaranteed for Box/&/&mut T and de-facto for `Option<Box/&/&mut T>` (nullpointer == None)

    - if that is performance wise non desirable and you can't pass something like `OnwingArrayC` by value either specify that the caller always should (stack) allocate the `OnwingArrayC` itself then only use `OnwingArrayC` at the boundary i.e. directly convert it to `Vec<T>` as needed (through this can easily be less clear about `Vec<T>`, and `&mut Vec<T>` and `&mut [T]` dereferenced)

    - In general if `OwningArrayC` is just for passing parameter bundles with a convention of it always being stack allocated by the caller then you also really should only use it for the transfer of the parameters and not automatic resource management, i.e. you should directly convert it to `Vec` at the boundary (and maybe in some edge cases use scopeguard::guard, but then converting it to a Vec is likely faster/easier to do). Also specify exactly what you do with the callee owned pointer in `OwningArrayC` i.e. do we always treat it ass dangling even if there are errors, do we set it to empty vec /no capacity as part of conversion to Vec and it's only moved if that was done etc. Also write a `From<&mut OwningArrayC> for Vec` impl, I recommend setting cap+len to zero in it).

    And yes FFI across languages is _always_ hard, good teaching material often missing and in Rust can be even harder as you have to comply with C soundness on one side and Rust soundness on the other (but I mean also true for Python,Java etc.). Through not necessary for any of the problems in the article IMHO. And even if we just speak about C to C FFI of programs build separately the amount of subtle potentially silent food guns is pretty high (like all the issues in this article and more + a bunch of other issues of ABI incompatibility risks and potentially also linker time optimization related risk).

    • aapoalas 10 hours ago

      Small self-promotion: I've written about FFI matters on a slightly philosophical bent, in the context of C FFI between Deno JS runtime and 3rd party C FFI libraries: https://denonomicon.deno.dev/philosophy-of-ffi

      It's probably not quite the teaching material you seek, but it's something close to that I think.

  • neonsunset a day ago

    No, it doesn’t. There are better ways to abstract away RAII, and it is a strictly bad pattern in Rust to not rely on Drop. The author needs to stop writing C and Go, both being arguably bad languages.