The inconceivable types of Rust: How to make self-borrows safe (2024)

(blog.polybdenum.com)

94 points | by birdculture 14 hours ago ago

15 comments

I'm very much a fan of the idea that language features — and especially library features — should not have privileged access to the compiler.

Rust is generally pretty good at this, unlike (say) Go: most functionality is implemented as part of the standard library, and if I want to write my own `Vec` then (for the most part) I can. Some standard library code relies on compiler features that haven't been marked stable, which is occasionally frustrating, but the nightly compiler will let me use them if I really want to (most of the time I don't). Whereas in Go, I can't implement an equivalent to a goroutine. And even iterating over a container was "special" until generics came along.

This article was a really interesting look at where all that breaks down. There's obviously a trade-off between having to maintain all the plumbing as user-visible and therefore stable vs purely magic and able to be changed so long as you don't break the side effects. I think Rust manages to drive a fairly good compromise in allowing library implementations of core functionality while not needing to stabilise everything before releasing anything.

Animats 10 hours ago

This is going to take some serious reading.

I've been struggling with a related problem over at [1]. Feel free to read this, but it's nowhere near finished. I'm trying to figure out how to do back references cleanly and safely. The basic approach I'm taking is

- We can do just about everything useful with Rc, Weak, RefCell, borrow(), borrow_mut(), upgrade, and downgrade. But it's really wordy and there's a lot of run time overhead. Can we fix the ergonomics, for at least the single-owner case? Probably. The general idea is to be able to write a field access to a weak link as

    sometype.name

when what's happening under the hood is

    sometype.upgrade().unwrap().borrow().name

- After fixing the ergonomics, can we fix the performance by hoisting some of the checking? Probably. It's possible to check at the drop of sometype whether anybody is using it, strongly or weakly. That allows removing some of the per-reference checking. With compiler support, we can do even more.

What I've discovered so far is that the way to write about this is to come up with real-word use cases, then work on the machinery. Otherwise you get lost in type theory. The "Why" has to precede the "How" to get buy-in.

I notice this paper is (2024). Any progress?

[1] https://github.com/John-Nagle/technotes/blob/main/docs/rust/...

[-]

zozbot234 5 hours ago

> The general idea is to be able to write a field access to a weak link as

  sometype.name

> when what's happening under the hood is

  sometype.upgrade().unwrap().borrow().name

You could easily implement this with no language-level changes as an auto-fixable compiler diagnostic. The compiler would error out when it sees the type-mismatched .name, but it would give you an easy way of changing it to its proper form. You just avoid making the .name form permanent syntactic sugar (which is way too opaque for a low-level language like Rust), it gets replaced in development.

SkiFire13 3 hours ago

> when what's happening under the hood is

> sometype.upgrade().unwrap().borrow().name

I suspect a hidden `.unwrap()` like that will be highly controversial.

kurante 9 hours ago

Have you seen GhostCell[1]? Seems like this could be a solution to your problem.

[1]: https://plv.mpi-sws.org/rustbelt/ghostcell/

[-]

Animats 8 hours ago

Yes. There's an implementation at https://github.com/matthieu-m/ghost-cell

Not clear why it never caught on.

There have been many attempts to solve the Rust back reference problem, but nothing has become popular.

zozbot234 5 hours ago

The qcell crate is perhaps the most popular implementation of GhostCell-like patterns. But the ergonomics is a bit of a challenge still.

mustache_kimono 9 hours ago

> But it's really wordy and there's a lot of run time overhead.

I'm curious: what do the benchmarks say about this?

Ericson2314 8 hours ago

Oh this is really good!

I wrote https://github.com/Ericson2314/rust-papers a decade ago for a slightly different purpose, but fundamentally we agree.

For those trying to grok their stuff after reading the blog post, consider this.

The borrow checker vs type checker distinction is a hack, a hack that works by relegating a bunch of stuff to be "second class". Second class means that the stuff only occurs within functions, and never across function boundaries.

Proper type theories don't have this "within function, between function" distinction. Just as in the lambda calculus, you can slap a lambda around any term, in "platonic rust" you should be able to get any fragment and make it a reusable abstraction.

The author's here lens is async, which is a good point that since we need to be able to slice apart functions into smaller fragments with the boundaries at await, we need this abstraction ability. With today's Rust in contrast, the only way to do safe manual non-cheating awake would instead to be drasticly limit where one could "await" in practice, to never catch this interesting stuff in action.

In my thing I hadn't considered async at all, but was considering a kind of dual thing. Since these inconsievable types do in fact exist (in a Rust Done Right), and since we can also combine our little functions into a bigger function, then the inescable conclusion is that locations do not have a single fixed type, but have types that vary at different points in the control flow graph. (You can try model the control flow graph as a bunch of small functions and moves, but this runs afowl of non-movable stuff, including borrowed stuff, the ur-non-moveable stuff).

Finally, if we're again trying to make everything first class to have a language without cheating and frustration artificial limits on where abstraction boundaries go, we have to consider not just static locations changing type, but also pointers changing type. (We don't want to liberate some types of locations but not others.) That's where my thing comes in — references that have one type for the pointee at the beginning of the lifetime, and another type at the end.

This stuff might be mind blowing, but if should be seriously pressude. Having second class concepts in the language breeds epiccycles over time. It's how you get C++. Taking the time to make everything first class like this might be scary, but it yields a much more "stable design" that is much more likely to stand the test of time.

[-]

Ericson2314 8 hours ago

The post concludes by saying it's hopeless to get this stuff implemented because back compat, but I do think that that is true. (It might be hopeless for other reasons. It certainly felt hopeless in 2015.)

All this is about adding things to the language. That's backwards compatible. E.g. Drop doesn't need to be changed, because from every Drop instance a DropVer2 instance can be written instead. async v1 can also continue to exist, just by continuing to generate it's existing shitty unsafe code. And if someone wants something better, they can just use async v2 instead.

People get all freaked out about changing languages, but IMO the FUD is entirely due to sloppy imperative monkey brain. Languages are ideas, and ideas are immutable. The actual question is always, can we do "safe FFI" between two languages. Safe FFI between Rust Edition 20WX and 20YZ is so trivial that people forget to think about it that way. C and C++ is better since C "continues to exist", but of course the bar for "safe FFI" is so low when the language themselves are unsafe within themselves so that safety between them couldn't mean very much.

With harder edition breaks like this, the "safe FFI" mentality actually yields fruit.

shevy-java 3 hours ago

Rust is not an easy language.

IshKebab 5 hours ago

I think they should just implement position-independent borrows. So instead of the borrow being an absolute pointer that gets broken if you move the self-borrowing struct, you can move it just fine.

Yes it would add like one extra add to every access, but you hardly ever need self-borrows so I think it's probably an acceptable cost in most cases.

[-]

tux3 4 hours ago

Say I have this type:

    struct A {
      raw_data: Vec<u8>,
      parsed_data: B<&pie raw_data>,
      parsed_data2: B<&pie raw_data>
    }

    struct B<T> {
      foo: &pie T [u8],
    }

Ignoring that my made up notation doesn't make much sense, is the idea that B.foo would be an offset relative to its own adress?

So B.method(&self) might do addr(&self.foo) + self.foo, which is stable even if the parent struct A and its raw data field moves?

Then I wonder how to handle the case where the relative &pie reference itself moves. Maybe parsed_data is std::mem::replaced with parsed_data2 (or maybe one of them is an Option<B> and we Option.take() it somewhere else.)

SkiFire13 3 hours ago

This has been proposed at the time, but it doesn't work for the case where the borrow points to stable memory (e.g. a `&str` pointing to the contents of a `String` in the same struct). In general case a reference might point to either stable or unstable memory at runtime, so there's no way to make this always work (e.g. in async functions)

uecker an hour ago

The people who say Rust is too complex just do not want to learn. /s