Is this concept of UB as poison actually sound? It seems to me (and I apologize for not using real notation, as I can read a little bit of Lean but I’ve never tried writing it, and my experience writing optimizers is nonexistent):
Suppose we start when an SSA-style function with inputs x and y:
0
And we rewrite it as:
let z = x +nsw y --or anything else that is UB for some input
0
The original code is a function that takes x and y and returns an instance of iN {n}, namely bitvec 0.
The new code also takes any input to bitvec 0, although it may discard a poison value along the way. But this is UB in the C model!
I wonder whether this situation can be avoided by declaring that there is no control flow and that the program representation disallows discarding values. (And select would need to return poison if any argument is poison even if the selector says that the poison is not used.)
The concept of poison here is not the same as C's definition of UB. LLVM has at least 3 different concepts that all vaguely count as UB (and probably closer to a half dozen when it gets fully formalized).
In C, UB is instant-UB--the moment you execute an operation that is UB, your program is undefined. LLVM also has instant-UB (I mean, any language that lets you dereference an arbitrary integer cast to a pointer has to have instant-UB). But poison isn't instant-UB--your program is perfectly safe to execute an operation that produces poison. It's only if a poison value reaches certain operations--essentially any control-flow decision point, or as an input to any operation that may cause instant-UB--that it causes UB in the C sense.
(This also means that operations that could trap--like x / 0--aren't modeled as poison in LLVM, but as instant-UB. It's safe to speculate an operation that causes poison, like x +nsw y, but it's not safe to speculate an operation that causes instant-UB, like x / y).
This kinda depends on how you model assignments. AFAIK in denotational semantics you usually express them a state update function, taking the previous state of the program and returning a new state with the value for that variable updated to the new one, or an error state if the expression evaluated to `poison`.
It does. But maybe someone should prove (in Lean?) that the lack of flow control is sufficient.
Without a constraint that values aren’t ignored, the lack of flow control is certainly not sufficient, so trying to do this right would require defining (in Lean!) what an expression is.
I'm not sure about moral per se, but LLVM is practically a painful dependency—it doesn't have API compatibility between versions, release builds are rapidly approaching a gigabyte, you are given some franken-version by the system, and you won't have a good time if you end up linking/loading more than one version for whatever reason.
IIRC there was also some case where an LLVM bump caused tcmalloc to explode inside wine.
Is this concept of UB as poison actually sound? It seems to me (and I apologize for not using real notation, as I can read a little bit of Lean but I’ve never tried writing it, and my experience writing optimizers is nonexistent):
Suppose we start when an SSA-style function with inputs x and y:
And we rewrite it as: The original code is a function that takes x and y and returns an instance of iN {n}, namely bitvec 0.The new code also takes any input to bitvec 0, although it may discard a poison value along the way. But this is UB in the C model!
I wonder whether this situation can be avoided by declaring that there is no control flow and that the program representation disallows discarding values. (And select would need to return poison if any argument is poison even if the selector says that the poison is not used.)
The concept of poison here is not the same as C's definition of UB. LLVM has at least 3 different concepts that all vaguely count as UB (and probably closer to a half dozen when it gets fully formalized).
In C, UB is instant-UB--the moment you execute an operation that is UB, your program is undefined. LLVM also has instant-UB (I mean, any language that lets you dereference an arbitrary integer cast to a pointer has to have instant-UB). But poison isn't instant-UB--your program is perfectly safe to execute an operation that produces poison. It's only if a poison value reaches certain operations--essentially any control-flow decision point, or as an input to any operation that may cause instant-UB--that it causes UB in the C sense.
(This also means that operations that could trap--like x / 0--aren't modeled as poison in LLVM, but as instant-UB. It's safe to speculate an operation that causes poison, like x +nsw y, but it's not safe to speculate an operation that causes instant-UB, like x / y).
This kinda depends on how you model assignments. AFAIK in denotational semantics you usually express them a state update function, taking the previous state of the program and returning a new state with the value for that variable updated to the new one, or an error state if the expression evaluated to `poison`.
iiic the model assumes no flow control, only select.
It does. But maybe someone should prove (in Lean?) that the lack of flow control is sufficient.
Without a constraint that values aren’t ignored, the lack of flow control is certainly not sufficient, so trying to do this right would require defining (in Lean!) what an expression is.
> Let’s say for moral reasons you choose to link none of LLVM’s libraries (thanks for caring about your users!)
For what moral reasons would I avoid linking LLVM? I’m not familiar.
I'm not sure about moral per se, but LLVM is practically a painful dependency—it doesn't have API compatibility between versions, release builds are rapidly approaching a gigabyte, you are given some franken-version by the system, and you won't have a good time if you end up linking/loading more than one version for whatever reason.
IIRC there was also some case where an LLVM bump caused tcmalloc to explode inside wine.