> This option is enabled by default on most targets.
What a footgun.
I understand that, in an effort to compete with other compilers for relevance, GCC pursued performance over safety. Has that era passed? Could GCC choose safer over fast?
Alternatively, has someone compiled a list of flags one might want to enable in latest GCC to avoid such kinds of dangerous optimizations?
Just for the record, that's not the main purpose of -fdelete-null-pointer-checks.
Normally, it only deletes null checks after actual null pointer dereferences. In principle this can't change observable behavior. Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null. In other words, unlike most C compiler optimizations, -fdelete-null-pointer-checks should be safe even if you do commit undefined behavior.
This once caused a kerfuffle with the Linux kernel. At the time, x86_64 CPUs allowed the kernel to dereference userspace addresses, and the kernel allowed userspace to map address 0. Therefore, it was possible for userspace to arrange for null pointers to not trap when dereferenced in the kernel. Which meant that the null check optimization could actually change observable behavior. Which introduced a security vulnerability. [1]
Since then, Linux has been compiled with `-fno-delete-null-pointer-checks`, but it's not really necessary: Linux systems have long since enforced that userspace can't map address 0, which means that deleting null pointer checks should be safe in both kernel and userspace. (Newer CPU security features also protect the kernel even if userspace is allowed to map address 0.)
But anyway, I didn't know that -fdelete-null-pointer-checks treated "memcpy with potentially-zero size" as a condition to remove subsequent null pointer checks. That means that the optimization actually isn't safe! Once GCC is updated to respect the newly well-defined behavior, though, it should become truly safe. Probably.
The same can't be said for most UB optimizations – most of which can't be turned off.
Irrelevant, because delete-null-pointer-checks happens even in absence of nonnull function attribute, see GP's godbolt link, and the documentation that omits any reference to that function attribute.
That is a side effect of passing the pointer as a function parameter marked nonnull. It implies that the pointer is nonnull and any NULL checks against it can be removed. Pass it to a normal function and you will not see the NULL check removed.
Explanation for the above: passing NULL as the destination argument to memcpy() is undefined behaviour at present. gcc assumes that the fact that memcpy() is called therefore means that the destination argument can't be NULL, so "knows" that the dest == NULL check can never be true, and so removes the test and the do_thing1() branch entirely.
Interestingly, replacing len in the memcpy() call results in gcc instead removing the memcpy() call and retaining the check - presumably a different optimisation routine decides that it's a no-op in that case. https://godbolt.org/z/cPdx6v13r is, therefore, interesting - despite this only ever calling test() with a len of 0, the elision of the dest == NULL check is still there, but test() has been inlined without the memcpy (because len == 0) but with do_thing2() (because the behaviour is undefined and so it can assume dest isn't NULL even though there's a NULL literally right there!)
You can, but gcc may replace it with an equivalent set of instructions as a compiler optimization, so you would have no guarantee it is used unless you hack the compiler.
On a related note, GCC optimizing away things is a problem for memset when zeroing buffers containing sensitive data, as GCC can often tell that the buffers are going to be freed and thus the write is deemed unnecessary. That is a security issue and has to be resolved by breaking the compiler’s optimization through a clever trick:
Similarly, GCC may delete a memcpy to a buffer about to be freed, although I have never observed that as you generally don’t do that in production code.
Per ISO C, the identifiers declared or defined with external linkage by any C standard library header are considered reserved, so the moment you define your own memcpy, you're already in UB land.
The valid inputs to memcpy() are defined by the C specification, so the compiler is free to make assumptions about what valid inputs are even if the library implementation chooses to allow a broader range of inputs
Many standard C functions are treated as “magic” by compilers. Malloc is treated as if it has no side effects (which of course it does, it changes allocator state) so the optimiser can elide allocations. If not you wouldn’t be able to elide the call because malloc looks like it has side effects, which it does but not ones we care about observing.
If I'm understanding the OP correctly, the C standard says so, i.e. the semantics of memcpy are defined by the standard and the standard says that it's UB to pass NULL.
Unlike all the more complicated languages the "freestanding" mode C doesn't even have a memcpy feature, so it may not define how one works - maybe you've decided to use the name "memcpy" for your function which generates a memorandum about large South American rodents, and "memo_capybara" was too much typing.
In something like C++ or Rust, even their bare metal "What do you mean Operating System?" modes quietly require memcpy and so on because we're not savages, clearly somebody should provide a way to copy bytes of memory, Rust is so civilised that even on bare metal (in Rust's "core" library) you get a working sort_unstable() for your arbitrary slice types!
They're just acting as agents that derive the logical consequences of the code.
The fact that the given example code is "surprising" is analogous to this mathematical derivation:
a = b
a*a = b*a
a*a - b*b = b*a - b*b
(a - b)(a + b) = b(a - b)
(a - b)(a + b)/(a - b) = b(a - b)/(a - b)
^ Divide by 0, undefined behavior!
Everything below is not necessarily true.
a + b = b
b + b = b
2b = b
2 = 1
2 - 1 = 1 - 1
1 = 0
The source of truth about what is/isn't allowed is the C standard, not your personal simplified model of it that may contain dangerous misconceptions. The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.
> The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.
Or it is a problem with the document, which is the entire reason we are having this discussion: N3322 argued the document should be fixed, and now it will be for C2y.
> However, the most vocal opposition came from a static analysis perspective: Making null pointers well-defined for zero length means that static analyzers can no longer unconditionally report NULL being passed to functions like memcpy—they also need to take the length into account now.
How does this make any sense? We don't want to remove a low hanging footgun because static analyzers can no longer detect it?
Yes, but that tradeoff exists for most things those tools do. If you can easily and perfectly detect an error, it should just go into the compiler (and perhaps language spec).
I just skimmed through the proposed wording in [N3322]. It looks like it silently fixes a defect too, NULL == NULL was also undefined up until C23. Hilarious.
This is probably related to the issue with NULL - NULL mentioned in the article.
Imagine you’re working in real mode on x86, in the compact or large memory model[1]. This means that a data pointer is basically struct{uint16_t off,seg;} encoding linear address (seg<<4)+off. This makes it annoying to have individual allocations (“objects”) >64K in size (because of the weird carries), so these models don’t allow that. (The huge model does, and it’s significantly slower.) Thus you legitimately have sizeof(size_t) == 2 but sizeof(uintptr_t) == 4 (hi Rust), and God help you if you compare or subtract pointers not within the same allocation. [Also, sizeof(void *) == 4 but sizeof(void (*)(void)) == 2 in the compact model, and the other way around in the medium model.]
Note the addressing scheme is non-bijective. The C standard is generally careful not to require the implementation to canonicalize pointers: if, say, char a[16] happens to be immediately followed by int b[8], an independently declared variable, it may well be that &a+16 (legal “one past” pointer) is {16,1} but &b is {0,2}, which refers to the exact same byte, but the compiler doesn’t have to do anything special because dereferencing &a+16 is UB (duh) and comparing (char *)(&a+16) with (char *)&b or subtracting one from the other is also UB (pointers to different objects).
The issue with NULL == NULL and also with NULL - NULL is that now the null pointer is required to be canonical, or these expressions must canonicalize their operands. I don’t know why you’d ever make an implementation that has non-canonical NULLs, but I guess the text prior to this change allowed such.
> now the null pointer is required to be canonical
Yikes! This particular oddity seems annoying but sort of harmless in x86 real mode, but not necessarily in protected mode. Imagine code that wants to load a pointer into a register: it loads the offset into an ordinary register and the selector portion into a segment register. It’s permissible to load the 0 (null) selector, but loading garbage will fault immediately. So, if you allow non canonical NULL, then knowing that a pointer is either valid or NULL does not allow you to hoist a segment load above a condition that might mean you never actually dereference the pointer.
(I have plenty of experience with low-level OS code in all kinds of nasty x86 modes but, thankfully, not so much experience writing ordinary C code targeting protected mode. It sometimes boggles my mind that anyone ever got decent performance with anything involving far data pointers. Segment loads are slow, and there are not a lot of segment registers to go around.)
In real mode assembly days, ES and sometimes DS were just another base register that you could use in a loop. Given the dearth of addressing modes it was quite nice to assume that large arrays started at xxxx0h and therefore that the offset part of the far pointer was zero.
If so, it's one that's been introduced at some point post C99 -- the C99 spec explicitly defines the behaviour of NULL == NULL. Section 6.5.9 para 6 says "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object [etc etc]".
Cannot find any confirmation to your statement. Otoh "All null pointer values (of compatible typewithin the same address space) are already required to compare equal. " in the limked paper.
That's a reasonable intuitive interpretation of how it should behave, but according to the spec it's undefined behaviour and compilers have a great degree of freedom in what happens as a result.
More information on this behavior in the link below.
> Note that, apart from contrived examples with deleted null checks, the current rules do not actually help the compiler meaningfully optimize code. A memcpy implementation cannot rely on pointer validity to speculatively read because, even though memcpy(NULL, NULL, 0) is undefined, slices at the end of a buffer are fine. [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
> [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
‘Only’ on platforms that have memory protection hardware. Even there, the platform can always allocate an overflow page for a process, or have the page fault handler check whether the page fault happened due to a speculative read, and repair things (I think the latter is hugely, hugely, hugely impractical, but the standard cannot rule it out)
My comment is a reply to (part of) a comment that isn’t talking about reading from NULL. That’s what the [And if the end of the buffer] part implies.
Even if it didn’t, I don’t think the standard should assume that “Platforms without memory protection hardware also have no problem reading NULL”
An OS could, for example, have a very simple memory protection feature where the bottom half of the memory address range is reserved for the OS, the top half for user processes, and any read from an address with the high bit clear by code in the top half of the address range traps and makes the OS kill the process doing the read.
What does "speculative" mean in this case? I understand it as CPU-level speculative execution a.k.a. branch mis-prediction, but that shouldn't have any real-world effects (or else we'd have segfaults all the time due to executing code that didn't really happen)
I feel strongly they should split undefined behavior in behavior that is not defined, and things that the compiler is allowed to assume. The former basically already exists as "implementation defined behavior". The latter should be written out explicitly in the documentation:
> memcpy(dest, src, count)
> Copies count bytes from src to dest. [...] Note this is not a plain function, but a special form that
applies the constraints dest != NULL and src != NULL to the surrounding scope. Equivalent to:
The conflation of both concepts breaks the mental model of many programmers, especially ones who learned C/C++ in the 90s where it was common to write very different code, with all kinds of now illegal things like type punning and checking this != NULL.
I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.
When C was conceived, CPU architectures and platforms were more varied than what we see today. In order to remain portable and yet performant, some details were left as either implementation defined, or completely undefined (i.e. the responsibility of the programmer). Seems archaic today, but it was necessary when C compilers had to be two-pass and run in mere kilobytes of RAM. Even warnings for risky and undefined behavior is a relatively modern concept (last 10-20 years) compared to the age of C.
When C was conceived, it was made for a specific DEC CPU, for making an operating system. The idea of a C standard was in the future.
If you wanted to know what (for instance) memcpy actually did, you looked at the source code, or even more likely, the assembler or machine code output. That was "the standard".
I think it's reasonable to assume that GP clearly meant the C standard being conceived, as, obviously, K&R's C implementation of the language was ad hoc rather than exhibiting any prescribed specification.
> Seems archaic today ... run in mere kilobytes of RAM
There is an entire industry that does pretty much that... today. They might run in flash instead of RAM, but still, a few kilobytes.
Probably there are more embedded devices out there than PCs. PIC, AVR, MSP, ARM, custom archs. There might be one of those right now under your hand, in that thing you use to move the cursor.
1. Initially, they just wanted to give compiler makers more freedom: both in the sense "do whatever is simplest" and "do something platform-specific which dev wants".
2. Compiler devs found that they can use UB for optimization: e.g. if we assume that a branch with UB is unreachable we can generate more efficient code.
3. Sadly, compiler devs started to exploit every opportunity for optimization, e.g. removing code with a potential segfault.
I.e. people who made a standard thought that compiler would remove no-op call to memcpy, but GCC removes the whole branch which makes the call as it considers the whole branch impossible. Standard makers thought that compiler devs would be more reasonable
> Standard makers thought that compiler devs would be more reasonable
This is a bit of a terrible take? Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"
Rather, repeatedly applying a series of targeted optimizations, each one in isolation being "reasonable", results in an eventual "unreasonable" total transformation. But this is more an emergent property of modern compilers having hundreds of optimization passes.
At the time the standards were created, the idea of compilers applying so many optimization passes was just not conceivable. Compilers struggled to just do basic compilation. The assumption was a near 1:1 mapping between code & assembly, and that just didn't age well at all.
One could argue that "optimizing based on signed overflow" was an unreasonable step to take, since any given platform will have some sane, consistent behavior when the underlying instructions cause an overflow. A developer using signed operations without poring over the standard might have easily expected incorrect values (or maybe a trap if the platform likes to use those), but not big changes in control flow. In my experience, signed overflow is generally the biggest cause of "they're putting UB in my reasonable C code!", followed by the rules against type punning, which are violated every day by ordinary usage of the POSIX socket functions.
I started to like signed overflow rules, because it is really easy to find problems using sanitizers.
The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior. (and POSIX could of course just define something even if ISO C leaves it undefined, but I don't think this is needed here)
> One could argue that "optimizing based on signed overflow" was an unreasonable step to take
That optimization allows using 64-bit registers / offset loads for signed ints which it can't do if it has to overflow, since that overflow must happen at 32-bits. That's not an uncommon thing.
There isn't a "find UB branches" pass that is seeking out this stuff.
Instead what happens is that you have something like a constant folding or value constraint pass that computes a set of possible values that a variable can hold at various program points by applying constraints of various options. Then you have a dead code elimination pass that identifies dead branches. This pass doesn't know why the "dest" variable can't hold the NULL value at the branch. It just knows that it can't, so it kills the branch.
Imagine the following code:
int x = abs(get_int());
if (x < 0) {
// do stuff
}
Can the compiler eliminate the branch? Of course. All that's happened here is that the constraint propagation feels "reasonable" to you in this case and "unreasonable" to you in the memcpy case.
Calling abs(INT_MIN) on twos-complement machine is not allowed by the C standard. The behavior of abs() is undefined if the result would not fit in the return value.
Charitable interpretation may be: Back then when the contract of this function was standardized, presumably in C89 which is ~35 years ago, CPUs but also C compilers were not as powerful so wasting an extra couple of CPU cycles to check this condition was much more expensive than it is today. Because of that contract, and which can be seen in the example in the below comments, the compiler is also free to eliminate the dead code which also has the effect of shaving off some extra CPU cycles.
Back when they wrote it they were trying to accommodate existing compilers, including those who did useful things to help people catch errors in their programs (e.g. making memcpy trap and send a signal if you called it with NULL). The current generation of compilers that use undefined behaviour as an excuse to do horrible things that screw over regular programmers but increase performance on microbenchmarks postdates the standard.
Because the benefit was probably seen as very little, and the cost significant.
When you're writing a compiler for an architecture where every byte counts you don't make it write extra code for little benefit.
Programmers were routinely counting bytes (both in code size and data) when writing Assembly code back then, and I mean that literally. Some of that carried into higher-level languages, and rightly so.
memcpy used to be a rep movsb on 8086 DOS compilers. I don't remember if rep movsb stops if cx=0 on entry, or decrements first and wraps around, copying 64K of data.
The specification does not explicitly say that, but the clear intention is that REP with CX=0 should be no-op (you get exactly that situation when REP gets interrupted during the last iteration, in that case CX is zero and IP points to the REP, not the following instruction).
I know at least MSVC's memcpy on x86_64 still results in a rep movsb if the cpuid flag that says rep movsb is fast is set, which it should be on all x86 chips from about 2011/2012 and onward ;)
Probably because they did not think of this special case when writing the standard, or did not find it important enough to consider complicating the standard text for.
In C89, there's just a general provision for all standard library functions:
> Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow. If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined. [...]
And then there isn't anything on `memcpy` that would explicitly state otherwise.
Later versions of the standard explicitly clarified that this requirement applies even to size 0, but at that point it was only a clarification of an existing requirement from the earlier standard.
People like to read a lot more intention into the standard than is reasonable. Lots of it is just historical accident, really.
Every time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior, and to allow compilers to use it as an optimization point
> time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior
That's implementation defined (more or less) ie teh compiler can do whatever makes mst sense for its implementation.
Undefined means (more or less) that the compiler can assume the behaviour never happens so can apply transforms without taking it into account.
> to allow compilers to use it as an optimization point
That's the main advantage of undefined behaviour ie if you can ignore the usage, you may be able to apply optimisations that you couldn't if you had to take it into account. In the article, for example, GCC eliminated what it considered dead code for a NULL check of a variable that couldn't be NULL according to the C spec.
That's also probably the most frustrating thing about optimisations based on undefined behaviour ie checks that prevent undefined behaviour are removed because the compiler thinks that the check can't ever succeed because, if it did, there must have been undefined behaviour. But the way the developer was ensuring defined behaviour was with the check!
AFAIK, something having undefined behavior in the spec does not prevent an implementation- (platform-)specific behavior being defined.
As to your point about checks being erased, that generally happens when the checks happen too late (according to the compiler), or in a wrong way. For example, checking that `src` is not NULL _after_ memcpy(sec, dst, 0) is called. Or, checking for overflow by doing `if(x+y<0) ...` when x and y are nonnegative signed ints.
I mean, they might not have given thought to that particular corner case, they probably wrote something like
> memcpy(void* ptr1, void* ptr2, int n)
Copy n bytes from ptr1 to ptr2.
UNDEFINED if ptr1 is NULL or ptr2 is NULL
‐------
It might also have come from a "explicit better than implicit" opinion, as in "it is better to have developers explicitly handle cases where the null pointer is involved
I think it's more a strategy. C was not created to be safe. It's pretty much a tiny wrapper around assembler. Every limitation requires extra cycles, compile time or runtime, both of which were scarce.
Of course, someone needs to check in the layers of abstraction. The user, programmer, compiler, cpu, architecture.. They chose for the programmer, who like to call themselves "engineers" these days.
I get that for the library. But I'm a bit puzzled about the optimizations done by a compiler based on this behavior.
E.g., suppose we patch GCC to preserve any conditional containing the string 'NULL' in it. Would that have a measurable performance impact on Linux/Chromium/Firefox?
People will only rely on UB when it is well defined by a particular implementation, either explicitly or because of a long history of past use. E.g. using unions for type punning in gcc, or allowing methods to be called on null pointers in MSVC.
A trivial implementation wouldn't dereference dest or src in case the length is 0. That's how a student would write it with a for loop (byte-by-byte copy). A non-trivial implementation might do something with the pointers before entering the copy loop.
I have asked this question in the past and was told that memcpy() is allowed to preemptively read before it has determined it needs to write to make it faster on some CPUs. The presumption is that if you are going to be copying data, there is at least one cache line there already, so reading can start early.
It does nothing, but is only defined when the pointers point into or one past the end of valid objects (live allocations), because that's how the standard defines the C VM, in terms of objects, not a flat byte array.
This is wrong. If you do p=malloc(256), p+256 is valid even though it does not point to a valid address (it might be in an unmapped page; check out ElectricFence). Rust's non-null aligned other pointer is the same, memcpy can't assume it can be dereferenced if the size is zero. The standard text in the linked paper says the same.
also UB according to the spec, but LLVM is free to define it. e.g., clang often converts trivial C++ copy constructors to memcpy, which is UB for self-assignment, but I assume that's fine because the C++ front-end only targets LLVM, and LLVM presumably defines the behaviour to do what you'd expect.
Where I work, it is quite normal to link together C code compiled with GCC and Rust code compiled with LLVM, due to how the build system is set up.
As far as I know that disables LTO, but the build system is so complex, and the C code so large, that nobody bothers switching the C side to Clang/LLVM as well.
Purely mechanically, yes, but in terms of the definition of the behaviour in the C abstract machine, no, because certain operations on null pointers are undefined, even if the obvious low-level compilation turns into nothing.
If you do this, your C code will run significantly slower than, say, Java, Go, or C#, because the compiler is unable to apply even the most basic optimizations (which it can do still in all those other languages).
So, at that point why even use C at all? Today, C is used where the overhead of a managed language is unacceptable. If you could just eat the performance cost, you'd probably already be using a managed language. There's not much desire for a variant of C with what would be at least a 10x slowdown in many workloads.
Or it could be made faster because certain manual optimizations become possible.
An example would a table of interned strings that you wanna match against (say you're writing a parser).
Since standard C says thou shall not compare pointers with < or > unless they both point into the same 'object' you are forbidden from doing the speed of light code:
To elaborate, we treat pointers as more than just integers because it gives optimizers the latitude to reorder and eliminate pointer operations. In the example above we cannot do this, because we cannot prove at compile time that x doesn't live at the address returned by oracle.
However, given how low-level a language C++ is, we can actually break this assumption by setting i to y-x. Since &x[i] is the same as x+i, this means we are actually writing 23 to &y[0].
But that is undefined, you can't do x + (y - x) ie a pointer arithmetic that ends outside of bounds of an array. Since it is undefined, shouldn't C++ assume that changing x[..] can't change y[0]
edit: welp, if I read a few more lines into article I would see that it also tells it is undefined
to be clear, in my example the result of oracle() cannot possibly alias with 'x' in C or C++ (and in fact gcc will optimize accordingly). In a different language where addresses are mere integers, things would be more complicated.
The numerical value returned by oracle might physically match the address of the stack slot for 'x', assuming that it exists, but it doesn't mean that, from a language point of view, it is a valid pointer.
If forging pointers had defined behaviour, it would be impossible to use the language sanely or perform any kind of optimization.
When using C, this can return anything (or crash of oracle function returns an invalid pointer, or rewrite its own code if the code section is writable). So if you get rid of "abstract machine", nothing changes - the program can return anything or crash.
The point is that the C standard does guarantee that the function returns 1 if the program is a valid C program - which means there is no UB.
For example: If the oracle function returns an invalid pointer, then dereferencing that pointer is UB, and therefore the program isn't a valid C program.
That’s the point. C allows this function to be optimized to always return 1. A “pointers are addresses, just emit reads and writes and stop trying to be so clever” version of C would require x to be spilled to the stack, then the write, then reload x and return whatever it contained.
Then use the register keyword or just reword the standard to assume the register behavior if a variables address hasn't been taken.
The majority of useful optimizations can be kept in a "Sane C" with either code style changes (cache stuff in local vars to avoid aliasing for example) or with minor tweaks to the standard.
Register behavior is what you want essentially all of the time. So we’d just have to write `register` all over the place for no gain.
“Don’t optimize this, read and write it even if you think it’s not necessary” is a very rare case so it shouldn’t be the default. If you want it, use the volatile keyword.
There’s no need to reword the standard to assume the register behavior if the variable’s address hasn’t been taken. That’s already how it works. In this example, if you escape the value of `&x`, it’s not legal to optimize this function to always return 1.
the literal 1 is not an object in C or C++ hence it does not have an address. If you meant 'x', then also no, oracle() can't return the address of 'x' because of pointer provenance rules.
That would restrict C to memory models with a linear address space. That is usually the case nowadays for C implementations, but maybe we don’t want to set that in stone, because it would be virtually impossible to revert such a guarantee.
There’s also cases like memory address ranges that map to non-memory hardware (i.e. that don’t behave like “dumb” memory), and how would you have the C standard define behavior for those?
Lastly, CPU caches require some sort of abstract model as soon as you have multi-threading.
The value of an abstract machine is that it allows you to specify how a given program behaves without needing to point to a specific piece of hardware. Compilers then have this as a target when compiling a program for a specific piece of hardware so that they know when the compiler's output is correct.
The issue here is that the abstract machine is under or badly specified.
20 years ago, making a C compiler that provided sane behaviour and better guarantees (going beyond the minimum defined in the standard) to make code safer and programmers' lives easier, even at the cost of some performance, might have been a good idea. Today any programmer who thinks things like not having security bugs are more important than having bigger numbers on microbenchmarks has already moved on from C.
This is certainly not true. Many programmers also learned to the use tools available to write reasonably safe code in C. I do not personally find this problematic.
You're like a Japanese holdout in the 60s refusing to leave his bunker long after the war is over.
C lost. Memory safety is a huge boon for security. Human beings, even the best of them, cannot consistently write correct C code. (Look at OpenBSD.) You can keep fighting the war your side has already lost or you can move on.
Note that the key word here is sound. The more common static analyzers are unsound tools that will miss cases. Sound tools do not, but few people know of them, they are rare and they are typically proprietary and expensive.
Sure. I'm also a big fan of what Microsoft has done with SAL. And of course you have formally proven C, as used in seL4. I'd say that the contortions you have to go through to write code with these systems takes you out of the domain of "C" and into a domain of a different, safer language merely resembling C. Such a language might be a fine tool! But it's not arbitrary C.
I think the first one, stack overflow, is technically not a memory safety issue, just denial-of-service on resource exhaustion. Stack overflow is well defined as far as I know.
The other three are definitely memory safety issues.
I would consider a stack overflow to be a memory safety issue. The C++ language authors likely would too. C++ famously refused to support variable length stack allocated arrays because of memory safety concerns. In specific, they were worried that code at runtime would make an array so big so big that it would jump the OS guard page, allowing access to unallocated memory that of course is not noticed ahead of time during development. This is probably easy to do unintentionally if you have more stack variables after an enormous stack allocated array and touch them before you touch the array. The alternative is to force developers to use compiler extensions such as alloca(). That makes it easy to pass pointers outside of the stack frame where they are valid and is a definite safety issue. The C++ nitpicking over variable length arrays is silly since it gives us a status quo where C++ developers use alloca() anyway, but it shows that stack overflows are considered a memory safety issue.
In the general case, I think you might be right, although it's a bit mitigated by the fact that Rust does not have support for variable length arrays, alloca, or anything that uses them, in the standard library. As you said though, it's certainly possible.
I was more referring to that specific linked advisory, which is unlikely to use either VLAs or alloca. In that case, where stack overflow would be caused by recursion, a guard frame will always be enough to catch it, and will result in a safe abort [0].
> I think a memory address is a number that CPU considers to be a memory address
I meant to say that, indeed, there must be some concept of CPU for a memory address to have a meaning, and for this concept of CPU to be as widely applicable as possible, surely defining it as abstract as possible is the way to go. Ergo, the idea of a C abstract machine.
Anyway, other people in this thread are discussing the matter more accurately and in more details than I could hope to do, so I'll leave it like that.
That’s currently the case in C, in that you can convert pointers to and from uintptr_t. However, not every number representable in that type needs to be valid memory (that’s true on the assembly level as well), hence it’s only defined for valid pointers.
Undefined behaviour is undefined behaviour whatever optimisation level you use.
Some -f flags may extend the C standard and remove undefined behaviour in some cases (e.g. strict aliasing, signed integer overflow, writable string constants, etc.)
No, because ISO never said it must behave this way.
Yes, because every libc I've personally encountered acts this way. At a glance, glibc's x86 implementation[1, 2], musl, and picolibc all handle 0-length memcpy as you'd expect.
I'm sure other folks could dig up the code for Newlib, uclibc, and others, and they'd see the same thing.
On a related note, ISO C has THREE different things that most people tend to lump together as "undefined behavior." They are:
Implementation-defined behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently apply a particular behavior, and document that behavior.
Unspecified behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently use a particular behavior, but they don't require that behavior to be documented.
Undefined behavior: ISO doesn't require any particular behavior, and they don't require implementations to define any particular behavior either.
Isn't it more sensible to just check that the params that are about to be sent to memcpy be reasonable?
That is why I tend to wrap my system calls with my own internal function (which can be inlined in certain PLs), where I can standardize such tests. Otherwise, the resulting code that performs the checks and does the requisite error handling is bloated.
Note that I am also loath to #DEFINE such code because C is already rife with them and my perspective is that the less of them the better.
At the end of the day, quick and dirty fixes will prove the adage "short cuts make long delays", and OpenBSD's approach is the only really viable long-term solution, where you just have to rewrite your code if it has ill-advised constructs.
For designing libraries such as C's stdlib, I don't believe in 'undefined behavior', clearly define your semantics and say, "If you pass a NULL to memcpy, this is what will happen." Same for providing a (n == 0), or should (src == dst).
And if, for some strange reason, fixing the semantics breaks calling code, then I can't imagine that their code wasn't f_cked in the first place.
As the article points out, all major memcpy implementations already do this check inside memcpy. Sure, the caller can also check, but given that it's both redundant in practice and makes some common patterns harder to use than they would otherwise be, there's no reason to not just standardize what's already happening anyway and make everyone's lives easier in the process.
Generally, undefined behavior removes the need for systematically checking for special cases, the most common being out of bounds access.
But it can go further than that. Dereferencing a NULL pointer is undefined behavior, so if a pointer is dereferenced, it can be assumed by the compiler not to be NULL and the code can be optimized. For example:
void foo(int *p) {
*p++;
if (p == NULL) {
printf("val is NULL\n");
} else {
printf("val is %d\n", *p);
}
}
can be optimized to:
void foo(int *p) {
*p++;
printf("val is %d\n", *p);
}
Note that static analyzers will most likely issue a warning here as such a trivial case is most likely a mistake. But the check for NULL may be part of an inline function that is used in many places, and thanks to the undefined behavior, the code that handles the NULL case will only be generated when relevant. The problem, of course, is that it assumes that the programmer knows what he is doing and doesn't make mistakes.
In the case of memcpy(NULL, NULL, 0), there probably isn't much to gain making it undefined. It most likely doesn't help with the memcpy implementation (len=0 is a generally no-op), and inference based on the fact that the arguments can't be NULL is more likely to screw the programmer up than to improve performance.
It all adds up. All those instructions you don't have to execute, especially memory access and cache misses from jumps, pipeline stalls from conditionals, not just from this optimization.
If we're inlining the call, then we can hoist the NULL check out of the loop. Now it's 1 check per 20 million operations. There's no need to eliminate it or have UB at that point.
>All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.
He gave you a simple and straightforward example, but that example may not be representative of a real world program where complex analysis leads to better performing code.
As a programmer, its far easier to just insert bounds checks everywhere, and trust the system to remove them when possible. This is what Rust does, and it safe. The problem isn't the compiler, the problem is the standard. More broadly, the standard wasn't written with optimizing compilers in mind.
The simplest example of a compiler optimization enabled by UB would be the following:
int my_function() {
int x = 1;
another_function();
return x;
}
The compiler can optimize that to:
int my_function() {
another_function();
return 1;
}
Because it's UB for another_function() to use an out-of-bounds pointer to access the stack of my_function() and modify the value of x.
And the most important example of a compiler optimization enabled by UB is related to that: being UB to access local variables through out-of-bounds pointers allows the compiler to place them in registers, instead of being forced to go through the stack for every operation.
I don't find those compelling reasons and, to the contrary, I think that kind of semantic circumvention to be a symptom of a poorly developed industry.
How can we have properly functioning programs without clearly-defined, and sensible, semantics?
If the developer needs to use registers, then they should choose a dev env/PL that provides them, otherwise such kludges will crash and burn, IMO.
Are you saying that C compilers should change every local variable access to read and write to the stack just in case some function intentionally does weird pointer arithmetic to change their values without referring to them in the source code?
We stopped explicitly declaring locals with the 'register' keyword circa 40 years ago. Register allocation is a low hanging fruit and one of those things that is definitely best left to a compiler for most code.
And now they have to manage register pressure for it to keep being faster. And false dependencies. And some more. It doesn’t work like that. Developers can’t optimize like compilers do, not with modern CPUs. The compilers do the very heavy lifting in exchange for the complexity of a set of constraints they (and you as a consequence, must) rely on. The more relaxed these constraints are, the less performant code you get. Modern CPUs run modern interpreters as fast as dumbest-compiled C code basically, so if you want sensible semantics, then Typescript is one of the absolutely non-ironic answers.
What you describe there is UB. If you define this in the standard, you are defining a kind of runtime behavior that can never happen in a well formed program and the compiler does not have to make a program that encounters this behavior do anything in particular.
Does this still matters today? I mean, first registers are anyway saved on the stack when calling a function, and caches of modern processors are really nearly as fast (if not as fast!) as a register. Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
Assuming that the program is "bug free" to me is a terrible idea, since even mitigations that the programmer puts in place to mitigate the effect of bugs (and no program is bug free) are skipped because the compiler can assume the program has no bug. To me security is more important than a 1% more boost in performance.
> I mean, first registers are anyway saved on the stack when calling a function
No, they aren't. For registers defined in the calling convention as "callee-saved", they don't have to be saved on the stack before calling a function (and the called function only has to save them if it actually uses that register). And for registers defined as "caller-saved", they only have to be saved if their value needs to be kept. The compiler knows all that, and tends to use caller-saved registers as scratch space (which doesn't have to be preserved), and callee-saved registers for longer-lived values.
> and caches of modern processors are really nearly as fast (if not as fast!) as a register.
No, they aren't. For instance, a quick web search tells me that the L1D cache for a modern AMD CPU has at least 4 cycles of latency. Which means: even if the value you want to read is already in the L1 cache, the processor has to wait 4 cycles before it has that value.
> Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
No, they aren't. The register file still exists, even though register renaming means which physical register corresponds to a logical register can change. And there's no VM, most common instructions are decoded directly (without going through microcode) into a single µOp or pair of µOps which is executed directly.
> To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
It's the opposite: these optimizations are more important nowadays, since memory speeds have not kept up with processor speeds, and power consumption became more relevant.
> To me security is more important than a 1% more boost in performance.
Newer programming languages agree with you, and do things like checking array bounds on every access; they rely on compiler optimizations so that the loss of performance is only that "1%".
Register allocation is one of the most basic optimizations that a compiler can do. Some modern cpus can alias stack memory with internal registers, but it is still not as fast as not spilling at all.
You can enjoy -O0 today and the compiler will happily allocate stack slots for all your variables and keep them up to date (which is useful for debugging). But the difference between -O0 and -O3 is orders of magnitude on many programs.
Many calling conventions use registers. And no loads and stores are extremely complex and not free at all: fewer can issue in each cycle and there's some very expensive hardware spent to maintain the ordering on execution.
In a real world program removing all UB is some cases impossible without adding new breaking features to the C language. But, taking a real world program and removingh all UB which IS possible to remove will introduce an overhead. In some programs this overhead is irrelevant. In others, it is probably the reason why C was picked.
If you want speed without overhead, you need to have more statically checked guarantees. This is what languages such as Rust attempt to achieve (quite successfully).
What Rust attempts to achieve is the possibility of accidentally introducing UB by designing the language in away that makes it impossible to have UB when sticking to the safe subset.
It also possibly to make sure to ensure that C programs have no UB and this does not require any breaking features to C. It usually requires some refactoring the program.
Why? It's 2024. Make it not be? Sure, some older stuff already written might no longer compile and need to be updated. Put it behind a "newer" standard flag/version or whatever.
Or is it that it can't be caught at compile time and only run time... hmm...
How interesting. GCC does indeed remove that branch.
https://godbolt.org/z/aPcr1bfPe
> For example, GCC will happily remove the dest == NULL branch in the following code
I think the blog should mention `-fno-delete-null-pointer-checks`
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...
> -fdelete-null-pointer-checks
> [...]
> This option is enabled by default on most targets.
What a footgun.
I understand that, in an effort to compete with other compilers for relevance, GCC pursued performance over safety. Has that era passed? Could GCC choose safer over fast?
Alternatively, has someone compiled a list of flags one might want to enable in latest GCC to avoid such kinds of dangerous optimizations?
Just for the record, that's not the main purpose of -fdelete-null-pointer-checks.
Normally, it only deletes null checks after actual null pointer dereferences. In principle this can't change observable behavior. Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null. In other words, unlike most C compiler optimizations, -fdelete-null-pointer-checks should be safe even if you do commit undefined behavior.
This once caused a kerfuffle with the Linux kernel. At the time, x86_64 CPUs allowed the kernel to dereference userspace addresses, and the kernel allowed userspace to map address 0. Therefore, it was possible for userspace to arrange for null pointers to not trap when dereferenced in the kernel. Which meant that the null check optimization could actually change observable behavior. Which introduced a security vulnerability. [1]
Since then, Linux has been compiled with `-fno-delete-null-pointer-checks`, but it's not really necessary: Linux systems have long since enforced that userspace can't map address 0, which means that deleting null pointer checks should be safe in both kernel and userspace. (Newer CPU security features also protect the kernel even if userspace is allowed to map address 0.)
But anyway, I didn't know that -fdelete-null-pointer-checks treated "memcpy with potentially-zero size" as a condition to remove subsequent null pointer checks. That means that the optimization actually isn't safe! Once GCC is updated to respect the newly well-defined behavior, though, it should become truly safe. Probably.
The same can't be said for most UB optimizations – most of which can't be turned off.
[1] https://lwn.net/Articles/342330/
> Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null.
<laughs in embedded-system-with-no-MMU>
Usually, when one marks an argument as nonnull via a function attribute, one wants NULL checks to be removed.
Irrelevant, because delete-null-pointer-checks happens even in absence of nonnull function attribute, see GP's godbolt link, and the documentation that omits any reference to that function attribute.
That's what makes it dangerous!
That is a side effect of passing the pointer as a function parameter marked nonnull. It implies that the pointer is nonnull and any NULL checks against it can be removed. Pass it to a normal function and you will not see the NULL check removed.
There's two similar but distinct function attributes for nullability. One affects codegen, one affects diagnostics only.
Which are those? I only know about nonnull, nonnull_if_nonzero and returns_nonnull:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...
Explanation for the above: passing NULL as the destination argument to memcpy() is undefined behaviour at present. gcc assumes that the fact that memcpy() is called therefore means that the destination argument can't be NULL, so "knows" that the dest == NULL check can never be true, and so removes the test and the do_thing1() branch entirely.
Interestingly, replacing len in the memcpy() call results in gcc instead removing the memcpy() call and retaining the check - presumably a different optimisation routine decides that it's a no-op in that case. https://godbolt.org/z/cPdx6v13r is, therefore, interesting - despite this only ever calling test() with a len of 0, the elision of the dest == NULL check is still there, but test() has been inlined without the memcpy (because len == 0) but with do_thing2() (because the behaviour is undefined and so it can assume dest isn't NULL even though there's a NULL literally right there!)
Fucking compilers, man.
How does gcc infer anything about memcpy? Can't I replace the c-library memcpy with my own, so how does it know that dest == NULL can never be true?
You can, but gcc may replace it with an equivalent set of instructions as a compiler optimization, so you would have no guarantee it is used unless you hack the compiler.
On a related note, GCC optimizing away things is a problem for memset when zeroing buffers containing sensitive data, as GCC can often tell that the buffers are going to be freed and thus the write is deemed unnecessary. That is a security issue and has to be resolved by breaking the compiler’s optimization through a clever trick:
https://github.com/openzfs/zfs/commit/d634d20d1be31dfa8cf06e... 12352
Similarly, GCC may delete a memcpy to a buffer about to be freed, although I have never observed that as you generally don’t do that in production code.
Per ISO C, the identifiers declared or defined with external linkage by any C standard library header are considered reserved, so the moment you define your own memcpy, you're already in UB land.
The valid inputs to memcpy() are defined by the C specification, so the compiler is free to make assumptions about what valid inputs are even if the library implementation chooses to allow a broader range of inputs
Many standard C functions are treated as “magic” by compilers. Malloc is treated as if it has no side effects (which of course it does, it changes allocator state) so the optimiser can elide allocations. If not you wouldn’t be able to elide the call because malloc looks like it has side effects, which it does but not ones we care about observing.
Not only that, malloc is also assumed to return pointer that don't alias anything else.
If I'm understanding the OP correctly, the C standard says so, i.e. the semantics of memcpy are defined by the standard and the standard says that it's UB to pass NULL.
Unlike all the more complicated languages the "freestanding" mode C doesn't even have a memcpy feature, so it may not define how one works - maybe you've decided to use the name "memcpy" for your function which generates a memorandum about large South American rodents, and "memo_capybara" was too much typing.
In something like C++ or Rust, even their bare metal "What do you mean Operating System?" modes quietly require memcpy and so on because we're not savages, clearly somebody should provide a way to copy bytes of memory, Rust is so civilised that even on bare metal (in Rust's "core" library) you get a working sort_unstable() for your arbitrary slice types!
The compiler is free to give a meaning to memcpy if run in the (default) hosted mode. There's -ffreestanding for freestanding environments.
Right, though I guess I wasn't clear enough about that for the down voters, but whatever.
If you do so you have to add -fno-builtins (or just -fno-builtin-memcpy).
> Fucking compilers, man.
They're just acting as agents that derive the logical consequences of the code.
The fact that the given example code is "surprising" is analogous to this mathematical derivation:
The source of truth about what is/isn't allowed is the C standard, not your personal simplified model of it that may contain dangerous misconceptions. The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.> They're just acting as agents that derive the logical consequences of the code.
In a particularly pedantic, uptight, and sometimes un-helpful way, yes.
Compilers don't have to be designed this way; in fact it is a relatively recent development in the history of such tools.
> The fact that your mental model doesn't match the document is an education problem, not a problem with the compiler.
Or it is a problem with the document, which is the entire reason we are having this discussion: N3322 argued the document should be fixed, and now it will be for C2y.
> However, the most vocal opposition came from a static analysis perspective: Making null pointers well-defined for zero length means that static analyzers can no longer unconditionally report NULL being passed to functions like memcpy—they also need to take the length into account now.
How does this make any sense? We don't want to remove a low hanging footgun because static analyzers can no longer detect it?
My understanding is that with this change, static analyzers have three options:
1. False positive on code that would have been an issue previously
2. False negative on a ton of similar footguns
3. Add complexity to differentiate between these cases
None of these options are fun.
Yes, but that tradeoff exists for most things those tools do. If you can easily and perfectly detect an error, it should just go into the compiler (and perhaps language spec).
I just skimmed through the proposed wording in [N3322]. It looks like it silently fixes a defect too, NULL == NULL was also undefined up until C23. Hilarious.
[N3322] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3322.pdf
This is probably related to the issue with NULL - NULL mentioned in the article.
Imagine you’re working in real mode on x86, in the compact or large memory model[1]. This means that a data pointer is basically struct{uint16_t off,seg;} encoding linear address (seg<<4)+off. This makes it annoying to have individual allocations (“objects”) >64K in size (because of the weird carries), so these models don’t allow that. (The huge model does, and it’s significantly slower.) Thus you legitimately have sizeof(size_t) == 2 but sizeof(uintptr_t) == 4 (hi Rust), and God help you if you compare or subtract pointers not within the same allocation. [Also, sizeof(void *) == 4 but sizeof(void (*)(void)) == 2 in the compact model, and the other way around in the medium model.]
Note the addressing scheme is non-bijective. The C standard is generally careful not to require the implementation to canonicalize pointers: if, say, char a[16] happens to be immediately followed by int b[8], an independently declared variable, it may well be that &a+16 (legal “one past” pointer) is {16,1} but &b is {0,2}, which refers to the exact same byte, but the compiler doesn’t have to do anything special because dereferencing &a+16 is UB (duh) and comparing (char *)(&a+16) with (char *)&b or subtracting one from the other is also UB (pointers to different objects).
The issue with NULL == NULL and also with NULL - NULL is that now the null pointer is required to be canonical, or these expressions must canonicalize their operands. I don’t know why you’d ever make an implementation that has non-canonical NULLs, but I guess the text prior to this change allowed such.
[1] https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=10...
> now the null pointer is required to be canonical
Yikes! This particular oddity seems annoying but sort of harmless in x86 real mode, but not necessarily in protected mode. Imagine code that wants to load a pointer into a register: it loads the offset into an ordinary register and the selector portion into a segment register. It’s permissible to load the 0 (null) selector, but loading garbage will fault immediately. So, if you allow non canonical NULL, then knowing that a pointer is either valid or NULL does not allow you to hoist a segment load above a condition that might mean you never actually dereference the pointer.
(I have plenty of experience with low-level OS code in all kinds of nasty x86 modes but, thankfully, not so much experience writing ordinary C code targeting protected mode. It sometimes boggles my mind that anyone ever got decent performance with anything involving far data pointers. Segment loads are slow, and there are not a lot of segment registers to go around.)
In real mode assembly days, ES and sometimes DS were just another base register that you could use in a loop. Given the dearth of addressing modes it was quite nice to assume that large arrays started at xxxx0h and therefore that the offset part of the far pointer was zero.
If so, it's one that's been introduced at some point post C99 -- the C99 spec explicitly defines the behaviour of NULL == NULL. Section 6.5.9 para 6 says "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object [etc etc]".
I don't imagine NULL is defined as "pointing to an object", so I don't expect that clause to apply.
You completely skipped over the first part: "Two pointers compare equal if and only if both are null pointers"
NULL == NULL was already defined -- but NULL <= NULL wasn't :)
Cannot find any confirmation to your statement. Otoh "All null pointer values (of compatible typewithin the same address space) are already required to compare equal. " in the limked paper.
NULL is not single type in any conventional sense (and is actually tricky to define in a way that makes it usable in the way most programmers expect).
Thus:
I feel like I've misunderstood something here... shouldn't memcpy(anything, anything, 0) just do nothing, because you're copying 0 bytes?
That's a reasonable intuitive interpretation of how it should behave, but according to the spec it's undefined behaviour and compilers have a great degree of freedom in what happens as a result.
More information on this behavior in the link below.
> Note that, apart from contrived examples with deleted null checks, the current rules do not actually help the compiler meaningfully optimize code. A memcpy implementation cannot rely on pointer validity to speculatively read because, even though memcpy(NULL, NULL, 0) is undefined, slices at the end of a buffer are fine. [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
https://davidben.net/2024/01/15/empty-slices.html
> [And if the end of the buffer] were at the end of a page with nothing allocated afterwards, a speculative read from memcpy would break
‘Only’ on platforms that have memory protection hardware. Even there, the platform can always allocate an overflow page for a process, or have the page fault handler check whether the page fault happened due to a speculative read, and repair things (I think the latter is hugely, hugely, hugely impractical, but the standard cannot rule it out)
Platforms without memory protection hardware also have no problem reading NULL.
My comment is a reply to (part of) a comment that isn’t talking about reading from NULL. That’s what the [And if the end of the buffer] part implies.
Even if it didn’t, I don’t think the standard should assume that “Platforms without memory protection hardware also have no problem reading NULL”
An OS could, for example, have a very simple memory protection feature where the bottom half of the memory address range is reserved for the OS, the top half for user processes, and any read from an address with the high bit clear by code in the top half of the address range traps and makes the OS kill the process doing the read.
Doesn't it take memory protection hardware to trap on a memory read?
They may also expect writes to address 0.
Not really. MMIO mapped at 0x0 for example.
Yikes! I would love sipping coffee watching the chief architect chew up whoever suggested that. That sounds awful even on a microcontroller.
AVR’s registers are mapped to address 0. So reading and writing NULL is actually modifying r0.
AVR’s r0 is also a totally normal register, unlike most other RISC which typically have r0 == 0.
On s390 the memory at address 0 (low core) has all sorts of important stuff. Of course s390 has paging enabled pretty much always but still...
What does "speculative" mean in this case? I understand it as CPU-level speculative execution a.k.a. branch mis-prediction, but that shouldn't have any real-world effects (or else we'd have segfaults all the time due to executing code that didn't really happen)
Turns out you can have that kind of speculative failure too! https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...
I feel strongly they should split undefined behavior in behavior that is not defined, and things that the compiler is allowed to assume. The former basically already exists as "implementation defined behavior". The latter should be written out explicitly in the documentation:
> memcpy(dest, src, count)
> Copies count bytes from src to dest. [...] Note this is not a plain function, but a special form that applies the constraints dest != NULL and src != NULL to the surrounding scope. Equivalent to:
The conflation of both concepts breaks the mental model of many programmers, especially ones who learned C/C++ in the 90s where it was common to write very different code, with all kinds of now illegal things like type punning and checking this != NULL.I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-assembler" combined with the above `assume` function or some other syntax to let me help the compiler, so that I can write C like in the 90s - close to metal but with less surprizes.
>Note this is not a plain function, but a special form that applies the constraints dest != NULL and src != NULL to the surrounding scope.
Plain functions can apply constraints to the surrounding code:
https://godbolt.org/z/fP58WGz9f
Why didn't they just... define it, back when they wrote it?
When C was conceived, CPU architectures and platforms were more varied than what we see today. In order to remain portable and yet performant, some details were left as either implementation defined, or completely undefined (i.e. the responsibility of the programmer). Seems archaic today, but it was necessary when C compilers had to be two-pass and run in mere kilobytes of RAM. Even warnings for risky and undefined behavior is a relatively modern concept (last 10-20 years) compared to the age of C.
When C was conceived, it was made for a specific DEC CPU, for making an operating system. The idea of a C standard was in the future.
If you wanted to know what (for instance) memcpy actually did, you looked at the source code, or even more likely, the assembler or machine code output. That was "the standard".
I think it's reasonable to assume that GP clearly meant the C standard being conceived, as, obviously, K&R's C implementation of the language was ad hoc rather than exhibiting any prescribed specification.
No, K&R's book was the standard.
First came the language, then a few years later they described it in a book.
> Seems archaic today ... run in mere kilobytes of RAM
There is an entire industry that does pretty much that... today. They might run in flash instead of RAM, but still, a few kilobytes.
Probably there are more embedded devices out there than PCs. PIC, AVR, MSP, ARM, custom archs. There might be one of those right now under your hand, in that thing you use to move the cursor.
> There is an entire industry that does pretty much that... today.
Which industry runs C compilers on embeded devices? Because that is what the part you elipsised out was talking about.
many do tho. i have targetted c89 and maybe c99 on several embedded devices
But you're running the compiler on the device rather than cross-compile?
They cross compile. No one is compiling code on these machines.
Oh... yes. You are right. My bad.
From what I understand:
1. Initially, they just wanted to give compiler makers more freedom: both in the sense "do whatever is simplest" and "do something platform-specific which dev wants". 2. Compiler devs found that they can use UB for optimization: e.g. if we assume that a branch with UB is unreachable we can generate more efficient code. 3. Sadly, compiler devs started to exploit every opportunity for optimization, e.g. removing code with a potential segfault.
I.e. people who made a standard thought that compiler would remove no-op call to memcpy, but GCC removes the whole branch which makes the call as it considers the whole branch impossible. Standard makers thought that compiler devs would be more reasonable
> Standard makers thought that compiler devs would be more reasonable
This is a bit of a terrible take? Compiler devs never did anything "unreasonable", they didn't sit down and go "mwahahaha we can exploit the heck out of UB to break everything!!!!"
Rather, repeatedly applying a series of targeted optimizations, each one in isolation being "reasonable", results in an eventual "unreasonable" total transformation. But this is more an emergent property of modern compilers having hundreds of optimization passes.
At the time the standards were created, the idea of compilers applying so many optimization passes was just not conceivable. Compilers struggled to just do basic compilation. The assumption was a near 1:1 mapping between code & assembly, and that just didn't age well at all.
One could argue that "optimizing based on signed overflow" was an unreasonable step to take, since any given platform will have some sane, consistent behavior when the underlying instructions cause an overflow. A developer using signed operations without poring over the standard might have easily expected incorrect values (or maybe a trap if the platform likes to use those), but not big changes in control flow. In my experience, signed overflow is generally the biggest cause of "they're putting UB in my reasonable C code!", followed by the rules against type punning, which are violated every day by ordinary usage of the POSIX socket functions.
I started to like signed overflow rules, because it is really easy to find problems using sanitizers.
The strict aliasing rules are not violated by typical POSIX socket code as a cast to a different pointer type, i.e. `struct sockaddr` by itself is well-defined behavior. (and POSIX could of course just define something even if ISO C leaves it undefined, but I don't think this is needed here)
> One could argue that "optimizing based on signed overflow" was an unreasonable step to take
That optimization allows using 64-bit registers / offset loads for signed ints which it can't do if it has to overflow, since that overflow must happen at 32-bits. That's not an uncommon thing.
There isn't a "find UB branches" pass that is seeking out this stuff.
Instead what happens is that you have something like a constant folding or value constraint pass that computes a set of possible values that a variable can hold at various program points by applying constraints of various options. Then you have a dead code elimination pass that identifies dead branches. This pass doesn't know why the "dest" variable can't hold the NULL value at the branch. It just knows that it can't, so it kills the branch.
Imagine the following code:
Can the compiler eliminate the branch? Of course. All that's happened here is that the constraint propagation feels "reasonable" to you in this case and "unreasonable" to you in the memcpy case.Why is it allowed to eliminate the branch? In most architectures abs(INT_MIN) returns INT_MIN which is negative
Calling abs(INT_MIN) on twos-complement machine is not allowed by the C standard. The behavior of abs() is undefined if the result would not fit in the return value.
It's possible that there is an edge case in the output bounds here. I'm just using it as an example.
Replace it with "int x = foo() ? 1 : 2;" if you want.
I didn't believe this so I looked it up, and yup.
Because of 2's complement limitations, abs(INT_MIN) can't actually be represented and it ends up returning INT_MIN.
More reasonable: Emit a warning or error to make the code and human writing it better.
NOT-reasonable: silently 'optimize' a 'gotcha' into behavior the programmer(s) didn't intend.
NOT-reasonable: expecting the compiler to read the programmer's mind.
Charitable interpretation may be: Back then when the contract of this function was standardized, presumably in C89 which is ~35 years ago, CPUs but also C compilers were not as powerful so wasting an extra couple of CPU cycles to check this condition was much more expensive than it is today. Because of that contract, and which can be seen in the example in the below comments, the compiler is also free to eliminate the dead code which also has the effect of shaving off some extra CPU cycles.
Back when they wrote it they were trying to accommodate existing compilers, including those who did useful things to help people catch errors in their programs (e.g. making memcpy trap and send a signal if you called it with NULL). The current generation of compilers that use undefined behaviour as an excuse to do horrible things that screw over regular programmers but increase performance on microbenchmarks postdates the standard.
Because the benefit was probably seen as very little, and the cost significant.
When you're writing a compiler for an architecture where every byte counts you don't make it write extra code for little benefit.
Programmers were routinely counting bytes (both in code size and data) when writing Assembly code back then, and I mean that literally. Some of that carried into higher-level languages, and rightly so.
memcpy used to be a rep movsb on 8086 DOS compilers. I don't remember if rep movsb stops if cx=0 on entry, or decrements first and wraps around, copying 64K of data.
The specification does not explicitly say that, but the clear intention is that REP with CX=0 should be no-op (you get exactly that situation when REP gets interrupted during the last iteration, in that case CX is zero and IP points to the REP, not the following instruction).
Rep movsb copies 64K if CX=0 (that's actually very useful), but memcpy could be implemented as two instructions:
I know at least MSVC's memcpy on x86_64 still results in a rep movsb if the cpuid flag that says rep movsb is fast is set, which it should be on all x86 chips from about 2011/2012 and onward ;)
The original C standard was more descriptive than prescriptive. There was probably an implementation where it crashed or misbehaved.
Probably because they did not think of this special case when writing the standard, or did not find it important enough to consider complicating the standard text for.
In C89, there's just a general provision for all standard library functions:
> Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow. If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined. [...]
And then there isn't anything on `memcpy` that would explicitly state otherwise. Later versions of the standard explicitly clarified that this requirement applies even to size 0, but at that point it was only a clarification of an existing requirement from the earlier standard.
People like to read a lot more intention into the standard than is reasonable. Lots of it is just historical accident, really.
Every time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior, and to allow compilers to use it as an optimization point
> time they leave something undefined, they do so to leave implementations free to use the underlying platform's default behavior
That's implementation defined (more or less) ie teh compiler can do whatever makes mst sense for its implementation.
Undefined means (more or less) that the compiler can assume the behaviour never happens so can apply transforms without taking it into account.
> to allow compilers to use it as an optimization point
That's the main advantage of undefined behaviour ie if you can ignore the usage, you may be able to apply optimisations that you couldn't if you had to take it into account. In the article, for example, GCC eliminated what it considered dead code for a NULL check of a variable that couldn't be NULL according to the C spec.
That's also probably the most frustrating thing about optimisations based on undefined behaviour ie checks that prevent undefined behaviour are removed because the compiler thinks that the check can't ever succeed because, if it did, there must have been undefined behaviour. But the way the developer was ensuring defined behaviour was with the check!
AFAIK, something having undefined behavior in the spec does not prevent an implementation- (platform-)specific behavior being defined.
As to your point about checks being erased, that generally happens when the checks happen too late (according to the compiler), or in a wrong way. For example, checking that `src` is not NULL _after_ memcpy(sec, dst, 0) is called. Or, checking for overflow by doing `if(x+y<0) ...` when x and y are nonnegative signed ints.
Here it's more that it allows to assume that this is never the case, thus no need to have an additional check in it I assume ?
I mean, they might not have given thought to that particular corner case, they probably wrote something like
> memcpy(void* ptr1, void* ptr2, int n)
Copy n bytes from ptr1 to ptr2. UNDEFINED if ptr1 is NULL or ptr2 is NULL
‐------
It might also have come from a "explicit better than implicit" opinion, as in "it is better to have developers explicitly handle cases where the null pointer is involved
I think it's more a strategy. C was not created to be safe. It's pretty much a tiny wrapper around assembler. Every limitation requires extra cycles, compile time or runtime, both of which were scarce.
Of course, someone needs to check in the layers of abstraction. The user, programmer, compiler, cpu, architecture.. They chose for the programmer, who like to call themselves "engineers" these days.
I disagree with your premise. C was designed to be a high level (for its time) language, abstracted from actual hardware
>It's pretty much a tiny wrapper around assembler
Assebler has zero problem with adding "null + 4" or computing "null-null". C does, because it's not actually a tiny wrapper.
Not sure what your last remark means wrt everything else.
I get that for the library. But I'm a bit puzzled about the optimizations done by a compiler based on this behavior.
E.g., suppose we patch GCC to preserve any conditional containing the string 'NULL' in it. Would that have a measurable performance impact on Linux/Chromium/Firefox?
Upon which some people may rely...
People will only rely on UB when it is well defined by a particular implementation, either explicitly or because of a long history of past use. E.g. using unions for type punning in gcc, or allowing methods to be called on null pointers in MSVC.
But there's nothing like that here.
A trivial implementation wouldn't dereference dest or src in case the length is 0. That's how a student would write it with a for loop (byte-by-byte copy). A non-trivial implementation might do something with the pointers before entering the copy loop.
I have asked this question in the past and was told that memcpy() is allowed to preemptively read before it has determined it needs to write to make it faster on some CPUs. The presumption is that if you are going to be copying data, there is at least one cache line there already, so reading can start early.
It does nothing, but is only defined when the pointers point into or one past the end of valid objects (live allocations), because that's how the standard defines the C VM, in terms of objects, not a flat byte array.
What if the objects are non-NULL, but invalid (not actually allocated)?
For example, Rust will use address 1 with length 0 for static empty strings, because 1 is a properly aligned non-null pointer.
I would imagine such strings end up being passed to C code sometimes, which may end up calling memcpy with a length of 0 on them.
> What if the objects are non-NULL, but invalid (not actually allocated)?
Still UB, since they're restricted pointers that must be valid to begin with.
This is wrong. If you do p=malloc(256), p+256 is valid even though it does not point to a valid address (it might be in an unmapped page; check out ElectricFence). Rust's non-null aligned other pointer is the same, memcpy can't assume it can be dereferenced if the size is zero. The standard text in the linked paper says the same.
also UB according to the spec, but LLVM is free to define it. e.g., clang often converts trivial C++ copy constructors to memcpy, which is UB for self-assignment, but I assume that's fine because the C++ front-end only targets LLVM, and LLVM presumably defines the behaviour to do what you'd expect.
Where I work, it is quite normal to link together C code compiled with GCC and Rust code compiled with LLVM, due to how the build system is set up.
As far as I know that disables LTO, but the build system is so complex, and the C code so large, that nobody bothers switching the C side to Clang/LLVM as well.
Still technically UB according to the proposed wording. The proposed wording only deals with allowing null pointers explicitly.
Purely mechanically, yes, but in terms of the definition of the behaviour in the C abstract machine, no, because certain operations on null pointers are undefined, even if the obvious low-level compilation turns into nothing.
Maybe we should get rid of "abstract machine" and treat pointers as memory addresses?
If you do this, your C code will run significantly slower than, say, Java, Go, or C#, because the compiler is unable to apply even the most basic optimizations (which it can do still in all those other languages).
So, at that point why even use C at all? Today, C is used where the overhead of a managed language is unacceptable. If you could just eat the performance cost, you'd probably already be using a managed language. There's not much desire for a variant of C with what would be at least a 10x slowdown in many workloads.
Or it could be made faster because certain manual optimizations become possible.
An example would a table of interned strings that you wanna match against (say you're writing a parser). Since standard C says thou shall not compare pointers with < or > unless they both point into the same 'object' you are forbidden from doing the speed of light code:
Official standard sanctioned workarounds would require extra indirection (using indices for example) which is suboptimal.You can cast them to uintptr_t and compare them to your heart's desire.
To elaborate, we treat pointers as more than just integers because it gives optimizers the latitude to reorder and eliminate pointer operations. In the example above we cannot do this, because we cannot prove at compile time that x doesn't live at the address returned by oracle.
For some high-quality further discussion, see Ralf Jung's series of blog posts starting with https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
edit: welp, if I read a few more lines into article I would see that it also tells it is undefined
to be clear, in my example the result of oracle() cannot possibly alias with 'x' in C or C++ (and in fact gcc will optimize accordingly). In a different language where addresses are mere integers, things would be more complicated.
The result of oracle can point to anything if you write it as return (int *)rand();
Note that rand() returns 32-bit value so you have to call it twice and merge the results to obtain a 64-bit pointer.
The numerical value returned by oracle might physically match the address of the stack slot for 'x', assuming that it exists, but it doesn't mean that, from a language point of view, it is a valid pointer.
If forging pointers had defined behaviour, it would be impossible to use the language sanely or perform any kind of optimization.
When using C, this can return anything (or crash of oracle function returns an invalid pointer, or rewrite its own code if the code section is writable). So if you get rid of "abstract machine", nothing changes - the program can return anything or crash.
The point is that the C standard does guarantee that the function returns 1 if the program is a valid C program - which means there is no UB.
For example: If the oracle function returns an invalid pointer, then dereferencing that pointer is UB, and therefore the program isn't a valid C program.
A conforming C compiler is allowed to emit that function to perform the write and then return the constant 1. Should that be allowed?
Is it allowed to return anything else in C? Is there anything in standard C that would allow oracle() to access memory address of x?
Sure different compilers might allow inlining assembly or some other ways to access x on previous stack perhaps but then it is not really "C"
That’s the point. C allows this function to be optimized to always return 1. A “pointers are addresses, just emit reads and writes and stop trying to be so clever” version of C would require x to be spilled to the stack, then the write, then reload x and return whatever it contained.
Then use the register keyword or just reword the standard to assume the register behavior if a variables address hasn't been taken.
The majority of useful optimizations can be kept in a "Sane C" with either code style changes (cache stuff in local vars to avoid aliasing for example) or with minor tweaks to the standard.
Register behavior is what you want essentially all of the time. So we’d just have to write `register` all over the place for no gain.
“Don’t optimize this, read and write it even if you think it’s not necessary” is a very rare case so it shouldn’t be the default. If you want it, use the volatile keyword.
There’s no need to reword the standard to assume the register behavior if the variable’s address hasn’t been taken. That’s already how it works. In this example, if you escape the value of `&x`, it’s not legal to optimize this function to always return 1.
Well even in C is not guaranteed to return anything other than 1, since oracle() may return the memory address of variable 1.
the literal 1 is not an object in C or C++ hence it does not have an address. If you meant 'x', then also no, oracle() can't return the address of 'x' because of pointer provenance rules.
That would restrict C to memory models with a linear address space. That is usually the case nowadays for C implementations, but maybe we don’t want to set that in stone, because it would be virtually impossible to revert such a guarantee.
There’s also cases like memory address ranges that map to non-memory hardware (i.e. that don’t behave like “dumb” memory), and how would you have the C standard define behavior for those?
Lastly, CPU caches require some sort of abstract model as soon as you have multi-threading.
The value of an abstract machine is that it allows you to specify how a given program behaves without needing to point to a specific piece of hardware. Compilers then have this as a target when compiling a program for a specific piece of hardware so that they know when the compiler's output is correct.
The issue here is that the abstract machine is under or badly specified.
20 years ago, making a C compiler that provided sane behaviour and better guarantees (going beyond the minimum defined in the standard) to make code safer and programmers' lives easier, even at the cost of some performance, might have been a good idea. Today any programmer who thinks things like not having security bugs are more important than having bigger numbers on microbenchmarks has already moved on from C.
This is certainly not true. Many programmers also learned to the use tools available to write reasonably safe code in C. I do not personally find this problematic.
> safe code in C
You're like a Japanese holdout in the 60s refusing to leave his bunker long after the war is over.
C lost. Memory safety is a huge boon for security. Human beings, even the best of them, cannot consistently write correct C code. (Look at OpenBSD.) You can keep fighting the war your side has already lost or you can move on.
Use a sound static analyzer like astree and you can produce memory safe C code:
https://www.absint.com/astree/index.htm
Note that the key word here is sound. The more common static analyzers are unsound tools that will miss cases. Sound tools do not, but few people know of them, they are rare and they are typically proprietary and expensive.
Sure. I'm also a big fan of what Microsoft has done with SAL. And of course you have formally proven C, as used in seL4. I'd say that the contortions you have to go through to write code with these systems takes you out of the domain of "C" and into a domain of a different, safer language merely resembling C. Such a language might be a fine tool! But it's not arbitrary C.
Well, memory safety is great but it seems Rust programmers also manage to create memory safety issues just fine:
https://rustsec.org/advisories/RUSTSEC-2024-0401.html https://rustsec.org/advisories/RUSTSEC-2024-0400.html https://rustsec.org/advisories/RUSTSEC-2024-0377.html https://rustsec.org/advisories/RUSTSEC-2024-0374.html etc.
I think the first one, stack overflow, is technically not a memory safety issue, just denial-of-service on resource exhaustion. Stack overflow is well defined as far as I know.
The other three are definitely memory safety issues.
I would consider a stack overflow to be a memory safety issue. The C++ language authors likely would too. C++ famously refused to support variable length stack allocated arrays because of memory safety concerns. In specific, they were worried that code at runtime would make an array so big so big that it would jump the OS guard page, allowing access to unallocated memory that of course is not noticed ahead of time during development. This is probably easy to do unintentionally if you have more stack variables after an enormous stack allocated array and touch them before you touch the array. The alternative is to force developers to use compiler extensions such as alloca(). That makes it easy to pass pointers outside of the stack frame where they are valid and is a definite safety issue. The C++ nitpicking over variable length arrays is silly since it gives us a status quo where C++ developers use alloca() anyway, but it shows that stack overflows are considered a memory safety issue.
In the general case, I think you might be right, although it's a bit mitigated by the fact that Rust does not have support for variable length arrays, alloca, or anything that uses them, in the standard library. As you said though, it's certainly possible.
I was more referring to that specific linked advisory, which is unlikely to use either VLAs or alloca. In that case, where stack overflow would be caused by recursion, a guard frame will always be enough to catch it, and will result in a safe abort [0].
[0] https://github.com/rust-lang/rust/pull/31333
C++ is a better unsafe language than unsafe Rust, IMHO. The thing about the social dynamic of Rust, though, is that it keeps unsafe code to a minimum.
How would you define what a memory address is without first defining in which context it has a meaning?
C was written as a portable assembly language, so I think a memory address is a number that CPU considers to be a memory address.
> I think a memory address is a number that CPU considers to be a memory address
I meant to say that, indeed, there must be some concept of CPU for a memory address to have a meaning, and for this concept of CPU to be as widely applicable as possible, surely defining it as abstract as possible is the way to go. Ergo, the idea of a C abstract machine.
Anyway, other people in this thread are discussing the matter more accurately and in more details than I could hope to do, so I'll leave it like that.
That’s currently the case in C, in that you can convert pointers to and from uintptr_t. However, not every number representable in that type needs to be valid memory (that’s true on the assembly level as well), hence it’s only defined for valid pointers.
Congratulations, you've invented an entirely new language.
Now, who's going to write the compiler for it?
No, it's C at -O0.
No, it's not.
Undefined behaviour is undefined behaviour whatever optimisation level you use.
Some -f flags may extend the C standard and remove undefined behaviour in some cases (e.g. strict aliasing, signed integer overflow, writable string constants, etc.)
Yes and no.
No, because ISO never said it must behave this way.
Yes, because every libc I've personally encountered acts this way. At a glance, glibc's x86 implementation[1, 2], musl, and picolibc all handle 0-length memcpy as you'd expect. I'm sure other folks could dig up the code for Newlib, uclibc, and others, and they'd see the same thing.
On a related note, ISO C has THREE different things that most people tend to lump together as "undefined behavior." They are:
Implementation-defined behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently apply a particular behavior, and document that behavior.
Unspecified behavior: ISO doesn't require any particular behavior, but they do require implementations to consistently use a particular behavior, but they don't require that behavior to be documented.
Undefined behavior: ISO doesn't require any particular behavior, and they don't require implementations to define any particular behavior either.
[1]: https://github.com/lattera/glibc/blob/master/string/memcpy.c [2]: https://github.com/lattera/glibc/blob/895ef79e04a953cac14938...
"man bcopy" on BSD:
'If len is zero, no bytes are copied.'
Seems reasonable.
Isn't it more sensible to just check that the params that are about to be sent to memcpy be reasonable?
That is why I tend to wrap my system calls with my own internal function (which can be inlined in certain PLs), where I can standardize such tests. Otherwise, the resulting code that performs the checks and does the requisite error handling is bloated.
Note that I am also loath to #DEFINE such code because C is already rife with them and my perspective is that the less of them the better.
At the end of the day, quick and dirty fixes will prove the adage "short cuts make long delays", and OpenBSD's approach is the only really viable long-term solution, where you just have to rewrite your code if it has ill-advised constructs.
For designing libraries such as C's stdlib, I don't believe in 'undefined behavior', clearly define your semantics and say, "If you pass a NULL to memcpy, this is what will happen." Same for providing a (n == 0), or should (src == dst).
And if, for some strange reason, fixing the semantics breaks calling code, then I can't imagine that their code wasn't f_cked in the first place.
> internal function
every time you introduce something nonstandard, you add one little hardship to anyone trying to read or modify your code.
if a programmer is familiar with the language, it's standard library, and the normal idioms, then they should be able to just jump in.
As the article points out, all major memcpy implementations already do this check inside memcpy. Sure, the caller can also check, but given that it's both redundant in practice and makes some common patterns harder to use than they would otherwise be, there's no reason to not just standardize what's already happening anyway and make everyone's lives easier in the process.
Only about 1000 more functions to do this to.
Well, that seems like something that should have been there from the beginning .
>On the one hand, UB can be important for compiler optimizations
e.g?
Generally, undefined behavior removes the need for systematically checking for special cases, the most common being out of bounds access.
But it can go further than that. Dereferencing a NULL pointer is undefined behavior, so if a pointer is dereferenced, it can be assumed by the compiler not to be NULL and the code can be optimized. For example:
can be optimized to: Note that static analyzers will most likely issue a warning here as such a trivial case is most likely a mistake. But the check for NULL may be part of an inline function that is used in many places, and thanks to the undefined behavior, the code that handles the NULL case will only be generated when relevant. The problem, of course, is that it assumes that the programmer knows what he is doing and doesn't make mistakes.In the case of memcpy(NULL, NULL, 0), there probably isn't much to gain making it undefined. It most likely doesn't help with the memcpy implementation (len=0 is a generally no-op), and inference based on the fact that the arguments can't be NULL is more likely to screw the programmer up than to improve performance.
But how much actual performance is gained here?
It depends on your CPU microarchitectural details, on the complexity and size of your binary executable and the workload of your binary.
So there's no universal answer to your question but it could very well be "much".
It all adds up. All those instructions you don't have to execute, especially memory access and cache misses from jumps, pipeline stalls from conditionals, not just from this optimization.
Imagine that you created a function GetPixel that reads an RGB pixel at a memory address, and which has a NULL check as a precondition.
If the compiler can "prove" that the pointer is not NULL it can (after inlining the call) remove 20 million checks for a 20 megapixel image.
The silly issue is the compiler using "you accessed it before" (aka "undefined behaviour") to "prove" that the pointer is not NULL.
But I can attest that avoiding 20 million such checks does indeed make a huge difference.
If we're inlining the call, then we can hoist the NULL check out of the loop. Now it's 1 check per 20 million operations. There's no need to eliminate it or have UB at that point.
Just make a non null checking version: GetPixelUnsafe() and let the responsibility onto the user to do the null check before the loop.
All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.
>All of these 'problems' have simple and straigtforward workarounds, I'm not convinced these UB are needed at all.
He gave you a simple and straightforward example, but that example may not be representative of a real world program where complex analysis leads to better performing code.
As a programmer, its far easier to just insert bounds checks everywhere, and trust the system to remove them when possible. This is what Rust does, and it safe. The problem isn't the compiler, the problem is the standard. More broadly, the standard wasn't written with optimizing compilers in mind.
That's a non solution for existing code that already calls GetPixel 20 million times.
It's not like I'm saying C is the best possible way to write new code.
I'm just commenting why this matters for performance, and “remove all undefined behavior" from C compilers is a non-starter.
Now go write Rust for all I care.
The simplest example of a compiler optimization enabled by UB would be the following:
The compiler can optimize that to: Because it's UB for another_function() to use an out-of-bounds pointer to access the stack of my_function() and modify the value of x.And the most important example of a compiler optimization enabled by UB is related to that: being UB to access local variables through out-of-bounds pointers allows the compiler to place them in registers, instead of being forced to go through the stack for every operation.
I don't find those compelling reasons and, to the contrary, I think that kind of semantic circumvention to be a symptom of a poorly developed industry.
How can we have properly functioning programs without clearly-defined, and sensible, semantics?
If the developer needs to use registers, then they should choose a dev env/PL that provides them, otherwise such kludges will crash and burn, IMO.
Are you saying that C compilers should change every local variable access to read and write to the stack just in case some function intentionally does weird pointer arithmetic to change their values without referring to them in the source code?
We stopped explicitly declaring locals with the 'register' keyword circa 40 years ago. Register allocation is a low hanging fruit and one of those things that is definitely best left to a compiler for most code.
And now they have to manage register pressure for it to keep being faster. And false dependencies. And some more. It doesn’t work like that. Developers can’t optimize like compilers do, not with modern CPUs. The compilers do the very heavy lifting in exchange for the complexity of a set of constraints they (and you as a consequence, must) rely on. The more relaxed these constraints are, the less performant code you get. Modern CPUs run modern interpreters as fast as dumbest-compiled C code basically, so if you want sensible semantics, then Typescript is one of the absolutely non-ironic answers.
We pay for the flexibility of not wearing seatbelts for increasing the consequences of crashes.
You dont need UB for that.
A simple model for both compilers and programmers to understand:
"A variable whose address has not been taken need not be reachable via a random pointer".
I mean that's how an assembly programmer would think - if I put something in r0 I don't expect a store instruction to clobber it.
What you describe there is UB. If you define this in the standard, you are defining a kind of runtime behavior that can never happen in a well formed program and the compiler does not have to make a program that encounters this behavior do anything in particular.
Does this still matters today? I mean, first registers are anyway saved on the stack when calling a function, and caches of modern processors are really nearly as fast (if not as fast!) as a register. Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
Assuming that the program is "bug free" to me is a terrible idea, since even mitigations that the programmer puts in place to mitigate the effect of bugs (and no program is bug free) are skipped because the compiler can assume the program has no bug. To me security is more important than a 1% more boost in performance.
> I mean, first registers are anyway saved on the stack when calling a function
No, they aren't. For registers defined in the calling convention as "callee-saved", they don't have to be saved on the stack before calling a function (and the called function only has to save them if it actually uses that register). And for registers defined as "caller-saved", they only have to be saved if their value needs to be kept. The compiler knows all that, and tends to use caller-saved registers as scratch space (which doesn't have to be preserved), and callee-saved registers for longer-lived values.
> and caches of modern processors are really nearly as fast (if not as fast!) as a register.
No, they aren't. For instance, a quick web search tells me that the L1D cache for a modern AMD CPU has at least 4 cycles of latency. Which means: even if the value you want to read is already in the L1 cache, the processor has to wait 4 cycles before it has that value.
> Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.
No, they aren't. The register file still exists, even though register renaming means which physical register corresponds to a logical register can change. And there's no VM, most common instructions are decoded directly (without going through microcode) into a single µOp or pair of µOps which is executed directly.
> To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.
It's the opposite: these optimizations are more important nowadays, since memory speeds have not kept up with processor speeds, and power consumption became more relevant.
> To me security is more important than a 1% more boost in performance.
Newer programming languages agree with you, and do things like checking array bounds on every access; they rely on compiler optimizations so that the loss of performance is only that "1%".
Register allocation is one of the most basic optimizations that a compiler can do. Some modern cpus can alias stack memory with internal registers, but it is still not as fast as not spilling at all.
You can enjoy -O0 today and the compiler will happily allocate stack slots for all your variables and keep them up to date (which is useful for debugging). But the difference between -O0 and -O3 is orders of magnitude on many programs.
Many calling conventions use registers. And no loads and stores are extremely complex and not free at all: fewer can issue in each cycle and there's some very expensive hardware spent to maintain the ordering on execution.
This explanation of why signed int overflow is undefined is interesting (although the behaviour is still very annoying): https://kristerw.blogspot.com/2016/02/how-undefined-signed-o... (HN discussion: https://news.ycombinator.com/item?id=11146384)
More examples here: http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
In a real world program removing all UB is some cases impossible without adding new breaking features to the C language. But, taking a real world program and removingh all UB which IS possible to remove will introduce an overhead. In some programs this overhead is irrelevant. In others, it is probably the reason why C was picked.
If you want speed without overhead, you need to have more statically checked guarantees. This is what languages such as Rust attempt to achieve (quite successfully).
Many real world C programs have no UB.
What Rust attempts to achieve is the possibility of accidentally introducing UB by designing the language in away that makes it impossible to have UB when sticking to the safe subset.
It also possibly to make sure to ensure that C programs have no UB and this does not require any breaking features to C. It usually requires some refactoring the program.
The example in this blurb is a pretty good one: https://www.hboehm.info/c++mm/why_undef.html
> because NULL + 0 is undefined behavior in C.
Why? It's 2024. Make it not be? Sure, some older stuff already written might no longer compile and need to be updated. Put it behind a "newer" standard flag/version or whatever.
Or is it that it can't be caught at compile time and only run time... hmm...
They are making it not be. That’s the whole point of the article.