The examples aren't really undefined behavior. They are examples that could become UB based on input/circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def of UB in that language). I feel like c has enough actual rough edges that deserve attention that sensationalism like this muddies folks attention (particularly novices) and can end up doing more harm than good.
"STORAGE_ERROR This exception is raised in any of the following situations: (...) or during the execution of a subprogram call, if storage is not sufficient."
First, you can define what happens when stack space is exceeded. Second not all programs need an arbitrary amount of stack space, some only need a constant amount that can be calculated ahead of time. (And some languages don't use a stack at all in their implementations.)
Your language could also offer tools to probe how much stack space you have left, and make guarantees based on that. Or they could let you install some handlers for what to do when you run out of stack space.
How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences).
You end up building a foundation on quicksand. That's the danger of UB.
Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone).
In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past.
The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).
One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.
Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.
> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).
The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injectiong UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).
Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.
This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.
It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):
Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.
Yet, debugging memory corruption issues in C and C++ code with modern compiler toolchains and memory debugging tools is infinitely easier than 25 years ago.
(e.g. just compiling with address sanitizer and using static analyzers catch pretty much all of the 'trivial' memory corruption issues).
3.16 undefined behavior: Behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message).
Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph? The intent here is extremely clear, that undefined behavior means you're doing something not intended or specified by the language, but that the consequence of this should be somewhat bounded or as expected for the target machine. This is closer to our old school understanding of UB.
By 'bounded', this obviously ignores the security consequences of e.g. buffer overflows, but just because UB can be exploited doesn't mean it's appropriate for e.g. the compiler to exploit it too, that clearly violates the intent of this paragraph.
> Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph?
I've (fruitlessly) had this discussion on HN before - super-aggressive optimisations for diminishing rewards are the norm in modern compilers.
In old C compilers, dereferencing NULL was reliable - the code that dereferenced NULL will always be emitted. Now, dereferencing NULL is not reliable, because the compiler may remove that and the program may fail in ways not anticipated (i.e, no access is attempted to memory location 0).
The compiler authors are on the standard, and they tend to push for more cases of UB being added rather than removing what UB there is right now (for exampel, by replacing with Implementation Defined Behaviour).
Notice though "ignoring the situation" thru "documented manner characteristic of the environment". Even though truly you can read this in an uncharitable way, you could also try and understand the intent of this paragraph, and I think reading it for its intents is always the best way to interpret a language standard when the wording is ambiguous or soft, especially if you're writing a compiler.
I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5, just that it would be good to print a diagnostic if it can be detected, and if not to do what's "characteristic of the environment". Does that make sense?
Bounding UB would be a nice idea, or at least prohibiting time-traveling UB (and there is an effort in that direction). But properly specifing it is actually hard.
In C / C++ there are two kinds of undefined behaviour. One is where there is written in standard what UB is. Another one is everthing else that is not in standard.
The concept of undefined behaviour is also a very useful lens for understanding LLM-based coding. Anything you don't explicitly specify is undefined behavior, so if you don't want the LLM to potentially pick a ridiculous implementation for some aspect of an application, make sure to explicitly specify how it should be implemented.
Author, if you are reading this, please cite the spec section explaining that this is UB. Dereferencing the produced pointer may be UB, but casting itself is not, since uint8_t is ~ char and char* can be cast to and from any type.
you might try to argue that uint8_t is not necessarily char, and while it is true that implementations of C can exist where CHAR_BIT > 8, but those do not have uint8_t defined (as per spec), so if you have uint8_t, then it is "unsigned char", which makes this cast perfectly safe and defined as far as i can tell. Of course CHAR_BIT is required to be >= 8, so if it is not >8, it is exactly 8. (In any case, whether uint8_t is literally a typedef of unsigned char is implementation-defined and not actually relevant to whether the cast itself is valid -- it is)
The issue is not type punning (itself a very common source of UB), but the fact that the `bytes` pointer might not be int-aligned. The spec is clear that the creation (not just the dereferencing) of an unaligned pointer is UB, see 6.3.2.3 paragraph 7 of the C11 (draft) spec.
Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.
C of course is ancient. It remembers the Cambrian explosion of CPU architectures, twelve-bit bytes and everything like that. I wonder if it is possible to codify some pragmatic subset of it that works nicely on currently available CPUs. Cause the author of the piece goes back in time to prove his point (SPARCs and Alphas).
That cast is valid. Spec does not guarantee same bit sequence for resulting pointer and source pointer. But as the cast is explicitly allowed, it is not UB. Compiler is free to round the pointer down. Or up. Or even sideways. All ok. Dereferencing it — indeed not ok. But the cast is explicitly allowed and not UB.
Pointer casts changing pointer bit sequences is common on weird platforms (eg: some TI DSPs, PIC, and aarch64+PAC). And it is valid as per spec. Pointer assignment is not required to be the same as memcpy-ing the pointer unto a pointer to another type.
You misunderstood the spec. No promises are made that that cast copies the pointer bit for bit (and thus creates an invalid pointer). Therefore, your objection to invalid pointers is null and void. :)
I'm not assuming anything about bit representations. In this case, the spec language is quite clear and unambiguous.
6.3.2.3 paragraph 7: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[footnote 68]) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
This is a subsection of section 6.3 which describes conversions, which include both implicit and conversions from a cast operation. This language is not saying anything about bit representations or derefencing.
I happen to be wearing my undefined behavior shirt at the moment, which lends me an extra layer of authority. I'm at RustWeek in Utrecht, and it's one of my favorite shirts to wear at Rust conferences. But let's say for the sake of argument that you are right and I am indeed misunderstanding the spec. Then the logical conclusion is that it's very difficult for even experienced programmers to agree on basic interpretations of what is and what isn't UB in C.
I do not see there a promise that the cast will produce an invalid pointer, nor anything prohibiting the compiler from rounding the pointer down, thus producing a valid one. “Converted” does not require bit copy. I don’t see how this interpretation is against any section of the spec.
Anyone who uses the construction "C/C++" doesn't write modern C++, and probably isn't very familiar with the recent revisions despite TFA's claims of writing it every day for decades.
Far from being just "C with classes", modern C++ is very different than C. The language is huge and complex, for sure, but nobody is forced to use all of it.
No HN comment can possibly cover all the use cases of C++ but in general, unless you have a very good reason not to:
- eschewing boomer loops in favor of ranges
- using RAII with smart pointers
- move semantics
- using STL containers instead of raw arrays
- borrowing using spans and string views
These things go a long way towards, shall we say, "safe-ish" code without UB. It is not memory-safe enforced at the language level, like Rust, but the upshot is you never need to deal with the Rust community :^)
Although some people, like Bjarne Stroustrup, object to the term C/C++, it's a bit like Richard Stallman objecting to the term "Linux". The fact is it can mean "C or C++", and I wouldn't assume the author thinks they're the same, but they're talking about both of them together in the same sentence. This seems reasonable given this is about undefined behavior, and it's trivial to accidentally write UB-inducing code in C++ even with modern style (although I'd say you should catch most trivial cases with e.g. ubsan, and a lot of bad cases would be avoided with e.g. ranges, so I think the article is exaggerating the issue).
C/C++ is a perfectly fine term for C or C-style C++. The languages can be very close, and personally I prefer C-style C++ miles over some of the half-baked modern nonsense. I mean, I do use C++23 since it has some great additions, but I'm ditching like 90% of the stuff that only adds complexity without much benefit.
I totally agree that modern c++ is pretty robust if you are both a well seasoned developer and only stick to a very blessed subset of it's features and avoid the historical baggage.
However, that's obviously not the point? Ignoring the idea that people can/should just "git gud" and write perfect code in a language with lots of old traps, you can't control how everyone else writes their code, even on your own team once it gets big enough. And there will always be junior devs stumbling into the bear traps of c/c++ (even if the rest of the codebase is all modern c++). So no matter how many great new features get added to C++, until (never) they start taking away the bad ones, the danger inherent to writing in that language doesn't go away.
Also, safe != non-UB. TFA isn't so much about memory safety anyway.
"C/C++" is still a useful term for the common C/C++ subset :)
As far as stdlib usage is concerned: that's just your opinion. The stdlib has a lot of footguns and terrible design decisions too, e.g. std::vector pulling in 20k lines of code into each compilation unit is simply bizarre.
The 5 stages of learning about UB in C:
-Denial: "I know what signed overflow does on my machine."
-Anger: "This compiler is trash! why doesn't it just do what I say!?"
-Bargaining: "I'm submitting this proposal to wg14 to fix C..."
-Depression: "Can you rely on C code for anything?"
-Acceptance: "Just dont write UB."
The examples aren't really undefined behavior. They are examples that could become UB based on input/circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def of UB in that language). I feel like c has enough actual rough edges that deserve attention that sensationalism like this muddies folks attention (particularly novices) and can end up doing more harm than good.
Ada 83 has no UB on call stack overflow, from the reference manual :
http://archive.adaic.com/standards/83lrm/html/lrm-11-01.html
"STORAGE_ERROR This exception is raised in any of the following situations: (...) or during the execution of a subprogram call, if storage is not sufficient."
That's not true at all.
First, you can define what happens when stack space is exceeded. Second not all programs need an arbitrary amount of stack space, some only need a constant amount that can be calculated ahead of time. (And some languages don't use a stack at all in their implementations.)
Your language could also offer tools to probe how much stack space you have left, and make guarantees based on that. Or they could let you install some handlers for what to do when you run out of stack space.
UB based on input can be an exploit vector.
The examples are unequivocally UB. Full stop.
How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences).
You end up building a foundation on quicksand. That's the danger of UB.
> The examples are unequivocally UB. Full stop.
Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone).
In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past.
The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).
Yes, a crash is about the most benign UB: at least it's highly visible.
In worse scenarios, your programme will silently continue with garbage, or format your hard disk or give attackers the key to the kingdom.
One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.
Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.
> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).
The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injectiong UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).
Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.
This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.
As much as I agree with the intro, these examples aren't good and the overall article is just a veil for pushing LLM coding.
Not good how? Are they TRUE? If so that's super bad.
most languages don't even HAVE a specification so in most languages literally EVERYTHING everything is undefined behavior
> A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things.
The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.
LLM generated code will eventually contain UB.
EDIT: added "eventually"
It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):
https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1
Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.
> LLM generated code will eventually contain UB.
Yes.
Even in languages other than C (i.e. you will get behaviour that nothing in the input specified).
When LLMs generate code, all languages have UB.
That's a bit silly.
UB means literally no restrictions. So if you standard says 'you have to crash with an error message' that's already no longer UB.
Very bad advice. Of course good new LLM's know about UB, but you still need to use ubsan (ie - fsanitize=undefined), and not your LLM.
A fun one that'd fit list be sequence point violations like
Fun, sure, but also GCC and Clang will both warn with -Wall (-Wsequence-point / -Wunsequenced).
Debugging in C is soooo hard. When I was writing Malloc Lab in system course, there were uncountable undefined and out of range :(
Yet, debugging memory corruption issues in C and C++ code with modern compiler toolchains and memory debugging tools is infinitely easier than 25 years ago.
(e.g. just compiling with address sanitizer and using static analyzers catch pretty much all of the 'trivial' memory corruption issues).
From the ANSI C standard:
Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph? The intent here is extremely clear, that undefined behavior means you're doing something not intended or specified by the language, but that the consequence of this should be somewhat bounded or as expected for the target machine. This is closer to our old school understanding of UB.By 'bounded', this obviously ignores the security consequences of e.g. buffer overflows, but just because UB can be exploited doesn't mean it's appropriate for e.g. the compiler to exploit it too, that clearly violates the intent of this paragraph.
> Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph?
I've (fruitlessly) had this discussion on HN before - super-aggressive optimisations for diminishing rewards are the norm in modern compilers.
In old C compilers, dereferencing NULL was reliable - the code that dereferenced NULL will always be emitted. Now, dereferencing NULL is not reliable, because the compiler may remove that and the program may fail in ways not anticipated (i.e, no access is attempted to memory location 0).
The compiler authors are on the standard, and they tend to push for more cases of UB being added rather than removing what UB there is right now (for exampel, by replacing with Implementation Defined Behaviour).
> but that the consequence of this should be somewhat bounded or as expected for the target machine.
Aren't "unpredictable results" and "no requirements" contrary to the idea that the behavior would be "somewhat bounded"?
Notice though "ignoring the situation" thru "documented manner characteristic of the environment". Even though truly you can read this in an uncharitable way, you could also try and understand the intent of this paragraph, and I think reading it for its intents is always the best way to interpret a language standard when the wording is ambiguous or soft, especially if you're writing a compiler.
I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5, just that it would be good to print a diagnostic if it can be detected, and if not to do what's "characteristic of the environment". Does that make sense?
Reading for intent is pragmatic.
Reading adversarially is what people do who are looking for ways that something can be abused, from an offensive or defensive position.
Personally I am tired of the entire topic.
What's bad is when your compiler writers and most of the people involved in standardisation are reading it adversarially.
Ex falso quodlibet.
Bounding UB would be a nice idea, or at least prohibiting time-traveling UB (and there is an effort in that direction). But properly specifing it is actually hard.
In C / C++ there are two kinds of undefined behaviour. One is where there is written in standard what UB is. Another one is everthing else that is not in standard.
https://en.wikipedia.org/wiki/There_are_unknown_unknowns
Technically, that's only one kind, because it's written in the standard that anything not mentioned in the standard is undefined behavior.
One kind, but two different classes of undefined behaviour.
Use Rust!
We know. This is not news.
It seems to be to many many programmers who keep using C++
Hello, it's me. I'm not afraid of UB.
To be honest, miscompilations because of UB is exceedingly rare, and we do a lot of weird shit in our code.
Wait until he discovers PowerShell ;D
When use C ,keep using char* not mess with int*
UB can also have impact in logical cohesion of codebase.
The concept of undefined behaviour is also a very useful lens for understanding LLM-based coding. Anything you don't explicitly specify is undefined behavior, so if you don't want the LLM to potentially pick a ridiculous implementation for some aspect of an application, make sure to explicitly specify how it should be implemented.
I stoped reading about here:
Author, if you are reading this, please cite the spec section explaining that this is UB. Dereferencing the produced pointer may be UB, but casting itself is not, since uint8_t is ~ char and char* can be cast to and from any type.you might try to argue that uint8_t is not necessarily char, and while it is true that implementations of C can exist where CHAR_BIT > 8, but those do not have uint8_t defined (as per spec), so if you have uint8_t, then it is "unsigned char", which makes this cast perfectly safe and defined as far as i can tell. Of course CHAR_BIT is required to be >= 8, so if it is not >8, it is exactly 8. (In any case, whether uint8_t is literally a typedef of unsigned char is implementation-defined and not actually relevant to whether the cast itself is valid -- it is)
The issue is not type punning (itself a very common source of UB), but the fact that the `bytes` pointer might not be int-aligned. The spec is clear that the creation (not just the dereferencing) of an unaligned pointer is UB, see 6.3.2.3 paragraph 7 of the C11 (draft) spec.
Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.
C of course is ancient. It remembers the Cambrian explosion of CPU architectures, twelve-bit bytes and everything like that. I wonder if it is possible to codify some pragmatic subset of it that works nicely on currently available CPUs. Cause the author of the piece goes back in time to prove his point (SPARCs and Alphas).
Fun story: even the latest C spec doesn’t require CHAR_BIT == 8, but it does now codify 2s complement int representation. (IIRC)
For unsigned ints, or also for signed ints?
For signed. Unsigned overflow was defined for a while now.
That cast is valid. Spec does not guarantee same bit sequence for resulting pointer and source pointer. But as the cast is explicitly allowed, it is not UB. Compiler is free to round the pointer down. Or up. Or even sideways. All ok. Dereferencing it — indeed not ok. But the cast is explicitly allowed and not UB.
Pointer casts changing pointer bit sequences is common on weird platforms (eg: some TI DSPs, PIC, and aarch64+PAC). And it is valid as per spec. Pointer assignment is not required to be the same as memcpy-ing the pointer unto a pointer to another type.
You misunderstood the spec. No promises are made that that cast copies the pointer bit for bit (and thus creates an invalid pointer). Therefore, your objection to invalid pointers is null and void. :)
I'm not assuming anything about bit representations. In this case, the spec language is quite clear and unambiguous.
6.3.2.3 paragraph 7: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[footnote 68]) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
This is a subsection of section 6.3 which describes conversions, which include both implicit and conversions from a cast operation. This language is not saying anything about bit representations or derefencing.
I happen to be wearing my undefined behavior shirt at the moment, which lends me an extra layer of authority. I'm at RustWeek in Utrecht, and it's one of my favorite shirts to wear at Rust conferences. But let's say for the sake of argument that you are right and I am indeed misunderstanding the spec. Then the logical conclusion is that it's very difficult for even experienced programmers to agree on basic interpretations of what is and what isn't UB in C.
I do not see there a promise that the cast will produce an invalid pointer, nor anything prohibiting the compiler from rounding the pointer down, thus producing a valid one. “Converted” does not require bit copy. I don’t see how this interpretation is against any section of the spec.
Byte and int has different alignment requirements. It is UB the moment you make such a ptr.
Great way to demonstrate the point of the article.
That better be marked "historical". At least, Lemire says:
On recent Intel and 64-bit ARM processors, data alignment does not make processing a lot faster. It is a micro-optimization. Data alignment for speed is a myth. // https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...
(while in the olden days, a program may crash on unaligned access, esp on RISC)
Don't mix up what processors do with what the C standard allows you to get away with.
Without memcpy there is no guarantee that that line produces an invalid pointer
I don’t see what spec part would prohibit that cast from validly compiling to
Spec only guaranteed round-trip through char* of properly aligned for type pointers. This doesn’t break that.Ok, and?
"Rewrite everything in Rust. OMG universe is written in Rust so memory safe with zero allocations"
Yet another push to use LLMs after casting fear. Now it should be illegal not to use LLMs. A good start of the day.
(I hope casting fear is not UB)
> (I hope casting fear is not UB)
I'm sure that's UB in C
In C++ just use <reinterpret_cast>
The irony is unmistakable.
There is nothing ironic in letting an llm have a pass at identifying potential UB and other correctness issues in C code.
I say this as an experienced C developer.
Anyone who uses the construction "C/C++" doesn't write modern C++, and probably isn't very familiar with the recent revisions despite TFA's claims of writing it every day for decades.
Far from being just "C with classes", modern C++ is very different than C. The language is huge and complex, for sure, but nobody is forced to use all of it.
No HN comment can possibly cover all the use cases of C++ but in general, unless you have a very good reason not to:
- eschewing boomer loops in favor of ranges
- using RAII with smart pointers
- move semantics
- using STL containers instead of raw arrays
- borrowing using spans and string views
These things go a long way towards, shall we say, "safe-ish" code without UB. It is not memory-safe enforced at the language level, like Rust, but the upshot is you never need to deal with the Rust community :^)
Although some people, like Bjarne Stroustrup, object to the term C/C++, it's a bit like Richard Stallman objecting to the term "Linux". The fact is it can mean "C or C++", and I wouldn't assume the author thinks they're the same, but they're talking about both of them together in the same sentence. This seems reasonable given this is about undefined behavior, and it's trivial to accidentally write UB-inducing code in C++ even with modern style (although I'd say you should catch most trivial cases with e.g. ubsan, and a lot of bad cases would be avoided with e.g. ranges, so I think the article is exaggerating the issue).
Well, the author explicitly refers to "C/C++" as one language:
>After all, C/C++ is not a memory safe language.
C/C++ is a perfectly fine term for C or C-style C++. The languages can be very close, and personally I prefer C-style C++ miles over some of the half-baked modern nonsense. I mean, I do use C++23 since it has some great additions, but I'm ditching like 90% of the stuff that only adds complexity without much benefit.
> the upshot is you never need to deal with the Rust community
In the end, everything comes down to culture war.
Perhaps we should rewrite our culture in Rust.
I totally agree that modern c++ is pretty robust if you are both a well seasoned developer and only stick to a very blessed subset of it's features and avoid the historical baggage.
However, that's obviously not the point? Ignoring the idea that people can/should just "git gud" and write perfect code in a language with lots of old traps, you can't control how everyone else writes their code, even on your own team once it gets big enough. And there will always be junior devs stumbling into the bear traps of c/c++ (even if the rest of the codebase is all modern c++). So no matter how many great new features get added to C++, until (never) they start taking away the bad ones, the danger inherent to writing in that language doesn't go away.
Also, safe != non-UB. TFA isn't so much about memory safety anyway.
"C/C++" is still a useful term for the common C/C++ subset :)
As far as stdlib usage is concerned: that's just your opinion. The stdlib has a lot of footguns and terrible design decisions too, e.g. std::vector pulling in 20k lines of code into each compilation unit is simply bizarre.
Also:
- eschewing boomer loops in favor of ranges
Those "boomer loops" compile infinitely faster than the new ranges stuff (and they are arguably more readable too): https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
- borrowing using spans and string views
Those are just as unsafe as raw pointers. It's not really "borrowing" when the referenced data can disappear while the "borrow" is active.