During an internship in 1986 I wrote C code for a machine with 10-bit bytes, the BBN C/70. It was a horrible experience, and the existence of the machine in the first place was due to a cosmic accident of the negative kind.
Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.
I have mixed feelings about this. On the one hand, it's obviously correct--there is no meaningful use for CHAR_BIT to be anything other than 8.
On the other hand, it seems like some sort of concession to the idea that you are entitled to some sort of just world where things make sense and can be reasoned out given your own personal, deeply oversimplified model of what's going on inside the computer. This approach can take you pretty far, but it's a garden path that goes nowhere--eventually you must admit that you know nothing and the best you can do is a formal argument that conditional on the documentation being correct you have constructed a correct program.
This is a huge intellectual leap, and in my personal experience the further you go without being forced to acknowledge it the harder it will be to make the jump.
That said, there seems to be an increasing popularity of physical electronics projects among the novice set these days... hopefully read the damn spec sheet will become the new read the documentation
I'm totally fine with enforcing that int8_t == char == 8-bits, however I'm not sure about spreading the misconception that a byte is 8-bits. A byte with 8-bits is called an octet.
At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].
I kinda like the idea of 6-bit byte retro-microcomputer (resp. 24-bit, that would be a word). Because microcomputers typically deal with small number of objects (and prefer arrays to pointers), it would save memory.
VGA was 6-bit per color, you can have a readable alphabet in 6x4 bit matrix, you can stuff basic LISP or Forth language into 6-bit alphabet, and the original System/360 only had 24-bit addresses.
What's there not to love? 12MiB of memory, with independently addressable 6-bits, should be enough for anyone. And if it's not enough, you can naturally extend FAT-12 to FAT-24 for external storage. Or you can use 48-bit pointers, which are pretty much as useful as 64-bit pointers.
- CHAR_BIT cannot go away; reams of code references it.
- You still need the constant 8. It's better if it has a name.
- Neither the C nor C++ standard will be simplified if CHAR_BIT is declared to be 8. Only a few passages will change. Just, certain possible implementations will be rendered nonconforming.
- There are specialized platforms with C compilers, such as DSP chips, that are not byte addressable machines. They are in current use; they are not museum pieces.
> A byte is 8 bits, which is at least large enough to contain the ordinary literal encoding of any element of the basic character set literal character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is bits in a byte.
But instead of the "and is composed" ending, it feels like you'd change the intro to say that "A byte is 8 contiguous bits, which is".
We can also remove the "at least", since that was there to imply a requirement on the number of bits being large enough for UTF-8.
Personally, I'd make a "A byte is 8 contiguous bits." a standalone sentence. Then explain as follow up that "A byte is large enough to contain...".
So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.
To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?
One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html
Ignoring this C++ proposal, especially because C and C++ seem like a complete nightmare when it comes to this stuff, I've almost gotten into the habit of treating a "byte" as a conceptual concept. Many serial protocols will often define a "byte", and it might be 7, 8, 9, 11, 12, or whatever bits long.
Why? Pls no. We've been told (in school!) that byte is byte. Its only sometimes 8bits long (ok, most of the time these days). Do not destroy the last bits of fun.
Is network order little endian too?
C++ 'programmers' demonstrating their continued brilliance at bullshitting people they're being productive (Had to check if publishing date was April fools. It's not.) They should start a new committee next to formalize what direction electrons flow. If they do it now they'll be able to have it ready to bloat the next C++ standards no one reads or uses.
the fact that this isn't already done after all these years is one of the reasons why I no longer use C/C++. it takes years and years to get anything done, even the tiniest, most obvious drama free changes. contrast with Go, which has had this since version 1, in 2012:
This is an egoistical viewpoint, but if I want 8 bits in a byte I have plenty of choices anyway - Zig, Rust, D, you name it.
Should the need for another byte width come up, either for past or future architectures C and C++ are my only practical choice.
Sure, it is selfish to expect C and C++ do the dirty work, while more modern languages get away skimping on it. On the other hand I think especially C++ is doing itself a disservice trying to become a kind of half-baked Rust.
Previously, in JF's "Can we acknowledge that every real computer works this way?" series: "Signed Integers are Two’s Complement" <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...>
During an internship in 1986 I wrote C code for a machine with 10-bit bytes, the BBN C/70. It was a horrible experience, and the existence of the machine in the first place was due to a cosmic accident of the negative kind.
D made a great leap forward with the following:
1. bytes are 8 bits
2. shorts are 16 bits
3. ints are 32 bits
4. longs are 64 bits
5. arithmetic is 2's complement
6. IEEE floating point
and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!
Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.
Some people are still dealing with DSPs.
https://thephd.dev/conformance-should-mean-something-fputc-a...
Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.
Is C++ capable of deprecating or simplifying anything?
Honest question, haven't followed closely. rand() is broken,I;m told unfixable and last I heard still wasn't deprecated.
Is this proposal a test? "Can we even drop support for a solution to a problem literally nobody has?"
Hi! Thanks for the interest on my proposal. I have an updated draft based on feedback I've received so far: https://isocpp.org/files/papers/D3477R1.html
I have mixed feelings about this. On the one hand, it's obviously correct--there is no meaningful use for CHAR_BIT to be anything other than 8.
On the other hand, it seems like some sort of concession to the idea that you are entitled to some sort of just world where things make sense and can be reasoned out given your own personal, deeply oversimplified model of what's going on inside the computer. This approach can take you pretty far, but it's a garden path that goes nowhere--eventually you must admit that you know nothing and the best you can do is a formal argument that conditional on the documentation being correct you have constructed a correct program.
This is a huge intellectual leap, and in my personal experience the further you go without being forced to acknowledge it the harder it will be to make the jump.
That said, there seems to be an increasing popularity of physical electronics projects among the novice set these days... hopefully read the damn spec sheet will become the new read the documentation
This is both uncontroversial and incredibly spicy. I love it.
I'm totally fine with enforcing that int8_t == char == 8-bits, however I'm not sure about spreading the misconception that a byte is 8-bits. A byte with 8-bits is called an octet.
At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].
[1] https://en.cppreference.com/w/cpp/types/byte
Nothing to do with C++, but:
I kinda like the idea of 6-bit byte retro-microcomputer (resp. 24-bit, that would be a word). Because microcomputers typically deal with small number of objects (and prefer arrays to pointers), it would save memory.
VGA was 6-bit per color, you can have a readable alphabet in 6x4 bit matrix, you can stuff basic LISP or Forth language into 6-bit alphabet, and the original System/360 only had 24-bit addresses.
What's there not to love? 12MiB of memory, with independently addressable 6-bits, should be enough for anyone. And if it's not enough, you can naturally extend FAT-12 to FAT-24 for external storage. Or you can use 48-bit pointers, which are pretty much as useful as 64-bit pointers.
There are DSP chips that have C compilers, and do not have 8 bit bytes; smallest addressable unit is 16 (or larger).
Less than a decade ago I worked with something like that: the TeakLite III DSP from CEVA.
I just put static_assert(CHAR_BITS==8); in one place and move on. Haven't had it fire since it was #if equivalent
Not sure about that, seems pretty controversial to me. Are we forgetting about the UNIVACs?
What will be the benefit?
- CHAR_BIT cannot go away; reams of code references it.
- You still need the constant 8. It's better if it has a name.
- Neither the C nor C++ standard will be simplified if CHAR_BIT is declared to be 8. Only a few passages will change. Just, certain possible implementations will be rendered nonconforming.
- There are specialized platforms with C compilers, such as DSP chips, that are not byte addressable machines. They are in current use; they are not museum pieces.
> We can find vestigial support, for example GCC dropped dsp16xx in 2004, and 1750a in 2002.
Honestly kind of surprised it was relavent as late as 2004. I thought the era of non 8-bit bytes was like 1970s or earlier.
JF Bastien is a legend for this, haha.
I would be amazed if there's any even remotely relevant code that deals meaningfully with CHAR_BIT != 8 these days.
(... and yes, it's about time.)
The current proposal says:
> A byte is 8 bits, which is at least large enough to contain the ordinary literal encoding of any element of the basic character set literal character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is bits in a byte.
But instead of the "and is composed" ending, it feels like you'd change the intro to say that "A byte is 8 contiguous bits, which is".
We can also remove the "at least", since that was there to imply a requirement on the number of bits being large enough for UTF-8.
Personally, I'd make a "A byte is 8 contiguous bits." a standalone sentence. Then explain as follow up that "A byte is large enough to contain...".
Hmm, I wonder if any modern languages can work on computers that use trits instead of bits.
https://en.wikipedia.org/wiki/Ternary_computer
While we're at it, perhaps we should also presume little-endian byte order. As much as I prefer big-endian, little-endian had won.
As consolation, big-endian will likely live on forever as the network byte order.
As a person who designed and built a hobby CPU with a sixteen-bit byte, I’m not sure how I feel about this proposal.
But how many bytes are there in a word?
So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.
To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?
I wish I knew what a 9 bit byte means.
One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html
Ignoring this C++ proposal, especially because C and C++ seem like a complete nightmare when it comes to this stuff, I've almost gotten into the habit of treating a "byte" as a conceptual concept. Many serial protocols will often define a "byte", and it might be 7, 8, 9, 11, 12, or whatever bits long.
And then we lose communication with Europa Clipper.
Why? Pls no. We've been told (in school!) that byte is byte. Its only sometimes 8bits long (ok, most of the time these days). Do not destroy the last bits of fun. Is network order little endian too?
This is entertaining and probably a good idea but the justification is very abstract.
Specifically, has there even been a C++ compiler on a system where bytes weren't 8 bits? If so, when was it last updated?
Don't Unisys' Clearpath mainframes (still commercially available, IIRC) 36-bit word and 9-bit bytes?
OTOH, I believe C and C++ are not recommended as languages on the platform.
C++ 'programmers' demonstrating their continued brilliance at bullshitting people they're being productive (Had to check if publishing date was April fools. It's not.) They should start a new committee next to formalize what direction electrons flow. If they do it now they'll be able to have it ready to bloat the next C++ standards no one reads or uses.
the fact that this isn't already done after all these years is one of the reasons why I no longer use C/C++. it takes years and years to get anything done, even the tiniest, most obvious drama free changes. contrast with Go, which has had this since version 1, in 2012:
https://pkg.go.dev/builtin@go1#byte
Incredible things are happening in the C++ community.
I wish the types were all in bytes instead of bits too. u1 is unsigned 1 byte and u8 is 8 bytes.
That's probably not going to fly anymore though
I like the diversity of hardware and strange machines. So this saddens me. But I'm in the minority I think.
Hoesntly at thought this might be an onion headline. But then I stopped to think about it.
There are FOUR bits.
Jean-Luc Picard
I'm appalled by the parochialism in these comments. Memory access sizes other than 8 bits being inconvenient doesn't make this a good idea.
Amazing stuff guys. Bravo.
This is an egoistical viewpoint, but if I want 8 bits in a byte I have plenty of choices anyway - Zig, Rust, D, you name it. Should the need for another byte width come up, either for past or future architectures C and C++ are my only practical choice.
Sure, it is selfish to expect C and C++ do the dirty work, while more modern languages get away skimping on it. On the other hand I think especially C++ is doing itself a disservice trying to become a kind of half-baked Rust.
Bold leadership
But think of ternary computers!
How many bytes is a devour?
In a char, not in a byte. Byte != char
Just mandate that everything must be run on an Intel or ARM chip and be done with it. Stop pretending anything else is viable.
formerly or formally?
Obviously
[dead]