You pretty much need to use markup (or control codes) for rich text. Take bold, italic, underline, strikeout: those four can, and are, used in nearly any combination. You would need one bit for each of them. You would need two bits to specify four levels of headings. If you don't allow for that, you are back to using markup. You would also need one bit to specify proportional/fixed width font, because that is a thing too. That remaining bit would have to be used for superscript, since superscripts are commonly used for footnotes and simple mathematical expressions.
Okay, you can now create passable rich text documents for a limited (though common) range of purposes with that 8/24-bit breakdown that was suggested. But you may have noticed the author mentioned subscripts, which wasn't in my list. Well, it turns out that subscript and superscript have a terribly limited range of applications if you are specifying them per character: x^2^2 would be visually identical to x^22, and x^a_b would look different from x_b^a (with both presentations being nonsensical). The use of subscripts and superscripts in any technical applications would be severely limited. You need a much richer markup language to be truly expressive. So there really isn't much of a point in offering subscripts. Superscripts, sure, because they have a few non-technical uses.
Yet the reality is that people want a much richer set of formatting options. At a minimum, they want to select fonts and font sizes. Some of the formatting options have semantics. I know I crammed four levels of headings in those eight bits, but that only makes sense in headings. It doesn't make sense to specify it per character. Then there are other common document elements, like tables. You can create decent tables using monospaced fonts, but that is limiting and would produce undesirable results in some cases (try displaying April 5^th sensibly, using a monospace font so that it won't affect the width of the columns). On top of that, you are ditching the concept of styles because that implies some sort of markup.
Also, different languages have different formatting varieties. 256 combinations doesn't seem like nearly enough.
Note that is 256 combinations. If you want both bold and italics, either it's one of the 256 combinations, separate from the bold-only combination and from the italics-only combination, or you need another 8 bits for each option.
I think HN made a very aesthetically pleasing decision to exclude bold and underline. Imagine the appearance of comment pages if those were options.
Hah, I was about to criticise the text for far too lightly conflating markup and punctuation, just to see the afterword.
I actually do think the author has a point, in that must solutions today are inelegant, I also don't think this is a problem which has a real elegant solution. Where to draw the line? Why not encode fonts into the standard too, if we're doing bold? Etc.
I'm still mostly in favour of keeping everything markdown (in my own writing), however much it pollutes the "purity" of text.
This person is confused. He's citing a Ted Nelson paper about separating these things into layers (content, structure, & special effects), while personally advocating that we mash it all into unicode.
A lack of universally recognized richtext format is really a problem. Why? practically any rich-text that needs to be rendered across platforms (web and mobile devices) are now being stored as html or markdown or app-dependent json.
HTML was never envisioned as a cross-platform richtext format and markdown lacks almost half of all formatting features. Specialzed json is even more evil because the content becomes unrenderable when the parent app goes out of existence.
op's suggestion (accomodating formattings as unicode bytes) might not be optimal however I'm happy at least somebody thought of this as a problem to solve.
The odd thing is, you can do quite some bold/italics/superscript in Unicode nowadays. Because, at least from the ASCII letter range, they have been used in symbolic ways in Mathematics, etc., and have been added to Unicode as symbols rather than bold variants of letters. For example:
, !
, !
ᴴᵉˡˡᵒ, ᵂᵒʳˡᵈ!
So, there's almost no bold/italic punctuation. And non-ASCII Unicode letters aren't "supported" this way either. But you can get quite far with "formatted" ASCII letters in Unicode, if you're so inclined.
The author believes that plain text should encode bold, italic, etc., because that's all they had exposure to. Were the text written today, they would claim emojis belong in unicode as well.
Most social media don't support it, but on Tumblr, for example, you can specify the color of the text and even choose a different font. I think there was some other social media that allowed you to have animated effects on the text as well, but I forgot the name.
Sad what things like Markdown has done to people. It's like they forgot about all the amazing semantic markup of HTML 5 to create strong relations between their data. I'll take a Lexical editor with SQLite to store my data any day.
I like the idea of keeping the presentation out of the content, but keeping it in the character encoding. It's a cool idea. Never thought of it before reading this.
You pretty much need to use markup (or control codes) for rich text. Take bold, italic, underline, strikeout: those four can, and are, used in nearly any combination. You would need one bit for each of them. You would need two bits to specify four levels of headings. If you don't allow for that, you are back to using markup. You would also need one bit to specify proportional/fixed width font, because that is a thing too. That remaining bit would have to be used for superscript, since superscripts are commonly used for footnotes and simple mathematical expressions.
Okay, you can now create passable rich text documents for a limited (though common) range of purposes with that 8/24-bit breakdown that was suggested. But you may have noticed the author mentioned subscripts, which wasn't in my list. Well, it turns out that subscript and superscript have a terribly limited range of applications if you are specifying them per character: x^2^2 would be visually identical to x^22, and x^a_b would look different from x_b^a (with both presentations being nonsensical). The use of subscripts and superscripts in any technical applications would be severely limited. You need a much richer markup language to be truly expressive. So there really isn't much of a point in offering subscripts. Superscripts, sure, because they have a few non-technical uses.
Yet the reality is that people want a much richer set of formatting options. At a minimum, they want to select fonts and font sizes. Some of the formatting options have semantics. I know I crammed four levels of headings in those eight bits, but that only makes sense in headings. It doesn't make sense to specify it per character. Then there are other common document elements, like tables. You can create decent tables using monospaced fonts, but that is limiting and would produce undesirable results in some cases (try displaying April 5^th sensibly, using a monospace font so that it won't affect the width of the columns). On top of that, you are ditching the concept of styles because that implies some sort of markup.
Also, different languages have different formatting varieties. 256 combinations doesn't seem like nearly enough.
Note that is 256 combinations. If you want both bold and italics, either it's one of the 256 combinations, separate from the bold-only combination and from the italics-only combination, or you need another 8 bits for each option.
I think HN made a very aesthetically pleasing decision to exclude bold and underline. Imagine the appearance of comment pages if those were options.
Hah, I was about to criticise the text for far too lightly conflating markup and punctuation, just to see the afterword.
I actually do think the author has a point, in that must solutions today are inelegant, I also don't think this is a problem which has a real elegant solution. Where to draw the line? Why not encode fonts into the standard too, if we're doing bold? Etc.
I'm still mostly in favour of keeping everything markdown (in my own writing), however much it pollutes the "purity" of text.
This person is confused. He's citing a Ted Nelson paper about separating these things into layers (content, structure, & special effects), while personally advocating that we mash it all into unicode.
https://www.xml.com/pub/a/w3j/s3.nelson.html
A lack of universally recognized richtext format is really a problem. Why? practically any rich-text that needs to be rendered across platforms (web and mobile devices) are now being stored as html or markdown or app-dependent json.
HTML was never envisioned as a cross-platform richtext format and markdown lacks almost half of all formatting features. Specialzed json is even more evil because the content becomes unrenderable when the parent app goes out of existence.
op's suggestion (accomodating formattings as unicode bytes) might not be optimal however I'm happy at least somebody thought of this as a problem to solve.
The odd thing is, you can do quite some bold/italics/superscript in Unicode nowadays. Because, at least from the ASCII letter range, they have been used in symbolic ways in Mathematics, etc., and have been added to Unicode as symbols rather than bold variants of letters. For example:
, !
, !
ᴴᵉˡˡᵒ, ᵂᵒʳˡᵈ!
So, there's almost no bold/italic punctuation. And non-ASCII Unicode letters aren't "supported" this way either. But you can get quite far with "formatted" ASCII letters in Unicode, if you're so inclined.
People are limited by their tools.
The author believes that plain text should encode bold, italic, etc., because that's all they had exposure to. Were the text written today, they would claim emojis belong in unicode as well.
Most social media don't support it, but on Tumblr, for example, you can specify the color of the text and even choose a different font. I think there was some other social media that allowed you to have animated effects on the text as well, but I forgot the name.
Sad what things like Markdown has done to people. It's like they forgot about all the amazing semantic markup of HTML 5 to create strong relations between their data. I'll take a Lexical editor with SQLite to store my data any day.
I like the idea of keeping the presentation out of the content, but keeping it in the character encoding. It's a cool idea. Never thought of it before reading this.
[dead]