What's New in POSIX 2024

(blog.toast.cafe)

234 points | by signa11 5 days ago ago

238 comments

  • enriquto 4 days ago

    > We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now? Ok, that might be a bit much all at once. We’re heading there though!

    Oh my god. This makes me so happy. This is the most lovely think I've read in the world of computing since the unix gods decided that newlines were to be a single character.

    The philosophy underlying the sentence "Wouldn’t it be nice if the naive scripts were just correct now?" is incredibly positive. We are surrounded by arrogant jerks who break old code by aggressively enforcing stricter compliance of some stupid rules. But here come these posix heros who do the exact opposite: make old code correct! There is hope in mankind after all.

    • nneonneo 4 days ago

      Rather unfortunately, I happen to have a handful of files on my machine with newlines in them (the filenames were programmatically generated from a summary of their contents). I loathe the possibility that my shell tools are going to suddenly crash when confronted with these weird files, rather than just producing some slightly silly output. I wish we'd standardized the behaviour of just escaping such characters as `\n/\r` or `^J/^M`...

      • jraph 4 days ago

        They did the right thing for this: make the tools fail on file creation, but not on existing files.

        I guess it's still advisable to rename those files, I don't know how things like cp, mv or rsync will behave when copying such files in the future.

        • nneonneo 4 days ago

          No, they did not do the right thing:

          > the following utilities are now either encouraged to error out if they are to create a filename that contains a newline, and/or encouraged to error out if they are *about to print a pathname that contains a newline* in a context where newlines may be used as a separator

          It then proceeds to list a bunch of utilities including diff, file, find, grep, head, du, etc., none of which create files directly.

          These utilities could be updated to reject newlines in file paths if they're going to print in a "newline delimited" form - but for some of these utilities, that's the only available form.

          • jraph 4 days ago

            > error out if they are about to print a pathname that contains a newline in a context where newlines may be used as a separator

            But that's already broken. This is a situation where filenames with newlines in them are indistinguishable from two filenames in outputs. So instead of producing subtly broken output, tools are encouraged (not forced) to explicitly fail with a lot of noise.

            The "in a context where newlines may be used as a separator" part of this sentence is very important.

            IIUC the tools are still allowed to succeed in non broken situations, for instance when a null separator is used and not a new line character. And I can't imagine the tools you listed will start breaking in situations that worked (apart from file creation - indeed this will likely start breaking, and new line characters in filename needs to be considered deprecated and things using them to be fixed).

            This is strictly better IMHO (if one thinks that newlines in files are not worth the troubles given how things work in POSIX, especially the part where things are line-based and new line characters have quite some significance)

          • 4 days ago
            [deleted]
        • ykonstant 4 days ago

          If your file system allows them, be careful with symlinks though!

          • jraph 4 days ago

            Why, specifically?

            I'm convinced we will need to be careful with symbolic links related to new line characters in filenames, but I'm curious of which specific aspect you had in mind.

            • ykonstant 4 days ago

              Oh, nothing specific to newlines. Just, when you rename files to fix newlines, you need to check if they break symlinks pointing to them.

              For instance, I had project folders for my individual research projects. In order to have a central repository of resources and not have copies of multi-megabyte pdfs in each folder, I put all referenced papers in a single directory and symlinked them for each project that needed them. Later, I wanted to rename the papers to remove newlines. The symlinks complicated this process quite a bit!

              • jraph 3 days ago

                Ah, right, indeed :-)

      • ykonstant 4 days ago

        In academia, I get (and used to create) pdfs with names like:

        "On the number of

        associative foobars

        of degree blah -

        Johnson and Anderson.pdf"

        all the time. It is very convenient for non-technical academics to have a descriptive file name, and to be able to see it entirely in the navigator they use newlines.

        • oneeyedpigeon 4 days ago

          Oh god. I already get upset enough by spaces in a file name, although I realise that fight is basically lost now!

          • curt15 4 days ago

            Didn't Windows name "Program Files" with a space to force application developers to handle spaces in paths properly?

            • mmcdermott 4 days ago

              For the longest time you could get away with this in cmd:

              > dir c:\progra~1

              So if forcing people to handle spaces was the goal, it took a long time to force it.

              • arp242 4 days ago

                I'm pretty sure that still works. I forgot the exact scenario, but my Windows CI on GitHub Actions output shorte~1 pathna~1 like that in a script just a few months ago. On one hand, the backwa~1 compati~1 is nice. On the other hand, there's just so much depreca~1 cruft that keeps popping up even on contemp~1 systems.

            • pjmlp 4 days ago

              In theory yes, in practice to this day many people don't bother how to learn how to deal with pathnames in a proper way.

              • inetknght 4 days ago

                Top difficulties in computer science:

                1. naming things

                2. cache coherency

                3. off-by-one errors

                ???

                4. quoting pathnames

                • hawski 4 days ago

                  I would replace 4 with parameter expansion rules.

                • nneonneo 4 days ago

                  Eh, maybe. In practice I usually do all my moderately-heavy filesystem scripting in Python these days, for which pathname quoting is just a complete non-issue. Of course, I still use a shell for quick-and-dirty stuff, but usually only for pretty simple tasks where the simplest quoting setup ("$i") suffices.

            • ape4 4 days ago

              Not to mention C:\Program Files (x86)

              • account42 4 days ago

                And C:\Programme and other localized variants to force people to go through the proper APIs instead of hardcoding paths.

            • InfiniteRand 4 days ago

              I just got used to installing things I need to interact with in a program into a folder named C:\workspace

          • enriquto 4 days ago

            As a fellow spaces-in-filenames-hater, the fight is not lost. We are on the brink of winning it; it's just a mount option away!

            While we cannot avoid that people hit the spacebar when writing a filename on a gui, this does not mean at all that the resulting filename itself need contain a plain space character. Those spaces can and should be transparently translated to non-breaking space characters at some point. Maybe by the gui itself, or more robustly by the filesystem. This would make everybody happy: gui users and naive shell script writers.

            • poincaredisk 4 days ago

              >Those spaces can and should be transparently translated to non-breaking space characters at some point

              Why? This just introduces more complexity and interoperability headaches for seemingly no reason.

              • enriquto 4 days ago

                > Why?

                In order to preserve the sacrosanct simplicity of naive shell scripts. Seems like a very noble goal to me.

                The only unexpexted compexity arises when you want to deal with filenames having mixed spaces and nbsps. But I'd say that people who do that had it coming.

                • alexvitkov 4 days ago

                  If you want simple shell scripts to work, make an actually good shell language without all the footguns.

                  The filesystem is way more important than /bin/sh and and any complexity added there will trickle down to all programs, not just shell scripts.

                  It's not worth adding hacks on the FS to patch defects in poorly written shell scripts (which are being replaced en masse with python/nodejs/even weirder yaml files/systemd units/etc... anyways)

                  • consteval 4 days ago

                    Whitespace in filenames in general is difficult to deal with. Many, maybe most, programs get it wrong. It's not just about shell scripts, many GUI programs fail to handle those files properly too.

                    • alexvitkov 4 days ago

                      When GUI programs mishandle filenames with spaces, IME it's usually because they spawn a subshell in a naive way (system("rm " + filename)).

                      To mishandle spaces you have to split an input w/ filenames by whitespace, which is not that common of an operation outside of a shell.

                      • InfiniteRand 4 days ago

                        My favorite file+space issues is spaces at the end of file names, especially when you copy and paste text, or text gets trimmed from an input box, or the person forgets to trim space from an input box...

                    • nradov 4 days ago

                      The vast majority of Windows and MacOS programs get it right.

                      • consteval 3 days ago

                        No, no they don't - you just don't notice when they get it wrong, and you also don't name your files stupid things (I imagine).

                        If you actually test this, you'll realize a ton of Windows programs get it wrong.

                        Also, in general this is a poor argument. The goal of Linux isn't to be as much like Windows as possible, because Windows sucks ass. Nobody in their right mind would use Linux if it was just Windows but, presumably, shittier. The entire appeal of Linux is that it isn't Windows, and it isn't MacOS.

                    • arp242 4 days ago

                      Eh? It's really not a bother in pretty much any programming language, and you don't really need to do anything special for it. I don't know any program that has any problems with it.

                      Even zsh has fixed this. It's just /bin/sh and bash that are annoying.

                  • ksp-atlas 4 days ago

                    nushell uses real lists for things which means you don't need to care about seperators except when dealing with external system things

                • lifthrasiir 4 days ago

                  Simplicity doesn't always mean stupidity. The simple but functional shell that correctly handles whitespaces without much hassle was already available since 90s, namely rc which is also found in Plan 9. Adopting rc's string concatenator `^` in POSIXy shells shouldn't be too hard.

            • chasil 4 days ago

              It would be really nice if there was a mount option that would quietly remove spaces in filenames, or convert them to an underscore.

              If I had it, I would use it today.

            • vbezhenar 4 days ago

              Yep, works today:

                  sh-3.2$ f='Hello world'
                  sh-3.2$ echo $f
                  Hello world
                  sh-3.2$ for i in $f; do echo $i; done
                  Hello
                  world
              
                  sh-3.2$ f='Hello\xC2\xA0world'
                  sh-3.2$ echo $f
                  Hello world
                  sh-3.2$ for i in $f; do echo $i; done
                  Hello world
              • stouset 4 days ago

                Just always quote variable interpolation and you will never have problems.

                    sh-3.2$ f='Hello world'
                    sh-3.2$ echo "${f}"
                    Hello world
                    sh-3.2$ for i in "${f}"; do echo "${i}"; done
                    Hello world
                    sh-3.2$
          • jonhohle 4 days ago

            Convince (force?) your team to use make and soon everyone will forget spaces in file names are even a thing!

            • taneliv 4 days ago

              My team already uses `make` but there's no reason for me to run it in my Downloads folders. File names in there are sometimes wild. Yet I expect command line tools to work with them. If they will cease to do so, I will have to start using non-POSIX variants of those tools, I guess.

        • jodrellblank 4 days ago

          I don't know who "the Austin Group" mentioned in the article are, but how come they "could not find a single use-case for newlines in pathnames besides breaking naive scripts" when legitimate use-cases are so easy to find?

          (And if they're that incompetent, why does the article imply they are worth quoting and listening to?)

          • gpderetta 4 days ago

            It is [1] the joint working group that for the last 25+ years has been responsible for both the POSIX standard and the Single Unix Specification. It emerged after the UNIX wars as a consolidation of the various splintered UNIX standardization efforts (POSIX itself, X/OPEN, OSF, etc).

            [1] https://en.wikipedia.org/wiki/Austin_Group

          • nilamo 4 days ago

            Is that legitimate? A path name is just a unique identifier for a file, IMO it doesn't make sense to put a whole novel in there. If anything, a giant summary like that should be in the meta tags?

            • jodrellblank 4 days ago

              In what way is it not legitimate? It's not an accident, bug or data corruption. Someone put it there for a reason, and it benefits their use case. That's as legitimate as it gets.

            • ykonstant 4 days ago

              That's a core part of the problem: a path name is NOT just a unique identifier for a file. Desktop operating systems and their classical utilities conflate the "unique identifier" and whatever "displayed title" of a file though which the end user interacts with the file.

              Users care about "titles" or "summaries" of files, not "filesystem identifiers"; as long as the two are conflated, non-technical users will use the identifier to write titles and thus make the file easy to locate in an interactive GUI. Meta tags are not even in the cognitive horizon of most people.

          • shthed 2 days ago

            Name one legit use case.

            • jodrellblank 2 days ago

              ... the use case in the parent comment I was replying to.

              And no I'm not going to copy that here for you to quip "that's not a legitimate use case". Make an effort to make a point and support it with better justification than "because I said so".

        • Flimm 4 days ago

          How do these non-technical academics even create a PDF file with a name like that?

          • ykonstant 4 days ago

            Right click, rename, enter, enter, enter (until the entire file name is visible on the box)? That's how I did it when I used Windows.

            Edit: now I remember the most basic way: open the pdf, select and copy the title, click on rename and paste from clipboard. Works great to get the file name with the newlines exactly as they are on the title!

            • zelphirkalt 4 days ago

              Doesn't <enter> just confirm the typed input for the filename and finish the renaming? How does that insert newlines?

              • astroid 4 days ago

                Yes - I just tested on Win10+11 because I thought "there is no way I didn't accidentally do something like this on accident... and I would have remembered seeing a new line in my file name when I made that mistake."

                I just opened a folder in file explorer, clicked 'rename' and then tried the following combinations: Enter L Ctrl + Enter L Alt + Enter Win + Enter R Ctrl + Enter R Alt + Enter

                None of them let me put new lines in the filename - it either did nothing, or 'closed' the rename view.

              • ykonstant 4 days ago

                Shrug, I last used windows with Windows 7, so you are probably right. That being said, at least two of the students I am currently tutoring are on XP and one of my colleagues as well :D

                • pino82 4 days ago

                  No, it was always this way.

                  • ykonstant 4 days ago

                    Right, I just remembered the main way to create those filenames: open the pdf, select and copy the title, close, rename the file and paste from clipboard.

            • abenga 4 days ago

              I don't know if this is a Linux thing, but when renaming a file, when I press enter, I apply the new name, the file manager doesn't add a newline.

            • 4 days ago
              [deleted]
        • ykonstant 4 days ago

          I am interested in hearing the rationale for downvotes explicitly. I am describing a reality that exists and must be taken into account. Why are you downvoting?

      • nasretdinov 4 days ago

        The thing is, it's hard to predict what would happen to those scripts regardless... E.g. try naming your files "-rf" and see how many things break :)

        • ykonstant 4 days ago

          A correct script will have no problems with "-rf" or any other file name. I have (and recommend script writers make their own) a directory hierarchy of "dangerous" file names to test scripts.

          For example, it contains a directory where all file and subdirectory names are in unary, consisting only of repetitions of the newline character. A correct script should be able to enumerate, access and modify files in there without issue.

        • redserk 4 days ago

          If one really wanted to embrace chaos, introduce this as a new team file naming standard for "risk finding" files ;)

        • tetha 4 days ago

          I do enjoy "ls *; touch -- -lisah; ls *" as a fun little brainteaser for those uninitiated to this behavior.

        • nneonneo 4 days ago

          export TMPDIR=" / "

          to surprise the next person or script to do "rm -rf $TMPDIR/foo"...

      • funcDropShadow 3 days ago

        Of course, there is an xkcd for that: https://xkcd.com/1172/

      • stouset 4 days ago

        Dude, just fix the filenames.

    • anal_reactor 4 days ago

      It's a bandaid on a wider problem: the design of Unix shell is bonkers and the whole thing should be deleted. Why? Because I haven't seen any other tool ever have so many pitfalls. Take n random languages and m random developers and tell them to loop over a string array and print its contents, and count how many correct programs you get on average per language. There will be easy languages, then difficult languages, then a huge gap, then Unix shell because in your random sample you managed to get one guy who has PhD in bash.

      • vbezhenar 4 days ago

        The main problem is using text as a common format between different applications.

        First: text is not well defined. Is it ASCII? Is it UTF-8? Some programs can spew UTF-32 with proper locale configured, it's a mess.

        Second: encoding and decoding of objects to text is not defined at all. Those problems with filenames is just one example. Using newline as a separator is a natural thing that is easy to implement, yet it is wrong.

        In my opinion two things should be done:

        1. Standardise on UTF-8. No other encodings allowed.

        2. Standardise on JSON. It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

        So any utility must read and write JSON objects with some standard env set. And shells can be developed with better syntax to deal with JSON. This way you can write something like

        `ps aux | while read row; do echo ${row.user} ${row.pid}; done`

        • poincaredisk 4 days ago

          >It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

          Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.

        • aloisklink 4 days ago

          POSIX does actually define what a "text file" is, but the definition is a bit unusual:

          See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

          > 3.387 Text File

          > A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

          So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

          (and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

          But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

          • WJW 4 days ago

            Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?

            • Ukv 4 days ago

              POSIX defines a line as:

              > 3.185 Line

              > A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

              So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

            • rascul 4 days ago

              An empty file is not hard to make. It's just a matter of creating the file and not writing to it.

              • WJW 4 days ago

                Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

                The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".

                • rascul 4 days ago

                  Ahh I see what you're saying. I misunderstood at first.

        • arghwhat 4 days ago

          What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

          You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.

          • Netch 3 hours ago

            > That can only be a bug - UTF-32/UCS-4 never saw external use

            I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

            And what is a real reason to disallow this?

          • vbezhenar 4 days ago

            I think that I hit that with Java:

                % java -Dfile.encoding=UTF-32 Test | hexdump -C
                00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
                00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
                00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
                00000030  00 00 00 0a                                       |....|
                00000034
            
            
            From quick googling it seems that glibc does not support it, so it should not happen.
            • Netch 3 hours ago

              > it seems that glibc does not support it

              `iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...

        • ezoe 4 days ago

          Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.

        • oneeyedpigeon 4 days ago

          I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.

        • anal_reactor 4 days ago

          True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?

        • nly 4 days ago

          The primary purpose of command line program output is to convey information to a human, not to other programs.

          Command line scripting is supposed to be adhoc and hack.

          • consteval 4 days ago

            There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.

          • mdavid626 4 days ago

            I disagree that it supposed to be adhoc and hack. Look at PowerShell.

          • anthk 4 days ago

            That under limited OSes such as DOS. Under Unix, piping has been the philosophy.

        • matrss 4 days ago

          JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

          But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?

        • pif 4 days ago

          > The main problem is using text as a common format between different applications.

          If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

          That idea is what made it possible for you to type that sentence in the first place.

      • akira2501 4 days ago

        > I haven't seen any other tool ever have so many pitfalls.

        I haven't seen any other tool with so much general utility and availability.

        > to loop over a string array and print its contents

        Is incredibly easy in bash and bash like shells. As highlighted the issue is that tools like 'ls' don't create "a string array." They create one giant string that has to be parsed. The rules in the shell are different than in other languages but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

        This is a fine tradeoff. As evidenced by it's wide usage and lack of convincing replacements.

        • anal_reactor 4 days ago

          > I haven't seen any other tool with so much general utility and availability.

          > availability

          That's the real reason why we use Unix shell. It's not good, but it's available. Like a cheap hooker.

          > but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

          "It mostly works if you're careful" doesn't sound very convincing to me.

          • akira2501 4 days ago

            > "It mostly works if you're careful" doesn't sound very convincing to me.

            Would you rather write your own parser?

          • stephenr 4 days ago

            > but it's available. Like a cheap hooker.

            Username checks out.

      • blueflow 4 days ago

        Someone needs to come up with a interactive shell first, one that is comparable in usability. Then we can think about replacing the unix shell.

        I tried both python and lua interactively, but they are a pain when it comes to handling files. You have to type much more to get the same things done.

        • anal_reactor 4 days ago

          The bigger issue is the sheer momentum of Unix shell. Even if you come up with an alternative that is better by every objectively measurable metric, it's still going to be a monumental task to have it packages with commonly used distros. Kinda like the "why can't the US switch to the metric system" problem.

          • Netch 3 hours ago

            OK let them add an explicit check to standard tools, and/or to open(), mkdir(), etc. with O_PORTABLECHARS. And an environment option to disable this check.

            Why they force the restriction at syscall level?

          • blueflow 4 days ago

            People already use different shells, mksh, fish, and so on. With fish there is a non-posix shell in wide use.

            • oguz-ismail 4 days ago

              >wide use

              Five people around the globe isn't wide use.

              • blueflow 4 days ago

                I'm sure you might get more than 5 people on HN replying to you that they are using fish right now. Say something discrediting about fish and they show up.

                • fragmede 4 days ago

                  Heh, reminds me of how to get help with Linux back in the day. If you directly asked for help, you'd be told to RTFM. If you stayed confidently that Windows could do something and that Linux sucks because it can't, you'd get users tripping over themselves with details and instructions,'just to prove you wrong.

                  Human psychology is fascinating!

          • azalemeth 4 days ago

            There's a direct cost in money, time and lives that has come from the US's adherence to their US Customary Units (which are often different to the old imperial units). People have literally died because of the confusion caused by having multiple systems of units in common use with ambiguous names (degrees, gallons, etc). Each year industry worldwide spends an enormous amount of money indirectly precisely because of this problem and it's still incredibly unlikely to be fixed within my lifetime.

            Bash-alternatives that are not completely compatible frankly just don't have a chance.

          • stephenr 4 days ago

            If it isn't distributed out of the box with every nix-like OS, it inherently isn't* “better by every objectively measurable metric" - distribution of a common, stable standard is a huge benefit in and of itself.

            • blueflow 4 days ago

              > distributed out of the box with every nix-like OS,

              Python and lua are pretty close to that.

              • stephenr 4 days ago

                > Python and lua are pretty close to that.

                Python maybe often installed by default but it's definitely not an essential/required package "out of the box" on every install. Also, in a thread where one topic is how POSIX shell handles whitespace in filenames, it's hilarious (not in a good way) that someone suggests a language that handles whitespace the wrong way in it's own code. Yes, significant whitespace is objectively wrong.

                What OS/distro is Lua included on out of the box? That doesn't mean "available in a package". I mean literally included in every single install and cannot reasonably be omitted?

                Regardless of the availability, the parent comment says

                > better by every objectively measurable metric

                Neither Python nor Lua are "better" than shell, at the types of things shell is commonly used for - they're objectively worse.

                • blueflow 4 days ago

                  Lua gets onto every other Linux distro as dependency of some base system component. For example, rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

                  You should use the word "objectively" less.

                  • stephenr 3 days ago

                    > Lua gets onto every other Linux distro

                    Just FYI, there are UNIX-like, POSIX compatible systems that are not a Linux distro.

                    > rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

                    Pipewire? Do you mean this? https://packages.debian.org/bookworm/pipewire

                    That isn't even close to "installed on every system". Best I can tell from the reverse dependencies, it's required for some Gnome Remote Desktop tool, and best I can tell, it doesn't rely on Lua anyway (at least on Debian).

                    > You should use the word "objectively" less.

                    I specifically used the word objectively, because the original comment that I replied to, said this:

                    > better by every objectively measurable metric

                    • blueflow 3 days ago

                      pipewire -> wireplumber -> libwireplumber -> liblua

                      Pipewire being the Pulseaudio replacement from Redhat.

                      Bookworm is probably the last Debian without :P

                      • stephenr 3 days ago

                        > Pipewire being the Pulseaudio replacement from Redhat.

                        Right, so it's a desktop package that ultimately will be installed on about 1% of all Linux machines because the vast majority are servers without a desktop environment.

                        Also worth pointing out: liblua on Debian at least, is the shared library. It's not the binary to execute standalone Lua scripts.

                        • blueflow 3 days ago

                          This this like a game where you come up with bullshit and i have to come up with the facts to rectify it? RHEL/centOS have more than 1% market share alone.

                          Check your own installs and tell me if you find some that dont have liblua or libluajit.

                          For the library thing: I said "Python and lua are pretty close to that." earlier. I did not say that they have interpreters ready everywhere. But if the language core is already installed on a large fraction of machines, then adding the interpreter is not a big cost.

                          • stephenr 3 days ago

                            > already installed on a large fraction of machines

                            So far you've presented no evidence of this though, just that it's used by a new desktop-focused package.

                            All linux desktops over the last 30 years is not even a "large fraction" of total Linux installs, much less the ones that have already migrated to this new audio system.

                            > adding the interpreter is not a big cost

                            It's nothing to do with cost. It's about "how do I know this will absolutely 100% run on any POSIX machine I throw it on without any extra steps".

                            Remember the argument here is about something that is claimed to be "objectively better" than Shell. The ubiquitous nature of POSIX shell is a huge barrier for any possible competitor, and saying "well you just need to install it" just defeats the purpose. You might as well write it in fucking java and say "well you just need to install a JVM".

                            Edit to Add: a good number of systems I manage do have liblua installed... because HAProxy requires it, and those systems have HAProxy installed. Not because it was installed as part of the base OS or even a default group of packages.

                            Incidentally, HAProxy and thus liblua were installed on those systems by infrastructure management that's implemented as shell script. So what kind of chicken and egg argument do we need to have here about how exactly I can run a Lua script to install Lua?

                            • blueflow 3 days ago

                              > a good number of systems I manage do have liblua installed

                              /thread

              • consteval 4 days ago

                Even outside of distribution, python and lua aren't objectively better. For starters, they're much more verbose.

                • blueflow 4 days ago

                  I just said that, scroll up.

        • throw16180339 4 days ago

          I certainly have my complaints about Powershell, but it's got pretty good coverage, decent documentation, and cross platform support.

          • felixgallo 4 days ago

            if it weren't so irregular, inconsistent, spotty and tasteless, it'd be a great option.

        • nly 4 days ago

          Oil shell?

          https://www.oilshell.org/

          Compatible with most bash scripts

      • throwaway19972 4 days ago

        > the design of Unix shell is bonkers

        Compared to what?

        • mdavid626 4 days ago

          Powershell?

          • poincaredisk 4 days ago

            PowerShell designer could learn from decades of programming language progress and especially shell usage. They could improve many aspects indeed. This doesn't mean that the original design is "bonkers", only that it's not perfect.

            • pjmlp 3 days ago

              The way Powershell works is largely based on what the computing world was doing with shells outside Bell Labs, at IBM, Xerox, and others places, exactly at similar timeframe as UNIX was happening.

            • mdavid626 2 days ago

              Can you give examples of what should be improved in PowerShell?

          • oguz-ismail 4 days ago

            Verbosity is a huge problem there

            • consteval 4 days ago

              Modern programming language designers have a bad relationship with verbosity. I don't know why they do this.

              It's a lang for an interactive shell, typing literally translates to developer speed. I understand the want for clarity and maybe that's nice in large scripts, but the main goal is to be a shell. So, optimize for that. Also, you probably shouldn't be using powershell for large scripts anyway.

              The only recent lang I've seen that has a handle on this is Rust. You can tell they put a lot of thought into having keywords be as short as possible while still being descriptive.

          • ggm 4 days ago

            FoundTheCamelCaseConvert.

            My God next you will say getopt() --longform is the bestest

            • throw16180339 4 days ago

              It's been years since I used Powershell, but IIRC there are shortcuts for the common commands, e.g. cat, ls, mv, rm, and such DTRT.

              • Diti 4 days ago

                Those aliases are, I believe, only defined on Windows PowerShell (the closed-source version 5; not PowerShell 7). I wish those default aliases you mentioned weren’t a thing. Especially `curl` (people should use `iwr` instead), which is an alias of `Invoke-WebRequest`, because it makes the `curl.exe` shipped with Windows nearly undiscoverable.

      • dailykoder 4 days ago

        Works on my machine!

      • dangsux 4 days ago

        [dead]

      • zelphirkalt 4 days ago

        This should not be as downvoted as it is. In a way shell is broken. The brokenness is in that it requires each command to serialize and deserialize again, considering all the weird things that can happen with the "all is a string" kind of approach, instead of having a proper data interchange format or even sending objects to next steps in the pipeline. This behavior is what necessitates even thinking about the changes listed in the post. We wouldn't even have that problem, if the design of shell was better thought out. Now we are dealing with decades of legacy built on these shaky foundations. I hate to admit it, but seems at least this aspect Powershell got right, whatever one may think about the rest of it.

        • chasil 4 days ago

          On my rhel7 system, the Debian dash shell is this large:

             $ ll /bin/dash
            -rwxr-xr-x. 1 root root 113536 Nov  5  2018 /bin/dash
          
          I happen to have an old powershell installed:

            $ rpm -qi powershell | grep Size
            Size        : 126588370
          
          A strict POSIX shell is always going to be vastly smaller, for many reasons.

          I would prefer that the POSIX shell was an LR-parsed language, but you can't have everything.

      • enriquto 4 days ago

        > loop over a string array

        Dear anal_reactor, what is a "string array"? I have used unix shells since nearly 30 years and never heard about them. And I consider myself a script-fu master!

        There are two array-like constructions in the shell: list of words (separated by spaces) and list of lines (separated by newlines). Both cases are implemented as a single string, and the shell makes it trivial to iterate through its components.

        • ManBeardPc 4 days ago

          That is exactly the problem many people have with it. Encoding „arrays“ this way is foreign to everyone who comes from „normal“ programming languages. Both variants lead to problems because either character can occur in elements, worst case scenario they contain both at the same time. I can see why this leads to confusion and bugs.

          • skydhash 4 days ago

            It’s like people saying they won’t learn French because it has a different grammatical structure. There’s no “normal” natural language. If you’re used to the C-like syntax, learning C-like language will be easy. But that’s not an argument to say Lisp is confusing.

            • ManBeardPc 4 days ago

              That's why I put normal in quotes. There is however more to it than having a different grammatical structure: It works different from many commonly used languages that have actual arrays/lists where elements can contain anything the type allows. If you come from any of the common modern programming languages (lets say Java, Kotlin, C#, JS/TS, Python, Swift, Go, Rust, etc.) and expect something similar (because many of them are very similar) you will be confused. Using spaces or newlines to encode elements in a single string is just not robust and leads to easy to make mistakes.

              • skydhash 4 days ago

                Most of these languages were created long after bash and the other shells. The fact is that shell scripts allows for unquoted strings and quoting is a specific operation, not syntax. Also shell scripts were meant for automations, not for writing general programs. The basic units are commands, arguments, input, output, files,… so the design makes these easy to manipulate.

                I’m not saying that we can’t improve, but I’m more in favor of making the tool more apt to solve a problem than making it easier to learn. Because the latter often wants to forego the requirement of understanding the problem space.

                • ManBeardPc 4 days ago

                  Yes, these are newer. I mainly wanted to make the point that it is confusing if you are new to bash and come from these newer languages with the wrong expectations. The concise nature and many subtle details makes it very difficult for beginners and infrequent users.

                  Compare this to the newer programming languages where you explicitly call something with speaking names like .Trim(), .EndsWith(), support from compiler and IDE.

                  In my experience automation and general programs often are the same thing once things get more complicated. Bash scripts usually grow rapidly and are a giant PITA to maintain or refactor. Throw in build systems and helper scripts and you quickly receive a giant pile of spaghetti. Personally I just switch to one the mentioned programming languages once it goes above a simple sequence of operations.

                  Personally I don't see how to improve it much without becoming a full blown programming language, at which point it would probably make more sense to just release a library for common automation tasks that is also composable. Maybe I'm just not the right target audience.

                  • skydhash 4 days ago

                    The issue with your otherwise good reply is that someone are bringing expectations to an expert tool (programming languages, software, OS) and blidly assuming that everything will work as he thinks it should. Familiarity helps with learning, but shouldn’t replace it. Someone new to bash should probably start with a book.

                    And for bigger automation projects, there are lots of projects and programming languages that can help.

                    • ManBeardPc 4 days ago

                      I agree it is an issue but it is how many people work and think. Most of the time they are not even wrong. "Hey, I have variables and loops, I know that!".

                      I would even make the case for expert tools being as unsurprising and familiar as possible unless there is a very good reason for them not to. Also they should be robust against misuse and guide the user towards good practices. There are always beginners, people that rarely need to use it, people that do programming as "just a job" and people that make mistakes because they are distracted, tired or just human. Something like "rm -r /" is a good reminder of that for many people.

                      Plus there are already a lot of tools required. Reading a book about every tool I have to use would be unpractical for most projects. Maybe more expert tools should just be tools. The same way I can now just use Ubuntu and get a working desktop system including drivers for most common hardware. If I compare that to the past where I installed a Linux distribution and then found out I lack a driver for my network card but I need to download it from the internet... I still can modify my system if I need to, but it's nice that I don't have to. I think we can do similar things with many parts of development and free some capacity for other tasks.

    • account42 4 days ago

      Their proposed solution is not compatible with reality though where POSIX does not get to define what kind of files exist on filesystems you need to work with.

      All they did is introduce new error cases in C programs while not actually fixing anything for shell scripts.

      If anything, it's going to result in more exploits as people write shell scripts with the assumption that newlines cannot appear in filenames.

      • quotemstr 4 days ago

        In the real world, nobody writes shell scripts that handle newlines in filenames.

        • account42 4 days ago

          I do. Single files are handled with quotes around arguments just fine. For lists of files you need to use NUL as a separator. That's not really hard to do once you are aware of the problem but ergonomics could be better - which is something useful that POSIX could change.

    • zokier 4 days ago

      But they did not make old code correct. Filenames are still allowed to contain newlines. Shell scripts still need to be prepared to deal with that. Nothing really changed, they just added a feel-good half-measure.

      • quotemstr 4 days ago

        It's a step in the right direction. You have to understand that for decades a vocal group of Unix die-hards has opposed any limitations whatsoever on the bytewise content of file names. The newline restriction in this latest version of POSIX may be modest, but it represents a dam breaking. When (obviously) the sky doesn't fall, the next version of POSIX will have a lot more filename cleanup.

        • rixed 3 days ago

          Next step is to forbid newlines from file content itself, to fix conforment json parsers ?

      • janderland 4 days ago

        This is pretty standard for a human run system. Gotta make the human feel good about an idea before they can do said idea.

        If you’re not familiar with humans, there are several manuals available online.

    • ezoe 4 days ago

      Don't assume UTF-8 is the only character encoding used in the wild. There are character encoding with leading bytes not easily detectable like UTF-8.

      • arghwhat 4 days ago

        In 2024, if you don't get the correct result decoding a text as UTF-8, the bug is the text, not the decoding. And luckily, adoption of UTF-8 in the past 30+ years have gone will enough that you don't need to worry.

        Caveats for cursed hardware standards demanding two-byte encodings like USB.

        • poincaredisk 4 days ago

          I hope you're happy in your ivory tower, but I personally work with a lot of files with other encoding, most often that weird utf16 (Windows), sometimes also legacy files with different ANSI encoding. Declaring "my decoder is fine, it's the text that is buggy" is not going to score a lot of points with my boss and clients.

          • arghwhat 4 days ago

            The only valid reason for still having files stored in legacy ANSI encodings is that their only use is input to software that has not been maintained for ~30 years and cannot be updated. That's fine because they're just binary inputs in a closed ecosystem that no one touches.

            But if they are supposed to be treated as text, then yes it's the text that's buggy - they should just be converted to UTF-8 once and have the originals thrown away.

            UTF-16 is something that Microsoft has cursed us with by inserting it into specifications (like USB) so that we cannot get rid of it, even if it never made any sense what so ever. But those are in effect explicit protocols with a hard contract, very different from something where you would "assume an encoding".

          • zelphirkalt 4 days ago

            Shouldn't hurt to tell clients to right their weird proprietary software originated encodings though.

        • 1oooqooq 4 days ago

          why people assume utf8 had only know locale encoding still?

          you're probably guilty of the sin you preach and is showing wrongly decoded utf8 and don't even know.

    • hwc 4 days ago

      Now do that with all whitespace!

  • Netch 2 hours ago

    Filename character set and its interpretation shall be controlled per directory or, at least, per FS. This pertains not only to permitted set like with or without LF, but to collation rules as well (including case insensitivity with cases like Turkish/Crimean/etc. I/ı and İ/i). Also this shall include workarounds for already existing problems: if a directory already contains files I1 and ı1, there shall be a technique to deal with them separately ever with Turkish locale.

    But restricting this at syscall level is definite insanity, among with excuses.

  • chasil 4 days ago

      - find(1p) now supports -print0
      - xargs(1p) now supports the -0 argument
      - newlines in filenames now should throw errors in many utilities
      - a complier implementing the c17 standard is now required
      - ulimit is expanded
      - renice can use relative values
      - a timeout utility has been added
      - make adds support for $^ $+ ::= :::= != ?= +=
      - logger is improved
      - gettext is adopted
      - readlink and realpath are adopted
      - rm now supports -d to remove empty directories and -v for verbose
      - various improvements to printf, sed, test
    • greyw 4 days ago

      Looks like the BSD-family will have some implementing to do.

      • chasil 4 days ago

        I just booted OpenBSD 7.0 (which is a bit dated).

        The find utility has print0, and xargs has -0. Notibly, xargs also has -P for running processes in parallel.

        rm has both -d and -v.

        The renice command appears to be able to use relative adjustments with -n.

        There is a timeout command.

        There is a readlink command, but no realpath (but a manual page exists for it as a system call).

      • sneed_chucker 4 days ago

        Strict adherence to POSIX isn't a goal of any of the current BSDs is it?

        • washbear a day ago

          People get "POSIX compliance" confused with "Unix certification". The first is an API you implement, the second is a rubber stamp.

          All active Unix-like operating systems aim to implement the new interfaces as they're defined.

        • bryanlarsen 4 days ago

          I'm confident they'd accept patches.

        • saagarjha a day ago

          macOS

  • imrejonk 4 days ago

    This adds `set -o pipefail` to POSIX sh, which causes a whole pipeline to fail (non-zero exit code) if one or more of the commands in the pipeline fail.

    • deskr 4 days ago

      If you're writing scripts, use that and don't forget -e and -u

        -e      Exit  immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status
      
        -u      Treat  unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion
      • ykonstant 4 days ago

        For `set -u` I mostly agree. For `set -e` see my comment below and Greg's wiki: http://mywiki.wooledge.org/BashFAQ/105

        • deskr 4 days ago

          > and they still fail to catch even some remarkably simple cases

          I totally agree. Although I'd say that there isn't anything "remarkably simple" about writing a bash script. Anything in the shell scripting world that seems remarkably simple is just because one hasn't realised the ghosts and horrors that lurk in the shadows.

          But I'll use -e anytime. It feels like having a protective proton pack at least.

      • 4 days ago
        [deleted]
    • zelphirkalt 4 days ago

      Does it? It is not mentioned anywhere in the post. Can you post a reference to your source?

    • akdor1154 4 days ago

      Holy balls that's like Christmas!

    • rightbyte 4 days ago

      Really? Wont that break piping grep?

      • WJW 4 days ago

        Probably, so don't `set -o pipefail` in scripts that pipe into grep.

        • rightbyte 4 days ago

          Ah ok I read it as 'sets it by default' for some reason.

    • throwaway984393 4 days ago

      Sad. Use of that option is almost always a mistake. It only leads to undebuggable silent failures.

      • Joker_vD 4 days ago

        I'd rather both have this option and have it work reliably. It's ridiculous that

            export VAR=$(cmd1 | cmd2)
        
        does not count as a pipefail when cmd1 or cmd2 fail but

            VAR=$(cmd1 | cmd2)
        
        does, so the "correct" way to set an environment variable from a pipeline's output is actually

            VAR=$(cmd1 | cmd2)
            export VAR
      • ykonstant 4 days ago

        Pipefail is useful and very hard to emulate on pure POSIX; you need to create named fifos, break the pipeline into individual redirections and check for error on each line.

        And that is fine; but sometimes you want to treat a pipeline as a "single command" and then you can use pipefail to abort the pipeline on error. Then you can handle the error at the granularity of the entire pipeline without caring which part failed.

        Lastly, I am confused as to the "silent" failures; maybe you are thinking of combining this with `set -e`? Then yes, that is bad and I recommend against the combination; but then again, I and most advanced scripters recommend against shotgunning `set -e` in the first place. Use it in specific portions of the script when appropriate, and use proper error handling otherwise.

        • zelphirkalt 4 days ago

          Why does `set -e` make a pipeline fail silently?

          • ykonstant 4 days ago

            `set -e` makes the script abort and is often used in lieu of proper error handing:

              set -e
              command
              command [fails]
              command
            
            Whether the above reports error or not depends on the command; when you have a pipeline failing in the above way, it is even sneakier:

              set -e
              command
              command | command | command [fails]
              command
            
            You are reliant on all commands in the pipeline being verbose about failure to signal error.

            None of the above is advisable. The advisable code is

              error_handler() { proper error handling; }
            
              command || error_handler "parameter"
              command || error_handler "parameter"
            
              { command | command | command; } || error_handler "parameter"
            
              {
              set -e
              exceptional section that needs to be bailed out
              set +e
              }
            
              command || error_handler "parameter"
            • skydhash 4 days ago

              Error handling like that makes sense if you’re writing a program. But if you just want a script for an automation, `set -e` is enough.

              • ykonstant 4 days ago

                It is not; Greg's wiki further explains why, if the silent failure problem above is not enough reason.

                • Joker_vD 4 days ago

                  Gee, imagine if shells with errexit option enabled wrote some diagnostic output to stderr before exiting. "Add your own error checking instead", how do I check which piece of pipeline has failed, exactly? The PIPESTATUS variable is bash-specific and was not standardized.

                  • ykonstant 4 days ago

                    ? Why are you replying to me? My position was pretty clear:

                    "Pipefail is useful and very hard to emulate on pure POSIX; you need to create named fifos, break the pipeline into individual redirections and check for error on each line.

                    And that is fine; but sometimes you want to treat a pipeline as a "single command" and then you can use pipefail to abort the pipeline on error. Then you can handle the error at the granularity of the entire pipeline without caring which part failed."

                    By the way, I never script in Bash; I only script in POSIX primitives using dash as my executable.

  • relistan 4 days ago

    The history at the beginning of this is not correct. Two examples: the assertion that there was one compatible UNIX prior to United States vs AT&T, the statement that GNU and BSD started that same year. Very, very off.

    • unixhero 4 days ago

      Okay, but you would add more value if you could also state what is the correct order if things.

  • pelorat 4 days ago

    TIL the POSIX standard is still updated. Does it still suffer from the issues that make Linux break POSIX compatibility in some areas because they consider it a flawed standard?

  • Flimm 4 days ago

    Yes! Finally! Let's treat filenames with new lines as errors! I'm so delighted with this decision.

    • skissane 4 days ago

      The original request was to ban all bytes between 1 and 31.

      https://www.austingroupbugs.net/view.php?id=251

      At some point they decided to narrow the change to just ban the newline character.

      Which I personally think is a pity. Allowing escape in file names is a security risk because it enables you to embed ECMA-48 escape sequences in file names. Secure terminal emulators shouldn’t be made vulnerable by arbitrary escape sequences, but there are “too smart for their own good” terminal emulators out there that have escape sequences that let you do crazy things like run arbitrary executables.

      • ezoe 4 days ago

        There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding. These values are used in the wild.

        I think the decision forbidding newline in pathname is also wrong. It may break tons of existing code.

        • skissane 4 days ago

          I wish Linux/etc had a mount option and/or superblock flag called “allow only sane file names”. And if you had that set, then attempting to create a file whose name wasn’t valid UTF-8, or which contained C0 or C1 controls, would fail. The small minority of people who really need pre-Unicode encodings such as ISO 2022 could just not turn that option on. And the majority who don’t need anything like that could reap the benefits of eliminating a whole category of potential bugs and vulnerabilities.

        • Joker_vD 4 days ago

          > There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding.

          Like what? I am genuinely curious: Shift-JIS, GB2312, Big5, and all of the EUC variants do not use bytes that correspond to C0 characters in ASCII.

    • devit 4 days ago

      That's obviously impossible since it would break backward compatibility and the users' existing filesystems (and the Linux kernel will rightly never accept anything like that).

      The only reasonable fix is to enhance bash and shell IDEs to track for each variable whether it could possibly include all filename-valid characters (e.g. if it comes from read with no options then it can't contain \n) and warn (off by default unless stderr is a terminal) if they can't and it's used as a filename (conservatively determined when used as arguments to processes), and also warn when using find without -print0, etc. noninteractively and perhaps interactively as well.

    • IshKebab 4 days ago

      Why is that an issue?

      • shakna 4 days ago

        Run a program to list a directory. Everything that interfaces with that, will assume newline delimiters. Similar assumptions are baked into a lot of software.

        Enforcing that a newline isn't part of a path, ensures the security of those systems that are commonly relied on.

        • oguz-ismail 4 days ago

          Except no one's enforcing anything yet. Earlier versions of POSIX allowed rejecting filenames containing newlines, the newest version encourages it while mandating features required to handle such filenames safely (find -print0, xargs -0, read -d ''). So nothing's set in stone yet.

        • IshKebab 4 days ago

          > Everything that interfaces with that, will assume newline delimiters.

          Well, only badly written programs. nushell handles this fine, as will any program that doesn't try to do everything as plain strings:

            ~> touch "foo\nbar"
            ~> ls foo* | print
            ╭───┬──────┬──────┬──────┬──────────╮
            │ # │ name │ type │ size │ modified │
            ├───┼──────┼──────┼──────┼──────────┤
            │ 0 │ foo  │ file │  0 B │ now      │
            │   │ bar  │      │      │          │
            ╰───┴──────┴──────┴──────┴──────────╯
          
          However after reading it they're only making them illegal for the posix utilities from the 70s that aren't written properly, so I think that makes sense.
    • enriquto 4 days ago

      Next: spaces

      • lifthrasiir 4 days ago

        Still much better than mojibaked names.

        • enriquto 4 days ago

          What do you mean?

          • _ZeD_ 4 days ago

            What is the encoding of the filenames?

            • Joker_vD 4 days ago

              I am personally not aware of any MBCS that could have a 0x20 or 0x0D as a valid trailing byte. Are you?

              • lifthrasiir 4 days ago

                I think my comment correctly contrasted mojibake from new lines or spaces for that reason.

  • quotemstr 4 days ago

    > We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now?

    Finally. Now let's do the rest: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

    Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

    • cryptonector 4 days ago

      > Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

      Ensuring normalization is hard. Where should you do it? There's only one good place: in the filesystem. But if you normalize on create then you'd better use the same form that everyone else uses, but, what's that? Input methods generally produce NFC, but there's no guarantee that they will not produce something else. HFS+ normalizes to NFD on create.

      ZFS uses form-insensitivity -- much like case-insensitivity, but for form. The reason ZFS went this was exactly that HFS+ and input methods differ as to forms. I pushed hard for this way back when. IMO form-insensitivity is the best way forward.

      But as for guaranteeing that filenames are UTF-8... that's much harder. The best thing to do is to not allow the use of non-UTF-8, non-ASCII, non-C locales -- not a guarantee, but pretty good.

      • quotemstr 3 days ago

        Sure. Form-insensitivity is another good option. I'd actually argue for full case insensitivity too (like macOS), although I realize that it's probably a stretch.

        • cryptonector 2 days ago

          Case-insensitivity is also an option in ZFS, but honestly case-insensitivity drives me nuts, especially if it's not case-preserving. Oh, that reminds me, ZFS is form-insensitive, and form-preserving.

  • BobbyTables2 3 days ago

    I really hate to say it, but the fretting about newlines used as delimiters after 50 years of misuse …

    … makes PowerShell start to look damn good.

  • donatj 4 days ago

    To build an internationalized shell script I'll need to compile multiple .mo language files and distribute them along side the script itself.

    For shell scripts part of a large system, that's probably fine. For small scripts, that's not very practical. You are not only adding a compilation step, you're also requiring distribution of multiple files. That's a pain.

    It just kind of kills the convenience of a simple shell script. I would probably end up writing a makefile to manage all of this and at that point I am only a hop skip and jump away from using a compiled language instead of shell.

  • somat 4 days ago

    Hopefully nothing, posix is, or at least it should be, a descriptive standard. This is why posix is so terrible, and why posix is so great.

    The way I feel posix, and other descriptive standards work best is when they describe what every one is already doing. This is opposed to prescriptive standards which try focus on how the "correct" way to do somthing, prescriptive standards tend to be over engineered and may or may not actually work.

    see also: descriptive and prescriptive dictionaries. http://www.englishplus.com/news/news1100.htm

    • Flimm 4 days ago

      Both prescriptive standards and descriptive standards have their uses. If POSIX is a prescriptive standard, then maybe another standard should exist that is descriptive.

      • lifthrasiir 4 days ago

        Keep in mind that the Web standard eventually became prescriptive because descriptive standards failed to catch up. Likewise it can be argued that descriptive standards for the common OS interface are no longer usable.

        • vacuity 4 days ago

          To be crass, description is only useful for existing things and prescription hinders making innovative things. I think social forces make it natural that standards are treated both descriptively and prescriptively, and that too leads to angst. Case in point, POSIX was once more descriptive, but then people wanted backwards compatibility for existing and new OSes, which made it more prescriptive. The takeaway is that ad-hoc things become permanent once they are too difficult to remove, and then people are sad. Nothing is immune, so just make reasonable attempts for the standard and the culture to harmonize for a specific purpose.

    • zelphirkalt 4 days ago

      That is also a way to never progress beyond the status quo.

  • 4 days ago
    [deleted]
  • oguz-ismail 4 days ago

    Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6

        SRC != ls *.c
    
    is fine in a makefile as far as POSIX is concerned, because:

    > Applications shall select target names from the set of characters consisting solely of slashes, hyphens, periods, underscores, digits, and alphabetics from the portable character set

  • InfiniteRand 4 days ago

    I kind-of would like to see a POSIX-strict profile which incorporates commonsense (by commonsense I mean avoiding things that repeatedly over many years have tripped up programmers in frustrating ways) things like no newline in file names. Operating systems (or distributions) or could opt into this profile, and then someone programming on such an operating system could rely on the constraints of the profile and additional facilities could be added on that might need to rely on those constraints. Hopefully, gradually the use of the profile would spread.

  • nh2 4 days ago

    > future editions will not require c17, but will simply require whatever C specification version is the most modern and already implemented by major toolchains

    Is this really good?

    If you can't rely on anything concrete being guaranteed, and it is open to interpretation what "modern" or "major toolchains" are, why have a standard?

  • rurban 3 days ago

    EILSEQ for \n finally, but why not for unicode confusables? Path names are identifiers, and as such need to be identifiable. Meaning stricter rules than just buffers (not talking about strings).

  • pabs3 4 days ago

    Since old-POSIX systems will be in use for some time, I wonder how many things will be able to switch to using the new capabilities. And how many OSes already support all of the new changes.

  • guerrilla 4 days ago

    Why was `isascii()` removed?

    (Listed in the Sortix article linked in OP.)

  • snvzz 4 days ago

    This is a surprisingly greedy POSIX update.

    • BoingBoomTschak 4 days ago

      As someone who truly limits himself to POSIX when he can, I think they needed to push it forward to not become completely obsolete. I'm really sad `mktemp -d` and `set -o nullglob` didn't make the cut, but that's how it is, I guess.

      • ykonstant 4 days ago

        A bespoke `mktempd` script is one of the first things I install in a new system. Fortunately, it is not too hard to make a `mktemp -d` compatible script with POSIX tools. `set -o nullglob` is another story :D

        • pxeger1 4 days ago

          It's quite hard to write mktemp securely[1]. It would be great if POSIX didn't make people attempt to do that error-prone task themselves.

          [1]: There's some explanation in this recent post: https://dotat.at/@/2024-10-22-tmp.html

          • ykonstant 4 days ago

            This is correct (though of course a decent `mktempd` script will deal with the listed problems or crash loudly on failure), and there are even more reasons to avoid /tmp.

            Unfortunately, it is one of the very few directories that are somewhat POSIX-"guaranteed" writable by a non-root user and the fact that on modern systems it is usually mounted on a tmpfs makes it very attractive for pure POSIX usage without rich array support.

            If you have mount permissions, of course, you should tell your `mktempd` to base its directory on a private tmpfs.

  • 4 days ago
    [deleted]
  • ggm 4 days ago

    File names with / in them

  • cryptonector 4 days ago

    > The problem is that pathnames2 (as per section 3.254 of POSIX 2024) are just strings (meaning they can contain any bytes except the NUL character), [...]

    Pathnames can neither contain NUL nor '/'.

    Re: `find -print0` / `xargs -0`:

    > Previous POSIX releases have considered -print0 before, but never ended up adopting it because using a null terminator meant that any utility that would need to process that output would need to have a new option to parse that type of output.

    What nonsense. Just add the `-0` or similar options as needed.

    > More precisely, this approach does not resolve our original problem. xargs(1p) can’t sort, and therefore we still have to handle that logic separately, unless sort(1p) also grows this support, even after read(1p). This problem continues with every other type of use-case. Importantly, it breaks the interoperability that POSIX was made to uphold.

    More nonsense.

    > A bunch of C functions3 are now encouraged to report EILSEQ if the last component of a pathname to a file they are to create contains a newline (put differently, they’re to error out instead of creating a filename that contains a newline).

    Ok, that's tolerable. Ditto utilities (notice here they were able to make a list of utilities).

    • chasil 4 days ago

      Note that GNU sort has...

      -z, --zero-terminated: end lines with 0 byte, not newline

  • EdSchouten 4 days ago

    strlcpy()!

  • oliver_jack 4 days ago

    [dead]

  • johnisgood 4 days ago

    > Anyway, POSIX 2024 now requires c17, and does not require c89

    I wish it would have been c99. What does c17 add exactly, more C++-esque complexity or not? Why was it not c99 (or perhaps even c11) over c17? Genuine questions.

    • lifthrasiir 4 days ago

      > What does c17 add exactly, more C++-ish bullshit or not?

      Multithreading support and such (atomics, thread-local storage and a guarantee that `errno` is in TLS), explicitly aligned types and allocations, dedicated types for strings known to be Unicode, _Noreturn, _Generic, _Static_assert, anonymous structs and unions in the nested position, quick_exit, timespec, exclusive mode ("x") in f[re]open, CMPLX macros.

      I'm not even sure which one can be C++-ish bullshit possibly except for about two points:

      - Multithreading does seem farfetched for causal users. In fact, I do think it could have been minimized without any actual harm, but multithreading itself needed to be specified because it greatly affects a memory model. (Before C11, C had no thread-aware memory model and different threading implementations were subtly different beyond what the standard stated.) Even JavaScript, originally without no notion of threads, eventually got a thread-aware memory model due to shared web workers. But that never meant JS itself need multithreading support in its standard library, and C could have done the same.

      - `_Generic` is even more debatable, though I believe it was the only way forward when we accept <tgmath.h>, which is known to be a response to Fortran (other responses include `restrict`) and was impossible to implement in the portable manner before C11. As long as it retains its scary underline and title case, I guess it's fine.

      • gpderetta 4 days ago

        Most importantly posix already has existing multithreading facilities in posix threads, so it is imperative that they are reformulated in term of the C++11/C11 memory model.

      • johnisgood 4 days ago

        You quoted me before my edit, but fair enough. I do like the "atomics" support.

        > "guarantee that `errno` is in TLS"

        I suppose that does not mean that I can just avoid setting errno to 0 before calling a function after which I check for errno, right?

        Yeah, I do have an issue with stuff like "_Generic" but I assume I can just simply not use it.

        What is "quick_exit" exactly and what does it solve?

        As for multithreading, I stick to phtread. Is any of the new features a replacement for that or what?

        At any rate, why C17 over C11 then?

        • lifthrasiir 4 days ago

          C17 is a bugfix version of C11 (the next major revision would be C23). The exact list of fixes is available in [1]. Mandating C11 instead of C17 when both are available seems not really useful now.

          You have the correct insight about errnos. The new guarantee only means that other threads are not possible to mess with your errnos, but cleaning errnos will be still useful within an individual thread.

          exit is not guaranteed to work correctly when called simultaneously from multipe threads, while quick_exit will be okay even in that situation. I think this behavior was not even specified before C11, and only specified after observing existing implementations.

          It is expected that libc threading routines are thin wrappers around pthread in Linux. That's why I do think it can be minimized; the only actual problem before C11 was the lack of thread-aware memory model. No need to actually be able to create threads from libc to be honest, especially given that each platform now almost always has a single dominant threading implementation like pthread.

          [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm

          • johnisgood 4 days ago

            My last question would be: is it "OK" to use phtread in my code or are there any alternatives (i.e. "best way") when using C17?

            • lifthrasiir 4 days ago

              No, just use pthread. There are some useful pthread APIs missing from C17 anyway too.

              • johnisgood 4 days ago

                Thank you for your answers, it is much appreciated.

                I suppose I will not use "quick_exit" either in that case, I have many workers, there is a job queue mutex, along with phtread_cond_wait and phtread_mutex_{lock,unlock} and when the "job_quit_flag" is set to true, that means all jobs are done and I am ready to return NULL.

      • cryptonector 4 days ago

        > guarantee that `errno` is in TLS

        I mean, that is already true.

        • lifthrasiir 3 days ago

          There is no such guarantee in C99:

          7.5 ¶2: [...] and `errno` which expands to a modifiable lvalue that has type `int`, the value of which is set to a positive error number by several library functions. It is unspecified whether `errno` is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual object, or a program defines an identifier with the name `errno`, the behavior is undefined.

          7.5 ¶3: The value of `errno` is zero at program startup, but is never set to zero by any library function. The value of `errno` may be set to nonzero by a library function call whether or not there is an error, provided the use of `errno` is not documented in the description of the function in this International Standard.

          The fact that `errno` can expand to an lvalue does reflect what is required for multithreading implementations among others, but that's about all.

          • cryptonector 3 days ago

            Nor is it in POSIX, but it's true of all POSIX-like systems that support threading.