What's New in POSIX 2024

(blog.toast.cafe)

237 points | by signa11 9 months ago ago

185 comments

  • enriquto 9 months ago

    > We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now? Ok, that might be a bit much all at once. We’re heading there though!

    Oh my god. This makes me so happy. This is the most lovely think I've read in the world of computing since the unix gods decided that newlines were to be a single character.

    The philosophy underlying the sentence "Wouldn’t it be nice if the naive scripts were just correct now?" is incredibly positive. We are surrounded by arrogant jerks who break old code by aggressively enforcing stricter compliance of some stupid rules. But here come these posix heros who do the exact opposite: make old code correct! There is hope in mankind after all.

    • nneonneo 9 months ago

      Rather unfortunately, I happen to have a handful of files on my machine with newlines in them (the filenames were programmatically generated from a summary of their contents). I loathe the possibility that my shell tools are going to suddenly crash when confronted with these weird files, rather than just producing some slightly silly output. I wish we'd standardized the behaviour of just escaping such characters as `\n/\r` or `^J/^M`...

      • jraph 9 months ago

        They did the right thing for this: make the tools fail on file creation, but not on existing files.

        I guess it's still advisable to rename those files, I don't know how things like cp, mv or rsync will behave when copying such files in the future.

        • nneonneo 9 months ago

          No, they did not do the right thing:

          > the following utilities are now either encouraged to error out if they are to create a filename that contains a newline, and/or encouraged to error out if they are *about to print a pathname that contains a newline* in a context where newlines may be used as a separator

          It then proceeds to list a bunch of utilities including diff, file, find, grep, head, du, etc., none of which create files directly.

          These utilities could be updated to reject newlines in file paths if they're going to print in a "newline delimited" form - but for some of these utilities, that's the only available form.

          • jraph 9 months ago

            > error out if they are about to print a pathname that contains a newline in a context where newlines may be used as a separator

            But that's already broken. This is a situation where filenames with newlines in them are indistinguishable from two filenames in outputs. So instead of producing subtly broken output, tools are encouraged (not forced) to explicitly fail with a lot of noise.

            The "in a context where newlines may be used as a separator" part of this sentence is very important.

            IIUC the tools are still allowed to succeed in non broken situations, for instance when a null separator is used and not a new line character. And I can't imagine the tools you listed will start breaking in situations that worked (apart from file creation - indeed this will likely start breaking, and new line characters in filename needs to be considered deprecated and things using them to be fixed).

            This is strictly better IMHO (if one thinks that newlines in files are not worth the troubles given how things work in POSIX, especially the part where things are line-based and new line characters have quite some significance)

          • 9 months ago
            [deleted]
        • ykonstant 9 months ago

          If your file system allows them, be careful with symlinks though!

          • jraph 9 months ago

            Why, specifically?

            I'm convinced we will need to be careful with symbolic links related to new line characters in filenames, but I'm curious of which specific aspect you had in mind.

            • ykonstant 9 months ago

              Oh, nothing specific to newlines. Just, when you rename files to fix newlines, you need to check if they break symlinks pointing to them.

              For instance, I had project folders for my individual research projects. In order to have a central repository of resources and not have copies of multi-megabyte pdfs in each folder, I put all referenced papers in a single directory and symlinked them for each project that needed them. Later, I wanted to rename the papers to remove newlines. The symlinks complicated this process quite a bit!

      • ykonstant 9 months ago

        In academia, I get (and used to create) pdfs with names like:

        "On the number of

        associative foobars

        of degree blah -

        Johnson and Anderson.pdf"

        all the time. It is very convenient for non-technical academics to have a descriptive file name, and to be able to see it entirely in the navigator they use newlines.

        • oneeyedpigeon 9 months ago

          Oh god. I already get upset enough by spaces in a file name, although I realise that fight is basically lost now!

          • curt15 9 months ago

            Didn't Windows name "Program Files" with a space to force application developers to handle spaces in paths properly?

            • mmcdermott 9 months ago

              For the longest time you could get away with this in cmd:

              > dir c:\progra~1

              So if forcing people to handle spaces was the goal, it took a long time to force it.

            • pjmlp 9 months ago

              In theory yes, in practice to this day many people don't bother how to learn how to deal with pathnames in a proper way.

            • ape4 9 months ago

              Not to mention C:\Program Files (x86)

            • InfiniteRand 9 months ago

              I just got used to installing things I need to interact with in a program into a folder named C:\workspace

          • enriquto 9 months ago

            As a fellow spaces-in-filenames-hater, the fight is not lost. We are on the brink of winning it; it's just a mount option away!

            While we cannot avoid that people hit the spacebar when writing a filename on a gui, this does not mean at all that the resulting filename itself need contain a plain space character. Those spaces can and should be transparently translated to non-breaking space characters at some point. Maybe by the gui itself, or more robustly by the filesystem. This would make everybody happy: gui users and naive shell script writers.

            • poincaredisk 9 months ago

              >Those spaces can and should be transparently translated to non-breaking space characters at some point

              Why? This just introduces more complexity and interoperability headaches for seemingly no reason.

            • chasil 9 months ago

              It would be really nice if there was a mount option that would quietly remove spaces in filenames, or convert them to an underscore.

              If I had it, I would use it today.

            • vbezhenar 9 months ago

              Yep, works today:

                  sh-3.2$ f='Hello world'
                  sh-3.2$ echo $f
                  Hello world
                  sh-3.2$ for i in $f; do echo $i; done
                  Hello
                  world
              
                  sh-3.2$ f='Hello\xC2\xA0world'
                  sh-3.2$ echo $f
                  Hello world
                  sh-3.2$ for i in $f; do echo $i; done
                  Hello world
          • jonhohle 9 months ago

            Convince (force?) your team to use make and soon everyone will forget spaces in file names are even a thing!

            • taneliv 9 months ago

              My team already uses `make` but there's no reason for me to run it in my Downloads folders. File names in there are sometimes wild. Yet I expect command line tools to work with them. If they will cease to do so, I will have to start using non-POSIX variants of those tools, I guess.

        • jodrellblank 9 months ago

          I don't know who "the Austin Group" mentioned in the article are, but how come they "could not find a single use-case for newlines in pathnames besides breaking naive scripts" when legitimate use-cases are so easy to find?

          (And if they're that incompetent, why does the article imply they are worth quoting and listening to?)

          • gpderetta 9 months ago

            It is [1] the joint working group that for the last 25+ years has been responsible for both the POSIX standard and the Single Unix Specification. It emerged after the UNIX wars as a consolidation of the various splintered UNIX standardization efforts (POSIX itself, X/OPEN, OSF, etc).

            [1] https://en.wikipedia.org/wiki/Austin_Group

          • nilamo 9 months ago

            Is that legitimate? A path name is just a unique identifier for a file, IMO it doesn't make sense to put a whole novel in there. If anything, a giant summary like that should be in the meta tags?

            • jodrellblank 9 months ago

              In what way is it not legitimate? It's not an accident, bug or data corruption. Someone put it there for a reason, and it benefits their use case. That's as legitimate as it gets.

            • ykonstant 9 months ago

              That's a core part of the problem: a path name is NOT just a unique identifier for a file. Desktop operating systems and their classical utilities conflate the "unique identifier" and whatever "displayed title" of a file though which the end user interacts with the file.

              Users care about "titles" or "summaries" of files, not "filesystem identifiers"; as long as the two are conflated, non-technical users will use the identifier to write titles and thus make the file easy to locate in an interactive GUI. Meta tags are not even in the cognitive horizon of most people.

          • shthed 8 months ago

            Name one legit use case.

            • jodrellblank 8 months ago

              ... the use case in the parent comment I was replying to.

              And no I'm not going to copy that here for you to quip "that's not a legitimate use case". Make an effort to make a point and support it with better justification than "because I said so".

        • Flimm 9 months ago

          How do these non-technical academics even create a PDF file with a name like that?

          • ykonstant 9 months ago

            Right click, rename, enter, enter, enter (until the entire file name is visible on the box)? That's how I did it when I used Windows.

            Edit: now I remember the most basic way: open the pdf, select and copy the title, click on rename and paste from clipboard. Works great to get the file name with the newlines exactly as they are on the title!

            • zelphirkalt 9 months ago

              Doesn't <enter> just confirm the typed input for the filename and finish the renaming? How does that insert newlines?

            • abenga 9 months ago

              I don't know if this is a Linux thing, but when renaming a file, when I press enter, I apply the new name, the file manager doesn't add a newline.

            • 9 months ago
              [deleted]
        • ykonstant 9 months ago

          I am interested in hearing the rationale for downvotes explicitly. I am describing a reality that exists and must be taken into account. Why are you downvoting?

      • nasretdinov 9 months ago

        The thing is, it's hard to predict what would happen to those scripts regardless... E.g. try naming your files "-rf" and see how many things break :)

        • ykonstant 9 months ago

          A correct script will have no problems with "-rf" or any other file name. I have (and recommend script writers make their own) a directory hierarchy of "dangerous" file names to test scripts.

          For example, it contains a directory where all file and subdirectory names are in unary, consisting only of repetitions of the newline character. A correct script should be able to enumerate, access and modify files in there without issue.

        • redserk 9 months ago

          If one really wanted to embrace chaos, introduce this as a new team file naming standard for "risk finding" files ;)

        • tetha 9 months ago

          I do enjoy "ls *; touch -- -lisah; ls *" as a fun little brainteaser for those uninitiated to this behavior.

        • nneonneo 9 months ago

          export TMPDIR=" / "

          to surprise the next person or script to do "rm -rf $TMPDIR/foo"...

      • funcDropShadow 9 months ago

        Of course, there is an xkcd for that: https://xkcd.com/1172/

      • stouset 9 months ago

        Dude, just fix the filenames.

    • anal_reactor 9 months ago

      It's a bandaid on a wider problem: the design of Unix shell is bonkers and the whole thing should be deleted. Why? Because I haven't seen any other tool ever have so many pitfalls. Take n random languages and m random developers and tell them to loop over a string array and print its contents, and count how many correct programs you get on average per language. There will be easy languages, then difficult languages, then a huge gap, then Unix shell because in your random sample you managed to get one guy who has PhD in bash.

      • vbezhenar 9 months ago

        The main problem is using text as a common format between different applications.

        First: text is not well defined. Is it ASCII? Is it UTF-8? Some programs can spew UTF-32 with proper locale configured, it's a mess.

        Second: encoding and decoding of objects to text is not defined at all. Those problems with filenames is just one example. Using newline as a separator is a natural thing that is easy to implement, yet it is wrong.

        In my opinion two things should be done:

        1. Standardise on UTF-8. No other encodings allowed.

        2. Standardise on JSON. It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

        So any utility must read and write JSON objects with some standard env set. And shells can be developed with better syntax to deal with JSON. This way you can write something like

        `ps aux | while read row; do echo ${row.user} ${row.pid}; done`

        • poincaredisk 9 months ago

          >It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

          Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.

        • aloisklink 9 months ago

          POSIX does actually define what a "text file" is, but the definition is a bit unusual:

          See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

          > 3.387 Text File

          > A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

          So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

          (and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

          But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

          • WJW 9 months ago

            Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?

            • Ukv 9 months ago

              POSIX defines a line as:

              > 3.185 Line

              > A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

              So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

            • rascul 9 months ago

              An empty file is not hard to make. It's just a matter of creating the file and not writing to it.

        • arghwhat 9 months ago

          What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

          You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.

          • vbezhenar 9 months ago

            I think that I hit that with Java:

                % java -Dfile.encoding=UTF-32 Test | hexdump -C
                00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
                00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
                00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
                00000030  00 00 00 0a                                       |....|
                00000034
            
            
            From quick googling it seems that glibc does not support it, so it should not happen.
            • Netch 8 months ago

              > it seems that glibc does not support it

              `iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...

          • Netch 8 months ago

            > That can only be a bug - UTF-32/UCS-4 never saw external use

            I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

            And what is a real reason to disallow this?

        • ezoe 9 months ago

          Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.

        • oneeyedpigeon 9 months ago

          I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.

        • anal_reactor 9 months ago

          True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?

        • nly 9 months ago

          The primary purpose of command line program output is to convey information to a human, not to other programs.

          Command line scripting is supposed to be adhoc and hack.

          • consteval 9 months ago

            There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.

          • mdavid626 9 months ago

            I disagree that it supposed to be adhoc and hack. Look at PowerShell.

          • anthk 9 months ago

            That under limited OSes such as DOS. Under Unix, piping has been the philosophy.

        • matrss 9 months ago

          JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

          But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?

        • pif 9 months ago

          > The main problem is using text as a common format between different applications.

          If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

          That idea is what made it possible for you to type that sentence in the first place.

      • akira2501 9 months ago

        > I haven't seen any other tool ever have so many pitfalls.

        I haven't seen any other tool with so much general utility and availability.

        > to loop over a string array and print its contents

        Is incredibly easy in bash and bash like shells. As highlighted the issue is that tools like 'ls' don't create "a string array." They create one giant string that has to be parsed. The rules in the shell are different than in other languages but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

        This is a fine tradeoff. As evidenced by it's wide usage and lack of convincing replacements.

        • anal_reactor 9 months ago

          > I haven't seen any other tool with so much general utility and availability.

          > availability

          That's the real reason why we use Unix shell. It's not good, but it's available. Like a cheap hooker.

          > but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

          "It mostly works if you're careful" doesn't sound very convincing to me.

          • stephenr 9 months ago

            > but it's available. Like a cheap hooker.

            Username checks out.

          • akira2501 9 months ago

            > "It mostly works if you're careful" doesn't sound very convincing to me.

            Would you rather write your own parser?

      • blueflow 9 months ago

        Someone needs to come up with a interactive shell first, one that is comparable in usability. Then we can think about replacing the unix shell.

        I tried both python and lua interactively, but they are a pain when it comes to handling files. You have to type much more to get the same things done.

        • anal_reactor 9 months ago

          The bigger issue is the sheer momentum of Unix shell. Even if you come up with an alternative that is better by every objectively measurable metric, it's still going to be a monumental task to have it packages with commonly used distros. Kinda like the "why can't the US switch to the metric system" problem.

          • blueflow 9 months ago

            People already use different shells, mksh, fish, and so on. With fish there is a non-posix shell in wide use.

            • oguz-ismail 9 months ago

              >wide use

              Five people around the globe isn't wide use.

          • azalemeth 9 months ago

            There's a direct cost in money, time and lives that has come from the US's adherence to their US Customary Units (which are often different to the old imperial units). People have literally died because of the confusion caused by having multiple systems of units in common use with ambiguous names (degrees, gallons, etc). Each year industry worldwide spends an enormous amount of money indirectly precisely because of this problem and it's still incredibly unlikely to be fixed within my lifetime.

            Bash-alternatives that are not completely compatible frankly just don't have a chance.

          • Netch 8 months ago

            OK let them add an explicit check to standard tools, and/or to open(), mkdir(), etc. with O_PORTABLECHARS. And an environment option to disable this check.

            Why they force the restriction at syscall level?

          • stephenr 9 months ago

            If it isn't distributed out of the box with every nix-like OS, it inherently isn't* “better by every objectively measurable metric" - distribution of a common, stable standard is a huge benefit in and of itself.

        • throw16180339 9 months ago

          I certainly have my complaints about Powershell, but it's got pretty good coverage, decent documentation, and cross platform support.

          • felixgallo 9 months ago

            if it weren't so irregular, inconsistent, spotty and tasteless, it'd be a great option.

        • nly 9 months ago

          Oil shell?

          https://www.oilshell.org/

          Compatible with most bash scripts

      • throwaway19972 9 months ago

        > the design of Unix shell is bonkers

        Compared to what?

        • mdavid626 9 months ago

          Powershell?

          • poincaredisk 9 months ago

            PowerShell designer could learn from decades of programming language progress and especially shell usage. They could improve many aspects indeed. This doesn't mean that the original design is "bonkers", only that it's not perfect.

          • oguz-ismail 9 months ago

            Verbosity is a huge problem there

          • ggm 9 months ago

            FoundTheCamelCaseConvert.

            My God next you will say getopt() --longform is the bestest

      • dailykoder 9 months ago

        Works on my machine!

      • dangsux 9 months ago

        [dead]

      • zelphirkalt 9 months ago

        This should not be as downvoted as it is. In a way shell is broken. The brokenness is in that it requires each command to serialize and deserialize again, considering all the weird things that can happen with the "all is a string" kind of approach, instead of having a proper data interchange format or even sending objects to next steps in the pipeline. This behavior is what necessitates even thinking about the changes listed in the post. We wouldn't even have that problem, if the design of shell was better thought out. Now we are dealing with decades of legacy built on these shaky foundations. I hate to admit it, but seems at least this aspect Powershell got right, whatever one may think about the rest of it.

        • chasil 9 months ago

          On my rhel7 system, the Debian dash shell is this large:

             $ ll /bin/dash
            -rwxr-xr-x. 1 root root 113536 Nov  5  2018 /bin/dash
          
          I happen to have an old powershell installed:

            $ rpm -qi powershell | grep Size
            Size        : 126588370
          
          A strict POSIX shell is always going to be vastly smaller, for many reasons.

          I would prefer that the POSIX shell was an LR-parsed language, but you can't have everything.

      • enriquto 9 months ago

        > loop over a string array

        Dear anal_reactor, what is a "string array"? I have used unix shells since nearly 30 years and never heard about them. And I consider myself a script-fu master!

        There are two array-like constructions in the shell: list of words (separated by spaces) and list of lines (separated by newlines). Both cases are implemented as a single string, and the shell makes it trivial to iterate through its components.

        • ManBeardPc 9 months ago

          That is exactly the problem many people have with it. Encoding „arrays“ this way is foreign to everyone who comes from „normal“ programming languages. Both variants lead to problems because either character can occur in elements, worst case scenario they contain both at the same time. I can see why this leads to confusion and bugs.

          • skydhash 9 months ago

            It’s like people saying they won’t learn French because it has a different grammatical structure. There’s no “normal” natural language. If you’re used to the C-like syntax, learning C-like language will be easy. But that’s not an argument to say Lisp is confusing.

    • account42 9 months ago

      Their proposed solution is not compatible with reality though where POSIX does not get to define what kind of files exist on filesystems you need to work with.

      All they did is introduce new error cases in C programs while not actually fixing anything for shell scripts.

      If anything, it's going to result in more exploits as people write shell scripts with the assumption that newlines cannot appear in filenames.

      • quotemstr 9 months ago

        In the real world, nobody writes shell scripts that handle newlines in filenames.

        • account42 9 months ago

          I do. Single files are handled with quotes around arguments just fine. For lists of files you need to use NUL as a separator. That's not really hard to do once you are aware of the problem but ergonomics could be better - which is something useful that POSIX could change.

          • quotemstr 8 months ago

            Most shell scripts that correctly handle newlines do so by accident.

    • zokier 9 months ago

      But they did not make old code correct. Filenames are still allowed to contain newlines. Shell scripts still need to be prepared to deal with that. Nothing really changed, they just added a feel-good half-measure.

      • quotemstr 9 months ago

        It's a step in the right direction. You have to understand that for decades a vocal group of Unix die-hards has opposed any limitations whatsoever on the bytewise content of file names. The newline restriction in this latest version of POSIX may be modest, but it represents a dam breaking. When (obviously) the sky doesn't fall, the next version of POSIX will have a lot more filename cleanup.

        • rixed 9 months ago

          Next step is to forbid newlines from file content itself, to fix conforment json parsers ?

      • janderland 9 months ago

        This is pretty standard for a human run system. Gotta make the human feel good about an idea before they can do said idea.

        If you’re not familiar with humans, there are several manuals available online.

    • ezoe 9 months ago

      Don't assume UTF-8 is the only character encoding used in the wild. There are character encoding with leading bytes not easily detectable like UTF-8.

      • arghwhat 9 months ago

        In 2024, if you don't get the correct result decoding a text as UTF-8, the bug is the text, not the decoding. And luckily, adoption of UTF-8 in the past 30+ years have gone will enough that you don't need to worry.

        Caveats for cursed hardware standards demanding two-byte encodings like USB.

        • poincaredisk 9 months ago

          I hope you're happy in your ivory tower, but I personally work with a lot of files with other encoding, most often that weird utf16 (Windows), sometimes also legacy files with different ANSI encoding. Declaring "my decoder is fine, it's the text that is buggy" is not going to score a lot of points with my boss and clients.

          • arghwhat 9 months ago

            The only valid reason for still having files stored in legacy ANSI encodings is that their only use is input to software that has not been maintained for ~30 years and cannot be updated. That's fine because they're just binary inputs in a closed ecosystem that no one touches.

            But if they are supposed to be treated as text, then yes it's the text that's buggy - they should just be converted to UTF-8 once and have the originals thrown away.

            UTF-16 is something that Microsoft has cursed us with by inserting it into specifications (like USB) so that we cannot get rid of it, even if it never made any sense what so ever. But those are in effect explicit protocols with a hard contract, very different from something where you would "assume an encoding".

          • zelphirkalt 9 months ago

            Shouldn't hurt to tell clients to right their weird proprietary software originated encodings though.

        • 1oooqooq 9 months ago

          why people assume utf8 had only know locale encoding still?

          you're probably guilty of the sin you preach and is showing wrongly decoded utf8 and don't even know.

    • hwc 9 months ago

      Now do that with all whitespace!

  • chasil 9 months ago

      - find(1p) now supports -print0
      - xargs(1p) now supports the -0 argument
      - newlines in filenames now should throw errors in many utilities
      - a complier implementing the c17 standard is now required
      - ulimit is expanded
      - renice can use relative values
      - a timeout utility has been added
      - make adds support for $^ $+ ::= :::= != ?= +=
      - logger is improved
      - gettext is adopted
      - readlink and realpath are adopted
      - rm now supports -d to remove empty directories and -v for verbose
      - various improvements to printf, sed, test
    • greyw 9 months ago

      Looks like the BSD-family will have some implementing to do.

      • chasil 9 months ago

        I just booted OpenBSD 7.0 (which is a bit dated).

        The find utility has print0, and xargs has -0. Notibly, xargs also has -P for running processes in parallel.

        rm has both -d and -v.

        The renice command appears to be able to use relative adjustments with -n.

        There is a timeout command.

        There is a readlink command, but no realpath (but a manual page exists for it as a system call).

      • sneed_chucker 9 months ago

        Strict adherence to POSIX isn't a goal of any of the current BSDs is it?

        • washbear 8 months ago

          People get "POSIX compliance" confused with "Unix certification". The first is an API you implement, the second is a rubber stamp.

          All active Unix-like operating systems aim to implement the new interfaces as they're defined.

        • bryanlarsen 9 months ago

          I'm confident they'd accept patches.

        • saagarjha 8 months ago

          macOS

  • imrejonk 9 months ago

    This adds `set -o pipefail` to POSIX sh, which causes a whole pipeline to fail (non-zero exit code) if one or more of the commands in the pipeline fail.

    • deskr 9 months ago

      If you're writing scripts, use that and don't forget -e and -u

        -e      Exit  immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status
      
        -u      Treat  unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion
      • ykonstant 9 months ago

        For `set -u` I mostly agree. For `set -e` see my comment below and Greg's wiki: http://mywiki.wooledge.org/BashFAQ/105

        • deskr 9 months ago

          > and they still fail to catch even some remarkably simple cases

          I totally agree. Although I'd say that there isn't anything "remarkably simple" about writing a bash script. Anything in the shell scripting world that seems remarkably simple is just because one hasn't realised the ghosts and horrors that lurk in the shadows.

          But I'll use -e anytime. It feels like having a protective proton pack at least.

      • 9 months ago
        [deleted]
    • zelphirkalt 9 months ago

      Does it? It is not mentioned anywhere in the post. Can you post a reference to your source?

    • akdor1154 9 months ago

      Holy balls that's like Christmas!

    • rightbyte 9 months ago

      Really? Wont that break piping grep?

      • WJW 9 months ago

        Probably, so don't `set -o pipefail` in scripts that pipe into grep.

        • rightbyte 9 months ago

          Ah ok I read it as 'sets it by default' for some reason.

    • throwaway984393 9 months ago

      Sad. Use of that option is almost always a mistake. It only leads to undebuggable silent failures.

      • Joker_vD 9 months ago

        I'd rather both have this option and have it work reliably. It's ridiculous that

            export VAR=$(cmd1 | cmd2)
        
        does not count as a pipefail when cmd1 or cmd2 fail but

            VAR=$(cmd1 | cmd2)
        
        does, so the "correct" way to set an environment variable from a pipeline's output is actually

            VAR=$(cmd1 | cmd2)
            export VAR
      • ykonstant 9 months ago

        Pipefail is useful and very hard to emulate on pure POSIX; you need to create named fifos, break the pipeline into individual redirections and check for error on each line.

        And that is fine; but sometimes you want to treat a pipeline as a "single command" and then you can use pipefail to abort the pipeline on error. Then you can handle the error at the granularity of the entire pipeline without caring which part failed.

        Lastly, I am confused as to the "silent" failures; maybe you are thinking of combining this with `set -e`? Then yes, that is bad and I recommend against the combination; but then again, I and most advanced scripters recommend against shotgunning `set -e` in the first place. Use it in specific portions of the script when appropriate, and use proper error handling otherwise.

        • zelphirkalt 9 months ago

          Why does `set -e` make a pipeline fail silently?

          • ykonstant 9 months ago

            `set -e` makes the script abort and is often used in lieu of proper error handing:

              set -e
              command
              command [fails]
              command
            
            Whether the above reports error or not depends on the command; when you have a pipeline failing in the above way, it is even sneakier:

              set -e
              command
              command | command | command [fails]
              command
            
            You are reliant on all commands in the pipeline being verbose about failure to signal error.

            None of the above is advisable. The advisable code is

              error_handler() { proper error handling; }
            
              command || error_handler "parameter"
              command || error_handler "parameter"
            
              { command | command | command; } || error_handler "parameter"
            
              {
              set -e
              exceptional section that needs to be bailed out
              set +e
              }
            
              command || error_handler "parameter"
  • relistan 9 months ago

    The history at the beginning of this is not correct. Two examples: the assertion that there was one compatible UNIX prior to United States vs AT&T, the statement that GNU and BSD started that same year. Very, very off.

    • unixhero 9 months ago

      Okay, but you would add more value if you could also state what is the correct order if things.

  • pelorat 9 months ago

    TIL the POSIX standard is still updated. Does it still suffer from the issues that make Linux break POSIX compatibility in some areas because they consider it a flawed standard?

  • Flimm 9 months ago

    Yes! Finally! Let's treat filenames with new lines as errors! I'm so delighted with this decision.

    • skissane 9 months ago

      The original request was to ban all bytes between 1 and 31.

      https://www.austingroupbugs.net/view.php?id=251

      At some point they decided to narrow the change to just ban the newline character.

      Which I personally think is a pity. Allowing escape in file names is a security risk because it enables you to embed ECMA-48 escape sequences in file names. Secure terminal emulators shouldn’t be made vulnerable by arbitrary escape sequences, but there are “too smart for their own good” terminal emulators out there that have escape sequences that let you do crazy things like run arbitrary executables.

      • ezoe 9 months ago

        There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding. These values are used in the wild.

        I think the decision forbidding newline in pathname is also wrong. It may break tons of existing code.

        • skissane 9 months ago

          I wish Linux/etc had a mount option and/or superblock flag called “allow only sane file names”. And if you had that set, then attempting to create a file whose name wasn’t valid UTF-8, or which contained C0 or C1 controls, would fail. The small minority of people who really need pre-Unicode encodings such as ISO 2022 could just not turn that option on. And the majority who don’t need anything like that could reap the benefits of eliminating a whole category of potential bugs and vulnerabilities.

        • Joker_vD 9 months ago

          > There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding.

          Like what? I am genuinely curious: Shift-JIS, GB2312, Big5, and all of the EUC variants do not use bytes that correspond to C0 characters in ASCII.

    • devit 9 months ago

      That's obviously impossible since it would break backward compatibility and the users' existing filesystems (and the Linux kernel will rightly never accept anything like that).

      The only reasonable fix is to enhance bash and shell IDEs to track for each variable whether it could possibly include all filename-valid characters (e.g. if it comes from read with no options then it can't contain \n) and warn (off by default unless stderr is a terminal) if they can't and it's used as a filename (conservatively determined when used as arguments to processes), and also warn when using find without -print0, etc. noninteractively and perhaps interactively as well.

    • IshKebab 9 months ago

      Why is that an issue?

      • shakna 9 months ago

        Run a program to list a directory. Everything that interfaces with that, will assume newline delimiters. Similar assumptions are baked into a lot of software.

        Enforcing that a newline isn't part of a path, ensures the security of those systems that are commonly relied on.

        • oguz-ismail 9 months ago

          Except no one's enforcing anything yet. Earlier versions of POSIX allowed rejecting filenames containing newlines, the newest version encourages it while mandating features required to handle such filenames safely (find -print0, xargs -0, read -d ''). So nothing's set in stone yet.

        • IshKebab 9 months ago

          > Everything that interfaces with that, will assume newline delimiters.

          Well, only badly written programs. nushell handles this fine, as will any program that doesn't try to do everything as plain strings:

            ~> touch "foo\nbar"
            ~> ls foo* | print
            ╭───┬──────┬──────┬──────┬──────────╮
            │ # │ name │ type │ size │ modified │
            ├───┼──────┼──────┼──────┼──────────┤
            │ 0 │ foo  │ file │  0 B │ now      │
            │   │ bar  │      │      │          │
            ╰───┴──────┴──────┴──────┴──────────╯
          
          However after reading it they're only making them illegal for the posix utilities from the 70s that aren't written properly, so I think that makes sense.
    • enriquto 9 months ago

      Next: spaces

      • lifthrasiir 9 months ago

        Still much better than mojibaked names.

        • enriquto 9 months ago

          What do you mean?

          • _ZeD_ 9 months ago

            What is the encoding of the filenames?

  • quotemstr 9 months ago

    > We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now?

    Finally. Now let's do the rest: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

    Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

    • cryptonector 9 months ago

      > Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

      Ensuring normalization is hard. Where should you do it? There's only one good place: in the filesystem. But if you normalize on create then you'd better use the same form that everyone else uses, but, what's that? Input methods generally produce NFC, but there's no guarantee that they will not produce something else. HFS+ normalizes to NFD on create.

      ZFS uses form-insensitivity -- much like case-insensitivity, but for form. The reason ZFS went this was exactly that HFS+ and input methods differ as to forms. I pushed hard for this way back when. IMO form-insensitivity is the best way forward.

      But as for guaranteeing that filenames are UTF-8... that's much harder. The best thing to do is to not allow the use of non-UTF-8, non-ASCII, non-C locales -- not a guarantee, but pretty good.

      • quotemstr 8 months ago

        Sure. Form-insensitivity is another good option. I'd actually argue for full case insensitivity too (like macOS), although I realize that it's probably a stretch.

        • cryptonector 9 months ago

          Case-insensitivity is also an option in ZFS, but honestly case-insensitivity drives me nuts, especially if it's not case-preserving. Oh, that reminds me, ZFS is form-insensitive, and form-preserving.

  • oguz-ismail 9 months ago

    Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6

        SRC != ls *.c
    
    is fine in a makefile as far as POSIX is concerned, because:

    > Applications shall select target names from the set of characters consisting solely of slashes, hyphens, periods, underscores, digits, and alphabetics from the portable character set

  • BobbyTables2 9 months ago

    I really hate to say it, but the fretting about newlines used as delimiters after 50 years of misuse …

    … makes PowerShell start to look damn good.

  • somat 9 months ago

    Hopefully nothing, posix is, or at least it should be, a descriptive standard. This is why posix is so terrible, and why posix is so great.

    The way I feel posix, and other descriptive standards work best is when they describe what every one is already doing. This is opposed to prescriptive standards which try focus on how the "correct" way to do somthing, prescriptive standards tend to be over engineered and may or may not actually work.

    see also: descriptive and prescriptive dictionaries. http://www.englishplus.com/news/news1100.htm

    • Flimm 9 months ago

      Both prescriptive standards and descriptive standards have their uses. If POSIX is a prescriptive standard, then maybe another standard should exist that is descriptive.

      • lifthrasiir 9 months ago

        Keep in mind that the Web standard eventually became prescriptive because descriptive standards failed to catch up. Likewise it can be argued that descriptive standards for the common OS interface are no longer usable.

        • vacuity 9 months ago

          To be crass, description is only useful for existing things and prescription hinders making innovative things. I think social forces make it natural that standards are treated both descriptively and prescriptively, and that too leads to angst. Case in point, POSIX was once more descriptive, but then people wanted backwards compatibility for existing and new OSes, which made it more prescriptive. The takeaway is that ad-hoc things become permanent once they are too difficult to remove, and then people are sad. Nothing is immune, so just make reasonable attempts for the standard and the culture to harmonize for a specific purpose.

    • zelphirkalt 9 months ago

      That is also a way to never progress beyond the status quo.

  • donatj 9 months ago

    To build an internationalized shell script I'll need to compile multiple .mo language files and distribute them along side the script itself.

    For shell scripts part of a large system, that's probably fine. For small scripts, that's not very practical. You are not only adding a compilation step, you're also requiring distribution of multiple files. That's a pain.

    It just kind of kills the convenience of a simple shell script. I would probably end up writing a makefile to manage all of this and at that point I am only a hop skip and jump away from using a compiled language instead of shell.

  • 9 months ago
    [deleted]
  • Netch 8 months ago

    Filename character set and its interpretation shall be controlled per directory or, at least, per FS. This pertains not only to permitted set like with or without LF, but to collation rules as well (including case insensitivity with cases like Turkish/Crimean/etc. I/ı and İ/i). Also this shall include workarounds for already existing problems: if a directory already contains files I1 and ı1, there shall be a technique to deal with them separately ever with Turkish locale.

    But restricting this at syscall level is definite insanity, among with excuses.

  • nh2 9 months ago

    > future editions will not require c17, but will simply require whatever C specification version is the most modern and already implemented by major toolchains

    Is this really good?

    If you can't rely on anything concrete being guaranteed, and it is open to interpretation what "modern" or "major toolchains" are, why have a standard?

  • InfiniteRand 9 months ago

    I kind-of would like to see a POSIX-strict profile which incorporates commonsense (by commonsense I mean avoiding things that repeatedly over many years have tripped up programmers in frustrating ways) things like no newline in file names. Operating systems (or distributions) or could opt into this profile, and then someone programming on such an operating system could rely on the constraints of the profile and additional facilities could be added on that might need to rely on those constraints. Hopefully, gradually the use of the profile would spread.

  • guerrilla 9 months ago

    Why was `isascii()` removed?

    (Listed in the Sortix article linked in OP.)

  • rurban 9 months ago

    EILSEQ for \n finally, but why not for unicode confusables? Path names are identifiers, and as such need to be identifiable. Meaning stricter rules than just buffers (not talking about strings).

  • pabs3 9 months ago

    Since old-POSIX systems will be in use for some time, I wonder how many things will be able to switch to using the new capabilities. And how many OSes already support all of the new changes.

  • snvzz 9 months ago

    This is a surprisingly greedy POSIX update.

    • BoingBoomTschak 9 months ago

      As someone who truly limits himself to POSIX when he can, I think they needed to push it forward to not become completely obsolete. I'm really sad `mktemp -d` and `set -o nullglob` didn't make the cut, but that's how it is, I guess.

      • ykonstant 9 months ago

        A bespoke `mktempd` script is one of the first things I install in a new system. Fortunately, it is not too hard to make a `mktemp -d` compatible script with POSIX tools. `set -o nullglob` is another story :D

        • pxeger1 9 months ago

          It's quite hard to write mktemp securely[1]. It would be great if POSIX didn't make people attempt to do that error-prone task themselves.

          [1]: There's some explanation in this recent post: https://dotat.at/@/2024-10-22-tmp.html

          • ykonstant 9 months ago

            This is correct (though of course a decent `mktempd` script will deal with the listed problems or crash loudly on failure), and there are even more reasons to avoid /tmp.

            Unfortunately, it is one of the very few directories that are somewhat POSIX-"guaranteed" writable by a non-root user and the fact that on modern systems it is usually mounted on a tmpfs makes it very attractive for pure POSIX usage without rich array support.

            If you have mount permissions, of course, you should tell your `mktempd` to base its directory on a private tmpfs.

  • 9 months ago
    [deleted]
  • ggm 9 months ago

    File names with / in them

  • cryptonector 9 months ago

    > The problem is that pathnames2 (as per section 3.254 of POSIX 2024) are just strings (meaning they can contain any bytes except the NUL character), [...]

    Pathnames can neither contain NUL nor '/'.

    Re: `find -print0` / `xargs -0`:

    > Previous POSIX releases have considered -print0 before, but never ended up adopting it because using a null terminator meant that any utility that would need to process that output would need to have a new option to parse that type of output.

    What nonsense. Just add the `-0` or similar options as needed.

    > More precisely, this approach does not resolve our original problem. xargs(1p) can’t sort, and therefore we still have to handle that logic separately, unless sort(1p) also grows this support, even after read(1p). This problem continues with every other type of use-case. Importantly, it breaks the interoperability that POSIX was made to uphold.

    More nonsense.

    > A bunch of C functions3 are now encouraged to report EILSEQ if the last component of a pathname to a file they are to create contains a newline (put differently, they’re to error out instead of creating a filename that contains a newline).

    Ok, that's tolerable. Ditto utilities (notice here they were able to make a list of utilities).

    • chasil 9 months ago

      Note that GNU sort has...

      -z, --zero-terminated: end lines with 0 byte, not newline

  • EdSchouten 9 months ago

    strlcpy()!

  • oliver_jack 9 months ago

    [dead]

  • johnisgood 9 months ago

    > Anyway, POSIX 2024 now requires c17, and does not require c89

    I wish it would have been c99. What does c17 add exactly, more C++-esque complexity or not? Why was it not c99 (or perhaps even c11) over c17? Genuine questions.

    • lifthrasiir 9 months ago

      > What does c17 add exactly, more C++-ish bullshit or not?

      Multithreading support and such (atomics, thread-local storage and a guarantee that `errno` is in TLS), explicitly aligned types and allocations, dedicated types for strings known to be Unicode, _Noreturn, _Generic, _Static_assert, anonymous structs and unions in the nested position, quick_exit, timespec, exclusive mode ("x") in f[re]open, CMPLX macros.

      I'm not even sure which one can be C++-ish bullshit possibly except for about two points:

      - Multithreading does seem farfetched for causal users. In fact, I do think it could have been minimized without any actual harm, but multithreading itself needed to be specified because it greatly affects a memory model. (Before C11, C had no thread-aware memory model and different threading implementations were subtly different beyond what the standard stated.) Even JavaScript, originally without no notion of threads, eventually got a thread-aware memory model due to shared web workers. But that never meant JS itself need multithreading support in its standard library, and C could have done the same.

      - `_Generic` is even more debatable, though I believe it was the only way forward when we accept <tgmath.h>, which is known to be a response to Fortran (other responses include `restrict`) and was impossible to implement in the portable manner before C11. As long as it retains its scary underline and title case, I guess it's fine.

      • gpderetta 9 months ago

        Most importantly posix already has existing multithreading facilities in posix threads, so it is imperative that they are reformulated in term of the C++11/C11 memory model.

      • johnisgood 9 months ago

        You quoted me before my edit, but fair enough. I do like the "atomics" support.

        > "guarantee that `errno` is in TLS"

        I suppose that does not mean that I can just avoid setting errno to 0 before calling a function after which I check for errno, right?

        Yeah, I do have an issue with stuff like "_Generic" but I assume I can just simply not use it.

        What is "quick_exit" exactly and what does it solve?

        As for multithreading, I stick to phtread. Is any of the new features a replacement for that or what?

        At any rate, why C17 over C11 then?

        • lifthrasiir 9 months ago

          C17 is a bugfix version of C11 (the next major revision would be C23). The exact list of fixes is available in [1]. Mandating C11 instead of C17 when both are available seems not really useful now.

          You have the correct insight about errnos. The new guarantee only means that other threads are not possible to mess with your errnos, but cleaning errnos will be still useful within an individual thread.

          exit is not guaranteed to work correctly when called simultaneously from multipe threads, while quick_exit will be okay even in that situation. I think this behavior was not even specified before C11, and only specified after observing existing implementations.

          It is expected that libc threading routines are thin wrappers around pthread in Linux. That's why I do think it can be minimized; the only actual problem before C11 was the lack of thread-aware memory model. No need to actually be able to create threads from libc to be honest, especially given that each platform now almost always has a single dominant threading implementation like pthread.

          [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm

          • johnisgood 9 months ago

            My last question would be: is it "OK" to use phtread in my code or are there any alternatives (i.e. "best way") when using C17?

      • cryptonector 9 months ago

        > guarantee that `errno` is in TLS

        I mean, that is already true.

        • lifthrasiir 9 months ago

          There is no such guarantee in C99:

          7.5 ¶2: [...] and `errno` which expands to a modifiable lvalue that has type `int`, the value of which is set to a positive error number by several library functions. It is unspecified whether `errno` is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual object, or a program defines an identifier with the name `errno`, the behavior is undefined.

          7.5 ¶3: The value of `errno` is zero at program startup, but is never set to zero by any library function. The value of `errno` may be set to nonzero by a library function call whether or not there is an error, provided the use of `errno` is not documented in the description of the function in this International Standard.

          The fact that `errno` can expand to an lvalue does reflect what is required for multithreading implementations among others, but that's about all.

          • cryptonector 8 months ago

            Nor is it in POSIX, but it's true of all POSIX-like systems that support threading.