What's New in POSIX 2024

(blog.toast.cafe)

237 points | by signa11 a year ago ago

240 comments

> We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now? Ok, that might be a bit much all at once. We’re heading there though!

Oh my god. This makes me so happy. This is the most lovely think I've read in the world of computing since the unix gods decided that newlines were to be a single character.

The philosophy underlying the sentence "Wouldn’t it be nice if the naive scripts were just correct now?" is incredibly positive. We are surrounded by arrogant jerks who break old code by aggressively enforcing stricter compliance of some stupid rules. But here come these posix heros who do the exact opposite: make old code correct! There is hope in mankind after all.

[-]

nneonneo a year ago

Rather unfortunately, I happen to have a handful of files on my machine with newlines in them (the filenames were programmatically generated from a summary of their contents). I loathe the possibility that my shell tools are going to suddenly crash when confronted with these weird files, rather than just producing some slightly silly output. I wish we'd standardized the behaviour of just escaping such characters as `\n/\r` or `^J/^M`...

[-]

jraph a year ago

They did the right thing for this: make the tools fail on file creation, but not on existing files.

I guess it's still advisable to rename those files, I don't know how things like cp, mv or rsync will behave when copying such files in the future.

[-]

nneonneo a year ago

No, they did not do the right thing:

> the following utilities are now either encouraged to error out if they are to create a filename that contains a newline, and/or encouraged to error out if they are *about to print a pathname that contains a newline* in a context where newlines may be used as a separator

It then proceeds to list a bunch of utilities including diff, file, find, grep, head, du, etc., none of which create files directly.

These utilities could be updated to reject newlines in file paths if they're going to print in a "newline delimited" form - but for some of these utilities, that's the only available form.

[-]

jraph a year ago

> error out if they are about to print a pathname that contains a newline in a context where newlines may be used as a separator

But that's already broken. This is a situation where filenames with newlines in them are indistinguishable from two filenames in outputs. So instead of producing subtly broken output, tools are encouraged (not forced) to explicitly fail with a lot of noise.

The "in a context where newlines may be used as a separator" part of this sentence is very important.

IIUC the tools are still allowed to succeed in non broken situations, for instance when a null separator is used and not a new line character. And I can't imagine the tools you listed will start breaking in situations that worked (apart from file creation - indeed this will likely start breaking, and new line characters in filename needs to be considered deprecated and things using them to be fixed).

This is strictly better IMHO (if one thinks that newlines in files are not worth the troubles given how things work in POSIX, especially the part where things are line-based and new line characters have quite some significance)

a year ago

[deleted]

ykonstant a year ago

If your file system allows them, be careful with symlinks though!

[-]

jraph a year ago

Why, specifically?

I'm convinced we will need to be careful with symbolic links related to new line characters in filenames, but I'm curious of which specific aspect you had in mind.

[-]

ykonstant a year ago

Oh, nothing specific to newlines. Just, when you rename files to fix newlines, you need to check if they break symlinks pointing to them.

For instance, I had project folders for my individual research projects. In order to have a central repository of resources and not have copies of multi-megabyte pdfs in each folder, I put all referenced papers in a single directory and symlinked them for each project that needed them. Later, I wanted to rename the papers to remove newlines. The symlinks complicated this process quite a bit!

[-]

jraph a year ago

Ah, right, indeed :-)

ykonstant a year ago

In academia, I get (and used to create) pdfs with names like:

"On the number of

associative foobars

of degree blah -

Johnson and Anderson.pdf"

all the time. It is very convenient for non-technical academics to have a descriptive file name, and to be able to see it entirely in the navigator they use newlines.

[-]

oneeyedpigeon a year ago

Oh god. I already get upset enough by spaces in a file name, although I realise that fight is basically lost now!

[-]

curt15 a year ago

Didn't Windows name "Program Files" with a space to force application developers to handle spaces in paths properly?

[-]

mmcdermott a year ago

For the longest time you could get away with this in cmd:

> dir c:\progra~1

So if forcing people to handle spaces was the goal, it took a long time to force it.

[-]

arp242 a year ago

I'm pretty sure that still works. I forgot the exact scenario, but my Windows CI on GitHub Actions output shorte~1 pathna~1 like that in a script just a few months ago. On one hand, the backwa~1 compati~1 is nice. On the other hand, there's just so much depreca~1 cruft that keeps popping up even on contemp~1 systems.

pjmlp a year ago

In theory yes, in practice to this day many people don't bother how to learn how to deal with pathnames in a proper way.

[-]

inetknght a year ago

Top difficulties in computer science:

1. naming things

2. cache coherency

3. off-by-one errors

???

4. quoting pathnames

[-]

hawski a year ago

I would replace 4 with parameter expansion rules.

nneonneo a year ago

Eh, maybe. In practice I usually do all my moderately-heavy filesystem scripting in Python these days, for which pathname quoting is just a complete non-issue. Of course, I still use a shell for quick-and-dirty stuff, but usually only for pretty simple tasks where the simplest quoting setup ("$i") suffices.

ape4 a year ago

Not to mention C:\Program Files (x86)

[-]

account42 a year ago

And C:\Programme and other localized variants to force people to go through the proper APIs instead of hardcoding paths.

InfiniteRand a year ago

I just got used to installing things I need to interact with in a program into a folder named C:\workspace

enriquto a year ago

As a fellow spaces-in-filenames-hater, the fight is not lost. We are on the brink of winning it; it's just a mount option away!

While we cannot avoid that people hit the spacebar when writing a filename on a gui, this does not mean at all that the resulting filename itself need contain a plain space character. Those spaces can and should be transparently translated to non-breaking space characters at some point. Maybe by the gui itself, or more robustly by the filesystem. This would make everybody happy: gui users and naive shell script writers.

[-]

poincaredisk a year ago

>Those spaces can and should be transparently translated to non-breaking space characters at some point

Why? This just introduces more complexity and interoperability headaches for seemingly no reason.

[-]

enriquto a year ago

> Why?

In order to preserve the sacrosanct simplicity of naive shell scripts. Seems like a very noble goal to me.

The only unexpexted compexity arises when you want to deal with filenames having mixed spaces and nbsps. But I'd say that people who do that had it coming.

[-]

alexvitkov a year ago

If you want simple shell scripts to work, make an actually good shell language without all the footguns.

The filesystem is way more important than /bin/sh and and any complexity added there will trickle down to all programs, not just shell scripts.

It's not worth adding hacks on the FS to patch defects in poorly written shell scripts (which are being replaced en masse with python/nodejs/even weirder yaml files/systemd units/etc... anyways)

[-]

consteval a year ago

Whitespace in filenames in general is difficult to deal with. Many, maybe most, programs get it wrong. It's not just about shell scripts, many GUI programs fail to handle those files properly too.

[-]

alexvitkov a year ago

When GUI programs mishandle filenames with spaces, IME it's usually because they spawn a subshell in a naive way (system("rm " + filename)).

To mishandle spaces you have to split an input w/ filenames by whitespace, which is not that common of an operation outside of a shell.

[-]

InfiniteRand a year ago

My favorite file+space issues is spaces at the end of file names, especially when you copy and paste text, or text gets trimmed from an input box, or the person forgets to trim space from an input box...

nradov a year ago

The vast majority of Windows and MacOS programs get it right.

[-]

consteval a year ago

No, no they don't - you just don't notice when they get it wrong, and you also don't name your files stupid things (I imagine).

If you actually test this, you'll realize a ton of Windows programs get it wrong.

Also, in general this is a poor argument. The goal of Linux isn't to be as much like Windows as possible, because Windows sucks ass. Nobody in their right mind would use Linux if it was just Windows but, presumably, shittier. The entire appeal of Linux is that it isn't Windows, and it isn't MacOS.

arp242 a year ago

Eh? It's really not a bother in pretty much any programming language, and you don't really need to do anything special for it. I don't know any program that has any problems with it.

Even zsh has fixed this. It's just /bin/sh and bash that are annoying.

ksp-atlas a year ago

nushell uses real lists for things which means you don't need to care about seperators except when dealing with external system things

lifthrasiir a year ago

Simplicity doesn't always mean stupidity. The simple but functional shell that correctly handles whitespaces without much hassle was already available since 90s, namely rc which is also found in Plan 9. Adopting rc's string concatenator `^` in POSIXy shells shouldn't be too hard.

chasil a year ago

It would be really nice if there was a mount option that would quietly remove spaces in filenames, or convert them to an underscore.

If I had it, I would use it today.

vbezhenar a year ago

Yep, works today:

    sh-3.2$ f='Hello world'
    sh-3.2$ echo $f
    Hello world
    sh-3.2$ for i in $f; do echo $i; done
    Hello
    world

    sh-3.2$ f='Hello\xC2\xA0world'
    sh-3.2$ echo $f
    Hello world
    sh-3.2$ for i in $f; do echo $i; done
    Hello world

[-]

stouset a year ago

Just always quote variable interpolation and you will never have problems.

    sh-3.2$ f='Hello world'
    sh-3.2$ echo "${f}"
    Hello world
    sh-3.2$ for i in "${f}"; do echo "${i}"; done
    Hello world
    sh-3.2$

jonhohle a year ago

Convince (force?) your team to use make and soon everyone will forget spaces in file names are even a thing!

[-]

taneliv a year ago

My team already uses `make` but there's no reason for me to run it in my Downloads folders. File names in there are sometimes wild. Yet I expect command line tools to work with them. If they will cease to do so, I will have to start using non-POSIX variants of those tools, I guess.

jodrellblank a year ago

I don't know who "the Austin Group" mentioned in the article are, but how come they "could not find a single use-case for newlines in pathnames besides breaking naive scripts" when legitimate use-cases are so easy to find?

(And if they're that incompetent, why does the article imply they are worth quoting and listening to?)

[-]

gpderetta a year ago

It is [1] the joint working group that for the last 25+ years has been responsible for both the POSIX standard and the Single Unix Specification. It emerged after the UNIX wars as a consolidation of the various splintered UNIX standardization efforts (POSIX itself, X/OPEN, OSF, etc).

[1] https://en.wikipedia.org/wiki/Austin_Group

nilamo a year ago

Is that legitimate? A path name is just a unique identifier for a file, IMO it doesn't make sense to put a whole novel in there. If anything, a giant summary like that should be in the meta tags?

[-]

jodrellblank a year ago

In what way is it not legitimate? It's not an accident, bug or data corruption. Someone put it there for a reason, and it benefits their use case. That's as legitimate as it gets.

ykonstant a year ago

That's a core part of the problem: a path name is NOT just a unique identifier for a file. Desktop operating systems and their classical utilities conflate the "unique identifier" and whatever "displayed title" of a file though which the end user interacts with the file.

Users care about "titles" or "summaries" of files, not "filesystem identifiers"; as long as the two are conflated, non-technical users will use the identifier to write titles and thus make the file easy to locate in an interactive GUI. Meta tags are not even in the cognitive horizon of most people.

shthed a year ago

Name one legit use case.

[-]

jodrellblank a year ago

... the use case in the parent comment I was replying to.

And no I'm not going to copy that here for you to quip "that's not a legitimate use case". Make an effort to make a point and support it with better justification than "because I said so".

Flimm a year ago

How do these non-technical academics even create a PDF file with a name like that?

[-]

ykonstant a year ago

Right click, rename, enter, enter, enter (until the entire file name is visible on the box)? That's how I did it when I used Windows.

Edit: now I remember the most basic way: open the pdf, select and copy the title, click on rename and paste from clipboard. Works great to get the file name with the newlines exactly as they are on the title!

[-]

zelphirkalt a year ago

Doesn't <enter> just confirm the typed input for the filename and finish the renaming? How does that insert newlines?

[-]

astroid a year ago

Yes - I just tested on Win10+11 because I thought "there is no way I didn't accidentally do something like this on accident... and I would have remembered seeing a new line in my file name when I made that mistake."

I just opened a folder in file explorer, clicked 'rename' and then tried the following combinations: Enter L Ctrl + Enter L Alt + Enter Win + Enter R Ctrl + Enter R Alt + Enter

None of them let me put new lines in the filename - it either did nothing, or 'closed' the rename view.

ykonstant a year ago

Shrug, I last used windows with Windows 7, so you are probably right. That being said, at least two of the students I am currently tutoring are on XP and one of my colleagues as well :D

[-]

pino82 a year ago

No, it was always this way.

[-]

ykonstant a year ago

Right, I just remembered the main way to create those filenames: open the pdf, select and copy the title, close, rename the file and paste from clipboard.

abenga a year ago

I don't know if this is a Linux thing, but when renaming a file, when I press enter, I apply the new name, the file manager doesn't add a newline.

a year ago

[deleted]

ykonstant a year ago

I am interested in hearing the rationale for downvotes explicitly. I am describing a reality that exists and must be taken into account. Why are you downvoting?

nasretdinov a year ago

The thing is, it's hard to predict what would happen to those scripts regardless... E.g. try naming your files "-rf" and see how many things break :)

[-]

ykonstant a year ago

A correct script will have no problems with "-rf" or any other file name. I have (and recommend script writers make their own) a directory hierarchy of "dangerous" file names to test scripts.

For example, it contains a directory where all file and subdirectory names are in unary, consisting only of repetitions of the newline character. A correct script should be able to enumerate, access and modify files in there without issue.

redserk a year ago

If one really wanted to embrace chaos, introduce this as a new team file naming standard for "risk finding" files ;)

tetha a year ago

I do enjoy "ls *; touch -- -lisah; ls *" as a fun little brainteaser for those uninitiated to this behavior.

nneonneo a year ago

export TMPDIR=" / "

to surprise the next person or script to do "rm -rf $TMPDIR/foo"...

funcDropShadow a year ago

Of course, there is an xkcd for that: https://xkcd.com/1172/

stouset a year ago

Dude, just fix the filenames.

anal_reactor a year ago

It's a bandaid on a wider problem: the design of Unix shell is bonkers and the whole thing should be deleted. Why? Because I haven't seen any other tool ever have so many pitfalls. Take n random languages and m random developers and tell them to loop over a string array and print its contents, and count how many correct programs you get on average per language. There will be easy languages, then difficult languages, then a huge gap, then Unix shell because in your random sample you managed to get one guy who has PhD in bash.

[-]

vbezhenar a year ago

The main problem is using text as a common format between different applications.

First: text is not well defined. Is it ASCII? Is it UTF-8? Some programs can spew UTF-32 with proper locale configured, it's a mess.

Second: encoding and decoding of objects to text is not defined at all. Those problems with filenames is just one example. Using newline as a separator is a natural thing that is easy to implement, yet it is wrong.

In my opinion two things should be done:

1. Standardise on UTF-8. No other encodings allowed.

2. Standardise on JSON. It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

So any utility must read and write JSON objects with some standard env set. And shells can be developed with better syntax to deal with JSON. This way you can write something like

`ps aux | while read row; do echo ${row.user} ${row.pid}; done`

[-]

poincaredisk a year ago

>It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.

aloisklink a year ago

POSIX does actually define what a "text file" is, but the definition is a bit unusual:

See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

> 3.387 Text File

> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

(and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

[-]

WJW a year ago

Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?

[-]

Ukv a year ago

POSIX defines a line as:

> 3.185 Line

> A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

rascul a year ago

An empty file is not hard to make. It's just a matter of creating the file and not writing to it.

[-]

WJW a year ago

Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".

[-]

rascul a year ago

Ahh I see what you're saying. I misunderstood at first.

arghwhat a year ago

What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.

[-]

vbezhenar a year ago

I think that I hit that with Java:

    % java -Dfile.encoding=UTF-32 Test | hexdump -C
    00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
    00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
    00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
    00000030  00 00 00 0a                                       |....|
    00000034

From quick googling it seems that glibc does not support it, so it should not happen.

[-]

Netch a year ago

> it seems that glibc does not support it

`iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...

Netch a year ago

> That can only be a bug - UTF-32/UCS-4 never saw external use

I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

And what is a real reason to disallow this?

ezoe a year ago

Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.

oneeyedpigeon a year ago

I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.

anal_reactor a year ago

True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?

nly a year ago

The primary purpose of command line program output is to convey information to a human, not to other programs.

Command line scripting is supposed to be adhoc and hack.

[-]

consteval a year ago

There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.

mdavid626 a year ago

I disagree that it supposed to be adhoc and hack. Look at PowerShell.

anthk a year ago

That under limited OSes such as DOS. Under Unix, piping has been the philosophy.

matrss a year ago

JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?

pif a year ago

> The main problem is using text as a common format between different applications.

If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

That idea is what made it possible for you to type that sentence in the first place.

akira2501 a year ago

> I haven't seen any other tool ever have so many pitfalls.

I haven't seen any other tool with so much general utility and availability.

> to loop over a string array and print its contents

Is incredibly easy in bash and bash like shells. As highlighted the issue is that tools like 'ls' don't create "a string array." They create one giant string that has to be parsed. The rules in the shell are different than in other languages but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

This is a fine tradeoff. As evidenced by it's wide usage and lack of convincing replacements.

[-]

anal_reactor a year ago

> I haven't seen any other tool with so much general utility and availability.

> availability

That's the real reason why we use Unix shell. It's not good, but it's available. Like a cheap hooker.

> but it /will/ do most of the parsing for you, or all of it, if you do it carefully.

"It mostly works if you're careful" doesn't sound very convincing to me.

[-]

stephenr a year ago

> but it's available. Like a cheap hooker.

Username checks out.

akira2501 a year ago

> "It mostly works if you're careful" doesn't sound very convincing to me.

Would you rather write your own parser?

blueflow a year ago

Someone needs to come up with a interactive shell first, one that is comparable in usability. Then we can think about replacing the unix shell.

I tried both python and lua interactively, but they are a pain when it comes to handling files. You have to type much more to get the same things done.

[-]

anal_reactor a year ago

The bigger issue is the sheer momentum of Unix shell. Even if you come up with an alternative that is better by every objectively measurable metric, it's still going to be a monumental task to have it packages with commonly used distros. Kinda like the "why can't the US switch to the metric system" problem.

[-]

blueflow a year ago

People already use different shells, mksh, fish, and so on. With fish there is a non-posix shell in wide use.

[-]

oguz-ismail a year ago

>wide use

Five people around the globe isn't wide use.

[-]

blueflow a year ago

I'm sure you might get more than 5 people on HN replying to you that they are using fish right now. Say something discrediting about fish and they show up.

[-]

fragmede a year ago

Heh, reminds me of how to get help with Linux back in the day. If you directly asked for help, you'd be told to RTFM. If you stayed confidently that Windows could do something and that Linux sucks because it can't, you'd get users tripping over themselves with details and instructions,'just to prove you wrong.

Human psychology is fascinating!

azalemeth a year ago

There's a direct cost in money, time and lives that has come from the US's adherence to their US Customary Units (which are often different to the old imperial units). People have literally died because of the confusion caused by having multiple systems of units in common use with ambiguous names (degrees, gallons, etc). Each year industry worldwide spends an enormous amount of money indirectly precisely because of this problem and it's still incredibly unlikely to be fixed within my lifetime.

Bash-alternatives that are not completely compatible frankly just don't have a chance.

Netch a year ago

OK let them add an explicit check to standard tools, and/or to open(), mkdir(), etc. with O_PORTABLECHARS. And an environment option to disable this check.

Why they force the restriction at syscall level?

stephenr a year ago

If it isn't distributed out of the box with every nix-like OS, it inherently isn't* “better by every objectively measurable metric" - distribution of a common, stable standard is a huge benefit in and of itself.

[-]

blueflow a year ago

> distributed out of the box with every nix-like OS,

Python and lua are pretty close to that.

[-]

stephenr a year ago

> Python and lua are pretty close to that.

Python maybe often installed by default but it's definitely not an essential/required package "out of the box" on every install. Also, in a thread where one topic is how POSIX shell handles whitespace in filenames, it's hilarious (not in a good way) that someone suggests a language that handles whitespace the wrong way in it's own code. Yes, significant whitespace is objectively wrong.

What OS/distro is Lua included on out of the box? That doesn't mean "available in a package". I mean literally included in every single install and cannot reasonably be omitted?

Regardless of the availability, the parent comment says

> better by every objectively measurable metric

Neither Python nor Lua are "better" than shell, at the types of things shell is commonly used for - they're objectively worse.

[-]

blueflow a year ago

Lua gets onto every other Linux distro as dependency of some base system component. For example, rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

You should use the word "objectively" less.

[-]

stephenr a year ago

> Lua gets onto every other Linux distro

Just FYI, there are UNIX-like, POSIX compatible systems that are not a Linux distro.

> rpm or pipewire depend on lua. Ubuntu and Debian ship with pipewire per default.

Pipewire? Do you mean this? https://packages.debian.org/bookworm/pipewire

That isn't even close to "installed on every system". Best I can tell from the reverse dependencies, it's required for some Gnome Remote Desktop tool, and best I can tell, it doesn't rely on Lua anyway (at least on Debian).

> You should use the word "objectively" less.

I specifically used the word objectively, because the original comment that I replied to, said this:

> better by every objectively measurable metric

[-]

blueflow a year ago

pipewire -> wireplumber -> libwireplumber -> liblua

Pipewire being the Pulseaudio replacement from Redhat.

Bookworm is probably the last Debian without :P

[-]

stephenr a year ago

> Pipewire being the Pulseaudio replacement from Redhat.

Right, so it's a desktop package that ultimately will be installed on about 1% of all Linux machines because the vast majority are servers without a desktop environment.

Also worth pointing out: liblua on Debian at least, is the shared library. It's not the binary to execute standalone Lua scripts.

[-]

blueflow a year ago

This this like a game where you come up with bullshit and i have to come up with the facts to rectify it? RHEL/centOS have more than 1% market share alone.

Check your own installs and tell me if you find some that dont have liblua or libluajit.

For the library thing: I said "Python and lua are pretty close to that." earlier. I did not say that they have interpreters ready everywhere. But if the language core is already installed on a large fraction of machines, then adding the interpreter is not a big cost.

[-]

stephenr a year ago

> already installed on a large fraction of machines

So far you've presented no evidence of this though, just that it's used by a new desktop-focused package.

All linux desktops over the last 30 years is not even a "large fraction" of total Linux installs, much less the ones that have already migrated to this new audio system.

> adding the interpreter is not a big cost

It's nothing to do with cost. It's about "how do I know this will absolutely 100% run on any POSIX machine I throw it on without any extra steps".

Remember the argument here is about something that is claimed to be "objectively better" than Shell. The ubiquitous nature of POSIX shell is a huge barrier for any possible competitor, and saying "well you just need to install it" just defeats the purpose. You might as well write it in fucking java and say "well you just need to install a JVM".

Edit to Add: a good number of systems I manage do have liblua installed... because HAProxy requires it, and those systems have HAProxy installed. Not because it was installed as part of the base OS or even a default group of packages.

Incidentally, HAProxy and thus liblua were installed on those systems by infrastructure management that's implemented as shell script. So what kind of chicken and egg argument do we need to have here about how exactly I can run a Lua script to install Lua?

[-]

blueflow a year ago

> a good number of systems I manage do have liblua installed

/thread

consteval a year ago

Even outside of distribution, python and lua aren't objectively better. For starters, they're much more verbose.

[-]

blueflow a year ago

I just said that, scroll up.

throw16180339 a year ago

I certainly have my complaints about Powershell, but it's got pretty good coverage, decent documentation, and cross platform support.

[-]

felixgallo a year ago

if it weren't so irregular, inconsistent, spotty and tasteless, it'd be a great option.

nly a year ago

Oil shell?

https://www.oilshell.org/

Compatible with most bash scripts

throwaway19972 a year ago

> the design of Unix shell is bonkers

Compared to what?

[-]

mdavid626 a year ago

Powershell?

[-]

poincaredisk a year ago

PowerShell designer could learn from decades of programming language progress and especially shell usage. They could improve many aspects indeed. This doesn't mean that the original design is "bonkers", only that it's not perfect.

[-]

pjmlp a year ago

The way Powershell works is largely based on what the computing world was doing with shells outside Bell Labs, at IBM, Xerox, and others places, exactly at similar timeframe as UNIX was happening.

mdavid626 a year ago

Can you give examples of what should be improved in PowerShell?

oguz-ismail a year ago

Verbosity is a huge problem there

[-]

consteval a year ago

Modern programming language designers have a bad relationship with verbosity. I don't know why they do this.

It's a lang for an interactive shell, typing literally translates to developer speed. I understand the want for clarity and maybe that's nice in large scripts, but the main goal is to be a shell. So, optimize for that. Also, you probably shouldn't be using powershell for large scripts anyway.

The only recent lang I've seen that has a handle on this is Rust. You can tell they put a lot of thought into having keywords be as short as possible while still being descriptive.

ggm a year ago

FoundTheCamelCaseConvert.

My God next you will say getopt() --longform is the bestest

[-]

throw16180339 a year ago

It's been years since I used Powershell, but IIRC there are shortcuts for the common commands, e.g. cat, ls, mv, rm, and such DTRT.

[-]

Diti a year ago

Those aliases are, I believe, only defined on Windows PowerShell (the closed-source version 5; not PowerShell 7). I wish those default aliases you mentioned weren’t a thing. Especially `curl` (people should use `iwr` instead), which is an alias of `Invoke-WebRequest`, because it makes the `curl.exe` shipped with Windows nearly undiscoverable.

dailykoder a year ago

Works on my machine!

dangsux a year ago

[dead]

zelphirkalt a year ago

This should not be as downvoted as it is. In a way shell is broken. The brokenness is in that it requires each command to serialize and deserialize again, considering all the weird things that can happen with the "all is a string" kind of approach, instead of having a proper data interchange format or even sending objects to next steps in the pipeline. This behavior is what necessitates even thinking about the changes listed in the post. We wouldn't even have that problem, if the design of shell was better thought out. Now we are dealing with decades of legacy built on these shaky foundations. I hate to admit it, but seems at least this aspect Powershell got right, whatever one may think about the rest of it.

[-]

chasil a year ago

On my rhel7 system, the Debian dash shell is this large:

   $ ll /bin/dash
  -rwxr-xr-x. 1 root root 113536 Nov  5  2018 /bin/dash

I happen to have an old powershell installed:

  $ rpm -qi powershell | grep Size
  Size        : 126588370

A strict POSIX shell is always going to be vastly smaller, for many reasons.

I would prefer that the POSIX shell was an LR-parsed language, but you can't have everything.

enriquto a year ago

> loop over a string array

Dear anal_reactor, what is a "string array"? I have used unix shells since nearly 30 years and never heard about them. And I consider myself a script-fu master!

There are two array-like constructions in the shell: list of words (separated by spaces) and list of lines (separated by newlines). Both cases are implemented as a single string, and the shell makes it trivial to iterate through its components.

[-]

ManBeardPc a year ago

That is exactly the problem many people have with it. Encoding „arrays“ this way is foreign to everyone who comes from „normal“ programming languages. Both variants lead to problems because either character can occur in elements, worst case scenario they contain both at the same time. I can see why this leads to confusion and bugs.

[-]

skydhash a year ago

It’s like people saying they won’t learn French because it has a different grammatical structure. There’s no “normal” natural language. If you’re used to the C-like syntax, learning C-like language will be easy. But that’s not an argument to say Lisp is confusing.

[-]

ManBeardPc a year ago

That's why I put normal in quotes. There is however more to it than having a different grammatical structure: It works different from many commonly used languages that have actual arrays/lists where elements can contain anything the type allows. If you come from any of the common modern programming languages (lets say Java, Kotlin, C#, JS/TS, Python, Swift, Go, Rust, etc.) and expect something similar (because many of them are very similar) you will be confused. Using spaces or newlines to encode elements in a single string is just not robust and leads to easy to make mistakes.

[-]

skydhash a year ago

Most of these languages were created long after bash and the other shells. The fact is that shell scripts allows for unquoted strings and quoting is a specific operation, not syntax. Also shell scripts were meant for automations, not for writing general programs. The basic units are commands, arguments, input, output, files,… so the design makes these easy to manipulate.

I’m not saying that we can’t improve, but I’m more in favor of making the tool more apt to solve a problem than making it easier to learn. Because the latter often wants to forego the requirement of understanding the problem space.

[-]

ManBeardPc a year ago

Yes, these are newer. I mainly wanted to make the point that it is confusing if you are new to bash and come from these newer languages with the wrong expectations. The concise nature and many subtle details makes it very difficult for beginners and infrequent users.

Compare this to the newer programming languages where you explicitly call something with speaking names like .Trim(), .EndsWith(), support from compiler and IDE.

In my experience automation and general programs often are the same thing once things get more complicated. Bash scripts usually grow rapidly and are a giant PITA to maintain or refactor. Throw in build systems and helper scripts and you quickly receive a giant pile of spaghetti. Personally I just switch to one the mentioned programming languages once it goes above a simple sequence of operations.

Personally I don't see how to improve it much without becoming a full blown programming language, at which point it would probably make more sense to just release a library for common automation tasks that is also composable. Maybe I'm just not the right target audience.

[-]

skydhash a year ago

The issue with your otherwise good reply is that someone are bringing expectations to an expert tool (programming languages, software, OS) and blidly assuming that everything will work as he thinks it should. Familiarity helps with learning, but shouldn’t replace it. Someone new to bash should probably start with a book.

And for bigger automation projects, there are lots of projects and programming languages that can help.

[-]

ManBeardPc a year ago

I agree it is an issue but it is how many people work and think. Most of the time they are not even wrong. "Hey, I have variables and loops, I know that!".

I would even make the case for expert tools being as unsurprising and familiar as possible unless there is a very good reason for them not to. Also they should be robust against misuse and guide the user towards good practices. There are always beginners, people that rarely need to use it, people that do programming as "just a job" and people that make mistakes because they are distracted, tired or just human. Something like "rm -r /" is a good reminder of that for many people.

Plus there are already a lot of tools required. Reading a book about every tool I have to use would be unpractical for most projects. Maybe more expert tools should just be tools. The same way I can now just use Ubuntu and get a working desktop system including drivers for most common hardware. If I compare that to the past where I installed a Linux distribution and then found out I lack a driver for my network card but I need to download it from the internet... I still can modify my system if I need to, but it's nice that I don't have to. I think we can do similar things with many parts of development and free some capacity for other tasks.

account42 a year ago

Their proposed solution is not compatible with reality though where POSIX does not get to define what kind of files exist on filesystems you need to work with.

All they did is introduce new error cases in C programs while not actually fixing anything for shell scripts.

If anything, it's going to result in more exploits as people write shell scripts with the assumption that newlines cannot appear in filenames.

[-]

quotemstr a year ago

In the real world, nobody writes shell scripts that handle newlines in filenames.

[-]

account42 a year ago

I do. Single files are handled with quotes around arguments just fine. For lists of files you need to use NUL as a separator. That's not really hard to do once you are aware of the problem but ergonomics could be better - which is something useful that POSIX could change.

[-]

quotemstr a year ago

Most shell scripts that correctly handle newlines do so by accident.

[-]

account42 a year ago

Because they don't deal with lists of files so this whole issue doesn't apply? Yes. Even more reason to just fix the shell scripts where this matters instead of trying to impose additional restrictions on filesystems with many decades of history.

zokier a year ago

But they did not make old code correct. Filenames are still allowed to contain newlines. Shell scripts still need to be prepared to deal with that. Nothing really changed, they just added a feel-good half-measure.

[-]

quotemstr a year ago

It's a step in the right direction. You have to understand that for decades a vocal group of Unix die-hards has opposed any limitations whatsoever on the bytewise content of file names. The newline restriction in this latest version of POSIX may be modest, but it represents a dam breaking. When (obviously) the sky doesn't fall, the next version of POSIX will have a lot more filename cleanup.

[-]

rixed a year ago

Next step is to forbid newlines from file content itself, to fix conforment json parsers ?

janderland a year ago

This is pretty standard for a human run system. Gotta make the human feel good about an idea before they can do said idea.

If you’re not familiar with humans, there are several manuals available online.

ezoe a year ago

Don't assume UTF-8 is the only character encoding used in the wild. There are character encoding with leading bytes not easily detectable like UTF-8.

[-]

arghwhat a year ago

In 2024, if you don't get the correct result decoding a text as UTF-8, the bug is the text, not the decoding. And luckily, adoption of UTF-8 in the past 30+ years have gone will enough that you don't need to worry.

Caveats for cursed hardware standards demanding two-byte encodings like USB.

[-]

poincaredisk a year ago

I hope you're happy in your ivory tower, but I personally work with a lot of files with other encoding, most often that weird utf16 (Windows), sometimes also legacy files with different ANSI encoding. Declaring "my decoder is fine, it's the text that is buggy" is not going to score a lot of points with my boss and clients.

[-]

arghwhat a year ago

The only valid reason for still having files stored in legacy ANSI encodings is that their only use is input to software that has not been maintained for ~30 years and cannot be updated. That's fine because they're just binary inputs in a closed ecosystem that no one touches.

But if they are supposed to be treated as text, then yes it's the text that's buggy - they should just be converted to UTF-8 once and have the originals thrown away.

UTF-16 is something that Microsoft has cursed us with by inserting it into specifications (like USB) so that we cannot get rid of it, even if it never made any sense what so ever. But those are in effect explicit protocols with a hard contract, very different from something where you would "assume an encoding".

zelphirkalt a year ago

Shouldn't hurt to tell clients to right their weird proprietary software originated encodings though.

1oooqooq a year ago

why people assume utf8 had only know locale encoding still?

you're probably guilty of the sin you preach and is showing wrongly decoded utf8 and don't even know.

hwc a year ago

Now do that with all whitespace!

chasil a year ago

  - find(1p) now supports -print0
  - xargs(1p) now supports the -0 argument
  - newlines in filenames now should throw errors in many utilities
  - a complier implementing the c17 standard is now required
  - ulimit is expanded
  - renice can use relative values
  - a timeout utility has been added
  - make adds support for $^ $+ ::= :::= != ?= +=
  - logger is improved
  - gettext is adopted
  - readlink and realpath are adopted
  - rm now supports -d to remove empty directories and -v for verbose
  - various improvements to printf, sed, test

[-]

greyw a year ago

Looks like the BSD-family will have some implementing to do.

[-]

chasil a year ago

I just booted OpenBSD 7.0 (which is a bit dated).

The find utility has print0, and xargs has -0. Notibly, xargs also has -P for running processes in parallel.

rm has both -d and -v.

The renice command appears to be able to use relative adjustments with -n.

There is a timeout command.

There is a readlink command, but no realpath (but a manual page exists for it as a system call).

sneed_chucker a year ago

Strict adherence to POSIX isn't a goal of any of the current BSDs is it?

[-]

washbear a year ago

People get "POSIX compliance" confused with "Unix certification". The first is an API you implement, the second is a rubber stamp.

All active Unix-like operating systems aim to implement the new interfaces as they're defined.

bryanlarsen a year ago

I'm confident they'd accept patches.

saagarjha a year ago

macOS

imrejonk a year ago

This adds `set -o pipefail` to POSIX sh, which causes a whole pipeline to fail (non-zero exit code) if one or more of the commands in the pipeline fail.

[-]

deskr a year ago

If you're writing scripts, use that and don't forget -e and -u

  -e      Exit  immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status

  -u      Treat  unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion

[-]

ykonstant a year ago

For `set -u` I mostly agree. For `set -e` see my comment below and Greg's wiki: http://mywiki.wooledge.org/BashFAQ/105

[-]

deskr a year ago

> and they still fail to catch even some remarkably simple cases

I totally agree. Although I'd say that there isn't anything "remarkably simple" about writing a bash script. Anything in the shell scripting world that seems remarkably simple is just because one hasn't realised the ghosts and horrors that lurk in the shadows.

But I'll use -e anytime. It feels like having a protective proton pack at least.

a year ago

[deleted]

zelphirkalt a year ago

Does it? It is not mentioned anywhere in the post. Can you post a reference to your source?

[-]

noselasd a year ago

The post only have a few highlights. The Posix specs are only for paying IEEE customers though, but https://pubs.opengroup.org/onlinepubs/9799919799/ mentions it.

[-]

arp242 a year ago

That is the POSIX spec, no?

It's at: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V...

(no permalink, search for "pipefail")

akdor1154 a year ago

Holy balls that's like Christmas!

rightbyte a year ago

Really? Wont that break piping grep?

[-]

WJW a year ago

Probably, so don't `set -o pipefail` in scripts that pipe into grep.

[-]

rightbyte a year ago

Ah ok I read it as 'sets it by default' for some reason.

throwaway984393 a year ago

Sad. Use of that option is almost always a mistake. It only leads to undebuggable silent failures.

[-]

Joker_vD a year ago

I'd rather both have this option and have it work reliably. It's ridiculous that

    export VAR=$(cmd1 | cmd2)

does not count as a pipefail when cmd1 or cmd2 fail but

    VAR=$(cmd1 | cmd2)

does, so the "correct" way to set an environment variable from a pipeline's output is actually

    VAR=$(cmd1 | cmd2)
    export VAR

ykonstant a year ago

Pipefail is useful and very hard to emulate on pure POSIX; you need to create named fifos, break the pipeline into individual redirections and check for error on each line.

And that is fine; but sometimes you want to treat a pipeline as a "single command" and then you can use pipefail to abort the pipeline on error. Then you can handle the error at the granularity of the entire pipeline without caring which part failed.

Lastly, I am confused as to the "silent" failures; maybe you are thinking of combining this with `set -e`? Then yes, that is bad and I recommend against the combination; but then again, I and most advanced scripters recommend against shotgunning `set -e` in the first place. Use it in specific portions of the script when appropriate, and use proper error handling otherwise.

[-]

zelphirkalt a year ago

Why does `set -e` make a pipeline fail silently?

[-]

ykonstant a year ago

`set -e` makes the script abort and is often used in lieu of proper error handing:

  set -e
  command
  command [fails]
  command

Whether the above reports error or not depends on the command; when you have a pipeline failing in the above way, it is even sneakier:

  set -e
  command
  command | command | command [fails]
  command

You are reliant on all commands in the pipeline being verbose about failure to signal error.

None of the above is advisable. The advisable code is

  error_handler() { proper error handling; }

  command || error_handler "parameter"
  command || error_handler "parameter"

  { command | command | command; } || error_handler "parameter"

  {
  set -e
  exceptional section that needs to be bailed out
  set +e
  }

  command || error_handler "parameter"

[-]

skydhash a year ago

Error handling like that makes sense if you’re writing a program. But if you just want a script for an automation, `set -e` is enough.

[-]

ykonstant a year ago

It is not; Greg's wiki further explains why, if the silent failure problem above is not enough reason.

[-]

Joker_vD a year ago

Gee, imagine if shells with errexit option enabled wrote some diagnostic output to stderr before exiting. "Add your own error checking instead", how do I check which piece of pipeline has failed, exactly? The PIPESTATUS variable is bash-specific and was not standardized.

[-]

ykonstant a year ago

? Why are you replying to me? My position was pretty clear:

"Pipefail is useful and very hard to emulate on pure POSIX; you need to create named fifos, break the pipeline into individual redirections and check for error on each line.

By the way, I never script in Bash; I only script in POSIX primitives using dash as my executable.

relistan a year ago

The history at the beginning of this is not correct. Two examples: the assertion that there was one compatible UNIX prior to United States vs AT&T, the statement that GNU and BSD started that same year. Very, very off.

[-]

unixhero a year ago

Okay, but you would add more value if you could also state what is the correct order if things.

[-]

relistan a year ago

https://en.m.wikipedia.org/wiki/History_of_Unix#/media/File%... is a good visual of (many of, not all) the various versions of UNIX and when they were released. BSD was first released in 1978. United States v. AT&T was implemented in 1984 (judgment 1982) GNU was first created in 1983.

pelorat a year ago

TIL the POSIX standard is still updated. Does it still suffer from the issues that make Linux break POSIX compatibility in some areas because they consider it a flawed standard?

Flimm a year ago

Yes! Finally! Let's treat filenames with new lines as errors! I'm so delighted with this decision.

[-]

skissane a year ago

The original request was to ban all bytes between 1 and 31.

https://www.austingroupbugs.net/view.php?id=251

At some point they decided to narrow the change to just ban the newline character.

Which I personally think is a pity. Allowing escape in file names is a security risk because it enables you to embed ECMA-48 escape sequences in file names. Secure terminal emulators shouldn’t be made vulnerable by arbitrary escape sequences, but there are “too smart for their own good” terminal emulators out there that have escape sequences that let you do crazy things like run arbitrary executables.

[-]

ezoe a year ago

There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding. These values are used in the wild.

I think the decision forbidding newline in pathname is also wrong. It may break tons of existing code.

[-]

skissane a year ago

I wish Linux/etc had a mount option and/or superblock flag called “allow only sane file names”. And if you had that set, then attempting to create a file whose name wasn’t valid UTF-8, or which contained C0 or C1 controls, would fail. The small minority of people who really need pre-Unicode encodings such as ISO 2022 could just not turn that option on. And the majority who don’t need anything like that could reap the benefits of eliminating a whole category of potential bugs and vulnerabilities.

Joker_vD a year ago

> There are many non-UTF-8/16/32 character encoding used in the wild which use these value in multi-byte character encoding.

Like what? I am genuinely curious: Shift-JIS, GB2312, Big5, and all of the EUC variants do not use bytes that correspond to C0 characters in ASCII.

devit a year ago

That's obviously impossible since it would break backward compatibility and the users' existing filesystems (and the Linux kernel will rightly never accept anything like that).

The only reasonable fix is to enhance bash and shell IDEs to track for each variable whether it could possibly include all filename-valid characters (e.g. if it comes from read with no options then it can't contain \n) and warn (off by default unless stderr is a terminal) if they can't and it's used as a filename (conservatively determined when used as arguments to processes), and also warn when using find without -print0, etc. noninteractively and perhaps interactively as well.

IshKebab a year ago

Why is that an issue?

[-]

shakna a year ago

Run a program to list a directory. Everything that interfaces with that, will assume newline delimiters. Similar assumptions are baked into a lot of software.

Enforcing that a newline isn't part of a path, ensures the security of those systems that are commonly relied on.

[-]

oguz-ismail a year ago

Except no one's enforcing anything yet. Earlier versions of POSIX allowed rejecting filenames containing newlines, the newest version encourages it while mandating features required to handle such filenames safely (find -print0, xargs -0, read -d ''). So nothing's set in stone yet.

IshKebab a year ago

> Everything that interfaces with that, will assume newline delimiters.

Well, only badly written programs. nushell handles this fine, as will any program that doesn't try to do everything as plain strings:

  ~> touch "foo\nbar"
  ~> ls foo* | print
  ╭───┬──────┬──────┬──────┬──────────╮
  │ # │ name │ type │ size │ modified │
  ├───┼──────┼──────┼──────┼──────────┤
  │ 0 │ foo  │ file │  0 B │ now      │
  │   │ bar  │      │      │          │
  ╰───┴──────┴──────┴──────┴──────────╯

However after reading it they're only making them illegal for the posix utilities from the 70s that aren't written properly, so I think that makes sense.

enriquto a year ago

Next: spaces

[-]

lifthrasiir a year ago

Still much better than mojibaked names.

[-]

enriquto a year ago

What do you mean?

[-]

_ZeD_ a year ago

What is the encoding of the filenames?

[-]

Joker_vD a year ago

I am personally not aware of any MBCS that could have a 0x20 or 0x0D as a valid trailing byte. Are you?

[-]

lifthrasiir a year ago

I think my comment correctly contrasted mojibake from new lines or spaces for that reason.

quotemstr a year ago

Finally. Now let's do the rest: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

[-]

cryptonector a year ago

> Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

Ensuring normalization is hard. Where should you do it? There's only one good place: in the filesystem. But if you normalize on create then you'd better use the same form that everyone else uses, but, what's that? Input methods generally produce NFC, but there's no guarantee that they will not produce something else. HFS+ normalizes to NFD on create.

ZFS uses form-insensitivity -- much like case-insensitivity, but for form. The reason ZFS went this was exactly that HFS+ and input methods differ as to forms. I pushed hard for this way back when. IMO form-insensitivity is the best way forward.

But as for guaranteeing that filenames are UTF-8... that's much harder. The best thing to do is to not allow the use of non-UTF-8, non-ASCII, non-C locales -- not a guarantee, but pretty good.

[-]

quotemstr a year ago

Sure. Form-insensitivity is another good option. I'd actually argue for full case insensitivity too (like macOS), although I realize that it's probably a stretch.

[-]

cryptonector a year ago

Case-insensitivity is also an option in ZFS, but honestly case-insensitivity drives me nuts, especially if it's not case-preserving. Oh, that reminds me, ZFS is form-insensitive, and form-preserving.

oguz-ismail a year ago

Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6

    SRC != ls *.c

is fine in a makefile as far as POSIX is concerned, because:

> Applications shall select target names from the set of characters consisting solely of slashes, hyphens, periods, underscores, digits, and alphabetics from the portable character set

BobbyTables2 a year ago

I really hate to say it, but the fretting about newlines used as delimiters after 50 years of misuse …

… makes PowerShell start to look damn good.

somat a year ago

Hopefully nothing, posix is, or at least it should be, a descriptive standard. This is why posix is so terrible, and why posix is so great.

The way I feel posix, and other descriptive standards work best is when they describe what every one is already doing. This is opposed to prescriptive standards which try focus on how the "correct" way to do somthing, prescriptive standards tend to be over engineered and may or may not actually work.

see also: descriptive and prescriptive dictionaries. http://www.englishplus.com/news/news1100.htm

[-]

Flimm a year ago

Both prescriptive standards and descriptive standards have their uses. If POSIX is a prescriptive standard, then maybe another standard should exist that is descriptive.

[-]

lifthrasiir a year ago

Keep in mind that the Web standard eventually became prescriptive because descriptive standards failed to catch up. Likewise it can be argued that descriptive standards for the common OS interface are no longer usable.

[-]

vacuity a year ago

To be crass, description is only useful for existing things and prescription hinders making innovative things. I think social forces make it natural that standards are treated both descriptively and prescriptively, and that too leads to angst. Case in point, POSIX was once more descriptive, but then people wanted backwards compatibility for existing and new OSes, which made it more prescriptive. The takeaway is that ad-hoc things become permanent once they are too difficult to remove, and then people are sad. Nothing is immune, so just make reasonable attempts for the standard and the culture to harmonize for a specific purpose.

zelphirkalt a year ago

That is also a way to never progress beyond the status quo.

donatj a year ago

To build an internationalized shell script I'll need to compile multiple .mo language files and distribute them along side the script itself.

For shell scripts part of a large system, that's probably fine. For small scripts, that's not very practical. You are not only adding a compilation step, you're also requiring distribution of multiple files. That's a pain.

It just kind of kills the convenience of a simple shell script. I would probably end up writing a makefile to manage all of this and at that point I am only a hop skip and jump away from using a compiled language instead of shell.

a year ago

[deleted]

Netch a year ago

Filename character set and its interpretation shall be controlled per directory or, at least, per FS. This pertains not only to permitted set like with or without LF, but to collation rules as well (including case insensitivity with cases like Turkish/Crimean/etc. I/ı and İ/i). Also this shall include workarounds for already existing problems: if a directory already contains files I1 and ı1, there shall be a technique to deal with them separately ever with Turkish locale.

But restricting this at syscall level is definite insanity, among with excuses.

nh2 a year ago

> future editions will not require c17, but will simply require whatever C specification version is the most modern and already implemented by major toolchains

Is this really good?

If you can't rely on anything concrete being guaranteed, and it is open to interpretation what "modern" or "major toolchains" are, why have a standard?

InfiniteRand a year ago

I kind-of would like to see a POSIX-strict profile which incorporates commonsense (by commonsense I mean avoiding things that repeatedly over many years have tripped up programmers in frustrating ways) things like no newline in file names. Operating systems (or distributions) or could opt into this profile, and then someone programming on such an operating system could rely on the constraints of the profile and additional facilities could be added on that might need to rely on those constraints. Hopefully, gradually the use of the profile would spread.

guerrilla a year ago

Why was `isascii()` removed?

(Listed in the Sortix article linked in OP.)

[-]

oguz-ismail a year ago

It would yield false-positives with non-UTF-8 encoded text. Big5 <https://en.wikipedia.org/wiki/Big5#Encoding> in particular was notorious for using ASCII values for trailing bytes. I don't know if it's still in use or if there are others.

rurban a year ago

EILSEQ for \n finally, but why not for unicode confusables? Path names are identifiers, and as such need to be identifiable. Meaning stricter rules than just buffers (not talking about strings).

pabs3 a year ago

Since old-POSIX systems will be in use for some time, I wonder how many things will be able to switch to using the new capabilities. And how many OSes already support all of the new changes.

snvzz a year ago

This is a surprisingly greedy POSIX update.

[-]

BoingBoomTschak a year ago

As someone who truly limits himself to POSIX when he can, I think they needed to push it forward to not become completely obsolete. I'm really sad `mktemp -d` and `set -o nullglob` didn't make the cut, but that's how it is, I guess.

[-]

ykonstant a year ago

A bespoke `mktempd` script is one of the first things I install in a new system. Fortunately, it is not too hard to make a `mktemp -d` compatible script with POSIX tools. `set -o nullglob` is another story :D

[-]

pxeger1 a year ago

It's quite hard to write mktemp securely[1]. It would be great if POSIX didn't make people attempt to do that error-prone task themselves.

[1]: There's some explanation in this recent post: https://dotat.at/@/2024-10-22-tmp.html

[-]

ykonstant a year ago

This is correct (though of course a decent `mktempd` script will deal with the listed problems or crash loudly on failure), and there are even more reasons to avoid /tmp.

Unfortunately, it is one of the very few directories that are somewhat POSIX-"guaranteed" writable by a non-root user and the fact that on modern systems it is usually mounted on a tmpfs makes it very attractive for pure POSIX usage without rich array support.

If you have mount permissions, of course, you should tell your `mktempd` to base its directory on a private tmpfs.

a year ago

[deleted]

ggm a year ago

File names with / in them

cryptonector a year ago

> The problem is that pathnames2 (as per section 3.254 of POSIX 2024) are just strings (meaning they can contain any bytes except the NUL character), [...]

Pathnames can neither contain NUL nor '/'.

Re: `find -print0` / `xargs -0`:

> Previous POSIX releases have considered -print0 before, but never ended up adopting it because using a null terminator meant that any utility that would need to process that output would need to have a new option to parse that type of output.

What nonsense. Just add the `-0` or similar options as needed.

> More precisely, this approach does not resolve our original problem. xargs(1p) can’t sort, and therefore we still have to handle that logic separately, unless sort(1p) also grows this support, even after read(1p). This problem continues with every other type of use-case. Importantly, it breaks the interoperability that POSIX was made to uphold.

More nonsense.

> A bunch of C functions3 are now encouraged to report EILSEQ if the last component of a pathname to a file they are to create contains a newline (put differently, they’re to error out instead of creating a filename that contains a newline).

Ok, that's tolerable. Ditto utilities (notice here they were able to make a list of utilities).

[-]

chasil a year ago

Note that GNU sort has...

-z, --zero-terminated: end lines with 0 byte, not newline

EdSchouten a year ago

strlcpy()!

oliver_jack a year ago

[dead]

johnisgood a year ago

> Anyway, POSIX 2024 now requires c17, and does not require c89

I wish it would have been c99. What does c17 add exactly, more C++-esque complexity or not? Why was it not c99 (or perhaps even c11) over c17? Genuine questions.

[-]

lifthrasiir a year ago

> What does c17 add exactly, more C++-ish bullshit or not?

Multithreading support and such (atomics, thread-local storage and a guarantee that `errno` is in TLS), explicitly aligned types and allocations, dedicated types for strings known to be Unicode, _Noreturn, _Generic, _Static_assert, anonymous structs and unions in the nested position, quick_exit, timespec, exclusive mode ("x") in f[re]open, CMPLX macros.

I'm not even sure which one can be C++-ish bullshit possibly except for about two points:

- Multithreading does seem farfetched for causal users. In fact, I do think it could have been minimized without any actual harm, but multithreading itself needed to be specified because it greatly affects a memory model. (Before C11, C had no thread-aware memory model and different threading implementations were subtly different beyond what the standard stated.) Even JavaScript, originally without no notion of threads, eventually got a thread-aware memory model due to shared web workers. But that never meant JS itself need multithreading support in its standard library, and C could have done the same.

- `_Generic` is even more debatable, though I believe it was the only way forward when we accept <tgmath.h>, which is known to be a response to Fortran (other responses include `restrict`) and was impossible to implement in the portable manner before C11. As long as it retains its scary underline and title case, I guess it's fine.

[-]

gpderetta a year ago

Most importantly posix already has existing multithreading facilities in posix threads, so it is imperative that they are reformulated in term of the C++11/C11 memory model.

johnisgood a year ago

You quoted me before my edit, but fair enough. I do like the "atomics" support.

> "guarantee that `errno` is in TLS"

I suppose that does not mean that I can just avoid setting errno to 0 before calling a function after which I check for errno, right?

Yeah, I do have an issue with stuff like "_Generic" but I assume I can just simply not use it.

What is "quick_exit" exactly and what does it solve?

As for multithreading, I stick to phtread. Is any of the new features a replacement for that or what?

At any rate, why C17 over C11 then?

[-]

lifthrasiir a year ago

C17 is a bugfix version of C11 (the next major revision would be C23). The exact list of fixes is available in [1]. Mandating C11 instead of C17 when both are available seems not really useful now.

You have the correct insight about errnos. The new guarantee only means that other threads are not possible to mess with your errnos, but cleaning errnos will be still useful within an individual thread.

exit is not guaranteed to work correctly when called simultaneously from multipe threads, while quick_exit will be okay even in that situation. I think this behavior was not even specified before C11, and only specified after observing existing implementations.

It is expected that libc threading routines are thin wrappers around pthread in Linux. That's why I do think it can be minimized; the only actual problem before C11 was the lack of thread-aware memory model. No need to actually be able to create threads from libc to be honest, especially given that each platform now almost always has a single dominant threading implementation like pthread.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm

[-]

johnisgood a year ago

My last question would be: is it "OK" to use phtread in my code or are there any alternatives (i.e. "best way") when using C17?

[-]

lifthrasiir a year ago

No, just use pthread. There are some useful pthread APIs missing from C17 anyway too.

[-]

johnisgood a year ago

Thank you for your answers, it is much appreciated.

I suppose I will not use "quick_exit" either in that case, I have many workers, there is a job queue mutex, along with phtread_cond_wait and phtread_mutex_{lock,unlock} and when the "job_quit_flag" is set to true, that means all jobs are done and I am ready to return NULL.

cryptonector a year ago

> guarantee that `errno` is in TLS

I mean, that is already true.

[-]

lifthrasiir a year ago

There is no such guarantee in C99:

7.5 ¶2: [...] and `errno` which expands to a modifiable lvalue that has type `int`, the value of which is set to a positive error number by several library functions. It is unspecified whether `errno` is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual object, or a program defines an identifier with the name `errno`, the behavior is undefined.

7.5 ¶3: The value of `errno` is zero at program startup, but is never set to zero by any library function. The value of `errno` may be set to nonzero by a library function call whether or not there is an error, provided the use of `errno` is not documented in the description of the function in this International Standard.

The fact that `errno` can expand to an lvalue does reflect what is required for multithreading implementations among others, but that's about all.

[-]

cryptonector a year ago

Nor is it in POSIX, but it's true of all POSIX-like systems that support threading.