What Unix pipelines got right and how we can do better

(programmingsimplicity.substack.com)

18 points | by rajiv_abraham 12 hours ago ago

21 comments

When cat writes to stdout, it doesn't block waiting for grep to process that data.

It will certainly do that if the buffer is full.

prevents the implicit blocking

No, that's exactly the case of implicit blocking mentioned above.

Does anyone else find this article rather AI-ish? The extreme verbosity and repetitiveness, the use of dashes, and "The limitation isn't conceptual—it's syntactic" are notable artifacts.

[-]

geysersam 12 hours ago

> Does anyone else find this article rather AI-ish?

After reading the whole thing, yes! Specifically it feels incoherent in the way AI text often is. It starts by praising unix pipes for their simple design and the explicit tradeoffs they make, and then proceeds explaining how we could and should make the complete opposite set of tradeoffs.

[-]

kej 11 hours ago

That would explain the strangeness of the recent spherical cows article from the same site, as well.

1718627440 11 hours ago

Also the headings are just sprinkled at intervals and don't really fit the text.

Joker_vD 12 hours ago

If anything, the pre-pipe style of

    prog1 -input input_file -output tmp1_file
    prog2 -input tmp1_file -output tmp2_file && del tmp1_file
    prog3 -input tmp2_file -output tmp1_file && del tmp2_file
    ...
    progN -input tmpX_file -output output_file && del tmpX_file

is more in line with the author's claimed benefits of the pipes than the piped style itself. The process isolation is absolute, they are separated not just in space, but in time as well, entirely!

[-]

bediger4000 9 hours ago

File management suddenly becomes an issue. If old file tmp1_file remains from a previous run, then prog1 fails, you get "old" output. Pipes avoid file management entirely.

1718627440 12 hours ago

> It will certainly do that if the buffer is full.

You can consider that an OS/resource specific limitation, rather than a limitation in the concept.

[-]

Joker_vD 11 hours ago

Nah. Having built-in automatic backpressure is one of the most underappreciated things about the UNIX pipes.

[-]

1718627440 11 hours ago

Fully agree. This is still a representation of the available resources.

quantified 2 months ago

> This cross-language composition remains remarkably rare in modern development, where we typically force everything into a single language ecosystem and its assumptions.

I think IPC via HTTP, gRPC, Kafka, files, etc allows language decoupling pretty well. Intra-process communication is primarily single-language, though you can generally call from language X into C-language libs. Cross-process, I don't see where the assertion comes from.

[-]

lenkite 2 months ago

Something like Kafka should be part of the core operating system. Its API has been stable for years (decade+?) now.

[-]

cenamus 12 hours ago

Isn't dbus pretty much that (not that it's particularly good)

all2 12 hours ago

Wouldn't passing comms through a C ABI still be placing everything into a single language? Or am I conflating communication protocol with 'language'? My parser/combinator/interpreter senses are tingling.

AndrewDucker 9 hours ago

I'm a big fan of how PowerShell passes objects.

But without a common runtime the closest you could really get to that in Unix would be to pass JSON or XML about, and have every program have a "pipe" mode that accepted that as input.

Which seems like an awful lot of work and unlikely to get the kind of buy in you'd need to make it work widely.

kazinator 12 hours ago

Unix pipelines got something right by being a syntactic sugar for chaining pure function application. It's easy to get excited when you don't understand this.

For instance sqrt(sin(cos(theta))) can be notated < theta | cos | sin | sqrt.

Pipeline syntax implemented in functional languages expands into chained function invocation.

Everything follows from that: what we know about combining functions applies to pipes.

> When cat writes to stdout, it doesn't block waiting for grep to process that data.

That says nothing more than that nested function invocations admit non-strict evaluation strategies. E.g. the argument of a function need not be reduced to a value before it is passed to another, which can proceed with a calculation which depends on that result before obtaining it.

When you expand the actual data dependencies into a tree, it's obvious to see what can be done in parallel.

nickelpro 12 hours ago

This looks and reads like AI slop.

Also viewing Unix pipes as some special class of file descriptor because your Intro to OS professor didn't teach you anything more sophisticated than shell pipe syntax is kinda dumb.

File descriptor-based IPC has none of the restrictions discussed in this article. They're not restricted to text (and the author does point this out), they're not restricted to linear topologies, they work perfectly fine in parallel environments (I have no idea what this section is talking about), and in Unix-land processes and threads are identically "heavy" (Windows is different).

rajiv_abraham 2 months ago

I find Paul's take on simplicity(and complexity) very illuminating.

hnlmorg 12 hours ago

> The lack of fan-out makes it awkward to express combinations where one sender feeds many receivers. In 1970, avoiding garbage collection was a practical necessity, but today garbage collection is available in most programming workflows and fan-out could be implemented much more easily through message copying rather than consumption.

Fanout has precisely zero dependency on GC. For example ‘tee’ has been around for decades and it can copy io streams just fine.

There has been some effort to built fanout shells too. With a discussion in HN earlier this month on one called dgsh https://news.ycombinator.com/item?id=45425298

Edit: I agree with other comments that this feels like AI slop

jeffbee 12 hours ago

"It's limited to unstructured text" requires ignoring ASCII unit and record separators. The people who came up with this stuff weren't dumb.

12 hours ago

[deleted]

12 hours ago

[deleted]