The Pythonic Emptiness

(blog.codingconfessions.com)

36 points | by todsacerdoti 8 months ago ago

49 comments

int_19h 8 months ago

I don't find any of the arguments here particularly convincing. This claim in particular is weird:

> Similarly, when you use len() to check a sequence for emptiness, you are reaching out for a more powerful tool than necessary. As a result it may make the reader question the intent behind it, e.g. if there is a possibility of handling objects other than a built-in sequence type here?

Given that checking for truthiness is less strict than a length test, by the same token, whenever you use it, you're reaching for an even more powerful tool than necessary. And, if anything, seeing `not items` is what makes me question the intent - did the author mean to also check for None etc here, or are they just assuming that it's never going to be that? And sure, well-written code will have other checks and asserts about it - but when I'm reading your code, I don't know if it's missing an assert like that because you intended it, or because you couldn't be bothered to write it.

OTOH len() is very explicit about what is actually checked, and will fail fast and loudly if the argument is not actually a sequence.

Also note that it's not, strictly speaking, an either-or - you can use `not len(x)` instead of `len(x) == 0` if you want a distinctive pattern specifically for empty collection checks.

[-]

wenc 8 months ago

For me, len() is much less ambiguous.

Also in Pandas if you try to check the truth value of a dataframe to see if it’s empty, it will fail. It will say “the truth value of a dataframe is ambiguous”.

df.empty is less ambiguous but you have to remember it specifically for dataframes.

But len(df) > 0 almost always works for any type of collection.

adamc 8 months ago

I don't agree with this at all, and I wonder if it reflects what other languages you use that may have shaped your assumptions. `if mylist` feels very much like Common Lisp to me. In much of the code I've seen, the value would never be none because an empty list was definitely created.

[-]

Joker_vD 8 months ago

Ah yes, reusing an empty list NIL as the false value because having a separate #f atom is bad for whatever reason.

[-]

kazinator 8 months ago

But it's only one value in the entire system. There is only one nil; there is no other false but nil. And there is no other empty list but nil.

An empty array or string are not false. Zero is not false.

In Lisp dialects with nil, we don't have endless discussions about how to test for false, or for empty, as the case may be.

[-]

Joker_vD 8 months ago

> And there is no other empty list but nil.

Okay? That still doesn't justify using it as replacement for #f (which is also a unique value). Or as #t for that matter, there is also only one #t in the entire system.

> An empty array or string are not false. Zero is not false.

And an empty list is also not false, only #f is false. So simple and uniform! Wait, there is no #f, but an empty list is false? Why? Why is empty list so special? Or, alternatively, why are you using the false value as an empty list? The idea to represent a list as

    [1 | [2 | [3 | False]]]

is quite strange. Just spare an atom, it's only what, eight bytes nowadays?

> how to test for false

In sane programming languages, the booleans are self-testing, so to speak, and they are the only values with such property, so:

    if not val then ...

    (if (not val) ...

tests for false.

> or for empty

    if val_specific_is_empty(val) then ...

    (if (null? val) ...

But of course, it only matters for languages that have other options for building composite data types than CONS-ing two items together.

[-]

kazinator 8 months ago

No language with completely separate booleans has ever had a fun club.

(Scheme isn't that language, because everything that is not #f is true.)

List processing is cumbersome in Scheme because the empty list is't false, and because car and cdr cannot be applied to (). (Which, stupidly, isn't even self-evaluating; you have to use '()).

The empty list is special because once upon a time lisp emphasized list processing more. Complex list processing is still important in metaprogramming, and representations of flexible dynamic data sets.

You're not working with lists, it utterly doesn't matter that nil also has a role as the empty list. Just like you don't care that your physician also plays saxophone in a jazz band; he's not doing it during your appointment.

adamc 8 months ago

It is very convenient and makes the code terse and that matters.

[-]

Joker_vD 8 months ago

May I suggest J (or K) to you then? It has even more conveniences, and allows for much terser code than LISP — even its name is terser! And has even less extraneoues syntax: did you know that parentheses can also be dispensed with? It's true!

lihaoyi 8 months ago

Python's truthiness behavior was the trigger for one of my worst ever bugs early in my career, which not only pulled in senior engineering but also marketing/comms and legal to help sort out the mess. Not a fan!

[-]

adamc 8 months ago

I think this needs more explanation to know if this is a good argument or not.

[-]

FreakLegion 8 months ago

Keep in mind that truthiness comes from __bool__ and is overridable, so separate from Python itself, a lot of library authors have made questionable decisions here. A perennial contender is https://github.com/psf/requests/issues/2002.

aguaviva 8 months ago

You know you're going to need to provide us with a little snippet demonstrating this behavior now, right?

[-]

lihaoyi 8 months ago

I went and dug up the original code that caused an issue. Here it is:

https://github.com/python/cpython/blob/v2.7.1/Lib/cgi.py#L60...

Python's std lib `cgi.FieldStorage` object was falsy if it did not define any headers, even if it contained file data.

Thus my conditional trying to check whether a file was being uploaded "if request.field_storage" was going through the False branch when files were being uploaded but only in certain header-free scenarios not covered by automated and manual testing. This resulted in us dropping user data on the floor and losing uploaded files for a very large number of users before we realized and shut it off

The other sibling post contains another example where people may be confused, and Google pulls up others. But this is the concrete case that caused us to send out a hundred thousand apology emails to affected customers after losing their files

sjsdaiuasgdia 8 months ago

I'm mostly annoyed that 'if len(items) == 0' / 'if len(items) > 0' aren't presented as options.

If we're talking about readability, they're far clearer than either of the options in the article and require no pre-knowledge of truthiness rules.

[-]

jasperry 8 months ago

I agree that `len(seq) == 0` is more readable. I don't mind the recommended truth test of the sequence itself, but I have no idea who would use their "wrong" option with the length as a truth value. Or maybe POSIX exit codes (0 is success) have made me shy about using integers as truth values.

warkdarrior 8 months ago

They are presented as options, it's just that their performance sucks. From the article:

    if not mylist:       # 1.061

    if len(mylist) == 0: # 1.924

[-]

James_K 8 months ago

I feel like, if you care about performance, using Python at all is a mistake.

[-]

tantalor 8 months ago

For most applications, the choice of algorithms and data structures has more impact on performance than programming language.

[-]

LegionMammal978 8 months ago

The one big exception to this I've found is working with image data. When you have over a million pixels, you'll have a slow time transforming them unless you can process each one within nanoseconds. Even a compiled language like Rust will struggle with image data if you build it in debug mode. So there's really always going to be a place for optimized implementations of those sorts of things.

chikere232 8 months ago

Then caring about a microptimisation over readability would still be the wrong call

bogwog 8 months ago

When you profile your code and find that all you gotta do is change a few if statements for a 2x perf boost, that will be a happy day.

James_K 8 months ago

Isn't this just a Perl feature (arrays are also their length and zero values are falsy)? I can't help but feel Python is getting closer to Perl as time goes on. Ironic, since their original goal was to be simple and make themselves distinct from Perl. What was the saying again? "There should be one-- and preferably only one --obvious way to do it." Honestly, I think it was all this "Zen" stuff that lead Python down the path to weirdness. This article reads like a monk interpreting sacred text. I can think of no good reason for all this malarkey.

[-]

jerf 8 months ago

This can't be Python "getting closer" to Perl, since it's been this way in Python since inception, and I'm pretty sure, Perl as well. Both languages have always been exactly where they are now on the topic, with no motion, probably since the beginning, and certainly for multiple decades.

[-]

James_K 8 months ago

I'm not particularly well versed in the history of either language. I just know that Python was supposed to be "simple and intuitive", but I've had quite a different experience of it and this has often been down to Perl-like things going on.

[-]

jerf 8 months ago

My personal opinion is that Python gave up on "simple and intuitive" a long time ago and people still citing that as a guiding principle of the language really need to sit down and just read the Python documentation again from start to finish, and then ponder for a moment. If they still need a clue, they are invited to the same thing to, say, Python 2.1, and ponder for a few more moments. It clearly isn't and hasn't been for a while. That is not, on its own, a bad thing necessarily. It just means it isn't a guiding principle anymore, or at the very least, it has moved way down the priority list. Plenty of languages have changed their guiding principles over time.

However, this is not an example of that. "if list:" has been the idiomatic Python since inception, and "if len(list):" has been an unnecessary complication for the same period of time. Python's "preferably one way to do it" has never been about "there is literally only one syntactically valid way to do it", for fairly obvious reasons if you think about it.

[-]

fphhotchips 8 months ago

Certainly "there should be one, and preferably only one, obvious way to solve a problem" hasn't been the case for a while, or maybe ever. See: tfa.

Perhaps it's just because I'm not Dutch.

James_K 8 months ago

> "if list:" has been the idiomatic Python since inception

I would argue the reverse. It was a bad idea to begin with and the start of something worse.

> Plenty of languages have changed their guiding principles over time.

What would you say is the guiding principal of Python now then?

[-]

jerf 8 months ago

"Idiomatic" doesn't mean "good". It means the normal way within the language. Personally after using both languages like Python with "truthiness" and languages that rigidly require all if clauses to evaluate to a boolean, rigidly and directly, I say the latter is unambiguously superior in practice. The former leads to surprises, sometimes even creating security vulnerabilities when a user can wedge an unexpected value into an if statement somehow.

It has unambiguously been idiomatic Python since the beginning. My opinion is that it is bad, but that's much more an opinion than the fact it has been idiomatic. And to Python's credit, part of the reason why I am so sure it's bad is precisely the experience I gained in Python using it. At the time Python was implementing the principle, I don't think the general experience of the programming language community was strong enough to know that it was a bad idea.

Python's guiding principle right now seems to be the same guiding principle as almost every other language, "let's solve as many problems as possible by adding features to the language". If a year goes by without at least one major new feature, the language must be "dead" or "failing". As the years wear on and so many languages have piled up so many features, I wonder when people will finally look around and realize that all these features, for all their superficial appeal in the small scale, are not generally helping them write better programs, or write programs they couldn't have written before, and often harm their programs on larger scales. There are exceptions. I have a hard time imagining any modern language without some concept of closures, for instance. I could name a few more; some sort of easy polymorphism (there's a few ways to get there but you need to take at least one of them... but preferably not all of them...), some sort of concurrency solution in this era, solutions for memory safety (again, multiple solutions, but you need at least one of them). But so many of these features are, in my opinion, not a net positive, their benefits far smaller than meets the eye and the costs so much greater.

Joker_vD 8 months ago

Well, lists in Python are not their length. They are convertible to a boolean via __bool__(), however, which is how the "if" tests them (or any other object that has this method) for "truthiness".

mulmboy 8 months ago

`if x: foo()` is a cancer on the Python community. Devs often use it with the intention of handling x being None, and carelessly lump in zero and empty lists/strings at the same time. Endless bugs.

[-]

smetj 8 months ago

Yup!

FridgeSeal 8 months ago

Python’s “truthiness” is a cutesy feature that is just an excuse for bugs in your code. It’s opaque/ too magic, exhibits poor readability and endless confusion.

Just use a normal check, like everyone is expecting to see.

“Oh but what if it’s not a sequence”, well then you have bigger problems. Why are you emptiness testing something that may-or-may-not-be-a-sequence? Maybe solve that problem first.

[-]

Chris_Newton 8 months ago

Python’s “truthiness” is a cutesy feature that is just an excuse for bugs in your code. It’s opaque/ too magic, exhibits poor readability and endless confusion.

Indeed. Relying on truthiness has always felt very un-Pythonic to me, not least because it contradicts several principles in the Zen of Python:

• Explicit is better than implicit.

• Special cases aren’t special enough to break the rules.

• There should be one — and preferably only one — obvious way to do it.

morkalork 8 months ago

Holy moly, that meme about type checkers and variable names, someone is arguing for hungarian notation in 2024?!

[-]

pseudalopex 8 months ago

Someone argued user_list, user_count, and has_users are clearer than users, users, and users. Will you argue the opposite?

The original Hungarian notation was a less readable implementation of the same idea. The Hungarian notation most people hated replaced functional types like count and index with data types like unsigned long. And used them everywhere.

wodenokoto 8 months ago

The idea that `if len(items):` indicates that you expect other things than sequences seems backwards to me.

It’s the duck typing nature of, classic Python, that leads the community to recommend the broader `if items:`, which allows for numbers and such.

tromp 8 months ago

I would prefer a standard list method/function to test for emptiness, which would be both readable and efficient.

[-]

chikere232 8 months ago

It won't be efficient as it's python, but it could be readable

JackSlateur 8 months ago

tldr: do not use 'if len(list) == 0', use 'if not list' !

You just have to not write any bug in your code. Also, use type checking everywhere. And rewrite your mind, too.

The benefits are worth it .. ! Oh well;

[-]

bogwog 8 months ago

> Also, use type checking everywhere

I can't think of anything more Pythonic than that!

082349872349872 8 months ago

[flagged]

[-]

8 months ago

[deleted]

ReflectedImage 8 months ago

Use the simplest syntax and check the code works via unit testing like you should be doing. Don't statically type your code as it increases the bugs by a factor of 3x typically.

[-]

wiml 8 months ago

How does statically typing your code "increase[] your bugs by a factor of 3x" when it has no runtime effect? What orifice are you getting the 3x from?

[-]

ReflectedImage 8 months ago

Typically, software developers write 3x the number of lines of code when using static typing compared to duck typing. It's just the nature of the static typing code style. Write 3x the lines, get 3x the bugs.

The programming language that has the least measured bug in practice is Clojure because it is duck typed and because it doesn't use OOP. Both static typing and OOP have a significant measurable negative effect on code correctness.

[-]

stnmtn 8 months ago

Clojure has the least measured bugs in practice? Where does that statistic come from?

xen0 8 months ago

I too can invent numbers.

My invented numbers say there's no meaningful difference in line counts between statically typed and dynamically typed languages.

So how will we prove any of us wrong?

[-]

8 months ago

[deleted]