Python 3.13, what didn't make the headlines

(bitecode.dev)

33 points | by pansa2 8 months ago ago

11 comments

sgarland 8 months ago

I feel like (by which I mean, in my tests for personal projects) performance peaked with 3.11, and has been going downhill since.

For example, 3.12 changed the byte code list comprehensions use, citing faster performance. In my tests, this is only true with small lists, and as soon as you get to a large number of items (~10K), it falls off a cliff. While large lists are probably not the norm, to have that completely ignored is maddening.

The 3.13 REPL is nice, I’ll give them that.

[-]

BiteCode_dev 8 months ago

The new generic syntax is great as well, the debugging improvements are handy and I do appreciate the new @deprecated. But 3.13 really shines, once again, with the improvement of error messages.

Now when you are a professional dev, you see them less and less, because you make less mistakes, but also because you heavily use tooling to prevent errors or diagnose them.

But if you observe beginners with modern versions of Python, it's quite magical really. The autonomy and confidence they get from those are huge.

The energy that Pablo Galindo Salgado injected into the culture of good error message has been a boon to the language.

tomtom1337 8 months ago

I updated the benchmark [1] that I wrote to someone who responded to you to include a even/odd filter and take the minimum time of 100 runs. Python 3.13 is much faster for small lists, and a tiny bit faster for larger lists. The times are the time taken for iterating per entry in the list comprehension.

Running on Python 3.12.7 (main, Oct 2 2024, 15:45:55) [Clang 18.1.8 ] 16.69ns on N=100 18.84ns on N=1_000 15.88ns on N=10_000 14.26ns on N=100_000 14.70ns on N=1_000_000

Running on Python 3.13.0 (main, Oct 16 2024, 08:05:40) [Clang 18.1.8 ] 9.54ns on N=100 17.88ns on N=1_000 15.81ns on N=10_000 14.14ns on N=100_000 14.46ns on N=1_000_000

[1] https://gist.github.com/thomasaarholt/2e4d42cbf3f0de60c811bb...

[-]

sgarland 8 months ago

For simple loops, yes, I also found that 3.13 is roughly equivalent to 3.12. Both are much slower than 3.11, though [0] – roughly 6-9% for lists, and in one weird edge case, 57% slower for array.array, but only on MacOS (I only tested ARM, no idea on x86). Linux was about the same as for lists.

[0]: https://github.com/python/cpython/issues/123540

[-]

tomtom1337 8 months ago

Thank you for sharing! I am convinced! Completely agree that this is a problem!

sitkack 8 months ago

Do you have some microbenchmarks for this? Would be a chance to use UV to setup Python 3.11 and Python 3.12 envs side by side.

https://github.com/astral-sh/uv

[-]

tomtom1337 8 months ago

Great minds think alike. Here's one I just cooked up. Python 3.13 is faster, at least for the simplest possible list comprehension with no filters. N is the length of the list comprehension. Repeated 100 times for each test and showing the average. Run on a MacBook Air M3.

https://gist.github.com/thomasaarholt/2e4d42cbf3f0de60c811bb...

  Running on Python 3.12.7 (main, Oct  2 2024, 15:45:55) [Clang 18.1.8 ]
  11.32ns on N=100
  9.42ns on N=1000
  7.32ns on N=10000
  5.23ns on N=100000
  5.87ns on N=1000000
  
  Running on Python 3.13.0 (main, Oct 16 2024, 08:05:40) [Clang 18.1.8 ]
  7.30ns on N=100
  5.91ns on N=1000
  5.09ns on N=10000
  4.31ns on N=100000
  5.89ns on N=1000000

zahlman 8 months ago

>Python 3.13 is still not significantly faster. Sorry.

Unsurprising. Attention to internals has been split by the GILectomy - I guess some people expected automatic gains from that, but that seemed unwarranted to me. It's normal for performance improvements in Python to be incremental (also in terms of memory efficiency - e.g. https://github.com/zahlman/python-dict-stats/), with here and there a big gain for a specific task (e.g. `math.factorial` getting a better algorithm in 3.x - https://stackoverflow.com/questions/9815252).

There also isn't generally a huge amount of pressure for these kinds of optimizations, given typical Python use cases. People who need performance are often already using a Python binding to code written in another language anyway.

> Ditto for zipfile.Path, a pathlib-compatible wrapper for traversing zip files you didn't know existed since 3.8. And you didn't know it because it sucked, frankly. But 3.13 brings many QoL patches to it, and improves a lot on how it handles directories, which used to require a lot of manual labor. So I expect it will see more use from now on.

It'd be nice if the various archive-handling libraries offered more of a common interface, including that sort of wrapper. BTW, Barney Gale (the dev primarily responsible for `pathlib`) is relatively active on https://discuss.python.org and generally a joy to talk with.

[-]

sgarland 8 months ago

> There also isn't generally a huge amount of pressure for these kinds of optimizations, given typical Python use cases. People who need performance are often already using a Python binding to code written in another language anyway.

You’re of course not wrong, it’s just disappointing to see regressions. A side project I have started as pure Python and has gradually shifted to Python calling various C functions via ctypes. Before anyone jumps in with “why not numpy” etc., I wanted to see how fast I could make Python using only the stdlib. Calling C code is only kind of cheating, IMO.

That aside, there do remain some surprising ways to gain performance in Python that aren’t usually looked at. For example, its string mini-format language (you know, before f-strings, and before .format()) is significantly faster than either alternative. Not surprising, since it’s essentially just printf. This doesn’t matter at all unless it’s in a hot loop, but most of my usage is, so I’ve spent an extremely long time in cProfile, perf, and memray doing optimizations.

[-]

zahlman 8 months ago

"mini-format language" in the documentation generally refers to the common system of type-specifying letters, width/precision specifiers, etc.

Anyway, f-strings can be faster since they're "pre-compiled", especially for simple cases:

    $ python -m timeit --setup 'foo="bar"' -- 'f"Is the {foo}tender here?"'
    5000000 loops, best of 5: 61.1 nsec per loop
    $ python -m timeit '"Is the %(foo)stender here?" % {"foo":"bar"}'
    1000000 loops, best of 5: 246 nsec per loop
    $ python -m timeit '"Is the {foo}tender here?".format(foo="bar")'
    1000000 loops, best of 5: 345 nsec per loop

Similarly with positional rather than named substitutions:

    $ python -m timeit 'f"""Is the {"bar"}tender here?"""'
    5000000 loops, best of 5: 60.4 nsec per loop
    $ python -m timeit '"Is the %stender here?" % "bar"'
    2000000 loops, best of 5: 124 nsec per loop
    $ python -m timeit '"Is the {}tender here?".format("bar")'
    2000000 loops, best of 5: 181 nsec per loop

[-]

sgarland 8 months ago

I get ~20% faster times with the first style than with f-strings for this specific use, which is when I discovered it.

    def test_with_cstr(epoch: float) -> str:
        return "%04d-%02d-%02d %02d:%02d:%02d" % (time.gmtime(epoch)[:6])

    def test_with_fstr(dt: datetime) -> str:
        return f"{dt.year:04d}-{dt.month:02d}-{dt.day:02d} {dt.hour:02d}:{dt.minute:02d}:{dt.second:02d}"

EDIT: bad example, there’s other function overhead here as well.