State of Python 3.13 performance: Free-threading

(codspeed.io)

196 points | by art049 8 months ago ago

192 comments

I don't really have a dog in this race as I don't use Python much, but this sort of thing always seemed to be of questionable utility to me.

Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible, so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

On the other hand, a lot of these changes to try and speed up the base language are going to be highly disruptive. E.g. disabling the GIL will break tonnes of code, lots of compilation projects involve changes to the ABI, etc.

I guess getting loops in Python to run 5-10x faster will still save some people time, but it's also never going to be a replacement for the zoo of specialized python-like compilers because it'll never get to actual high performance territory, and it's not clear that it's worth all the ecosystem churn it might cause.

[-]

andai 8 months ago

There was a discussion the other day about how Python devs apparently don't care enough for backwards compatibility. I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

But then a few hours later, I tried running a very small project I wrote last year and it turned out that a bunch of my dependencies had changed their APIs. I've had similar (and much worse) experiences trying to get older code with dependencies running.

My meaning with this comment is, that if the average developer's reality is that backwards compatibility isn't really a thing anyway, then we are already paying for that downside so we might as well get some upside there, is my reasoning.

[-]

adamc 8 months ago

It's hard to comment on this without knowing more about the dependencies and when/how they changed their APIs. I would say if it was a major version change, that isn't too shocking. For a minor version change, it should be.

Stuff that is actually included with Python tends to be more stable than random Pypi packages, though.

NPM packages also sometimes change. That's the world.

[-]

Timon3 8 months ago

The big difference is that npm will automatically (since 2017) save a version range to the project metadata, and will automatically create this metadata file if it doesn't exist. Same for other package managers in the Node world.

I just installed Python 3.13 with pip 24.2, created a venv and installed a package - and nothing, no file was created and nothing was saved. Even if I touch requirements.txt and pyproject.toml, pip doesn't save anything about the package.

This creates a massive gap in usability of projects by people not very familiar with the languages. Node-based projects sometimes have issues because dependencies changed without respecting semver, but Python projects often can't be installed and you have no idea why without spending lots of time looking through versions.

Of course there are other package managers for Python that do this better, but pip is still the de-facto default and is often used in tutorials for new developers. Hopefully uv can improve things!

[-]

__mharrison__ 8 months ago

I recommend to start using UV.

It is very fast and tracks the libraries you are using.

After years of venv/pip, I'm not going back (unless a client requires it).

[-]

jessekv 8 months ago

Another nice thing about uv is it can install python itself in the venv.

So no need to mess around with brew/deadsnakes and multiple global python versions on your dev system.

This is actually an improvement over the node/nvm approach.

elashri 8 months ago

> Of course there are other package managers for Python that do this better

I think if you are comparing with what NPM does then you would have to say that native pip can do that too. It is just one command

`pip freeze > requirements.txt`

It does include everything in the venv (or in you environment in general) but if you stick to only add required things (one venv for each project) then you will get requirements.txt files

[-]

Timon3 8 months ago

Sure, you can manually do that. But my point is that pip doesn't do this automatically, and that is what makes so many Python projects essentially unusable without investing massive amounts of time into debugging. Good defaults and good standards matter a lot.

[-]

elashri 8 months ago

> without investing massive amounts of time into debugging

Again even if you are going to spend sometime to learn something that will have better tool for doing that like uv and poetry package managers. This is no massive amount of time. And eveb pip freeze is just one standard command and will give you portable environment be everything will be pinned in your environment. You just don't want to do everything with system global environment which is a common sense not a huge ask.

So I am not sure what is the massive amount of debugging needes for that.

[-]

Timon3 8 months ago

I am specifically talking about the scenario where I, someone experienced with Python, am trying to use a project from someone who is not experienced in Python. There is nothing I can change since the project was already created some time ago. Most of the time, the only way for me to get the project running is to spend a lot of time debugging version issues - to find combinations of dependencies that are installable, and then to see if they work in the application context, and then to see if any bugs are part of the application or the dependency.

You might explain that away by asking why I'd want to run these projects, but a large percentage of data science projects I've tried to run end up in this scenario, and it simply doesn't happen with npm. A command existing is not a valid replacement for good defaults, since it literally affects my ability to run these projects.

whyever 8 months ago

I don't think this is the same. Does it also cover transitive dependencies?

[-]

elashri 8 months ago

Sorry if what I said about NPM is not accurate. But in reality if you are pinning the dependencies (all of them actual get pinned) then when pip is installing it will grab the correct version of the transitive dependency (both packages are pinned)

So I am not sure when this will become a problem.

VagabundoP 8 months ago

All that can be specified in a pyproject.toml.

As some posters mentioned uv takes care of a lot of that and you can even pin it to a version of python.

If it’s just a one off script you can get all the dep information in the script header and uv can take care of all the venv/deps for you if you transfer the script to another machine by reading the headers in a comment section at the start of the script.

All this is based on PEPs to standardise packaging. It’s slow but moving in the right direction.

[-]

Timon3 8 months ago

What do I have to put into pyproject.toml so that pip saves dependency ranges by default?

[-]

VagabundoP 8 months ago

So pyproject.toml will be used by uv and others, like poetry. Pip uses a requirements.txt for depandancy management.

Using uv as an example[0]:

uv add "tqdm >=4.66.2,<5"

[0]https://docs.astral.sh/uv/concepts/dependencies/#project-dep...

I don't have access to uv to test that command at the moment, but that should work. uv then installs the dependency in the .venv directory in the project directory. This may include a specific version of python as well, if you pin one.

[-]

Timon3 8 months ago

As I already replied to another user, I know that this exists, and it doesn't have anything to do with my point. So sadly your suggestion doesn't help in any way.

andai 8 months ago

Yeah, I guess I should have done a pip freeze to specify the versions in the requirements file. I wasn't thinking ahead.

Turns out one dependency had 3 major releases in the span of a year! (Which basically confirms what I was saying, though I don't know how typical that is.)

[-]

cdchn 8 months ago

3rd party package maintainers usually don't do as good a job of maintaining backwards compatibility or doing it right as do the core library maintainers, thats why you were able to upgrade from 2 to 3 by changing print to print() but then sometimes dependencies you install with pip break for inexplicable reasons.

klysm 8 months ago

So pin your deps? Language backwards compatibility and an API from some random package changing are completely distinct.

[-]

nomel 8 months ago

> So pin your deps?

Which is, fairly often, pinning your python version.

epistasis 8 months ago

Pinning deps is discouraged by years of Python practice. And going back to a an old project and finding versions that work, a year or more later, might be nigh on impossible.

Last week I was trying to install snakemake via Conda, and couldn't find any way to satisfy dependencies at all, so it's not just pypi, and pip tends to be one of the more forgiving version dependency managers.

It's not just Python, trying to get npm to load the requirements has stopped me from compiling about half of the projects I've tried to build (which is not a ton of projects). And CRAN in the R universe can have similar problems as projects age.

[-]

vosper 8 months ago

> Pinning deps is discouraged by years of Python practice.

I'm not sure it is discouraged so much as just not what people did in Python-land for a long time. It's obviously the right thing to do, it's totally doable, it's just inertia and habit that might mean it isn't done.

[-]

lmm 8 months ago

> I'm not sure it is discouraged so much as just not what people did in Python-land for a long time. It's obviously the right thing to do, it's totally doable, it's just inertia and habit that might mean it isn't done.

Pinning obviously the wrong thing, it only works if everyone does it and if everyone does it then making changes becomes very hard. The right thing is to have deterministic dependency resolution so that dependencies don't change under you.

[-]

WillDaSilva 8 months ago

When they suggest you pin your dependencies, they don't just mean your direct dependencies, but rather all transitive dependencies. You can take this further by having a lock file that account for different Python versions, operating systems, and CPI architectures – for instance , by using UV or Poetry – but a simple `pip freeze` is often sufficient.

[-]

epistasis 8 months ago

That works for your project, but then nobody can include you as a library without conflicts.

But having that lock file will allow somebody to reconstruct your particular moment in time in the future. Its just that those lock files do not exist for 99.9% of Python projects in time.

[-]

dagw 8 months ago

That works for your project, but then nobody can include you as a library without conflicts.

I think this is the core to much misunderstandings and arguments around this question. Some people are writing code that only they will run, on a python they've installed, on hardware they control. Others are writing code that has to work on lots of different versions of python, on lots of different hardware, and when being run in all kinds of strange scenarios. These two groups have quite different needs and don't always understand each other and the problems they face.

zelphirkalt 8 months ago

A lib can still lock its dependencies and have version ranges declared at the same time. The lock file is an artifact than is used to reproducibly build the lib, while the version ranges are used to see, whether some other project can use the lib.

It is only a matter of tooling. Locking ones dependencies remains the right thing to do, even for a lib.

[-]

epistasis 8 months ago

This is of course the right answer. But unfortunately it has only recently become supported by packaging tooling, and is extremely uncommon to encounter in the wild.

LtWorf 8 months ago

If you include a range you have to test with everything in the range.

LtWorf 8 months ago

For some reason the "secure" thing to do is considered to be to pin everything and then continuously bump everything to latest, to get the security fixes.

At which point one might directly not pin, but that's "insecure" (https://scorecard.dev/)

klysm 8 months ago

That doesn’t match my experience at all. I have many Python projects going back years that all work fine with pinned dependencies

nicce 8 months ago

It took me few days to get some old Jupyter Notebooks working. I had to find the correct older version of Jupyter, correct version of the every plugin/extension that notebook used and then I had to find the correct version of every dependency of these extensions. Only way to get it working was a bunch of pinned dependencies.

[-]

zelphirkalt 8 months ago

Had they been properly pinned before, you would not have had to work for a few days. Code in a Jupyter notebook is unlikely to be relied upon elsewhere. Perfectly good for making it always use the exact same versions (checked by checksums, whatever tool you are using).

physicsguy 8 months ago

Pinning by ‘pip freeze’ only works for a specific platform since there are often differences in the wheels available, particularly for older things.

Conda is it’s own kettle of fish especially given different channels and conda-forge which you have to remember.

devjab 8 months ago

I’m curious as to which packages you are unable to find older versions for. You mention snakemake, but that doesn’t seem to have any sort of issues.

https://pypi.org/project/snakemake/#history

[-]

epistasis 8 months ago

It's not about finding old packages, it's about not finding the magical compatible set of package versions.

Pip is nice in that you can install packages individually to get around some version conflicts. But with conda and npm and CRAN I have always found my stuck without being able to install dependencies after 15 minutes of mucking.

Its rare that somebody has left the equivalent of the output of a `pip freeze` around to document their state.

With snakemake, I abandoned conda and went with pip in a venv, without filing an issue. Perhaps it was user error from being unfamiliar with conda, but I did not have more time to spend on the issue, much less doing the research to be able to file a competent issue and follow up later on.

[-]

devjab 8 months ago

It’s a little hard for me to talk about Python setups which don’t use Poetry as that is basically the standard around here. I would argue that not controlling your packages regardless of the package manager you use is very poor practice.

How can you reasonably expect to work with any tech that breaks itself by not controlling its dependencies? You’re absolutely correct that this is probably more likely to be an issue with Python, but that’s the thing with freedom. It requires more of you.

BerislavLopac 8 months ago

Yes and no.

There are different types of dependencies, and there are different rules for them, but here's an overview of the best practices:

1. For applications, scripts and services (i.e. "executable code"), during development, pin your direct dependencies; ideally to the current major or minor version, depending how much you trust their their authors to follow SemVer. Also make sure you regularly update and retest the versions to make sure you don't miss any critical updates.

You should not explicitly pin your transitive dependencies, i.e. the dependencies of your direct dependencies -- at least unless you know specifically that certain versions will break your app (and even then it is better to provide a range than a single version).

2. For production builds of the above, lock each of your dependencies (including the transitive ones) to specific version and source. It is not really viable to do it by hand, but most packaging tools -- pip, Poetry, PDM, uv... -- will happily do that automatically for you. Unfortunately, Python still doesn't have a standard lock format, so most tools provide their own lock file; the closest thing to a standard we have at the moment is pip's requirements file [0].

Besides pinned versions, a lock file will also include the source where the packages are to be retrieved from (pip's requirements file may omit it, but it's then implicitly assumed to be PyPI); it can (and should, really) also provide hashes for the given packages, strengthening the confidence that you're downloading the correct packages.

3. Finally, when developing libraries (i.e. "redistributable code"), you should never pin your dependencies at all -- or, at most, you can specify the minimum versions that you know that work and have tested against. That is because you have no control over the environment the code will eventually be used and executed in, and arbitrary limitations like that might (and often will) prevent your users to update some other crucial dependency.

Of course, the above does not apply if you know that a certain version range will break your code. It that case you should most definitely exclude it from your specification -- but you should also update your code as soon as possible. Libraries should also clearly specify which versions of Python they support, and should be regularly tested against each of those versions; it is also recommended that the minimal supported version is regularly reviewed and increased as new versions of Python get released [1].

For more clarity on abstract vs concrete dependencies, I recommend the great article by Donald Stufft from 2013 [2]; and for understanding why top-binding (i.e. limiting the top version pin) should be avoided there is a lengthy but very detailed analysis by Henry Schreiner [3].

[0] https://pip.pypa.io/en/stable/reference/requirements-file-fo...

[1] https://devguide.python.org/versions/

[2] https://caremad.io/posts/2013/07/setup-vs-requirement/

[3] https://iscinumpy.dev/post/bound-version-constraints/

[-]

ivoflipse 8 months ago

I reckon you're aware of it, but they're actively discussing a lock file format PEP and I'm quite hopeful this time it will actually get accepted

https://discuss.python.org/t/pep-751-now-with-graphs/69721

zelphirkalt 8 months ago

In a poetry lock file transitive dependencies are automatically locked and thereby pinned. It will ensure, that you get the same thing each time, or get an error about things not matching hashsums, when something suspicious is going on, that would be worth raising an issue on a repo, if none exists.

[-]

BerislavLopac 8 months ago

> In a poetry lock file transitive dependencies are automatically locked and thereby pinned

That is true for all formats of lock files, by definition.

saurik 8 months ago

The Python 2 to 3 thing was worse when they started: people who made the mistake of falling for the rhetoric to port to python3 early on had a much more difficult time as basic things like u"" were broken under an argument that they weren't needed anymore; over time the porting process got better as they acquiesced and unified the two languages a bit.

I thereby kind of feel like this might have happened in the other direction: a ton of developers seem to have become demoralized by python3 and threw up their hands in defeat of "backwards compatibility isn't going to happen anyway", and now we live in a world with frozen dependencies running in virtual environments tied to specific copies of Python.

[-]

Shawnj2 8 months ago

Honestly I think a big issue is that it’s not just legacy code, it’s also legacy code which depends on old dependency versions. Eg there’s an internal app where I work stuck on 3.8.X because it uses deprecated pandas syntax and is too complicated to rewrite for a newer version easily.

dataflow 8 months ago

> Python 2 code running on Python 3 by just changing print to print().

This was very much the opposite of my experience. Consider yourself lucky.

[-]

dietr1ch 8 months ago

This migration took the industry years because it was not that simple.

[-]

selcuka 8 months ago

> This migration took the industry years because it was not that simple.

It was not that simple, but it was not that hard either.

It took the industry years because Python 2.7 was still good enough, and the tangible benefits of migrating to Python 3 didn't justify the effort for most projects.

Also some dependencies such as MySQL-python never updated to Python 3, which was also an issue for projects with many dependencies.

LtWorf 8 months ago

Maybe his application was an hello world!

salomonk_mur 8 months ago

What APIs were broken? They couldn't be in the standard library.

If the dependency was in external modules and you didn't have pinned versions, then it is to be expected (in almost any active language) that some APIs will break.

[-]

dagw 8 months ago

They couldn't be in the standard library.

Why not? Python does make breaking changes to the standard library when going from 3.X to 3.X+1 quite regularly.

[-]

noitpmeder 8 months ago

Only usually after YEARS of deprecation warnings

[-]

physicsguy 8 months ago

Async was a good example where that didn’t happen, but to be fair to the maintainers, it was fairly experimental

LtWorf 8 months ago

Python drops modules from the standard library all the time these days. It's a pain in the ass.

Now even asyncore is gone -_-' Have fun rewriting all the older async applications!

kwertzzz 8 months ago

Sadly, several python projects do not use semantic versioning, for example xarray [0] and dask. Numpy can make backward incompatible changes after a warning for two releases[1]. In general, the python packaging docs do not really read as an endorsement of semantic versioning [2]:

> A majority of Python projects use a scheme that resembles semantic versioning. However, most projects, especially larger ones, do not strictly adhere to semantic versioning, since many changes are technically breaking changes but affect only a small fraction of users...

[0] https://github.com/pydata/xarray/issues/6176

[1] https://numpy.org/doc/stable/dev/depending_on_numpy.html

[2] https://packaging.python.org/en/latest/discussions/versionin...

musicale 8 months ago

Even after it finished burning a billion lines of python 2 code (largely unnecessarily imho) python 3 seems to retain an unhealthy contempt for backward compatibility. I have had similar experiences where python 3 projects require a particular version of python 3 in order to run.

I like python (and swift for that matter) but I don't like the feeling that I am building on quicksand. Java, C++, and vanilla javascript seem more durable.

almostgotcaught 8 months ago

> I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

...

> I wrote last year and it turned out that a bunch of my dependencies had changed their APIs

these two things have absolutely nothing to do with each other - couldn't be a more apples to oranges comparison if you tried

[-]

andai 8 months ago

I ran into both of these things in the same context, which is "the difficulty involved in getting old code working on the latest Python environment", which I understood as the context of this discussion.

LtWorf 8 months ago

I'd drop libraries that do like that.

rfoo 8 months ago

> Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible

Scientific computing community have a bunch of code calling numpy or whatever stuff. They are pretty fast because, well, numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.

Okay, you may ask, why not just use a lot of processes and message-passing? That's how historically people work around the GIL issue. However, you need to either swallow the cost of serializing data over and over again (pickle is quite slow, even it's not, it's wasting precious memory bandwidth), or do very complicated dance with shared memory.

It's not for web app bois, who may just write TypeScript.

[-]

willseth 8 months ago

This is misleading. Most of the compute intensive work in Numpy releases the GIL, and you can use traditional multithreading. That is the case for many other compute intensive compiled extensions as well.

[-]

rfoo 8 months ago

No. Like the siblings said, say, you have a program which spends 10% time in Python code between these numpy calls. The code is still not scalable, because you can run at most 10 such threads in a Python process before you hit the hard limit imposed by GIL.

There is no need to eliminate the 10% or make it 5% or whatever, people happily pay 10% overhead for convenience, but being limited to 10 threads is a showstopper.

[-]

willseth 8 months ago

Python has no threads or processes hard limit, and the pure Python code in between calls into C extensions is irrelevant because you would not apply multithreading to it. Even if you did, the optimal number of threads would vary based on workload and compute. No idea where you got 10.

[-]

rfoo 8 months ago

> No idea where you got 10.

Because of GIL, there may be at most one thread in a Python process running pure Python code. If I have a computation which takes 10% time in pure Python, 90% time in C extensions, I can only launch at most 10 threads, because 10 * 10% = 100%, and expect mostly linear scalability.

> the pure Python code in between calls into C extensions is irrelevant because you would not apply multithreading to it

No. There is a very important use case where the entire computation, driven by Python, is embarrassingly parallel and you'd want to parallize that, instead of having internal parallization in each your C extensions call. So the pure Python code in between calls into C extensions MUST BE SCALABLE. C extensions code may not launch thread at all.

[-]

willseth 8 months ago

This is your original comment, which as stated is simply incorrect

> numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.

Now you have concocted this arbitrary example of why you can't use multithreading that has nothing to do with your original comment or my response.

> instead of having internal parallelization in each your C extensions call ... C extensions code may not launch thread at all.

I don't think you understood my comment - or maybe you don't understand Python multithreading. If a C extension is single threaded but releases the GIL, you can use multithreading to parallelize it in Python. e.g. `ThreadPool(processes=100)` will create 100 threads within the current Python process and it will soak all the CPUs you have -- without additional Python processes. I have done this many times with numpy, numba, vector indexes, etc.

Even for your workload, using multithreading for the GIL-free code in a hierarchical parallelization scheme would be far more efficient than naive multiprocessing.

[-]

rfoo 8 months ago

> This is your original comment, which as stated is simply incorrect

I apologize if I can't make you understand what I said. But I still believe I said it clearly and it is simply correct.

Anyway, let me try to mansplain it again, "numpy isn't written in Python" - and numpy releases GIL. So as long as a Python code calls into numpy, the thread on which the numpy call runs can go without GIL. The GIL could be taken by another thread to run Python code. I have easily have 100 threads running in numpy without any GIL issue. BUT, they eventually needs to return to Python and retake GIL. Say, for each such thread and for every 1 second there is 0.1 seconds they need to run pure Python code (and must hold GIL). Please tell me how to scale this to >10 threads.

> Now you have concocted this arbitrary example of why you can't use multithreading that has nothing to do with your original comment or my response.

The example is not arbitrary at all. This is exactly the problem people are facing TODAY in ANY DL training written in PyTorch.

I have a Python thread, driving GPUs in an async way, it barely runs any Python code, all good. No problem.

Then, I need to load and preprocess data [1] for the GPUs to consume. I need very high velocity changing this code, so it looks like a stupid script, reads data from storage, and then does some transformation using numpy / again whatever shit I decided to call. Unfortunately, as dealing with the data is largely where magic happens in today's DL, the code spend non-trivial time (say, 10%) in pure Python code manipulating bullshit dicts in between all numpy calls.

Compared to what happens on GPUs this is pretty lightweight, this is not latency sensitive and I just need good enough throughput, there's always enough CPU cores alongside GPUs, so, ideally, I just dial up the concurrency. And then I hit the GIL wall.

> I don't think you understood my comment - or maybe you don't understand Python multithreading. If a C extension is single threaded but releases the GIL, you can use multithreading to parallelize it in Python. e.g. `ThreadPool(processes=100)` will create 100 threads within the current Python process and it will soak all the CPUs you have -- without additional Python processes. I have done this many times with numpy, numba, vector indexes, etc.

I don't think you understood what you did before and you already wasted a lot of CPUs.

I never talked about doing any SINGLE computation. Maybe you are one of those HPC gurus who care and only care about solving single very big problem instances? Otherwise I have no idea why you are even talking about hierarchical parallelization after I already said that a lot of problem is embarrassingly parallel and they are important.

[1] Why don't I simply do it once and store the result? Because that's where actual research is happening and is what a lot of experiments are about. Yeah, not model architectures, not hyper-parameters. Just how you massage your data.

[-]

willseth 8 months ago

Your original comment as written claims that numpy cannot scale using threads because of the GIL. You admit that is wrong, but somehow can't read your comment back and understand that it says that. What you really meant was that combinations of pure Python and numpy don't scale trivially using threads, which is true but not what you wrote. You were actually just thinking of your PyTorch specific use case, which you evidently haven't figured out how to scale properly, and oversimplified a complaint about it.

> I don't think you understood what you did before and you already wasted a lot of CPUs.

No CPUs were wasted lol. You are clearly confused about how threads and processes in Python work. You also don't seem to understand hierarchical parallelization, which is simply a pattern that works well in cases where you can better maximize parallelism using combination of processes and threads.

There are probably better ways to address your preprocessing problem, but I get the impression you're one of those people only incidentally using Python out of necessity to run PyTorch jobs and frustrated or haven't yet come to the realization that you need to learn how to optimize your Python compute workload because PyTorch doesn't do everything for you automatically.

PaulHoule 8 months ago

It’s an Amdahl’s law sort of thing, you can extract some of the parallelism with scikit-learn but what’s left is serialized. Particularly for those interactive jobs where you might write plain ordinary Python snippets that could get a 12x speedup (string parsing for a small ‘data lake’)

In so far as it is all threaded for C and Python you can parallelize it all with one paradigm that also makes a mean dynamic web server.

eigenspace 8 months ago

Numpy is not fast enough for actual performance sensitive scientific computing. Yes threading can help, but at the end of the day the single threaded perf isn't where it needs to be, and is held back too much by the python glue between Numpy calls. This makes interproceedural optimizations impossible.

Accellerated sub-languages like Numba, Jax, Pytorch, etc. or just whole new languages are really the only way forward here unless massive semantic changes are made to Python.

[-]

rfoo 8 months ago

These "accelerated sub-languages" are still driven by, well, Python glue. That's why we need free-threading and faster Python. We want the glue to be faster because it's currently the most accessible glue to the community.

In fact, Sam, the man behind free-threading, works on PyTorch. From my understanding he decided to explore nogil because GIL is holding DL trainings written in PyTorch back. Namely, the PyTorch DataLoader code itself and almost all data loading pipelines in real training codebases are hopeless bloody mess just because all of the IPC/SHM nonsense.

wormlord 8 months ago

> On the other hand, a lot of these changes to try and speed up the base language are going to be highly disruptive. E.g. disabling the GIL will break tonnes of code, lots of compilation projects involve changes to the ABI, etc.

Kind of related, the other day I was cursing like a sailor because I was having issues with some code I wrote that uses StrEnum not working with older versions of Python, and wondering why I did that, and trying to find the combination of packages that would work for the version of Python I needed-- wondering why there was so much goddamn churn in this stupid [expletive] scripting language.

But then I took a step back and realized that, actually, I should be glad about the churn because it means that there is a community of developers who care enough about the language to add new features and maintain this language so that I can just pipe PyQt and Numpy into each other and get paid.

I don't have any argument, just trying to give an optimistic perspective.

[-]

d0mine 8 months ago

At least bugfix versions could have kept Enum behavior the same. Postponing breaking changes until the next minor version. Some Enum features work differently (incompatible) in Python 3.11.x versions.

[-]

wormlord 8 months ago

> Some Enum features work differently (incompatible) in Python 3.11.x versions.

I wasn't aware of that, that's actually insane. It's odd to me that it took so long to get f-strings and Enums right in Python, I assumed those would be pretty easy language features to implement.

roelschroeven 8 months ago

> Some Enum features work differently (incompatible) in Python 3.11.x versions.

I know that Python 3.11 added some things, like StrEnum; those obviously won't work on older Python versions. But I'm not aware of things that work in a certain Python 3 version but don't work in newer ones. You're even talking about incompatibilities between different 3.11.x versions? Can you give some more detail on that?

[-]

d0mine 8 months ago

Yes, the issue is the change in behavior in a bugfix version. I don't remember the exact details.

lmm 8 months ago

Meh. Anyone can improve a language if they don't care about keeping backward compatibility, just like new programming languages and new codebases always look gloriously clean. The part that requires actual discipline and expertise is improving it while keeping the breakages under control.

6gvONxR4sf7o 8 months ago

> so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

I don't even understand what this means. If I write `def foo(x):` versus `def foo(x: int) -> float:`, one is a restricted subset of the other, but both are the language's "real" semantics. Restricted subsets of languages are wildly popular in programming languages, and for very varied reasons. Why should that be a barrier here?

Personally, if I have to annotate some of my code that run with C style semantics, but in return that part runs with C speed, for example, then I just don't really mind it. Different tools for different jobs.

[-]

almostgotcaught 8 months ago

> If I write `def foo(x):` versus `def foo(x: int) -> float:`, one is a restricted subset of the other, but both are the language's "real" semantics.

You either are performing some wordplay here or you don't understand but type hints are not part of the semantics at all: since they are not processed at all they do not affect the behavior of the function (that's what semantics means).

EDIT: according to the language spec and current implementation

`def foo(x: int) -> float`

and

`def foo(x: float) -> int`

are the same exact function

[-]

movpasd 8 months ago

That's not strictly true -- type annotations are accessible at runtime (that's how things like dataclasses and pydantic work).

the__alchemist 8 months ago

This is a good question, and I think about it as well. My best guess for a simple explanation: Python is very popular; it makes sense to improve performance for python users, given many do not wish to learn to use a more performant language, or to use a more performant Python implementation. Becoming proficient in a range of tools so you can use the right one for the right job is high enough friction that it is not the path chosen by many.

[-]

devjab 8 months ago

You should really add that Python is also a very good tool for people who know more performant languages. I think one of the sides which often gets forgotten is that a lot of software will never actually need to be very performant and often you’re not going to know the bottlenecks beforehand. If you even get to the bottlenecks it means you’ve succeeded enough to get to the bottlenecks. Somewhere you might not have gotten if you over engineered things before you needed it.

What makes Python brilliant is that it’s easy to deliver on business needs. It’s easy to include people who aren’t actually software engineers but can write Python to do their stuff. It’s easy to make that Wild West code sane. Most importantly, however, it’s extremely easy to replace parts of your Python code with something like C (or Zig).

So even if you know performant languages, you can still use Python for most things and then as glue for heavy computation.

Now I may have made it sound like I think Python is brilliant so I’d like to add that I actually think it’s absolute trash. Loveable trash.

[-]

nomel 8 months ago

> it’s extremely easy to replace parts of your Python code with something like C

I tend to use C++, so use SWIG [1] to make python code to interface with C++ (or C). You can nearly just give it a header file, and a python class pops out, with native types and interfaces. It's really magical.

[1] https://www.swig.org

lanstin 8 months ago

I do think not being able to use 32 cores easily is a gap in the current language. In 2017 I rewrote a fairly high performance python dialog to Kafka daemon in Go. The python version require a lot of specialized knowledge to write, using gevent, PyPy, hand optimizing our framework for the hot paths etc, and still was only using a few cores to do work.

The Go was a dead simple my first Go project sort of implementation and used 32 cores and therefore worked much better right out of the gate. (I mean I did have go routine worker pools for each step of the processing, but the division of work into the stages was already in the Python code).

So yeah Python is easy to make less of a mess of for lots of people, until you want to use all your cores (which again means rerunning your things goes from four minutes to 15 seconds on that fancy laptop).

eigenspace 8 months ago

Oh yeah, I totally get the motivation behind it. It's always very tempting to want to make things faster. But I can't help but wondering if these attempts to make it faster might end up just making it worse.

On the other hand though, Python is so big and there's so many corps using it with so much cash that maybe they can get away with just breaking shit every few releases and people will just go adapt packages to the changes.

[-]

lanstin 8 months ago

I think I am much happier to redo my personal scripts to stay at bleeding edge than to rewrite old code at work to stay in supported versions. For corporations wanton breaking changes means code that is working and has some non-zero risk to change has to have budget spent on it just to stay in place.

Just this week I failed to convince a team to migrate their ten year old maintenance mode demon to Python 3.

dagmx 8 months ago

Python famously has a community that does NOT adapt to changes well. See the Python 2 to 3 transition.

[-]

adamc 8 months ago

That was, in many ways, a crazy difficult transition. I don't think most languages have gone through such a thing. Perl tried and died. So I don't agree that it reflects poorly on the community; I think the plan itself was too ambitious.

[-]

dagmx 8 months ago

Many languages have. There were significant breaks in C++ when stringabi changed, Swift has had major language changes, rust has editions.

The difference is in what motivates getting to the other end of that transition bump and how big the bump is. That’s why it took till past 2.7’s EOL to actually get people on to 3 in a big way because they’d drag their feet if they don’t see a big enough change.

Compiled languages have it easier because they don’t need to mix source between dependencies, they just have to be ABI compatible.

eigenspace 8 months ago

Python's community was significantly smaller and less flushed with cash during the 2 to 3 transition. Since then there has been numerous 3.x releases that were breaking and people seem to have been sucking it up and dealing with it quietly so far.

The main thing is that unlike the 2 to 3 transition, they're not breaking syntax (for the most part?), which everyone experiences and has an opinion on, they're breaking rather deep down things that for the most part only the big packages rely on so most users don't experience it much at all.

[-]

dagmx 8 months ago

I disagree with this entire comment.

The Python community consisted of tons of developers including very wealthy companies. At what point in the last few years would you even say they became “rich enough” to do the migration? Because people are STILL talking about trying to fork 2.7 into a 2.8.

I also disagree with your assertion that 3.x releases have significant breaking changes. Could you point to any specific major breaking changes between 3.x releases?

2 to 3 didn’t break syntax for most code either. It largely cleaned house on sensible API defaults.

[-]

wruza 8 months ago

Could you point to any specific major breaking changes between 3.x releases?

I can not, but I can tell you that anything AI often requires finding a proper combination of python + cuXXX + some library. And while I understand cu-implications, for some reason python version is also in this formula.

I literally have four python versions installed and removed from PATH, because if I delete 3.9-3.11, they will be needed next day again and there’s no meaningful default.

[-]

holdenweb 8 months ago

They'll all co-exist. Add them all to your PATH, make one the default python3, and request specific versions when they are required.

[-]

wruza 8 months ago

I don't need the default, because which python to choose depends on project's dependencies which may work only for specific versions. And then you build a venv and it lives there. It's not some safety measure, it's a natural choice. I don't need bare `python` command outside of venv.

That said, 3.12 doesn't even have python312.exe in its folder. If 3.13 follows, having both of them in PATH is useless.

dagmx 8 months ago

Those are ABI changes and not changes to the language.

[-]

wruza 8 months ago

If these were just ABI changes, packagers would simply re-package under a new ABI. Instead they specify ranges of versions in which "it works". The upper end often doesn't include the last python version and may be specified as "up to 3.x.y" even.

Sure I'm not that knowledgeable in this topic (in python). But you're telling me they go to the lengths of supporting e.g. 3.9-3.11.2, but out of lazyness won't just compile it to 3.12?

I can only hypothesize that 3.9-3.xxx had the same ABI and they don't support multiple ABIs out of principle, but that sounds like a very strange idea.

adamc 8 months ago

2 to 3 broke lots of code. Print became a function. Imports moved around. And there were subtle changes in the semantics of some things. Famously, strings changed, and that definitely affected a lot of packages.

Quite a bit of that could be fixed by automated tooling, but not all of it, and the testing burden was huge, which meant a lot of smaller packages did not convert very quickly and there were ripple effects.

[-]

dagmx 8 months ago

Yes 2 to 3 changed things. We’re discussing what changed in between different versions of 3.

eigenspace 8 months ago

Fair enough. You may be totally right here, as I mentioned I don't use Python much at all since like 2017 and haven't paid it much attention in a while. I retract my comment.

Regarding breakage in 3.x, all I know is that I recall several times where I did a linux system update (rolling release), and that updated my Python to a newly released version which broke various things in my system. I'm pretty sure one of these was v3.10, but I forget which others caused me problems which I could only solve by pinning Python to an older release.

It's entirely possible though that no actual APIs were broken and that this was just accidentaly bugs in the release, or the packages were being naughty and relying on internals they shouldn't have relied on or something else.

[-]

dagmx 8 months ago

To your last point: it’s neither the language nor the packages but rather it’s the ABI.

Python isn’t fully ABI stable (though it’s improved greatly) so you can’t just intermix compiled dependencies between different versions of Python.

This is true for many packages in your distro as well.

[-]

dbsmith83 8 months ago

There have been many breaking changes throughout python 3.x releases:

- standard library modules removed

- zip error handling behaves differently

- changes to collections module

- new reserved keywords (async, await, etc.)

You can argue how big of a deal it is or isn't, but there were definitely breakages that violate semantic versioning

[-]

maleldil 8 months ago

Python doesn't follow SemVer, that's why.

https://peps.python.org/pep-2026/

[-]

dbsmith83 8 months ago

Clearly, which is what I was showing. A poor design decision imo

eigenspace 8 months ago

They removed entire standard library modules? Wut.

[-]

heisenzombie 8 months ago

Yes, e.g. https://peps.python.org/pep-0594/

eigenspace 8 months ago

Ah I see. Yeah, I guess I just think that for a language that's so dependent on FFI, instability in the ABI is defacto instability in the language as far as I'm concerned. But I understand that not everyone feels the same.

Almost every major noteworthy Python package uses the ABI, so instability there is going to constantly be felt ecosystem wide.

lanstin 8 months ago

My company would pay for 2.8. And a lot of internal teams that used Python 2 now use go for that. Not ML but devops/infra orchestration.

Python 3 made no sense from a cost/risk perspective for teams with a lot working and mostly finished Python 2 code.

hawski 8 months ago

Most of those are some old long deprecated things and in general those are all straight up improvements. Python is not my main thing so I'm not really good to answer this, but I listed a few that I am sure triggered errors in some code bases (I'm not saying they are all major). Python's philosophy makes most of those pretty easy to handle, for example instead of foo now you have to be explicit and choose either foo_bar or foo_baz. For example in C there still is a completely bonkers function 'gets' which is deprecated for a long time and it will be there probably for a long time as well. C standard library, Windows C API and Linux C API to large extent are add only, because things should stay bug-to-bug compatible. Python is not like that. This has its perks, but it may cause your old Python code to just not run. It may be easy to modify, but easy is significantly harder than nothing at all.

https://docs.python.org/3/whatsnew/3.3.html#porting-to-pytho...

> Hash randomization is enabled by default. Set the PYTHONHASHSEED environment variable to 0 to disable hash randomization. See also the object.__hash__() method.

https://docs.python.org/3/whatsnew/3.4.html#porting-to-pytho...

> The deprecated urllib.request.Request getter and setter methods add_data, has_data, get_data, get_type, get_host, get_selector, set_proxy, get_origin_req_host, and is_unverifiable have been removed (use direct attribute access instead).

https://docs.python.org/3/whatsnew/3.5.html#porting-to-pytho...

https://docs.python.org/3/whatsnew/3.6.html#removed

> All optional arguments of the dump(), dumps(), load() and loads() functions and JSONEncoder and JSONDecoder class constructors in the json module are now keyword-only. (Contributed by Serhiy Storchaka in bpo-18726.)

https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-...

> Removed support of the exclude argument in tarfile.TarFile.add(). It was deprecated in Python 2.7 and 3.2. Use the filter argument instead.

https://docs.python.org/3/whatsnew/3.8.html#api-and-feature-...

> The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() or time.process_time() instead, depending on your requirements, to have well-defined behavior. (Contributed by Matthias Bussonnier in bpo-36895.)

https://docs.python.org/3/whatsnew/3.9.html#removed

> array.array: tostring() and fromstring() methods have been removed. They were aliases to tobytes() and frombytes(), deprecated since Python 3.2. (Contributed by Victor Stinner in bpo-38916.)

> Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

> The encoding parameter of json.loads() has been removed. As of Python 3.1, it was deprecated and ignored; using it has emitted a DeprecationWarning since Python 3.8. (Contributed by Inada Naoki in bpo-39377)

> The asyncio.Task.current_task() and asyncio.Task.all_tasks() have been removed. They were deprecated since Python 3.7 and you can use asyncio.current_task() and asyncio.all_tasks() instead. (Contributed by Rémi Lapeyre in bpo-40967)

> The unescape() method in the html.parser.HTMLParser class has been removed (it was deprecated since Python 3.4). html.unescape() should be used for converting character references to the corresponding unicode characters.

https://docs.python.org/3/whatsnew/3.10.html#removed

https://docs.python.org/3/whatsnew/3.11.html#removed

https://docs.python.org/3/whatsnew/3.12.html#removed

[-]

dagmx 8 months ago

Thanks. That’s a good list, though I think the majority of the changes were from deprecations early in the 3.x days and are API changes, whereas the OP was talking about syntax changes for the most part.

[-]

eigenspace 8 months ago

No I wasn't?

[-]

dagmx 8 months ago

Maybe i misunderstood your argument here where you scope it tightly to syntax changes being an issue but internal changes being fine.

https://news.ycombinator.com/item?id=42051745

[-]

eigenspace 8 months ago

What I meant by that is that because the changes mostly aren't syntax changes, they won't upset most users who are just using big packages that are actively maintained and keeping ahead of the breakage.

But I still find the high level of instability in Python land rather disturbing, and I would be unhappy if the languages I used constantly did these sorts of breaking changes

I'm even more extreme in that I also think the ABI instability is bad. Even though Python gives no guarantee of its stability, it's used by so many people it seems like a bad thing to constantly break and it probably should be stabilized.

[-]

lanstin 8 months ago

The go 1.0 compatibility promise is the sort of minimal guarantee that meets the cost and risk requirements for long term software maintenance.

The model here isn't something being actively maintained but some 100k utility that has been quietly working for five or ten years and has receded out of the awareness of everyone. Touching it is risky and it's working. I don't want to change it. SSL deprecations are bad enough but at least justifyable but removing things that have been deprecated is not.

Yossarrian22 8 months ago

Are there communities that handle such a change well? At least that went better than Perl and Raku

[-]

dagmx 8 months ago

Anything where the language frontend isn’t tied to the ABI compatibility of the artifacts I think. They can mix versions/editions without worry.

I think it’s a larger problem with interpreted languages where all the source has to be in a single version. In that case I cant think of much.

sneed_chucker 8 months ago

If JavaScript (V8) and PyPy can be fast, then CPython can be fast too.

It's just that the CPython developers and much of the Python community sat on their hands for 15 years and said stuff like "performance isn't a primary goal" and "speed doesn't really matter since most workloads are IO-bound anyway".

[-]

jerf 8 months ago

In this context, V8 and PyPy aren't fast. Or at least, not generally; they may actually do well on this task because pure number tasks are the only things they can sometimes, as long as you don't mess them up, get to compiled language-like performance. But they don't in general to compiled language performance, despite common belief to the contrary.

[-]

Spivak 8 months ago

Let's make this more concrete because assigning speed to languages is a fools errand. Python is doing a lot more per line of code than compiled languages to enable its very flexible semantics. In cases where this flexibility is desired you won't see much more performance in a compiled language as you'll have just implemented Python-like semantics on top of your compiled language— GObject is a good example of this. More famously this is Greenspun's tenth rule.

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

But where this flexibility isn't required, which is a lot of performance sensitive number crunching code the cost of the flexibility bites you. You can't "turn it off" when you want control down to the instruction for a truly massive performance win. Which is why I think the model Python has of highly expressive and flexible language backed by high-performance compiled libraries is so successful.

Python will never be number crunching or parsing with the best of them because it would require essentially a whole new language to express the low-level constraints but for high-level code that relies on Python's semantics you can get performance wins that can't be accomplished just by switching to a compiled language. We've just taken the "embedded scripting language" and made it the primary interface.

adamc 8 months ago

This gets into the whole "fast for what purpose" discussion. For many purposes, JavaScript is quite acceptably fast. But it isn't C or Rust.

lmm 8 months ago

Do you have any objective evidence for that claim, or is this just you guessing?

jillesvangurp 8 months ago

Why does python have to be slow? Improvements over the last few releases have made it quite a bit faster. So that kind of counters that a bit. Apparently it didn't need to be quite as slow all along. Other languages can be fast. So, why not python?

I think with the GIL some people are overreacting: most python code is single threaded because of the GIL. So removing it doesn't actually break anything. The GIL was just making the use of threads kind of pointless. Removing it and making a lot of code thread safe benefits people who do want to use threads.

It's very simple. Either you did not care about performance anyway and nothing really changes for you. You'd need to add threading to your project to see any changes. Unless you do that, there's no practical reason to disable the GIL for you. Or to re-enable that once disabled becomes the default. If your python project doesn't spawn threads now, it won't matter to you either way. Your code won't have deadlocking threads because it has only 1 thread and there was never anything to do for the GIL anyway. For code like that compatibility issues would be fairly minimal.

If it does use threads, against most popular advise of that being quite pointless in python (because of the GIL), you might see some benefits and you might have to deal with some threading issues.

I don't see why a lot of packages would break. At best some of them would be not thread safe and it's probably a good idea to mark the ones that are thread safe as such in some way. Some nice package management challenge there. And probably you'd want to know which packages you can safely use.

[-]

eigenspace 8 months ago

> Why does python have to be slow?

Because the language's semantics promise that a bunch of insane stuff can happen at any time during the running of a program, including but not limited to the fields of classes changing at any time. Furthermore, they promise that their integers are aribtrary precision which are fundamentally slower to do operations with than fixed precision machine integers, etc.

The list of stuff like this goes on and on and on. You fundamentally just cannot compile most python programs to efficient machine code without making (sometimes subtle) changes to its semantics.

_________

> I don't see why a lot of packages would break. At best some of them would be not thread safe and it's probably a good idea to mark the ones that are thread safe as such in some way. Some nice package management challenge there. And probably you'd want to know which packages you can safely use.

They're not thread safe because it was semantically guaranteed to them that it was okay to write code that's not thread safe.

[-]

adamc 8 months ago

There are different definitions of slow, though. You might want arbitrary precision numbers but want it to be reasonable fast in that context.

I don't agree that it is "insane stuff", but I agree that Python is not where you go if you need super fast execution. It can be a great solution for "hack together something in a day that is correct, but maybe not fast", though. There are a lot of situations where that is, by far, the preferred solution.

[-]

eigenspace 8 months ago

There are ways to design languages to be dynamic while still being friendly to optimizing compilers. Typically what you want to do is promise that various things are dynamic, but then static within a single compilation context.

julia is a great example of a highly dynamic language which is still able to compile complicated programs to C-equivalent machine code. An older (and less performant but still quite fast) example of such a language is Common Lisp.

Python makes certain choices though that make this stuff pretty much impossible.

[-]

Jtsummers 8 months ago

Common Lisp is probably not a good point of comparison. It offers comparable (if not more) dynamism to Python and still remains fast (for most implementations). You can redefine class definitions and function definitions on the fly in a Common Lisp program and other than the obvious overhead of invoking those things the whole system remains fast.

[-]

eigenspace 8 months ago

Common lisp is in fact a good point of comparison once you look at how it's fast. The trick with Common Lisp is that they made a foundation of stuff that can actually be optimized pretty well by a compiler, and made that stuff exempt from being changed on the fly (or in some cases, just made the the compiler assume that they won't change on the fly even if they do, resulting in seg-faults unless you recompile code and re-generate data after changing stuff).

This is how Common Lisp people can claim that the language is both performant and flexible. The performant parts and the flexible parts are more disjoint than one might expect based on the way people talk about it.

But anyways, Common Lisp does manage to give a high degree of dynamism and performance to a point that it surely can be used for any of the dynamic stuff you'd want to do in Python, while also giving the possibility of writing high performance code.

Python did not do this, and so it'll be impossible for them to offer something like common lisp perf without breaking changes, or by just introducing a whole new set of alternatives to slow builtins like class, int, call, etc.

[-]

Jtsummers 8 months ago

> > Why does python have to be slow?

You originally claimed Python is slow because of its semantics and then compare later to CL. CL has a very similar degree of dynamism and remains fast. That's what I'm saying makes for a poor comparison.

CL is a demonstration that Python, contrary to your initial claim, doesn't have to forfeit dynamism to become fast.

[-]

lispm 8 months ago

> CL has a very similar degree of dynamism and remains fast.

But not the dynamic parts remain "really" fast. Common Lisp introduced very early a lot of features to support optimizing compilers -> some of those reduce "dynamism". Code inlining (-> inline declarations), file compiler semantics, type declarations, optimization qualities (speed, compilation-speed, space, safety, debug, ...), stack allocation, tail call optimization, type inferencing, ...

eigenspace 8 months ago

I think you're missing the point. Common Lisp is very dynamic yes, but it was designed in a very careful way to make sure that dynamism does not make an optimizing compiler impossible. That is not the case for Python.

Not all dynamism is the same, even if the end result can feel the same. Python has a particularly difficult brand of dynamism to deal with.

Archit3ch 8 months ago

> You can redefine class definitions and function definitions on the fly in a Common Lisp program and other than the obvious overhead of invoking those things the whole system remains fast.

You can also treat Julia as C and recompile vtables on the fly.

adamc 8 months ago

Not disputing it, but people don't pick Python because they need the fastest language, they pick it for friendly syntax and extensive and well-supported libraries. I loved Lisp, but none of the lisps have anything like Python's ecology. Julia, even less so.

People don't pick languages for language features, mostly. They pick them for their ecosystems -- the quality of libraries, compiler/runtime support, the network of humans you can ask questions of, etc.

[-]

eigenspace 8 months ago

> loved Lisp, but none of the lisps have anything like Python's ecology. Julia, even less so.

None of the lisps have anything close to julia's ecology in numerical computing at least. Can't really speak to other niches though.

> People don't pick languages for language features, mostly. They pick them for their ecosystems -- the quality of libraries, compiler/runtime support, the network of humans you can ask questions of, etc.

Sure. And that's why Python is both popular and slow.

gilch 8 months ago

> I loved Lisp, but none of the lisps have anything like Python's ecology.

Try Hissp (https://github.com/gilch). Lisp programming and Python's ecosystem.

JodieBenitez 8 months ago

I think that to a certain extent the quality of libraries can depend on language features.

qaq 8 months ago

"hack together something in a day" JPM Athena trading platform had 35 million lines of code in 2019 with about 20k commits a week

lmm 8 months ago

> Because the language's semantics promise that a bunch of insane stuff can happen at any time during the running of a program, including but not limited to the fields of classes changing at any time. Furthermore, they promise that their integers are aribtrary precision which are fundamentally slower to do operations with than fixed precision machine integers, etc.

> The list of stuff like this goes on and on and on. You fundamentally just cannot compile most python programs to efficient machine code without making (sometimes subtle) changes to its semantics.

People said the same thing about JavaScript, where object prototypes can change at any time, dramatically changing everything. But it turned out JavaScript can run fast if you try hard enough. I suspect the same would be true for Python if a similar amount of resources was poured into it (probably with a similar "try the fast path, abort and fall back to a slow path in the extremely rare case that someone actually is doing funky metaprogramming" approach).

pjmlp 8 months ago

SELF, Smalltalk, Common Lisp are even crazier and have good performance, that has always been a common excuse, which is being proven false as proper resources are now being addressed into CPython.

And that while ignoring all the engineering that has gone into PyPy, largely ignored by the community.

willvarfar 8 months ago

(As a happy pypy user in previous jobs, I want to chime in and say python _can_ be fast.

It can be so fast that it completely mooted the discussions that often happen when wanting to move from a python prototype to 'fast enough for production'.)

[-]

eigenspace 8 months ago

PyPy is still slow compared to actual fast languages. It's just fast compared to Python, and it achieves that speed by not being compatible with most of the Python ecosystem.

Seems like a lose-lose to me. (which is presumably why it never caught on)

[-]

rowanG077 8 months ago

What isn't compatible with PyPy? I can run large frameworks using pypy no problem. There certainly will be package that aren't compatible. But far and away most of the ecosystem is fully comaptible.

[-]

artemisart 8 months ago

This depends a lot on your domain, e.g. pypy is not compatible with pytorch or tensorflow so DL is out of the picture.

yunohn 8 months ago

> I guess getting loops in Python to run 5-10x faster will still save some people time

I would recommend being less reductively dismissive, after claiming you “don’t really have a dog in this race”.

Edit: Lots of recent changes have done way more than just loop unrolling JIT stuff.

Capricorn2481 8 months ago

I don't really get this. They have already made Python faster in the past while maintaining the same semantics. Seems like a good goal to me.

ggm 8 months ago

You may be right. I personally think this work is net beneficial, and although I never expected to be in MP or threads, I now find doing a lot of DNS (processing end of day logs of 300m records per day, trying to farm them out over public DNS resolvers doing multiple RR checks per FQDN) that the MP efficiency is lower than threads, because of this serialisation cost. So, improving threading has shown me I could be 4-5x faster in this solution space, IFF I learn how to use the thread.lock to gatekeep updates on the shared structures.

My alternative is to serialise in heavy processes and then incur a post process unification pass, because the cost of serialise send/receive deserialise to unify this stuff is too much. If somebody showed me how to use shm models to do this so it came back to the cost of threading.lock I'd do the IPC over a shared memory dict, but I can't find examples and now suspect multiprocessing in Python3 just doesn't do that (happy, delighted even to be proved wrong)

nickpsecurity 8 months ago

I spent some time looking into it. I believe it could be done with a source-to-source transpiler with zero-cost abstractions and some term rewriting. It’s a lot of work.

The real barrier my thought experiment hit were the extensions. Many uses of Python are glue around C extensions designed for the CPython interpreter. Accelerating “Python” might actually be accelerating Python, C, and hybrid code that’s CPython-specific. Every solution seemed like more trouble than just rewriting those libraries to not be CPython-specific. Or maybe to work with the accelerators better.

Most people are just using high-level C++ and Rust in the areas I was considering. If using Python, the slowdown of Python doesn’t impact them much anyway since their execution time is mostly in the C code. I’m not sure if much will change.

rmbyrro 8 months ago

Python 3.12 will be officially supported until October 2028, so there's plenty of time to migrate to no-GIL if anyone wants to.

[-]

zurfer 8 months ago

Python 3.13 is not removing the GIL. You just have an option to run without it.

Someone 8 months ago

I think the reasoning is like this:

- People choose Python to get ease of programming, knowing that they give up performance.

- With multi-core machines now the norm, they’re relatively giving up more performance to get the same amount of ease of programming.

- so, basically, the price of ease of programming has gone up.

- economics 101 is that rising prices will decrease demand, in this case demand for programming in Python.

- that may be problematic for the long-term survival of Python, especially with new other languages aiming to provide python’s ease of use while supporting multi-processing.

So, Python must get even easier to use and/or it must get faster.

PaulHoule 8 months ago

As a Java dev I think people don’t always appreciate the flexiblity of threads for manage parallelism, concurrency, and memory. In particular with threads you can have a large number of thread share a large data structure. Say you have an ML inference model that takes 1GB of RAM. No way can you let 25 Celery workers each have a copy of that model, so if you are using Python you have to introduce a special worker class. It’s one more thing to worry about and more parameters to tune. With Java all your “workers” could be in one process, even the same process as your web server and that will be the case in Python.

Threads break down so many bottlenecks of CPU resources, memory, data serialization, waiting for communications, etc. I have a 16-core computer on my desk and getting a 12x speed-up is possible for many jobs and often worth worse less efficient use of the individual CPU. Java has many primitives for threads that include tools for specialized communications (e.g. barrier synchronization) that are necessary to really get those speed-ups and I hope Python gets those too.

Netcob 8 months ago

I was experimenting with Python for some personal data-based project. My goal was to play around with some statistics functions, maybe train some models and so on. Unfortunately most of my data needs a lot of preprocessing before it will be of any use to numpy&co. Mostly just picking out the right objects from a stream of compressed JSON data and turning them into time series. All of that is easy to do in Python, which is what I'd expect for the language used this much in data science (or so I've heard).

But then I do have a lot of data, there's a lot of trial&error there for me so I'm not doing these tasks only once, and I would have appreciated a speedup of up to 16x. I don't know about "high performance", but that's the difference between a short coffee break and going to lunch.

And if I was a working on an actual data-oriented workstation, I would be using only one of possibly 100+ cores.

That just seems silly to me.

ogrisel 8 months ago

The IPC overhead of process-based parallelism in Python is a pain to deal with in general, even when the underlying computational bottleneck are already written CPU optimized (calls to compiled extensions written in Cython/C/C++/Rust, call to CPU optimized operations written with CPU architecture-specific intrinsics/assembly from OpenBLAS via NumPy/SciPy, calls to e.g. GPU CUDA kernels via PyTorch/Triton, ...).

Sometimes the optimal level of parallelism lies in an outer loop written in Python instead of just relying on the parallelism opportunities of the inner calls written using hardware specific native libraries. Free-threading Python makes it possible to choose which level of parallelism is best for a given workload without having to rewrite everything in a low-level programming language.

emgeee 8 months ago

One area where this absolutely makes a difference is when embedding python. Like it or not Python is extreme popular in data/AI/ML so if you want to build an experience where users can deploy custom functions, removing the GIL allows you to more efficiently scale these workloads.

kevincox 8 months ago

Even if something is slow there is utility to have it run faster. Sure, Python will never be with for the most demanding performance requirements but that doesn't mean we should deny people performance.

There are lots of use case where Python's performance is acceptable but a 10x speed boost would be much appreciated. Or where Python's performance is not acceptable but it would be if I could fully utilize my 32 cores.

For example look at Instagram. They run tons of Python but need to run many processes on each machine wasting memory. I'm sure they would love to save that memory, the environment would too. Sure, they could rewrite their service in C but that is most likely not the best tradeoff.

runeblaze 8 months ago

For machine learning (un)fortunately, lots of the stack runs on Python. Lots of ceremony was done to circumvent GIL (e.g. PyTorch data loader). “Reasonable performance Python” I imagine is actually something in huge demand for lots of ML shops

pjmlp 8 months ago

Since Python has become the new Lisp, the minimum is to have the performance tooling Common Lisp has had for several decades, in native code generation and multithreading (yes I know that in CL this is implementation specific).

graemep 8 months ago

I do use Python and I am not that bothered about speed.

Very little of what I use it for has performance bottlenecks in the Python. Its the database or the network or IO or whatever.

On the few occasions when it does I can rewrite critical bits of code.

I definitely care more about backward compatibility than I do about performance.

It feels like Python development is being driven by the needs of one particular group (people who use ML heavily, possibly because they have deep pockets) and I wonder whether this, and a few other things will make it less attractive a language for me and others.

[-]

winrid 8 months ago

Your DB is probably faster than you think. I rewrote an API in Python to Java and it is around 6x faster with just same dumb N+1 queries, and the new API also includes all the frontend calculations that Python wasn't doing before.

DeathArrow 8 months ago

>Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible, so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

Have you heard of Mojo? It is a very performant superset of Python. https://www.modular.com/mojo

[-]

pansa2 8 months ago

Mojo isn’t anywhere near a superset of Python. It doesn’t even support classes!

otabdeveloper4 8 months ago

Parallelism != speed.

If you have 128 cores you can compromise on being a bit slow. You can't compromise on being single-threaded though.

[-]

lanstin 8 months ago

And for the code running on the 128 core machine, which is not really rare these days, for that code to be pythonic, it should be dead simple to use all the cores in the obvious correct way. We have language features that enable simple multiple semantics but haven't yet got the easy way to do it.

MP has the stupid serialization issues and anything involving the Python coder doing locking or mutexes is not Pyrhinic as they will struggle.

zelphirkalt 8 months ago

I am always wondering who writes code so badly unaware of concurrency issues, to rely on the GIL. Wondering how many libraries and programs will actually break. But probably the number is way higher than even I imagine.

[-]

ogrisel 8 months ago

The race condition bugs are typically hidden by different software layers. For instance, we found one that involves OpenBLAS's pthreads-based thread pool management and maybe its scipy bindings:

- https://github.com/scipy/scipy/issues/21479

it might be the same as this one that further involves OpenMP code generated by Cython:

- https://github.com/scikit-learn/scikit-learn/issues/30151

We haven't managed to write minimal reproducers for either of those but as you can observe, those race conditions can only be triggered when composing many independently developed components.

stevofolife 8 months ago

Can you help me understand, if libraries like pandas and numpy also applies to your comment? Or are they truely optimized and you’re just referring to the standard Python language?

lijok 8 months ago

Given the scale at which python is run, how much energy are we saving by improving its performance by 1%?

cma 8 months ago

How about when there are 128-256 core consumer CPUs?

[-]

lanstin 8 months ago

And people are running things that take minutes to run when a good multi-cpu framework would make it seconds.

Maybe add a dataflow analyzer to Python and do it for people.

Decabytes 8 months ago

I'm glad the Python community is focusing more on CPython's performance. Getting speed ups on existing code for free feels great. As much as I hate how slow Python is, I do think its popularity indicates it made the correct tradeoffs in regards to developer ease vs being fast enough.

Learning it has only continued to be a huge benefit to my career, as it's used everywhere which underlies how important popularity of a language can be for developers when evaluating languages for career choices

ijl 8 months ago

Performance for python3.14t alpha 1 is more like 3.11 in what I've tested. Not good enough if Python doesn't meet your needs, but this comes after 3.12 and 3.13 have both performed worse for me.

3.13t doesn't seem to have been meant for any serious use. Bugs in gc and so on are reported, and not all fixes will be backported apparently. And 3.14t still has unavoidable crashes. Just too early.

[-]

bastawhiz 8 months ago

> 3.13t doesn't seem to have been meant for any serious use.

I don't think anyone would suggest using it in production. The point was to put something usable out into the world so package maintainers could kick the tires and start working on building compatible versions. Now is exactly the time for weird bug reports! It's a thirty year old runtime and one of its oldest constraints is being removed!

kristianp 8 months ago

> 3.12 and 3.13 have both performed worse for me

That's interesting, I wouldn't have expected performance regressions coming from those releases. How can that be?

runjake 8 months ago

If it were ever open sourced, I could see Mojo filling the performance niche for Python programmers. I'm hopeful because Lattner certainly has the track record, if he doesn't move on beforehand.

https://en.wikipedia.org/wiki/Mojo_(programming_language)

[-]

qaq 8 months ago

https://github.com/modularml/mojo/blob/main/LICENSE

[-]

kaanyalova 8 months ago

The source code for the compiler hasn't released yet

misswaterfairy 8 months ago

If not, Nim is probably the closest most 'Python-like' language that is almost as fast as C, and is released under the MIT licence.

https://nim-lang.org/

https://en.wikipedia.org/wiki/Nim_(programming_language)

the5avage 8 months ago

Can someone share insight into what was technically done to enable this? What replaced the global lock? Is the GC stopping all threads during collection or an other locking mechanism?

[-]

throwaway313373 8 months ago

The most interesting idea in my opinion is biased reference counting [0].

An oversimplified explanation (and maybe wrong) of it goes like this:

problem:

- each object needs a reference counter, because of how memory management in Python works

- we cannot modify ref counters concurrently because it will lead to incorrect results

- we cannot make each ref counter atomic because atomic operations have too large performance overhead

therefore, we need GIL.

Solution, proposed in [0]:

- let's have two ref counters for each object, one is normal, another one is atomic

- normal ref counter counts references created from the same thread where the object was originally created, atomic counts references from other threads

- because of an empirical observation that objects are mostly accessed from the same thread that created them, it allows us to avoid paying atomic operations penalty most of the time

Anyway, that's what I understood from the articles/papers. See my other comment [1] for the links to write-ups by people who actually know what they're talking about.

[0] https://dl.acm.org/doi/10.1145/3243176.3243195

[1] https://news.ycombinator.com/item?id=42059605

throwaway313373 8 months ago

AFAIK the initial prototype called nogil was developed by a person named Sam Gross who also wrote a detailed article [0] about it.

He also had a meeting with Python core. Notes from this meeting [1] by Łukasz Langa provide more high-level overview, so I think that they are a good starting point.

[0] https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzf...

[1] https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0...

nas 8 months ago

The key enabling tech is thread safe reference counting. There are many other problems that Sam Gross solved in order to make it happen but the reference counting was one of the major blockers.

[-]

the5avage 8 months ago

Is this implemented with lockless programming? Is it a reason for the performance drop in single thread code?

Does it eliminate the need for a GC pause completely?

[-]

nas 8 months ago

You should probably just read the PEP, which explains these things:

https://peps.python.org/pep-0703/#reference-counting

If by GC you mean the cyclic GC, free-threaded Python currently stops all threads while the cyclic GC is running.

[-]

the5avage 8 months ago

Thank you:)

tightbookkeeper 8 months ago

Lots of little locks littered all over the place.

biglost 8 months ago

I'm not smart nor have any university title butmy opinion is this it's very good, but efforts should also go into remove features, not just python, i get it, it would breake anything.

0xDEADFED5 8 months ago

Nice benchmarks. Hopefully some benevolent soul with more spare time than I can pitch in on threadsafe CFFI

santiagobasulto 8 months ago

With these new additions it might make sense to have a synchronized block as in Java?

aitchnyu 8 months ago

Are there web frameworks taking advantage of subinterpreters and free threading yet?

8 months ago

[deleted]

refdl 8 months ago

[flagged]

[-]

notatallshaw 8 months ago

You seem to be conflating problems and different groups of people that aren't directly related.

To clarify some different groups:

* The faster-cpython project, headed by Guido and his team at Microsoft, is continuing fine, most of the low hanging fruit was accomplished by 3.11, further improvements were moved to medium term goals and they were delayed further by the free threaded project which broke assumptions that were made to optimize CPython, they have adapted but it has pushed big optimizations out to later released (think probably 3.15ish)

* The free threaded project, initiated by Sam Gross at Meta, is continuing fine, it was never intended to be ready by 3.13, the fact there is even a build officially published is very fast progress. There isn't yet a plan to make it default, depending on the compatibility of the free threaded build it could be a quick transition or a 5+ year switch over.

* The PSF, the Steering Council, and the CoC WG are all different groups that have different responsibilities and aren't typically involved in day-to-day choices of committing certain features.

* The release manager is a core developer in charge of making the final choice on whether a particular feature is stable or not. It was the 3.13 release manager who decided to revert the new GC which was intended to generally improve performance for non-free threaded builds, which it may still do in a future release with sufficient fine tuning.

Now, there are clearly communication issues in areas of the Python community, but there is also a lot of people getting on with great work and improvements and communicating fine.

[-]

jghag 8 months ago

Thanks for the party line, backed by the usual flaggers. This is how Python maintains its market share and popularity. Google though does not seem too impressed by Python any longer. Others will follow.

[-]

notatallshaw 8 months ago

> Thanks for the party line, backed by the usual flaggers

The previous post was wildly conflating different things, I think is a real disservice to the actual events that happened.

I think there are systemic issues with the current structure of community governance, but having the events which show that conflated with completely unrelated work makes it more difficult to address these issues, as now there is misinformation floating around and criticism can be dismissed as being uninformed.

kmeisthax 8 months ago

> people are kept in line with CoC threats

Can you point to an example of someone claiming that a particular feature's performance claims were misleading, and then getting threatened with a ban or other sanctions for it, where they did not otherwise violate CoC?

[-]

pwdisswordfishz 8 months ago

> where they did not otherwise violate CoC?

Violations are in the eye of the enforcer.

acdha 8 months ago

> As usual in Python, people are kept in line with CoC threats and the official marketing is amplified.

You’d need some serious evidence to back those claims up, and without it this seems like pure flamebait unrelated to the topic of this post.