Following Up on the Python JIT

(lwn.net)

87 points | by Bogdanp 4 days ago ago

47 comments

  • klooney a day ago

    Is LWN the only professional media organization that covers this sort of thing? It seems like core, important news relevant to a large number of well heeled professionals, it's a little wild there's so little competition.

    • rs186 19 hours ago

      Most professional media organizations, including tech focused media, don't have a writer that remotely understands what's going on here.

      I know Ars Technica has a guy who can go deep into how a security attack works. That's the only one I am aware of from a well-known publication. Even so, if you look at the comment section, most people have no clue about technical details and are just talking about the story.

      Also bear in mind -- media in general is not a good industry to be in. Making money out of writing news articles is getting increasingly difficult. Most articles try to optimize for reach and clicks. Something like this one is not going to attract a lot of readers. I have no idea how LWN can be sustainable, but I can assure you that it's the exception, not the norm.

      Other than LWN, your best bet is reading this from someone's blog, who spends hours writing about it, trying to explain it in an understandable way and avoid mistakes, expecting almost nothing in return, just as a hobby.

      Your next best bet is reading someone's 12 disjointed tweets, possibly riddled with errors.

      That's just the world we are living in.

      • wmanley 18 hours ago

        > I have no idea how LWN can be sustainable

        Speculation: it’s because Linux (the kernel) is a large centralised project, so there’s a critical mass of people willing to pay for Linux Weekly News.

        From there the proven quality allowed the publication to cover other OSS projects. I suspect that without Linux LWN would not be sustainable.

      • sophacles 19 hours ago

        > I have no idea how LWN can be sustainable, ...

        Speculation: LWN is sustainable because there are enough people out there who recognize the value of such a source of information and are willing to pay for a subscription.

        >... but I can assure you that it's the exception, not the norm.

        Anecdata: I credit a good portion of my success to knowledge and insight I've gained from lwn articles.

        If you're in a position to do so, I always recommend getting a paid membership, particularly if you've found their articles helpful to your tech journey.

        (i am not affiliated with LWN, just a happy subscriber)

    • Sesse__ 19 hours ago

      Given that even LWN, with its consistently high-quality reporting, is struggling to make it, would there be room for a second one?

      • klooney 18 hours ago

        Probably not, which is sad.

  • satellite2 20 hours ago

    100% or even 200% seems nice, but at least at the time when I compared with JavaScript it was 4200% faster than would have been needed. Let's not even mention go.

    At least for now it seems that Python can still only be used to call some Pascal or cuda bindings if one needs performance

    • v3ss0n 2 hours ago

      PyPy is average 4x faster yet 95% of python community ignored. Its already feature parity to 3.12 and most of the pypi libs works.

    • pjmlp 16 hours ago

      Back in the 2000's I was on a startup whose main language was Tcl, and we would write C extensions for any performance critical command.

      The experience lead me to avoid any language without JIT or AOT, unless forced upon me.

      Hence why I have used Python since version 1.6 only for OS scripting tasks.

      • mananaysiempre 14 hours ago

        I don’t know the timeline off-hand but I’m guessing this was before Tcl 8, at a time when everything in Tcl was a string not only logically, but also in the actual VM? There’s a whole chasm of implementation tradeoffs between that and a straight-up JIT.

        • pjmlp 8 hours ago

          Tcl 8 was just released, and yes we were using it, it hardly mattered that much.

          Our product was inspired by AOLServer and Vignette, providing similar kind of capabilities, we had Rails in Tcl, but we were not a famous SV startup, rather mostly serving the Portuguese market.

          The founders then went on to create OutSystems, built on top of .NET and Java, nowadays using other stacks, while offering the same kind of RAD tooling.

  • iberator a day ago

    "For native profilers and debuggers, such as perf and GDB, there is a need to unwind the stack through JIT frames, and interact with JIT frames, but "the short answer is that it's really really complicated"

    I'm homeless and broken and I just spent like 2 weeks developing low level python bytecode tracer, and it SEEMS that this gonna ruin everything.

    This is hilarious - as its my first project in like 2 years

    • achierius 21 hours ago

      It's not impossible, JavaScript engines have the same challenge but are able to handle it. You do need to dump a lot of extra info but there's more or less a standard for this now -- look up JITDump

  • miohtama 18 hours ago

    What's up with PyPy lately? Anyone using it in production?

    • ziihrs 17 hours ago

      Yep. It supports Python 3.11 now.

  • v3ss0n a day ago

    PyPy existed since a decade ago.

    • pansa2 a day ago

      PyPy deserves much more credit (and much wider use) than it gets. The underperformance of the Faster CPython project [0] shows how difficult it is to optimize a Python implementation, and highlights just how impressive PyPy really is.

      [0] The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe [https://github.com/markshannon/faster-cpython/blob/master/pl...].

      • Qem 21 hours ago

        > The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe

        IIRC they originally expected the JIT to be the single focus on CPython performance improvement. But then another front was opened to tackle the GIL in parallel[1]. Perhaps the overhead of two major "surgeries" in the CPython codebase at the same time contributed to slower progress than originally predicted.

        [1] https://peps.python.org/pep-0703/

      • pjmlp 16 hours ago

        The main culprit is not wanting to change the C ABI of the VM.

        Other equally dynamic languages have long shown the way.

        • incrudible 4 hours ago

          But what do people actually use Python for the most, at least as far as industry is concerned? Interfacing with those C extensions.

          PyPy does have an alternative ABI that integrates with the JIT and also works on CPython, so if people cared that much about those remaining bits of performance, they could support it.

    • nromiun 21 hours ago

      I really wish PSF would adopt PyPy as a separate project. It is so underrated. People still think it supports a subset of Python code and that it is slow with C ffi code

      But the latest PyPy supports all of Python 3.12 and it is just as fast with C ffi code as JIT Python code. It is literally magic and if it was more popular Python would not have a reputation for being slow.

      • ziml77 18 hours ago

        PyPy is amazing and it's actually a bit baffling that it's not the default that everyone is using in production. I've had Python jobs go from taking hours to run, down to minutes simply by switching to PyPy.

      • quibono 20 hours ago

        Do you happen to know if Flask is supported by any chance?

        • Twirrim 19 hours ago

          Yes. I've had a small webapp running under it quite happily (complete overkill, but it's a personal project and I was curious).

          Very basic hello world app hosted under gunicorn (just returning the string "hello world", so hopefully this is measuring the framework time). Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to "warm up", the second round (warmed up) results give me:

              pypy   : 8127.44 trans/sec
              cpython: 4512.64 trans/sec
          
          So it seems like there's definitely things that pypy's JIT can do to speed up the Flask underpinnings.
        • tgbugs 20 hours ago

          Yes, have been using Flask on PyPy3 for years. I get about a 4x speedup.

        • nromiun 20 hours ago

          I just tested it and it works perfectly.

    • pjmlp 16 hours ago

      Unfortunately it keeps being the black swan in the Python community.

      Python is probably the only programming language community that has been so much against JITs, and where folks routinely call C libraries bindings, "Python".

      • IshKebab 15 hours ago

        It's not a black swan. The issue is that using Pypy means accepting some potential compatibility hassle, and in return you get a reasonable speedup in your Python code, from glacial to tolerable. But nobody who has accepted glacial speed really needs tolerable speed.

        It's like... imagine you ride a bike to most places. But now you want to visit Australia. "No problem, here take this racing bike! It's only a little less comfortable!".

        So really it's only of interest to people who have foolishly built their entire business on Python and don't have a choice. The only one I know of is Dropbox. I bet they use Pypy.

    • didip 18 hours ago

      I don't get why PyPy and CPython don't simply merge. It will be difficult, organization wise... but not impossible.

      • pjmlp 16 hours ago

        When people think of C library wrappers as Python is kind of an hard sell.

    • orbisvicis a day ago

      If memory serves, PyPy supports a subset of Python and focused their optimizations on software transactional memory.

      • iberator a day ago

        Back in 2022 it worked fine with literally all modules except some ssh, ssl and C based modules.

        With a little bit of tinkering (multiprocessing, choosing the right libraries written strictly in python, PyPy plus a lot of memory) I was able to optimize some workflows going from 24h to just 17 minutes :) Good times...

        It felt like magic.

        • achierius 20 hours ago

          The "C based modules" bit is the kicker. A significant chunk of Python users essentially use it as a friendly wrapper for more-powerful C/C++ libraries underneath the hood.

          • Twirrim 19 hours ago

            They've long since fixed the C based modules interaction, unfortunately a lot of common knowledge is from when it couldn't interact with everything.

            If you've written it off on that basis, I'd suggest it's worth giving it another shot at some stage. It might surprise you.

            Last I saw there was still a little bit more overhead around the C interface, so hot loops that just call out to a C module in the loop can be just a smidgen slower, but I haven't seen it be appreciably slower in a fair while.

            • laurencerowe 17 hours ago

              The FAQ states it is often much slower:

              > We have support for c-extension modules (modules written using the C-API), so they run without modifications. This has been a part of PyPy since the 1.4 release, and support is almost complete. CPython extension modules in PyPy are often much slower than in CPython due to the need to emulate refcounting. It is often faster to take out your c-extension and replace it with a pure python or CFFI version that the JIT can optimize.

              https://doc.pypy.org/en/latest/faq.html#do-c-extension-modul...

              I have seen great success with cffi though.

            • orbisvicis 17 hours ago

              I see, and it's a pretty short list:

              https://doc.pypy.org/en/latest/cpython_differences.html#exte...

              """ The extension modules (i.e. modules written in C, in the standard CPython) that are neither mentioned above nor in lib_pypy/ are not available in PyPy. """

              The lifecycle of generators makes pypy code very verbose without refcounting. I've already been bitten with generator lifecycles and shared resources. PEP533 to fix this was deferred. Probably for the best as it seems a bit heavy-handed.

        • hnuser123456 21 hours ago

          Yep, I had a script that was doing some dict mapping and re-indexing, wrote the high level code to be as optimal as possible, and switching from cpython to pypy brought the run time from 5 minutes to 15 seconds.

        • anthk 15 hours ago

          If pypy worked with Retux the game would get a big boost. Altough the main issue is that it tried to redraw many object at one per frame.

      • v3ss0n 2 hours ago

        Not a subset. It covers 100% of pure python. CPyExt are working fine , just need optimizations on some parts. The private CPyEXT calls that some libraries uses as Hacks are only things that PyPy do not support officially (PyO3 Rust-python bindings uses those) .

    • almostgotcaught 20 hours ago

      Why do people feel the need to comment this on every single JIT post? Like imagine commenting on every post about Pepsi "Coca-cola exists since 1886".

      • v3ss0n an hour ago

        Because it is one of the most ambitious project in opensource world and very little is known about that. It is neglected by Python Contributor community for unknown reasons ( something political it seems) . It was developed as PHD Research project by really good researchers. PyPy had written python in Pure python and surpassed performance of Python written in C by 4-20x . They delivered Python with JIT and also Static RPython : which is subset of python which compiles directly to binary. I had also personally worked together with some of the lead PyPy developers on commercial projects and they are the best developers to work together with.

      • a-french-anon 4 hours ago

        Because Pypy wasn't even _mentioned_ in the JIT PEP (https://peps.python.org/pep-0744/), like it's the black sheep the family isn't supposed to talk about.

      • pjmlp 16 hours ago

        Because as proven multiple times, the problem isn't Python, rather CPython, and many folks keep mixing languages with implementations.

  • bratao 19 hours ago

    I feel sad and disappointed in Microsoft for letting the entire Faster CPython team go. I was a big supporter, always leaving positive comments and sharing news about their work. I'd figure the team paid for itself in goodwill alone. What a letdown, Microsoft. You ought to do better.

  • lynx97 20 hours ago

    I wonder what is going on with the strange ""double quoting"".

    • setupminimal 19 hours ago

      Hi — LWN editor here. We use <q> tags with some CSS to set off quotes from the main text. This _mostly_ works seamlessly, but a few browsers render "<q>something</q>" as ""something"". It's especially common if you copy/paste from the site.

      I've considered dropping the outer quotes and using CSS before/after text to add them back in to the rendered page, but we have a huge back-catalog of articles doing it this way, and it's usually not much of an issue.

      • lynx97 an hour ago

        Thats interesting. I was using Lynx. A quick test with <q>hello</q> gives "hello" as expected. So whatever it is, you must be doing something else. Please dont use CSS to insert characters into normal text, that will fall down if CSS is not supported. I know, in this day and age, that sounds unusual. But abusing CSS to do something it wasnt ment to be doing is still no good idea IMO.