Async Django in Production

(jonathanadly.com)

50 points | by jonathan-adly 8 months ago ago

47 comments

singularity2001 8 months ago

When will it be common knowledge that introducing async function coloring (on the callee side, instead of just 'go' on the call side) was the biggest mistake of the decade?

[-]

10000truths 8 months ago

Function coloring is only a symptom of a larger problem. The problem is that libraries implementing databases and network protocols are designed in such a way that they tightly couple themselves to a particular type of I/O dispatch. Ideally, libraries wouldn't be dispatching any I/O at all! Instead, they'd have a sans-I/O [0] implementation, where the caller creates a state machine object, feeds it a byte buffer, gets a byte buffer, and then the caller has to handle reading/writing those buffers from/to a socket or disk. A well-designed sans-IO library wouldn't care whether you're using it with sync I/O, or async I/O, or with gevent, or with a homegrown event loop built on epoll or io_uring or kqueue or IOCP.

[0] https://sans-io.readthedocs.io/

[-]

surajrmal 8 months ago

It's not always that simple. For a complicated self contained state machine this might work, but what about a situation where you are coordinating interactions with multiple distinct actors? You need a primitive to wait. At that point you need to pick a threading model.

[-]

10000truths 8 months ago

State machines compose - the state of each actor is a part of your overall application state. So you'd create a giant state machine that contains a smaller state machine for each actor. Waits and timeouts can be handled by encapsulating time into your state. Then, you can feed your state machine with "X amount of time has elapsed" messages to inform it of the passage of time, and your state machine can emit something like a "sessions 123, 456 timed out" message as a response, and you can act on that by closing the sockets associated with those sessions.

the__alchemist 8 months ago

I feel like I'm a loner; the odd man out regarding this in rust. I can no longer relate to my (online/OSS) peers, nor use the same libraries they do. I almost wish sync and async rust were separate languages. The web and embedded communities are almost fully committed. I stopped participating in the rust OSS embedded community for this reason.

I haven't tried Async Django, (But use normal Django on a few work and hobby project) and am hesitant based on my experience in rust, and not finding a fault I think coloring my Python/Django code would be advantageous to do.

gnlampx 8 months ago

This will happen when someone writes a new green thread framework and convinces van Rossum to preside over the corporate-sponsored integration into Python.

Four releases later async will be marketed as "this has been a mistake" and the green threads will be the best thing ever. Thousands of blog posts will be written and everyone has to rewrite their code bases. Repeat for goroutines etc.

[-]

simonw 8 months ago

As Python grows GIL-free threading and machines continue to grow more cores do you think we'll be able to skip green threads and use regular threads instead?

[-]

8 months ago

[deleted]

Eikon 8 months ago

I feel I may be an outlier here, but I love async functions (I mostly use Rust). I intuitively understand what will block and what won't, and can generally gauge the performance impact. I believe the few downsides are far outweighed by the massive benefits.

Async functions allow me to build massively parallel and concurrent systems with ease - it's beautiful.

In comparison, I'm not as fond of Go's approach to concurrency, which feels less elegant to me.

[-]

UltraSane 8 months ago

I found learning async Python to be very painful for months until I gained an intuitive mental model of the async code flow and then it all just started to "click" for me. I just think of async as parallelized waiting.

dangsux 8 months ago

[dead]

surajrmal 8 months ago

In languages where there is already considerable overhead I agree with you, but go style coroutines are not really a competitive option for some use cases. The alternative is callbacks or something else which is a lightweight abstraction on top of an event loop.

petters 8 months ago

Colored functions can be nice. It makes the type system give a hint that a function could take a long time. You can then e.g. avoid holding locks when awaiting.

But there is a big downside, that's true.

jfasydfw 7 months ago

gevent was created over a decade ago to solve this.

python users only have themselves to blame for ignoring it.

[-]

westurner 7 months ago

From "Asyncio, twisted, tornado, gevent walk into a bar" (2023) https://news.ycombinator.com/item?id=37227567 :

> IIRC the history of the async things in TLA in order: Twisted (callbacks), Eventlet (for Second Life by Linden Labs), tornado, gevent; gunicorn, Python 3.5+ asyncio, [uvicorn,]

Async/await > History: https://en.wikipedia.org/wiki/Async/await

jonathan-adly 8 months ago

We have traditionally used Django in all our projects. We believe it is one of the most underrated, beautifully designed, rock solid framework out there.

However, if we are to be honest, the history of async usage in Django wasn't very impressive. You could argue that for most products, you don’t really need async. It was just an extra layer of complexity without any significant practical benefit.

Over the last couple of years, AI use-cases have changed that perception. Many AI products have calling external APIs over the network as their bottleneck. This makes the complexity from async Python worth considering. FastAPI with its intuitive async usage and simplicity have risen to be the default API/web layer for AI projects.

I wrote about using async Django in a relatively complex AI open source project here: https://jonathanadly.com/is-async-django-ready-for-prime-tim...

tldr: Async django is ready! there is a couple of gotcha's here and there, but there should be no performance loss when using async Django instead of FastAPI for the same tasks. Django's built-in features greatly simplify and enhance the developer experience.

So - go ahead and use async Django in your next project. It should be a lot smoother that it was a year or even six months ago.

[-]

fulafel 8 months ago

(Disclaimer: I haven't used Django in a long time)

Can you expand more on why these AI cases make the complexity tradeoff different?

I'd imagine think waiting on a 3rd party LLM API call would be computationally very inexpensive compared to what's going on at the business end of that API call. Further lowering the cost, Django is usually configured to use multiple threads and/or processes so that this blocking call won't keep a CPU idle, no?

[-]

roughly 8 months ago

> Can you expand more on why these AI cases make the complexity tradeoff different?

They’re very slow. Like, several seconds to get a response slow. If you’re serving a very large number of very fast requests, you can argue that the simplicity of the sync model makes it worth it to just scale up the number of processes required to serve that many requests, but the LLM calls are slow enough that it means you need to dramatically scale up the number of serving processes available if you’re going to keep the sync model, and that’s mostly to have CPUs sitting around idle waiting for the LLM to come back. The async model can also let you parallelize calls to the LLMs if you’re making multiple independent calls within the same request - this can cut multiple seconds off your response time.

UltraSane 8 months ago

async is just parallelized waiting and LLMs make CPUs wait ages for responses to async calls to LLMs allow for much higher CPU utilization.

mulmboy 8 months ago

Yeah the article doesn't even mention "what about more threads". Responses to your comment suggest there's broadly a lack of clarity around what Async gets you especially vs threads.

If the alternative to Async is more processes, rather than threads, a clear benefit is reduced memory usage and reduced process startup time.

[-]

zbentley 7 months ago

> a clear benefit is reduced memory usage and reduced process startup time

Not necessarily true. Many process-parallel Python environments support using fork(2) for parallelism (multiprocessing, gunicorn, celery).

For similar processes (e.g. parallel waiting on RPCs) that removes the memory overhead. It also largely mitigates startup time costs (especially if forks are reused for multiple requests, which they are in most forking contexts).

While there is debate and grumbling in the Python community about fork(2)’s rough edges re: signals/threads/MacOS, these issues are usually handled inside parallelism-management library code and rarely concern application level developers.

flowingfocus 8 months ago

The explanation is in the article. Tldr is for sync functions, the CPU is blocked, with async functions, once the await statement is reached, other stuff can be handled in between

[-]

fulafel 8 months ago

Indeed it says "It enhances performance in areas where tasks are waiting for IO to complete by allowing the CPU to handle other tasks in the meantime".

To restate my comment: I argued (1) this CPU cost would be very marginal compared to the LLM API compute cost, and (2) the CPU blocking claim doesn't really hold, due to the wonders of threads and processes.

[-]

simonw 8 months ago

There are two ways to call out to an externally hosted LLM via an HTTP API:

1. A blocking call, which can take 3-10 seconds.

2. A streaming call, which can also take 3-10 seconds but where the content is streaming directly to you as it is generated (and you may be proxying it through to your user).

In both cases you risk blocking a thread or process for several seconds. That's the problem asyncio solves for you - it means you could have hundreds (or even thousands) of users all waiting for the response to that LLM call without needing hundreds or thousands of blocked threads/processes.

[-]

fulafel 7 months ago

Reading this my first reaction was that my questions still holds. Unless the slowness of the LLMs is of such a magnitude that having a thread or process waiting on the API call would substantially increase its cost, which I guess would mean the LLM server endpoint would be doing very heavy queuing and/or multitasking instead utilizing a powerful compute element for 90% of the call duration.

Or maybe the disconnect is where I'm taking for granted that "cost of parked thread" is the same as worrying about the nr of parked threads? Maybe everyone uses Django setups where it's nontrivial to add memory, increase serverless platform limits, etc if you get the happy problem of 10k concurrent users? Or maybe people don't know that the nr of threads/processes you can have on Linux is much more than hundreds or thousands. Or maybe there's some Python or Django specific limits to this.

[-]

simonw 7 months ago

Maybe I need to update my mental model of how many threads is too many on Linux (and benchmark the impact on the GIL here, which should at least be released for I/O network waits).

[-]

fulafel 7 months ago

On my Linux desktop the default cap seems[1] to be ~32k.

See eg https://www.baeldung.com/linux/max-threads-per-process for some runtime sysctl knobs if you want to go higher than the default limits. (Though the section 6 there seems out of date and written with the 32-bit OS in mind)

[1] as reported by the Python snippet from https://stackoverflow.com/a/64406494

olgeni 8 months ago

I was using Daphne and then found out that it read the whole files in a POST before passing it along - needless to say I am no longer using Daphne \o/

beestripes 7 months ago

Nearly all of the work done on Django ninja is by someone who lists their location on GitHub as Kharkiv. I hope that’s not current.

pacifika 8 months ago

Learned quite a bit thanks!

andrewstuart 8 months ago

Async Django is a bit of a puzzle …. who is it for?

People who like synchronous Python can use Django.

People who like asynch Python can use Starlette - the async web server also written by the guy who wrote Django.

It’s not clear why Django needs to be async, especially when I get the sense there’s developers who like async and developers who prefer synch. It’s a natural fit for Django to fulfill the sync demand and Starlette to fulfill the async. They’re both good.

[-]

senko 8 months ago

Starlette and Django are wildly different.

Django has a batteries-includes approach, benefits from tighter integration of orm, auth, form handling, etc, and has a huge 3rd party ecosystem.

First-class async support in Django allows Django users to avoid jumping through hoops (celery, channels, ...) for longer-running requests, something especially noticable if you're calling any kind of AI service.

[-]

coffeefirst 8 months ago

As a long time Django user (haven’t tried async in prod yet) the appeal is to have the full Django toolkit and be able to set something up async _if_ needed.

Which is a very Django way to think: lots of tools ready to go, use only what you need.

globular-toast 8 months ago

How much of that is relevant if you're going to be using django-ninja (pydantic) and the whole app has to be async, though?

Django is fine for writing a thin CRUD layer around a database. It makes the easy stuff easy. But doesn't seem to help much for the hard stuff and often actively hinders it.

Really the main reason for Django is its ORM and migrations. It's basically the other Python ORM (next to SQLAlchemy) but, unlike SQLAlchemy, it's not designed to be used standalone. In my experience I find Django (and active record ORMs in general) easier for people to get started with, but massively limiting in long run for complex domains.

[-]

senko 8 months ago

> if you're going to be using django-ninja (pydantic)

This assumes that people don't do multi-page apps or sites any more, which ... isn't true. And I believe django-ninja replaces forms/serialization/deserialization and routing, while nicely integrating with everything else.

> Django is fine for writing a thin CRUD layer around a database.

In my dozen or so years with Django, I confess I did more than a few thin CRUD layers around a database. But also worked on billing systems for telecoms, insurance provider API services, live/on demand audio/video streaming services, a bunch of business process apps, AI codegen tools, and other web apps and API backends that were way more than thin CRUD layers around databases.

Django was rarely a hindrance. In fact, Python being sync-only (or Django not supporting async) was usually more of a hindrance that anything Django specific.

> In my experience I find Django (and active record ORMs in general) easier for people to get started with, but massively limiting in long run for complex domains.

In my exprience the only situations where Django's ORM doesn't help much is when you have a lot of business logic in your database (views, stored procedures), or the database is organized in a way that's not Django's preffered way. Still works, mind you, just not as great a experience. However, the vast majority of projects I've encountered have none of those.

Otherwise, I've found its ORM quite powerful, and easy to drop down to raw() in cases where you really need it (which was maybe 1% on the projects I've worked).

[-]

bluewalt 8 months ago

> Django was rarely a hindrance.

+1 on this. Django scales pretty well when adopting a clean architecture like Django Model Behaviours with mixins.

> Otherwise, I've found its ORM quite powerful.

Same. In ten years, the only issue I had is with a very complex query that the ORM was not able to write properly. But a workaround existed.

I'm currently using FastAPI in a project. It's very enjoyable (especially with a strictly typed codebase) but I have to write lots of batteries by myself, which is not very productive (unless you keep this boilerplate code for future projects).

ensignavenger 8 months ago

In what ways has Django been a hindrance in your experience? In my experience, I have never felt that way. When i have used a micro framework, sooner or later I always regret it and wish I had Django. When I use Django, and I needed to do somthing different, I have never had a problem with it getting in the way, I have used alternate ORMs, Raw SQL, I by default use Jinja templates, and many other "non-standard" things with Django. Async has been about the only area where I would even consider anything different in the past.

[-]

globular-toast 8 months ago

The main problem with Django is where to put business logic. It's actually kinda funny to write it like that, because that's the most important thing if you want to do something that isn't just CRUD.

Quick, where would you put a high-level business function that touches multiple models? Where would you put the model-level validation for that? I've seen all of the following: in a view, in a form, in a service-layer function, arbitrarily in one of the models/managers, implicitly in the database schema (just catch the IntegrityErrror!).

There are endless discussions online about where to do this. This isn't a good sign. It should be easy. The business logic should be at the heart of an application. But instead we see Django's ORM at the heart.

Django models are really limiting and totally tied to the database. You can't even have a model that has a list/set of items without getting into ForeignKeys etc. You can't test any of it without having the database present. Why would I need the database to be there to test business logic?! The point of an ORM is to do object persistence, not to do business logic.

So I've seen people do a separate set of ORM models and domain models and then manually mapping between them. This is possible, but it's a whole lot easier with something like SQLAlchemy which is actually designed for that (it's a data mapper rather than active record type ORM).

Then there's stuff like django-admin and ModelForms etc. which are the last thing you want if you're doing more than CRUD.

Tbh it would be utterly remarkable if Django somehow made this stuff easy. But it doesn't and I find myself wishing I just put in the effort to set up the boring stuff like auth etc. but with a proper architecture.

[-]

ensignavenger 7 months ago

So you want a web framework that dictates how you structure your application code? I think one of Django's strengths is that if a component doesn't work well for the application you are building, you are always free to choose something else. If the default Django app structure doesn't work well for you, you can pick any structure that does. Though I would argue that the default structure is exactly what the vast majority of web applications need.

[-]

globular-toast 7 months ago

No, I want a web framework that is just a web framework. The web is just an entrypoint to an application, like a CLI or GUI. If you use something like Flask or Pyramid this is exactly what you get. They are roughly analogous to something like Qt. Something that sits at the edge of an application converting low-level user input into high-level commands and events etc.

You could use Django like this, but you'd end up manually mapping between Django models and high-level domain models. But this mapping is exactly what an ORM like SQLAlchemy does! It seems crazy to do it yourself. So nobody does. They build fat models with all the business logic right next to application logic like foreign keys etc.

Even if you did try to do something different developers would complain that it's no longer a "Django project". Have you ever seen any Django project do business logic completely independently of application logic?

[-]

ensignavenger 7 months ago

So then your answer is simple- just like in Flask, put the function wherever it makes sense for your application and desired architecture.

If you really prefer SQLAlchemy, there is nothing stopping you from using it over the Django ORM, I happen to prefer the Django ORM, but Django doesn't get jealous if you go out with a different ORM (it is a polyamorious web framework[1]).

Yes, I have seen Django applications do that. There are plenty of individuals out there advocating such a separation. I wouldn't worry about these "different developers' and their opinions.

[1] I was going to use a smiley face here, but then I recalled that a Python dev was de-flocked recently for using smiley emoji as it is offensive to some individuals, so insert whatever words or symbols you don't find offensive that indicate humor instead.

[-]

globular-toast 7 months ago

Again, I'm not saying you can't do this with Django, I'm saying it makes it much harder than it should be.

Using Django without its ORM really doesn't seem attractive. I'm really not sure what you'd be getting from Django at that point. Crap templates, views that aren't views. The ORM is the only good bit of Django. Well, that and the package ecosystem.

The more I think about it the more I realise it's that last point that is the real attraction.

Also please don't worry about offending me. I come from the old internet ;)

[-]

ensignavenger 7 months ago

Now I think I get it, you just don't like Django. And that is fine, there are plenty of good alternatives. I happen to like it and find that I miss many of its small convenience functions when I don't have it. Its not perfect, but it gets the job done or gets out of the way when it doesn't- at least for me.

the__alchemist 7 months ago

> Quick, where would you put a high-level business function that touches multiple models? Where would you put the model-level validation for that?

It's just python: `descriptive_name.py`. `ingest_validation.py` etc. If this seems too terse, it's because your question was posed in a general way. If you have more info on the sort of business logic you have in mind, or the sort of data validation, I'll reply with more.

Could you describe what you have in mind regarding models not tied to the database? If it's the naive interpretation I'm thinking of, use python dataclasses and enums. Django models are specifically to represent database schema.

Reading between the lines, perhaps you are looking for something not covered by a web framework? Django's features are for responding to HTTP requests, managing a relational database, auth, email, admin, templates etc. If you're trying to do something not part of this, use other parts of the Python language. Django is a library; Python is the more fundamental tool used to build applications.

bluewalt 8 months ago

Not a hindrance, but something like not having a typed database, no auto completion over time, can be a real drawback (I know about django-types).

Finally, in my opinion, the best reason to not use Django is not the project itself (because it will do the job in 99% case), it's because all you learn is tied to Django.

Having learn Pydantic recently was a breed of fresh air, and I would reuse it in lots of projects, not only web projects.

[-]

ensignavenger 7 months ago

I'm not sure what you mean? Django ORM works with common SQL databases, which are generally "typed"? Auto completion is generally a function of your IDE, and there are certainly IDE's with autocompletion that work well with Python and Django? I agree that Pydantic is a great library... I use it with Django quite often, as Django is a Python Library and it plays very well with other Python libraries like Pydantic.

simonw 8 months ago

Minor correction: Starlette was started by Tom Christie who is also the creator of Django REST Framework, but he didn't start the Django project itself.

Django having async support means you can use the Django ORM, and the Django request/response cycle, and generally not need to write your async and your sync web code using slightly different APIs.