A first look at Django's new background tasks

(roam.be)

101 points | by roam 11 hours ago ago

22 comments

  • JakaJancar 7 hours ago

    Assuming you're fine with keeping the queue in postgres, I've used Procrastinate and it's great:

    https://procrastinate.readthedocs.io/en/stable/index.html

    Core is not Django-specific, but it has an optional integration. Sync and async, retries/cancellation/etc., very extensible, and IMO super clean architecture and well tested.

    IIRC think the codebase is like one-tenth that of Celery.

    • TkTech 6 hours ago

      If you like Procastinate, you might like my Chancy, which is also built on postgres but with a goal of the most common bells and whistles being included.

      Rate limiting, global uniqueness, timeouts, memory limits, mix asyncio/processes/threads/sub-interpreters in the same worker, workflows, cron jobs, dashboard, metrics, django integrations, repriotization, triggers, pruning, Windows support, queue tagging (ex: run this queue on all machines running windows with a GPU, run this one on workers with py3.14 and this one on workers with py3.11) etc etc...

      https://tkte.ch/chancy/ & https://github.com/tktech/chancy

      The pending v0.26 includes stabilizing of the HTTP API, dashboard improvements, workflow performance improvements for workflows with thousands of steps and django-tasks integration.

  • adamcharnock 2 hours ago

    Something about queuing systems that often gets me is that they can start to seem like the wrong abstraction as soon as one has tasks that enqueue additional tasks. Particularly when features start growing, and double particularly when modelling business processes.

    This is because the code enqueuing the task needs to be aware of what happens next, which breaks separation of concerns. Why should the user sign-up code have to know that a report generation job now needs queuing?

    Really what starts to make more sense to me is to fire off events. Code can say, "this thing just happened", and let other code decide if it wants to listen. When then makes it an event stream rather than a queue, with consumer groups at al.

    I made the (now unmaintained) project https://lightbus.org around this, and it did work really well for our use case. Hopefully someone has now created something better.

    So I'd say this: before grabbing for a task queue, take a moment to think about what you're actually modelling. But be careful of the event streaming rabbit-hole!

    • globular-toast an hour ago

      They're not mutually exclusive. Nothing about "event driven" means async. I have an event driven modular monolith and all events are handled synchronously. It's up to the receiver to queue a task if it needs to, so context boundaries are not crossed.

  • elliot07 2 hours ago

    Celery is such garbage to run/maintain at any sort of scale. Very excited for this. Rq/temporal also seem to solve this well.

    Anyone here done the migration off of celery to another thing? Any wisdom?

    • pmontra 2 hours ago

      A customer of mine has two projects. One running on their own hardware, Django + Celery. The other one running on AWS EC2, Django alone.

      In the first one we use Celery to run some jobs that may last from a few seconds to some minutes. In the other one we create a new VM and make it run the job and we make it self destroy on job termination. The communication is over a shared database and SQS queues.

      We have periodic problems with celery: workers losing connection with RabbitMQ, Celery itself getting stuck, gevent issues maybe caused by C libraries but we can't be sure (we use prefork for some workers but not for everything)

      We had no problems with EC2 VMs. By the way, we use VirtualBox to simulate EC2 locally: a Python class encapsulates the API to start the VMs and does it with boto3 in production and with VBoxManage in development.

      What I don't understand is: it's always Linux, amd64, RabbitMQ but my other customer using Rails and Sidekiq has no problems and they run many more jobs. There is something in the concurrency stack inside Celery that is too fragile.

    • odie5533 2 hours ago

      Migrated Celery to Argo Workflows. No wisdom as it was straightforward. You lose a lot startup speed though, so it's not a drop-in replacement and is only a good choice for long-running workflows. Celery was easier than Argo Workflows. Celery is really easy to get started with. I like Airflow the best, but it's closer to Argo Workflows in terms of more long-lived workflows. I hope to try Hatchet soon. I've read Temporal is even harder to manage.

    • kbumsik 2 hours ago

      We switched from Celery to Temporal. Temporal is such a great piece of distributed system.

    • nojs 37 minutes ago

      What were the problems you had with Celery?

  • frankhsu 5 hours ago

    Really cool to see a batteries‑included option in Django for background jobs.

    For folks who’ve used Celery/Procrastinate/Chancy: how does retry/ACK behavior feel in real projects? Any rough edges?

    What about observability — dashboards, tracing, metrics — good enough out of the box, or did you bolt on extra stuff?

    Also, any gotchas with type hints or decorator-style tasks when refactoring? I’ve seen those bite before.

    And lastly, does swapping backends for tests actually feel seamless, or is that more of a “works in the demo” thing?

    • TkTech 5 hours ago

      (I'm biased, I'm the author of Chancy)

      One of the major complaints with Celery is observability. Databased-backed options like Procastinate and Chancy will never reach the potential peak throughput of Celery+RabbitMQ, but they're still sufficient to run millions upon millions of tasks per day even on a $14/month VPS. The tradeoff to this is excellent insight into what's going on - all state lives in the database, you can just query it. Both Procastinate and Chancy come with Django integrations, so you can even query it with the ORM.

      For Chancy in particular, retries are a (very trivial) plugin (that's enabled by default) - https://github.com/TkTech/chancy/blob/main/chancy/plugins/re.... You can swap it out and add whatever complex retry strategies you'd like.

      Chancy also comes with a "good enough" metrics plugin and a dashboard. Not suitable for an incredibly busy instance with tens of thousands of distinct types of jobs, but good enough for most projects. You can see the new UI and some example screenshots in the upcoming 0.26 release - https://github.com/TkTech/chancy/pull/58 (and that dashboard is for a production app running ~600k jobs a day on what's practically a toaster). The dashboard can be run standalone locally and pointing to any database as-needed, run inside a worker process, or embedded inside any existing asgi app.

  • vforgione 8 hours ago

    I’ve been using the django-tasks library in production for about a year. The database backend and simple interface have been great. It definitely isn’t intended to replace all of celery, but for a simple task queue that doesn’t require additional infrastructure it works quite well.

  • matsemann 10 hours ago

    How is the typing support? We just had downtime because a change to a celery task didn't trigger mypy to complain for all call sites until runtime. Too many python decorators aren't written with pretty weak typing support.

    • roam 9 hours ago

      With regards to args and kwargs? None. Your callable is basically replaced with a Task instance that’s not callable. You need to invoke its enqueue(*args, **kwargs) method and yeah… that’s of course not typed.

    • halfcat 8 hours ago

      Static analysis will never be fully robust in Python. As a simple example, you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically, or even know what the call path of the functions is, without actually running the code in trace/profiler mode.

      You probably want something like pydantic’s @validate_call decorator.

      • tomjakubowski 7 hours ago

        > you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically

        Can you say more, maybe with with an example, about a function which can't be typed? Are you talking about generating bytecode at runtime, defining functions with lambda expressions, or something else?

  • the__alchemist 9 hours ago

    This is great! The prev recommendation was usually a lib called celery that I wasn't able to get working. I don't remember the details, but it had high friction points or compatibility barriers I wasn't able to overcome. This integration fits Django's batteries included approach.

    I've been handling this, so far, with separate standalone scripts that hook into Django's models and ORM. You have to use certain incantations in an explicit order at the top of the module to make this happen.

  • ethagnawl 7 hours ago

    This is an exciting development. I imagine I'll continue using Celery in most cases but being able to transparently swap back-ends for testing, CI, etc. is very compelling.

    I haven't looked into this in any detail but I wonder if the API or defaults will shave off some of the rough edges in Celery, like early ACKs and retries.

  • fud101 7 hours ago

    Django this is about 10 years too late. It's frustrating because we use all manner of hacks to work around this being part of the builtin story.

    • cuu508 9 minutes ago

      The best time to plant a tree is twenty years ago. The second best time is now.