Asynchronous Error Handling Is Hard

(parallelprogrammer.substack.com)

30 points | by hedgehog a day ago ago

24 comments

Exceptions get a lot of hate, but of the three styles, I keep coming back to exceptions. Ages ago I built an application with error codes, and went back to exceptions, because I thought the ceremony of error checking was not worth it. On occasion, I'll use a get-last-error style, particularly when the error is something the user is intended to address. But for most of my applications (which are usually not libraries and are code under my control) I like exceptions.

I always have global error handler that logs and alerts of anything uncaught. This allows me to code the happy path. Most of the time, it's not worth figuring out how to continuing processing under every possible error, so to fail and bail is my default approach. If I later determine that its something that can be handled to continue processing, then I update that code path to handle that case.

Most of my code is web applications, so that is where I'm coming from.

[-]

PaulHoule 4 hours ago

Hell yeah. I typed in my first C program, a terminal emulator, from a 1984 issue of Byte magazine. It was painful seeing how 5 lines of real logic were intertwined with 45 lines of error handling logic that, in the end, did what exceptions did for free -- it was a formative experience for me as a programmer and when I saw exceptions in Java in 1995 (still in beta) they made me so happy.

In the async case you can pass the Exception as an object as opposed to throwing it but you're still left with the issue that the failure of one "task" in an asynchronous program can cause the failure of a supertask which is comprised of other tasks and handling that involves some thinking. That's chess whereas the stuff talked about in that article is Tic-Tac-Toe in comparison.

[-]

rorylaitila 4 hours ago

Yeah, I agree in the async case. What I do there is wrapped the async code in its own global error handler, so to speak. That handler is logging to something that the outer process can get-last-error from.

But I can get away with this also because I don't write async heavy code. My web applications are thread-per-request (Java). This fits 99% of the needs of business code, whose processing nature is mostly synchronous.

[-]

PaulHoule 3 hours ago

By and large you want to avoid async if you can help it. Sometimes you can’t. The struggles Rustifarians have had with it are a cautionary tale (the stack and borrow checking go together like peanut butter and jelly.). I used to have a lot of fun writing async Python, friends told me I was living dangerously, I finally rewrote my RSS reader/bookmark manager/personal web crawler/image sorter in sync style so some of it could run in Celery and that any blocking on the CPU on a 16 core machine is too much.

People used to worry about the 10k connection problem but machines are bigger now, few services are really that big, and fronting with nginx or something like that helps a lot. (That image sorter serves images with IIS)

JavaScript is async and you gotta live with it because of deployability. No fight with the App Store. No installshield engineer. No army of IT people to deploy updates. “Just works” on PC, Mac, Linux, tablet, game consoles, VR headsets, etc. Kinda sad people are making waitlist forms with frameworks that couldn’t handle the kind of knowledge graph editor and decision support applications I was writing in 2006 but that’s life.

bob1029 4 hours ago

Exceptions can even work with remote APIs.

If you reach into the enterprise bucket of tricks, technologies like WCF/SOAP can propagate these across systems reliably. You can even forward the remote stack traces by turning on some scary flags in your app.config. When printing the final exception using .ToString(), this creates a really magical narrative of what the fuck happened.

The entire reason exceptions are good is because of stack traces. It is amazing to me how many developers do not understand that having a stack trace at the exact instant of a bad thing is like having undetectable wall hacks in a competitive CS:GO match.

[-]

rorylaitila 4 hours ago

Yes, I've never quite understood the "But with exceptions it's hard to debug why the error occurred after the fact, its better to be explicit in advance" - The stack trace points exactly to the line. And usually, with the error message and context, its all I need. Maybe I'm missing something that someone can inform me.

[-]

flysand7 3 hours ago

Yeah, this kinda becomes a problem when the library you are using does not distribute its source code, so even if you get the line, this information is practically useless to you.

This has been my biggest problem with exceptions, one, for the reason outlined above, plus it's for how much time you actually end up spending on figuring out what the exception for a certain situation is. "Oh you're making a database insertion, what's the error that's thrown if you get a constraint violation, I might want to handle that". And then it's all an adventure, because there's no way to know in advance. If the docs are good it's in the docs, otherwise "just try it" seems to be the way to do it.

[-]

rorylaitila 3 hours ago

Yeah I agree with that, opaque errors from libraries are where this really sucks. The worst is when they swallow the original error and throw a generic exception instead.

3 hours ago

[deleted]

9rx 3 hours ago

> It is amazing to me how many developers do not understand that having a stack trace at the exact instant of a bad thing is like having undetectable wall hacks in a competitive CS:GO match.

Who doesn't understand that? If you aren't using exceptions you are using wrapping instead, and said wrapping is merely an alternative representation of what is ultimately the very same thing. This idea isn't lost on anyone, even if they don't use the call stack explicitly.

The benefit of wrapping over exceptions[1] is that each layer of the stack gains additional metadata to provide context around the whole execution. The tradeoff is that you need code at each layer in the stack to assign the metadata instead of being able to prepare the data structure all in one place at the point of instantiation.

[1] Technically you could wrap exceptions in exceptions, of course. This binary statement isn't quite right, but as exceptions have proven to be useless if you find yourself ending up here, with two stacks offering the same information, we will assume for the sake of discussion that the division is binary.

[-]

groestl 2 hours ago

One could say the whole point of wrapping exceptions is to add additional metadata _if such data is available_. Otherwise, the most basic metadata is tracked automatically: stack locations.

[-]

9rx 2 hours ago

Technically, the actual whole point of wrapping is to avoid leaking implementation details. If you let "FooLibraryException" bubble up, and then you stop using Foo Library, then all of the users of your code are going to end up broken waiting for "FooLibraryException" when now you throw "BarLibraryException". This diminishes any value exception handlers theoretically could provide since you end up having to wrap everything at each step anyway.

Checked exceptions were introduced to try to help with that problem, giving you at least a compiler error if an implementation changed from underneath you. But that comes with its own set of problems and at this point most consider it to be a bad idea.

Of course, many just throw caution to the wind and don't consider the future, believing they'll have moved on by then and it will be the next programmer's problem. Given the context of discussion, we have assumed that is the case.

PaulKeeble 3 hours ago

Agreed and even more heretical from me is that I quite like declared exceptions. It makes the interface of a method clear in all the ways it can fail and you can directly choose what to handle often without having to look at the docs to work out what they mean, because the names tell you what you need to know. You can ignore them and rethrow catch globally but you can also handle them.

Having used Go for years now frankly I prefer exceptions, way too often there is nothing that can be done about an error locally but it produces noise and if branches all over the code base and its even worse to add an error later to a method than in Java because every method has to have code added not just a signature change. I really miss stack traces and the current state of the art in Go has us writing code to produce them in every method.

[-]

bigstrat2003 3 hours ago

Yep, checked exceptions are the shit. You can of course abuse them to create a monstrosity (as you can with anything), but when used responsibly I think they are by far the best error handling paradigm.

o11c 3 hours ago

The problem with `getlasterror` and `errno` is that they're global (thread-local, whatever).

But if you make them take a `context` object, there's no longer a problem.

One interesting observation - you can use them even for the initial "failed to allocate a context" by interpreting a NULL pointer as always containing an "out of memory" error.

RS-232 4 hours ago

Async anything is hard!

b0a04gl 5 hours ago

in async code ,errors belong to the task ,not the caller.

in sync code ,the caller owns the stack ,so it makes sense they own the error. but async splits that. now each async function runs like a background job. that job should handle its own failure =retry ,fallback ,log because the caller usually cant do much anyway.

write async blocks like isolated tasks. contain errors inside unless the caller has a real decision to make. global error handler picks up the rest

[-]

quietbritishjim 3 hours ago

Structured concurrency [1] solves the issue of task (and exception) ownership. In languages / libraries that support it, when spawning a task you must specify some enclosing block that owns it. That block, called a nursery or task group, can be a long way outside the point where the task is spawned because the nursery is an object in its own right, so it can be passed into a function which can then call its start() method. All errors are handled at the nursery level.

They were introduced in the Trio library [2] for Python, but they're now also supported by Python's built in asyncio module [3]. I believe the idea has spread to other languages too.

[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[2] https://trio.readthedocs.io/en/stable/

[3] https://docs.python.org/3/library/asyncio-task.html#task-gro...

EGreg 5 hours ago

well, that's partially true

the caller is itself a task / actor

the thing is that the caller might want to rollback what they're doing based on whether the subtask was rolled back... and so on, backtracking as far as needed

ideally all the side effects should be queued up and executed at the end only, after your caller has successfully heard back from all the subtasks

for example... don't commit DB transactions, send out emails or post transactions onto a blockchain until you know everything went through. Exceptions mean rollback, a lot of the time.

on the other hand, "after" hooks are supposed to happen after a task completes fully, and their failure shouldn't make the task rollback anything. For really frequent events, you might want to debounce, as happens for example with browser "scroll" event listeners, which can't preventDefault anymore unless you set them with {passive: false}!

PS: To keep things simple, consider using single-threaded applications. I especially like PHP, because it's not only single-threaded but it actually is shared-nothing. As soon as your request handling ends, the memory is released. Unlike Node.js you don't worry about leaking memory or secrets between requests. But whether you use PHP or Node.js you are essentially running on a single thread, and that means you can write code that is basically sequentially doing tasks one after the other. If you need to fan out and do a few things at a time, you can do it with Node.js's Promise.all(), while with PHP you kind of queue up a bunch of closures and then explicitly batch-execute with e.g. curl_multi_ methods. Either way ... you'll need to explicitly write your commit logic in the end, e.g. on PHP's "shutdown handler", and your database can help you isolate your transactions with COMMIT or ROLLBACK.

If you organize your entire code base around dispatching events instead of calling functions, as I did, then you can easily refactor it to do things like microservices at scale by using signed HTTPS requests as a transport (so you can isolate secrets, credentials, etc.) from the web server: https://github.com/Qbix/Platform/commit/a4885f1b94cab5d83aeb...

[-]

jpc0 4 hours ago

I liked where you started.

Any ASYNC operation, whether using coroutines or event based actors or whatever else should be modelled as a network call.

You need a handle that will contain information about the async call and will own the work it performs. You can have an API that explicitly says “I don’t care what happens to this thing just that it happens” and will crash on failure. Or you can handle its errors if there are any and importantly decide how to handle those errors.

Oh and failing to allocate/create that handle should be a breach of invariants and immediately crash.

That way you have all the control and flexibility and Async error handling becomes trivial, you can use whatever async pattern you want to manage async operations at that point as well.

And you also know you have fundamentally done something expensive in latency for the benefit of performance or access to information, because if it was cheap you would have just done it on the thread you are already using.

renox 3 hours ago

> for example... don't commit DB transactions, send out emails or post transactions onto a blockchain until you know everything went through. Exceptions mean rollback, a lot of the time.

But what if you need to send emails AND record it in a DB?

[-]

lelanthran 40 minutes ago

I had the same question, actually; it is very common to perform multiple point-of-no-return IO in a workflow, so deferring all IO into a specific spot does not, in practice, bring any advantages.

[-]

EGreg 31 minutes ago

It does. You queue ALL of these side effects (simply tasks whose exceptions don't rollback your own task) until the end. Then you can perform them all, in parallel if you wish.

innocentoldguy 3 hours ago

One of the things I love most about Elixir is that it makes asynchronous error handling easier than any other language I've used. Asynchronous code used to be the source of many difficult bugs in the teams I've worked with, but Elixir's (or, more accurately, Erlang's) "let it crash" architecture helps eliminate many of these issues.