How to Correctly Sum Up Numbers

(cedardb.com)

39 points | by sebg 5 days ago ago

43 comments

I'd argue this is about safely summing up numbers, not correctly.

For example, this is the proposed rust solution (imho the most readable):

    fn sum_up_numbers_correctly(numbers: &[i32]) -> Option<i32> {
        let mut sum: i32 = 0;
        for i in numbers {
            sum = sum.checked_add(*i)?;
        }
        Some(sum)
    }

If you try to sum [i32::MAX, i32::MIN, 2] this will return None, while the naive wrapping addition would have returned 1. I'd call None a safe result, it's the algorithm bailing out when it's unsure, and 1 the correct result.

The upside of wrapping addition is that you can overflow as often as you want, as long as the result is within the range of the type, the result (of addition and subtraction) is correct. The downside is that you don't know if the result is within range. So you trade between correctly handling a larger range of inputs with wrapping or being more certain of the result with checked overflow.

My preferred solution would have been just upcasting. If we make sum an i64 we know it can't overflow the intermediate if you sum less than 2^32 i32 numbers, and on a 64 bit machine you lose basically nothing by doing that. Keep the checked_add if you are concerned about people summing extremely long lists of numbers

    fn sum_up_numbers_correctly(numbers: &[i32]) -> Option<i32> {
        let mut sum: i64 = 0;
        for i in numbers {
            sum = sum.checked_add(*i as i64)?;
        }
        Some(sum.try_into().ok()?)
    }

(the try_into().ok() turns values that overflow i32 into None).

When summing i64 you can do the same trick by summing into an i128

GMoromisato 2 days ago

One of the things I love about software engineering is that there are no absolute answers, only trade-offs.

You can't memorize your way into software engineering. You can't say, "always wrap-around when an operation overflows" or "always tell the user (panic) when an operation overflows".

This article is a great example. C wraps unsigned integer addition on overflow, but Python upscales to arbitrary integer sizes. The trade-off, of course, is performance. If you're writing a real-time engine (for a rocket or a video game) and you can guarantee integer sizes, then you need to use the fastest algorithm possible. But if you're writing a general-purpose calculator, you might prefer upscaling or at least catching overflows and informing the user.

In almost any other disciple there would be a "correct" way to do it, or at least an accepted "best practice." But for software engineers, it's always, "it depends."

[-]

Zamiel_Snawley a day ago

Saturation arithmetic doesn’t get enough love, either.

I’ve heard Zig has convenient facilities for it.

[-]

GMoromisato a day ago

I didn't know it was called that, but I agree! Clamping is a valid option for overflow.

IshKebab a day ago

There are plenty of absolutes. People just don't talk about them much because there's not much to say:

* binary is the best way to encode information and numbers * twos complement is the best general purpose way to encode negative integers * static typing and stronger typing lead to fewer bugs * length prefixes are better than using sentinel values and escaping

> In almost any other discipline there would be a "correct" way to do it

Tell me you know nothing about anything except computers without...

You think there's a "correct" way to build a bridge?

[-]

poincaredisk 19 hours ago

Well even your examples...

* two's complement is the best general purpose way (tradeoffs!)

* static typing leads to fewer bugs (but makes development slower - tradeoffs)

* length prefixes are better than sentinels - but the largest system language uses sentinels for strings (because prefixes are not actually always better, be and sentinels were deemed better by its creators)

[-]

IshKebab 10 hours ago

> general purpose way (tradeoffs!)

Not really I was more hedging against HN pedantry...

> but makes development slower - tradeoffs

No it doesn't. It makes development faster!

> because prefixes are not actually always better, be and sentinels were deemed better by its creators

Because they got it wrong.

GMoromisato a day ago

I was a little sloppy. But the difference between software engineering and building bridges is that absolutes in software engineering are automated away. If there is an always correct way to do something, then the hardware/software always does it that way and we don't expose a choice to the software engineer. When a choice is exposed to the software engineer it is because there is a trade-off.

In contrast, when making concrete, there are valid and invalid ratios of water/cement/etc. You have to memorize (or lookup) the correct ratios. Building a bridge is filled with things you have to look up or just compute from basic parameters.

Of course there are engineering trade-offs in building a bridge. But there are also many "just follow the algorithm" procedures. In software engineering, it's all trade-offs! Almost by definition, if there is a choice, it's a trade-off that requires human judgement, otherwise, the choice would be made for you by the computer.

cratermoon 2 days ago

> You can't say, "always wrap-around when an operation overflows" or "always tell the user (panic) when an operation overflows".

If you were designing the flight software for the Ariane 5, which choice would you make?

[-]

TeMPOraL a day ago

The important choice was made elsewhere: flight termination system successfully self-destructed the rocket once it was unlikely to recover from its problem.

The Wiki quotes an interesting sentence from the official report - "An underlying theme in the development of Ariane 5 is the bias towards the mitigation of random failure." It's fortunate that that bias didn't extend to flight termination system. At the point flight control software intersects with real life, it's better to destroy an expensive rocket than have it rain fiery hell on some random people somewhere else. That principle holds to this day.

GMoromisato 2 days ago

I understood that reference.

But it just reinforces my point that there are no absolute rules. You have to think.

wang_li a day ago

> If you were designing the flight software for the Ariane 5, which choice would you make?

I would take note of the underlying hardware and what range of inputs are allowed and code appropriately. Stop pretending that your software doesn’t have a specific purpose and will run on an actual computer. If you’re doing finance you should be using fixed point integer arithmetic with two or three decimal places. If you’re doing physics simulations you need to be aware of how many significant digits exist. If you ignore your problem space and the kind of system your code is going to run on you’re living in spherical cow world and it doesn’t matter if you over- or under-flow because your software is garbage anyway.

[-]

gizmo686 a day ago

That is exactly what was done with Ariane. It lead to a complete failure and self destruct. In Ariane 4, they realized that an overflow was physically impossible, and so removed the check for efficiency reasons. The module containing the software ended up being reused on Ariane 5, where the overflow ended up being very possible.

eesmith 5 days ago

This essay only covers underflow and overflow, and gives an example of "correct" summation for Rust.

  fn sum_up_numbers_correctly(numbers: &[i32]) -> Option<i32> {
      let mut sum: i32 = 0;
      for i in numbers {
          sum = sum.checked_add(*i)?;
      }
      Some(sum)
  }

I expected it to also mention the Kahan summation algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm), which gives more accurate results than this checked_add solution, but it did not. I wonder if that will be in one of the subsequent blog posts.

[-]

12 hours ago

[deleted]

msichert a day ago

Hey, I'm the author of the blog post!

I had planned to write about the Kahan summation, but I didn't want to overload the blog post with too many different concepts. I'm definitely considering writing about it in a follow-up post.

lifthrasiir 5 days ago

To be clear, the checked_add solution is for integers while the Kahan summation is for IEEE 754 numbers. (There is no `f32::checked_add` or similar.)

[-]

SideQuark 16 hours ago

> There is no `f32::checked_add`

You don't need it. You get +- inf on overflow and can decide what to do then. It's also not hard to pre-check before ops if they will succeed if you want to pay the price (same as integers, or any language libs that do the same). In general these ops are expensive and rarely needed.

Making a checked math lib is not terribly hard.

eesmith 5 days ago

You're right! I am not a Rust programmer and mis-read the "i32" as "float32".

The text does imply the correct float solution has a similar structure to the int32 case, with "So, you will have to resort to writing machine code manually if you want to efficiently check overflows of float numbers."

[-]

tialaramex 2 days ago

Rust's 32-bit floating point type is, unsurprisingly, named f32 (stable Rust also has f64, and unstable builds offer f16 and possibly f128 which are both newer IEEE formats)

If you're sure your answer is a 32-bit IEEE floating point number, I think you could store the intermediate sums as a fully expanded fixed point number with enough bits to go from the smallest possible f32 to the largest possible, so you're just doing (big integer) integer summation. 36 bytes ought to be very possible. When you've finished summing, you find the top-most set bit, its position decides the exponent and the subsequent 23 bits are your mantissa.

[-]

lifthrasiir a day ago

Indeed, this approach is known as a superaccumulator [1] and faster overall than the Kahan summation for a large amount of numbers. I don't exactly know the currently accepted threshold but 10,000 numbers should be conservative enough for that.

[1] https://arxiv.org/abs/1505.05571

jepler 2 days ago

You don't need machine code these days to find overflow in IEEE floating point arithmetic.

If any intermediate sum overflows, you get an infinity. Further arithmetic can never return you to the realm of finite numbers (though you can also get into the world of "not a number" NaNs by calculating -inf + inf for instance), so you can use a standard function like (C99 <math.h>) isfinite() to check if any intermediate step of summing FP numbers overflowed the representation.

Koshkin 2 days ago

The order in which you sum a large collection of floating-point numbers is also important, if you want the result to be stable.

[-]

stabbles 2 days ago

This is more relevant than overflow

SideQuark 16 hours ago

Bingo. Higham's (recently died!) book on numerics has an excellent coverage of a massive amount of summation knowledge and algorithms.

77pt77 2 days ago

Sum smaller absolute value numbers first.

[-]

stabbles 2 days ago

It will still accumulate rounding errors of O(n eps). If you care you can make it O(lg n eps) with pairwise summation or O(eps) with Kahan summation

[-]

SideQuark 16 hours ago

You can also use multiple layered Kahan, just like the double double and quad double tricks. Adding just one more layer and letting hardware cover more bits precision is often much faster to get precision than any larger lib.

If you're overflowing the exponent, then you need more than hardware magics. But for precision I've used these multi-double tricks with the same guarantees as IEEE style formats, but with as much precision as I want.

Pesthuf 2 days ago

I just wonder, in situations where precision is so important that these small rounding errors actually matter, aren't floats the wrong tool anyway?

[-]

stabbles 2 days ago

In many numerical algorithms it is often that rounding errors add up over iterations / time. Maybe you loose one significant digit in one iteration of your algorithm, let it run longer and rounding errors start to dominate -- the result is garbage.

Since performance is important, not using floats is not an option, and you rarely see Kahan summation type of tricks because the heavy lifting is usually done by BLAS libraries (i.e. vectorized linear algebra operations).

So, in numerical algorithms the type of tricks is typically using stable matrix decompositions like the QR decomposition A = QR where Q'Q = I (repeated multiplication with Q leaves the quantities roughly the same size, whereas with A things blow up).

And there are other tricks, for example if you need to orthogonalize a vector x to this Q, in full precision you compute y = x - Q * (Q' * x), so that Q' * y = 0. The Q' * x computation is a reduction, so it can accumulate O(n eps) errors in finite precision. Instead of doing a more stable reduction (e.g. Kahan summation etc), you can instead repeat the computation: z = y - Q * (Q' * y). The idea is that the second orthogonalization step kills the rounding error noise introduced by the first step.

So in short: you can still get accuracy and decent performance with floats, you just have to know the tricks.

[Another somewhat interesting point is that on GPUs reductions are by necessity implemented with O(ln n eps) error, and the naive but vectorized CPU implementation has an error O(n/k eps) where k is the vector width]

TeMPOraL a day ago

They are, but that didn't stop people from using them. They're convenient, they're bounded, and as such, ended up having decades of hardware optimization behind then. When precision turns out to be really important, it's often cheaper to get some smart people to restructure computation as to mitigate the errors, than it is to switch to arbitrary-precision math and deal with consequences. There's a whole field of CS that's spent the last half century or more developing ways to deal with floating point math problems - extending the scope of their use, and making alternatives even less economical.

There's an interesting case of it today: LLMs. In theory, they should be fully deterministic with "temperature" parameter set to 0, but they aren't - all that matrix multiplication is massively parallelized, so with floating point math being order-dependent, partial results quickly accumulate non-deterministic errors.

owlbite 2 days ago

Sometimes its more about numerical stability than absolute accuracy (numerical stability being defined as not have the property that small changes in input cause large changes in the output).

kardos a day ago

Sorting is costly though!

This summing problem is solved by xsum: https://gitlab.com/radfordneal/xsum

There is room for more work here to make it faster or use less memory though.

phonon a day ago

You can use math.fsum[0] in Python and the upcoming Math.sumPrecise[1] in JavaScript to address much of this for double-precision floating-point.

[0]https://docs.python.org/3/library/math.html#math.fsum

[1]https://github.com/tc39/proposal-math-sum

seoulbigchris 17 hours ago

I've always wondered why programming languages define that mathematical operations should result in the same type as the operands. But this isn't true, mathematically.

[-]

SideQuark 16 hours ago

Mostly for performance. It's ludicrously expensive in general to check and upsize as required each operation. It's wiser for performance computing to have the developer decides if they want to pay the price and use suitable libraries as needed.

[-]

seoulbigchris 5 hours ago

I think why this has always stuck in my craw is that a compiler can complain if the result and operands aren't the same size. But if they ARE the same size, the compiler doesn't similarly complain "your result may overflow the destination variable". If it did, all calculations would be flagged. Imagine a strict clerk in a government office who approves vital documents based solely on whether the paper size and ink color are correct, regardless of the documents' content.

I'm not saying the status-quo is wrongheaded, and I see no obvious way to solve it -- just an observation.

[-]

SideQuark 5 hours ago

> But if they ARE the same size, the compiler doesn't similarly complain "your result may overflow the destination variable"

Pretty much any production compiler can do this. They can also warn when you truncate without doing so explicitly. Where I work, for certain projects, all C++ compilation does both of these, forcing programmers to explicitly cast into size to show they made a careful consideration of sizes.

Other places guarantee integers operate mod n, others guarantee integers grow as needed (e.g., Python).

That people don't learn the numerics of their platform is not the problem of the language or compiler. Using languages that are closer to the hardware means less hand holding when developing code, whether it's memory, numerics, files, or any part where one needs to pay attention and complexity increases.

[-]

seoulbigchris 4 hours ago

> Pretty much any production compiler can do this.

I'm curious how a compiler could know this at compile time? If you mean a runtime checking feature, I get it. I'm referring to an example statement like `ix = ia + ib` where all three are, say, u8 integers.

EDIT: I will agree in some situations, the compiler could deduce that a certain equation like above would or would not ever overflow, by context. But for the general case, I don't see how.

seoulbigchris 5 hours ago

Yeah, I get that. Existing hardware supports double-width integers for multiplication (`u16 = u8 * u8`, etc ). But for addition, typical ALUs don't support oddball widths like (`u9 = u8 + u8`, etc). And then what do you about chaining calculations? Sum up a huge column of numbers that way and you could get a u41593 value. Maybe this could be solved if there was a concept of a floating width integer, which is on paper just a special case of floating point. Maybe.

My remarks have been about integers -- the typical programming language floating point rules (`f32 = f32 * f32`, etc) make sense to me.

EDIT: Markdown editor didn't like the asterisk in my example equation.

kwillets 18 hours ago

It's a sad example of my C++ knowledge that I didn't know about the checked add intrinsics, but I do know that unsigned (x + y) < x compiles into checking the carry flag on most compilers.

[-]

SideQuark 16 hours ago

Those are compiler specific intrinsics, not standard C++.

It's pretty easy to make a checked integer (or float) math lib/type in C++. Make a class that holds an int, overload the ops, check them. I regularly do such stuff when proving boundedness of algorithms, or performance of an algorithm. Such a class can track numbers of adds, mults, etc., can catch errors, can track values of stats, anything.

I also made and regularly use an IntN type: the idea is take half the max int size your system has natively, say IntN, and make a Int2N built by hand as a class. Now templateize it on size. Now you can recursively make Int2N, 4N, 8N, etc., from that template, basically for free (no more code needed), and have any size int.

There's a lot of cool numerics stuff you can do along these lines.

mmphosis a day ago

  alias tally='awk -F '\''[^-.0-9]'\'' '\''{ for (i = 1; i <= NF; i++) s += $i } END { print s }'\'''