In the grand HN tradition of being triggered by a word in the post and going off on a not-quite-but-basically-totally-tangential rant:
There’s (at least) three areas here that are footguns with these kinds of calculations:
1) 95% is usually a lot wider than people think - people take 95% as “I’m pretty sure it’s this,” whereas it’s really closer to “it’d be really surprising if it were not this” - by and large people keep their mental error bars too close.
2) probability is rarely truly uncorrelated - call this the “Mortgage Derivatives” maxim. In the family example, rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high. This skews the distribution - modeling with an unweighted uniform distribution will lead to you being surprised at how improbable the actual outcome was.
3) In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce. We see them a bunch in nature because there tends to be negative feedback loops all over the place, but once you leave the relatively tidy garden of Mother Nature for the chaos of human affairs, normal distributions get pretty abnormal.
I like this as a tool, and I like the implementation, I’ve just seen a lot of people pick up statistics for the first time and lose a finger.
I strongly agree with this, and particularly point 1. If you ask people to provide estimated ranges for answers that they are 90% confident in, people on average produce roughly 30% confidence intervals instead. Over 90% of people don't even get to 70% confidence intervals.
I don't think estimation errors regarding things outside of someone's area of familiarity say much.
You could ask a much "easier"" question from the same topic area and still get terrible answers: "What percentage of blue whales are blue?" Or just "Are blue whales blue?"
Estimating something often encountered but uncounted seems like a better test. Like how many cars pass in front of my house every day. I could apply arithmetic, soft logic and intuition to that. But that would be a difficult question to grade, given it has no universal answer.
I have no familiarity with blue whales but I would guess they're 1--5 times the mass of lorries, which I guess weigh like 10--20 cars which I in turn estimate at 1.2--2 tonnes, so primitively 12--200 tonnes for a normal blue whale. This also aligns with it being at least twice as large as an elephant, something I estimate at 5 tonnes.
The question asks for the heaviest, which I think cannot be more than three times the normal weight, and probably no less than 1.3. That lands me at 15--600 tonnes using primitive arithmetic. The calculator in OP suggests 40--320.
The real value is apparently 170, but that doesn't really matter. The process of arriving at an interval that is as wide as necessary but no wider is the point.
Estimation is a skill that can be trained. It is a generic skill that does not rely on domain knowledge beyond some common sense.
I would say general knowledge in many domains may help with this as you can try and approximate to the nearest thing you know from that domain.
How you get good at being a generalist is the tricky part, my best bet is reading and doing a lot of trivia (I found crosswords to be somewhat effective at this, but far from being efficient)
So the context of the quiz is software estimation, where I assume it's an intentional parable of estimating something you haven't seen before. It's trying to demonstrate that your "5-7 days" estimate probably represents far more certainty than you intended.
For some of these, your answer could span orders of magnitude. E.g. my answer for the heaviest blue whale would probably be 5-500 tons because I don't have a good concept of things that weigh 500 tons. The important point is that I'm right around 9 times in 10, not that I had a precise estimate.
I don't know, an estimate spanning three orders of magnitude doesn't seem useful.
To continue your example of 5-7 days, it would turn into an estimate of 5-700 days. So somewhere between a week or two years. And fair enough, whatever you're estimating will land somewhere in between. But how do I proceed from there with actual planning or budget?
> But how do I proceed from there with actual planning or budget?
You make up the number you wanted to hear in the first place that ostensibly works with the rest of the schedule. That’s why engineering estimates are so useless - it’s not that they’re inaccurate or unrealistic - it’s that if we insisted on giving them realistic estimates we’d get fired and replaced by someone else who is willing to appease management and just kick the can down the road a few more weeks.
Your question is akin to asking ‘how do I make the tail to wag the dog?’
Your budget should be allocated for say 80% confidence (which the tool helpfully provides behind a switch) and your stakeholders must be on board with this. It shouldn’t be too hard to do since everyone has some experience with missed engineering deadlines. (Bezos would probably say 70% or even less.)
I mean it's no less useful than a more precise, but less certain estimate. It means you either need to do some work to improve your certainty (e.g. in the case of this quiz, allow spending more than 10 minutes or allow research) or prepare for the possibility that it's 700 days.
Edit: And by the way given a large enough view, estimates like this can still be valuable, because when you add these estimates together the resulting probability distribution narrows considerably. e.g. at just 10 tasks of this size, you get a 95% CI of 245~460 per task. At 20, 225~430 per task.
Note that this is obviously reductive as there's no way an estimate of 5-700 would imply a normal distribution centred at 352.5, it would be more like a logarithmic distribution where the mean is around 10 days. And additionally, this treats each task as independent...i.e. one estimate being at the high end wouldn't mean another one would be as well.
It shouldn't matter how familiar you are with the question. If you're pretty familiar, give a narrow 90% credence interval. If you're unfamiliar, give a wide interval.
This jives with my general reaction to the post, which was that the added complexity and difficulty of reasoning about the ranges actually made me feel less confident in the result of their example calculation. I liked the $50 result, you can tack on a plus or minus range but generally feel like you're about breakeven. On the other hand, "95% sure the real balance will fall into the -$60 to +$220 range" feels like it's creating a false sense of having more concrete information when you've really just added compounding uncertainties at every step (if we don't know that each one is definitely 95%, or the true min/max, we're just adding more guesses to be potentially wrong about). That's why I don't like the Drake equation, every step is just compounding wild-ass guesses, is it really producing a useful number?
It is producing a useful number. As more truly independent terms are added, error grows with the square root while the point estimation grows linearly. In the aggregate, the error makes up less of the point estimation.
This is the reason Fermi estimation works. You can test people on it, and almost universally they get more accurate with this method.
If you got less certain of the result in the example, that's probably a good thing. People are default overconfident with their estimated error bars.
Read a bit on Fermi estimation, I'm not quite sure exactly what the "method" is in contrast to a less accurate method, it's basically just getting people to think in terms of dimensional analysis? This passage from the Wikipedia is interesting:
By contrast, precise calculations can be extremely complex but with the expectation that the answer they produce is correct. The far larger number of factors and operations involved can obscure a very significant error, either in mathematical process or in the assumptions the equation is based on, but the result may still be assumed to be right because it has been derived from a precise formula that is expected to yield good results.
So the strength of it is in keeping it simple and not trying to get too fancy, with the understanding that it's just a ballpark/sanity check. I still feel like the Drake equation in particular has too many terms for which we don't have enough sample data to produce a reasonable guess. But I think this is generally understood and it's seen as more of a thought experiment.
They are meaning the same thing. The original comment pointed out that people’s qualitative description and mental model of the 95% interval means they are overconfident… they think 95 means ‘pretty sure I’m right’ rather than ‘it would be surprising to be wrong’
I think the point is to create uncertainty, though, or to at least capture it. You mention tacking a plus/minus range to $50, but my suspicion is that people's expected plus/minus would be narrower than the actual - I think the primary value of the example is that it makes it clear there's a very real possibility of the outcome being negative, which I don't think most people would acknowledge when they got the initial positive result. The increased uncertainty and the decreased confidence in the result is a feature, not a bug.
I did a project with non-technical stakeholders modeling likely completion dates for a big GANTT chart. Business stakeholders wanted probabilistic task completion times because some of the tasks were new and impractical to quantify with fixed times.
Stakeholders really liked specifying work times as t_i ~ PERT(min, mode, max) because it mimics their thinking and handles typical real-world asymmetrical distributions.
[Background: PERT is just a re-parameterized beta distribution that's more user-friendly and intuitive https://rpubs.com/Kraj86186/985700]
This looks like a much more sophisticated version of PERT than I have seen used. When people around me have claimed to use PERT, they have just added together all the small numbers, all the middle numbers, and all the big numbers. That results in a distribution that is too extreme in both lower and upper bound.
>rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high
Not sure I agree with this. It's reasonable to have a model where the mean rent may be correlated with the mean food cost, but given those two parameters we can model the fluctuations about the mean as uncorrelated. In any case at the point when you want to consider something like this you need to do proper Bayesian statistics anyways.
>In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce.
I don't know where you're getting this from. One needs uncorrelated errors, but this isn't a "constraint" or "negative feedback".
The family example is a pat example, but take something like project planning - two tasks, each one takes between 2 and 4 weeks - except that they’re both reliant on Jim, and if Jim takes the “over” on task 1, what’s the odds he takes the “under” on task 2?
This is why I joked about it as the mortgage derivatives maxim - what happened in 2008 (mathematically, at least - the parts of the crisis that aren’t covered by the famous Upton Sinclair quote) was that the mortgage backed derivatives were modeled as an aggregate of a thousand uncorrelated outcomes (a mortgage going bust), without taking into account that at least a subset of the conditions leading to one mortgage going bust would also lead to a separate unrelated mortgage going bust - the results were not uncorrelated, and treating them as such meant the “1 in a million” outcome was substantially more likely in reality than the model allowed.
Re: negative feedback - that’s a separate point from the uncorrelated errors problem above, and a critique of using the normal distribution at all for modeling many different scenarios. Normal distributions rely on some kind of, well, normal scattering of the outcomes, which means there’s some reason why they’d tend to clump around a central value. We see it in natural systems because there’s some constraints on things like height and weight of an organism, etc, but without some form of constraint, you can’t rely on a normal distribution - the classic examples being wealth, income, sales, etc, where the outliers tend to be so much larger than average that they’re effectively precluded by a normal distribution, and yet there they are.
To be clear, I’m not saying there are not statistical methods for handling all of the above, I’m noting that the naive approach of modeling several different uncorrelated normally distributed outcomes, which is what the posted tool is doing, has severe flaws which are likely to lead to it underestimating the probability of outlier outcomes.
Normal distributions are the maximum entropy distributions for a given mean and variance. Therefore, in accordance with the principle of maximum entropy, unless you have some reason to not pick a normal distribution (e.g. you know your values must be non-negative), you should be using a normal distribution.
I think to do all that you’d need a full on DSL rather than something pocket calculator like. I think adding a triangular distribution would be good though.
Great points. I think the idea of this calculator could just be simply extended to specific use cases to make the statistical calculation simple and take into account additional variables. Moving being one example.
Without having fully digested how the Unsure Calculator computes, it seems to me you could perhaps "weight" the ranges you pass to the calculator. Rather than a standard bell curve the Calculator could apply a more tightly focused — or perhaps skewed curve for that term.
If you think your salary will be in the range of 10 to 20, but more likely closer to 10 you could:
10<~20 (not to be confused with less-than)
or: 10!~20 (not to be confused with factorial)
or even: 10~12~20 to indicate a range of 10 to 20 ... leaning toward 12.
The correlation in this case isn't about the distribution for the individual event, it's about the interactions between them - so, for instance, Rent could be anywhere between 1200 and 1800, and Food could be anywhere between 100 and 150, but if Rent is 1200, it means Food is more likely to be 100, and if Food is 150, it means Rent is more likely to be 1800. Basically, there's a shared factor that's influencing both (local cost of living) that's the actual thing you need to model.
So, a realistic modeling isn't 1200~1500 + 100~150, it's (1~1.5)*(1200 + 150) - the "cost of living" distribution applies to both factors.
The android app fits lognormals, and 90% rather than 95% confidence intervals. I think they are a more parsimonious distribution for doing these kinds of estimates. One hint might be that, per the central limit theorem, sums of independent variables will tend to normals, which means that products will tend to be lognormals, and for the decompositions quick estimates are most useful, multiplications are more common
Is there a way to do non-scalar multiplication? E.g if I want to say "what is the sum of three dice rolls" (ignoring the fact that that's not a normal distro) I want to do 1~6 * 3 = 1~6 + 1~6 + 1~6 = 6~15. But instead it does 1~6 * 3 = 3~18. It makes it really difficult to do something like "how long will it take to complete 1000 tasks that each take 10-100 days?"
This is neat! If you enjoy the write up, you might be interested in the paper “Dissolving the Fermi Paradox” which goes even more on-depth into actually multiplying the probability density functions instead of the common point estimates. It has the somewhat surprising result that we may just be alone.
I have made a similar tool but for the command line[1] with similar but slightly more ambitious motivation[2].
I really like that more people are thinking in these terms. Reasoning about sources of variation is a capability not all people are trained in or develop, but it is increasingly important.[3]
The ASCII art (well technically ANSI art) histogram is neat. Cool hack to get something done quickly. I'd have spent 5x the time trying various chart libraries and giving up.
Would be nice to retransform the output into an interval / gaussian distribution
Note: If you're curious why there is a negative number (-5) in the histogram, that's just an inevitable downside of the simplicity of the Unsure Calculator. Without further knowledge, the calculator cannot know that a negative number is impossible
Drake Equation or equation multiplying probabilities can also be seen in log space, where the uncertainty is on the scale of each probability, and the final probability is the product of exponential of the log probabilities. And we wouldnt have this negative issue
It sounds like a gimmick at first, but looks surprisingly useful. I'd surely install it if it was available as an app to use alongside my usual calculator, and while I cannot quite recall a situation when I needed it, it seems very plausible that I'll start finding use cases once I have it bound to some hotkey on my keyboard.
I think they should be functions: G(50, 1) for a Gaussian with µ=50, σ=1; N(3) for a negative exponential with λ=3, U(0, 1) for a uniform distribution between 0 and 1, UI(1, 6) for an uniform integer distribution from 1 to 6, etc. Seems much more flexible, and easier to remember.
I think arbitrary distribution choice is dangerous. You're bound to end up using lots of quantities that are integers, or positive only (for example). "Confidence" will be very difficult to interpret.
Does it support constraints on solutions? E.g. A = 3~10, B = 4 - A, B > 0
On the whole it seems like a nice idea, but there's a couple of weird things, such as:
> Note: If you're curious why there is a negative number (-5) in the histogram, that's just an inevitable downside of the simplicity of the Unsure Calculator. Without further knowledge, the calculator cannot know that a negative number is impossible (in other words, you can't have -5 civilizations, for example).
The input to this was "1.5~3 x 0.9~1.0 x 0.1~0.4 x 0.1~1.0 x 0.1~1.0 x 0.1~0.2 x 304~10000" - every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
I guess this is a consequence of "I am not sure about the exact number here, but I am 95% sure it's somewhere in this range" so it's actually considering values outside of the specified range. In this case, 10% either side of all the ranges is positive except the large "304~10000".
Trying with a simpler example: "1~2 x 1~2" produces "1.3~3.4" as a result, even though "1~4" seems more intuitive. I assume this is because the confidence of 1 or 4 is now only 90% if 1~2 was at 95%, but it still feels off.
I wonder if the 95% thing actually makes sense, but I'm not especially good at stats, certainly not enough to be sure how viable this kind of calculator is with a tighter range. But just personally, I'd expect "1~2" to mean "I'm obviously not 100% sure, or else I wouldn't be using this calculator, but for this experiment assume that the range is definitely within 1~2, I just don't know where exactly".
>The input to this was "1.5~3 x 0.9~1.0 x 0.1~0.4 x 0.1~1.0 x 0.1~1.0 x 0.1~0.2 x 304~10000" - every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
Every single range here includes positive and negative numbers. To get the correct resulting distribution you have to take into account the entire input distribution. All normal distributions have a non-zero possibility to be negative.
If you want to consider only the numbers inside the range you can look at interval arithmetic, but that does not give you a resulting distribution.
The calculator in Emacs has support for what it is you request, which it calls "interval forms". Interval form arithmetic simply means executing the operations in parallel on both ends of the interval.
It also has support for "error forms" which is close to what the calculator in OP uses. That takes a little more sophistication than just performing operations on the lower and upper number in parallel. In particular, the given points don't represent actual endpoints on a distribution, but rather low and high probability events. Things more or less likely than those can happen, it's just rare.
> I'm not especially good at stats
It shows! All the things you complain about make perfect sense given a little more background knowledge.
Is it actually just doing it at both ends or something nore complex? Because for example if I did 7 - (-1~2)^2 the actual range would be 3-7 but just doing both ends of the interval would give 3-6 as the function is maximised inside the range.
> every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
They explain that the range you give as input is seen as only being 95% correct, so the calculator adds low-probability values outside of the ranges you specified.
I can see how that surprises you, but it's also a defensible design choice.
I didn't peruse the source code. I just read the linked article in its entirety and it says
> The computation is quite slow. In order to stay as flexible as possible, I'm using the Monte Carlo method. Which means the calculator is running about 250K AST-based computations for every calculation you put forth.
So therefore I conclude Monte Carlo is being used.
Line 19 to 21 should be the Monte-Carlo sampling algorithm. The implementation is maybe a bit unintuitive but apparently he creates a function from the expression in the calculator, calling that function gives a random value from that function.
I'm guessing this is not an error. If you divide 1/normal(0,1), the full distribution would range from -inf to inf, but the 95% output doesn't have to.
I don't quite understand, probably because my math isn't good enough.
If you're treating -1~1 as a normal distribution, then it's centered on 0. If you're working out the answer using a Monte Carlo simulation, then you're going to be testing out different values from that distribution, right? And aren't you going to be more likely to test values closer to 0? So surely the most likely outputs should be far from 0, right?
When I look at the histogram it creates, it varies by run, but the most common output seems generally closest to zero (and sometimes is exactly zero). Wouldn't that mean that it's most frequently picking values closest to -1 or 1 denoninator?
If X is normal and centered around 0, then the average of 1/X does not exist (math speak for "is infinity" in this case). In these cases Monte Carlo simulations are not reliable because they give high variance estimates (math speak for "the histogram varies run by run").
OK, but do we necessarily just care about the central 95% range of the output? This calculation has the weird property that values in the tails of the input correspond to values in the middle of the output, and vice versa. If you follow the intuition that the range you specify in the input corresponds to the values you expect to see, the corresponding outputs would really include -inf and inf.
Now I'm realizing that this doesn't actually work, and even in more typical calculations the input values that produce the central 95% of the output are not necessarily drawn from the 95% CIs of the inputs. Which is fine and makes sense, but this example makes it very obvious how arbitrary it is to just drop the lowermost and uppermost 2.5%s rather than choosing any other 95/5 partition of the probability mass.
That may be true, but if you look at the distribution it puts out for this, it definitely smells funny. It looks like a very steep normal distribution, centered at 0 (ish). Seems like it should have two peaks? But maybe those are just getting compressed into one because of resolution of buckets?
https://qalculate.github.io can do this also for as long as I've used it (only a couple years to be fair). I've got it on my phone, my laptop, even my server with apt install qalc. Super convenient, supports everything from unit conversion to uncertainty tracking
The histogram is neat, I don't think qalc has that. On the other hand, it took 8 seconds to calculate the default (exceedingly trivial) example. Is that JavaScript, or is the server currently very busy?
It's all computed in the browser so yeah, it's JavaScript. Still, 8 seconds is a lot -- I was targeting sub-second computation times (which I find alright).
This is awesome. I used Causal years ago to do something similar, with perhaps slightly more complex modelling, and it was great. Unfortunately the product was targeted at high paying enterprise customers and seems to have pivoted into finance now, I've been looking for something similar ever since. This probably solves at least, err... 40~60% of my needs ;)
I actually stumbled upon this a while ago from social media and the web version has a somewhat annoying latency, so I wrote my own version in Python. It uses numpy so it's faster. https://gist.github.com/kccqzy/d3fa7cdb064e03b16acfbefb76645... Thank you filiph for this brilliant idea!
The reason I'm asking: unsure also has a CLI version (which is leaps and bounds faster and in some ways easier to use) but I rarely find myself using it. (Nowadays, I use https://filiph.github.io/napkin/, anyway, but it's still a web app rather than a CLI tool.)
Love it! I too have been toying with reasoning about uncertainty. I took a much less creative approach though and just ran a bunch of geometric brownian motion simulations for my personal finances [0]. My approach has some similarity to yours, though much less general. It displays the (un)certainty over time (using percentile curves), which was my main interest. Also, man, the UI, presentation, explanations: you did a great job, pretty inspiring.
Very cool. This can also be used for LLM cost estimation. Basically any cost estimation I suppose. I use cloudflare workers a lot and have a few workers running for a variable amount of time. This could be useful to calculate a ball park figure of my infra cost. Thank you!
Here (https://uncertainty.nist.gov/) is another similar Monte Carlo-style calculator designed by the statisticians at NIST. It is intended for propagating uncertainties in measurements and can handle various different assumed input distributions.
I think I was looking at this and several other similar calculators when creating the linked tool. This is what I mean when I say "you'll want to use something more sophisticated".
The problem with similar tools is that of the very high barrier to entry. This is what my project was trying to address, though imperfectly (the user still needs to understand, at the very least, the concept of probability distributions).
I want to ask about adjacent projects - user interface libraries that provide input elements for providing ranges and approximate values. I'm starting my search around https://www.inkandswitch.com/ and https://malleable.systems/catalog/ but I think our collective memory has seen more examples.
If I am reading this right, a range is expressed as a distance between the minimum and maximum values, and in the Monte Carlo part a number is generated from a uniform distribution within that range[1].
But if I just ask the calculator "1~2" (i.e. just a range without any operators), the histogram shows what looks like a normal distribution centered around 1.5[2].
Shouldn't the histogram be flat if the distribution is uniform?
> Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.
Part of the confusion here is likely that the tool, as seen on the web, probably lags significantly behind the code. I've started using a related but different tool (https://filiph.github.io/napkin/).
The HN mods gave me an opportunity to resubmit the link, so I did. If I had more time, I'd have also upgraded the tool to the latest version and fix the wording. But unfortunately, I didn't find the time to do this.
Wow, this is fantastic! I did not know about squiggle language, and it's basically what I was trying to get to from my unsure calculator through my next project (https://filiph.github.io/napkin/). Squiggle looks and works much better.
An alternative approach is using fuzzy-numbers. If evaluated with interval arithmetic you can do very long calculations involving uncertain numbers very fast and with strong mathematical guarantees.
It would especially outperform the Monte-Carlo approach drastically.
I'm familiar with fuzzy numbers (e.g. see my https://filiph.net/fuzzy/ toy) but I didn't know there's arithmetic with fuzzy numbers. How is it done? Do you have a link?
There is a book by Hanss on it. It focuses on the sampling approach (he calls it "transformation method") though.
If you want to do arithmetic and not a black box approach you just have to realize that you can perform them on the alpha-cuts with ordinary interval arithmetic. Then you can evaluate arbitrary expressions involving fuzzy numbers, keeping the strengths and weaknesses of interval arithmetic.
The sampling based approach is very similar to Monte-Carlo, but you sample at certain well defined points.
If they are truly independent of each other some of the uncertainty cancels out. 10 people and a budget of $1/person are both unlikely events, and two unlikely events occurring independently of each other is even more unlikely. And because the calculator is not about the full range of possible values but about the values in the 95% confidence interval this leads to the outer edges of the range now falling outside the 95% confidence interval
Interesting. I like the notation and the histogram that comes out with the output. I also like the practical examples you gave (e.g. the application of the calculator to business and marketing cases). I will try it out with simple estimates in my marketing campaigns.
Cool! Some random requests to consider: Could the range x~y be uniform instead of 2 std dev normal (95.4%ile)? Sometimes the range of quantities is known. 95%ile is probably fine as a default though.
Also, could a symbolic JS package be used instead of Monte-Carlo? This would improve speed and precision, especially for many variables (high dimensions).
Could the result be shown in a line plot instead of ASCII bar chart?
”Without further knowledge, the calculator cannot know that a negative number is impossible (in other words, you can't have -5 civilizations, for example).”
Not true. If there are no negative terms, the equation cannot have negative values.
The calculator cannot know whether there are no negative terms. For example, if people's net worth is distributed 0.2–400, there's likely a significant chunk of people who are, on the whole, in debt. These will be represented as a negative term, even though their distribution was characterised by positive numbers.
The range notation indicates 95% confidence intervals, not the minima and maxima. If the lower bounds are close enough to zero (and the interval is large enough), then there may some residual probability mass associated with negative values of the variable.
Really cool! On iOS there's a noticeable delay when clicking the buttons and clicking the backspace button quickly zooms the page so it's very hard to use. Would love it in mobile friendly form!
I love this! As a tool for helping folks with a good base in arithmetic develop statistical intuition, I can't think offhand of what I've seen that's better.
It's hard for me to imagine _dividing_ by -1~1 in a real-world scenario, but let's say we divide by 0~10, which also includes zero. For example, we are dividing the income between 0 to 10 shareholders (still forced, but ok).
Clearly, it's possible to have a division by zero here, so "0 sharehodlers would each get infinity". And in fact, if you try to compute 500 / 0, or even 500~1000 / 0, it will correctly show infinity.
But if you divide by a range that merely _includes_ zero, I don't think it should give you infinity. Ask yourself this: does 95% of results of 500 / 0~10 become infinity?
See also Guesstimate https://getguesstimate.com. Strengths include treating label and data as a unit, a space for examining the reasoning for a result, and the ability to replace an estimated distribution with sample data => you can build a model and then refine it over time. I'm amazed Excel and Google Sheets still haven't incorporated these things, years later.
i like it and i skimmed the post but i don't understand why the default example 100 / 4~6 has a median of 20? there is no way of knowing why the range is between 4 and 6
mmh ok thanks, i guess i need extra maths training ;)
i didn't mean knowing _that_ the range is between 4 and 6 but _why_, i thought the weighing would be explained by the reasoning, like: "we divide a €100 bill between possibly 4, rather 5 and most probably not 6 persons"
> Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.
There's an amazing scene in "This is Spinal Tap" where Nigel Tufnel had been brainstorming a scene where Stonehenge would be lowered from above onto the stage during their performance, and he does some back of the envelope calculations which he gives to the set designer. Unfortunately, he mixes the symbol for feet with the symbol for inches. Leading to the following:
I like this!
In the grand HN tradition of being triggered by a word in the post and going off on a not-quite-but-basically-totally-tangential rant:
There’s (at least) three areas here that are footguns with these kinds of calculations:
1) 95% is usually a lot wider than people think - people take 95% as “I’m pretty sure it’s this,” whereas it’s really closer to “it’d be really surprising if it were not this” - by and large people keep their mental error bars too close.
2) probability is rarely truly uncorrelated - call this the “Mortgage Derivatives” maxim. In the family example, rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high. This skews the distribution - modeling with an unweighted uniform distribution will lead to you being surprised at how improbable the actual outcome was.
3) In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce. We see them a bunch in nature because there tends to be negative feedback loops all over the place, but once you leave the relatively tidy garden of Mother Nature for the chaos of human affairs, normal distributions get pretty abnormal.
I like this as a tool, and I like the implementation, I’ve just seen a lot of people pick up statistics for the first time and lose a finger.
I strongly agree with this, and particularly point 1. If you ask people to provide estimated ranges for answers that they are 90% confident in, people on average produce roughly 30% confidence intervals instead. Over 90% of people don't even get to 70% confidence intervals.
You can test yourself at https://blog.codinghorror.com/how-good-an-estimator-are-you/.
From link:
> Heaviest blue whale ever recorded
I don't think estimation errors regarding things outside of someone's area of familiarity say much.
You could ask a much "easier"" question from the same topic area and still get terrible answers: "What percentage of blue whales are blue?" Or just "Are blue whales blue?"
Estimating something often encountered but uncounted seems like a better test. Like how many cars pass in front of my house every day. I could apply arithmetic, soft logic and intuition to that. But that would be a difficult question to grade, given it has no universal answer.
I have no familiarity with blue whales but I would guess they're 1--5 times the mass of lorries, which I guess weigh like 10--20 cars which I in turn estimate at 1.2--2 tonnes, so primitively 12--200 tonnes for a normal blue whale. This also aligns with it being at least twice as large as an elephant, something I estimate at 5 tonnes.
The question asks for the heaviest, which I think cannot be more than three times the normal weight, and probably no less than 1.3. That lands me at 15--600 tonnes using primitive arithmetic. The calculator in OP suggests 40--320.
The real value is apparently 170, but that doesn't really matter. The process of arriving at an interval that is as wide as necessary but no wider is the point.
Estimation is a skill that can be trained. It is a generic skill that does not rely on domain knowledge beyond some common sense.
I would say general knowledge in many domains may help with this as you can try and approximate to the nearest thing you know from that domain.
How you get good at being a generalist is the tricky part, my best bet is reading and doing a lot of trivia (I found crosswords to be somewhat effective at this, but far from being efficient)
I guess people didn't realise they are allowed to, and in fact are expected to, put very wide ranges for things they are not certain about.
So the context of the quiz is software estimation, where I assume it's an intentional parable of estimating something you haven't seen before. It's trying to demonstrate that your "5-7 days" estimate probably represents far more certainty than you intended.
For some of these, your answer could span orders of magnitude. E.g. my answer for the heaviest blue whale would probably be 5-500 tons because I don't have a good concept of things that weigh 500 tons. The important point is that I'm right around 9 times in 10, not that I had a precise estimate.
I don't know, an estimate spanning three orders of magnitude doesn't seem useful.
To continue your example of 5-7 days, it would turn into an estimate of 5-700 days. So somewhere between a week or two years. And fair enough, whatever you're estimating will land somewhere in between. But how do I proceed from there with actual planning or budget?
> But how do I proceed from there with actual planning or budget?
You make up the number you wanted to hear in the first place that ostensibly works with the rest of the schedule. That’s why engineering estimates are so useless - it’s not that they’re inaccurate or unrealistic - it’s that if we insisted on giving them realistic estimates we’d get fired and replaced by someone else who is willing to appease management and just kick the can down the road a few more weeks.
Your question is akin to asking ‘how do I make the tail to wag the dog?’
Your budget should be allocated for say 80% confidence (which the tool helpfully provides behind a switch) and your stakeholders must be on board with this. It shouldn’t be too hard to do since everyone has some experience with missed engineering deadlines. (Bezos would probably say 70% or even less.)
I mean it's no less useful than a more precise, but less certain estimate. It means you either need to do some work to improve your certainty (e.g. in the case of this quiz, allow spending more than 10 minutes or allow research) or prepare for the possibility that it's 700 days.
Edit: And by the way given a large enough view, estimates like this can still be valuable, because when you add these estimates together the resulting probability distribution narrows considerably. e.g. at just 10 tasks of this size, you get a 95% CI of 245~460 per task. At 20, 225~430 per task.
Note that this is obviously reductive as there's no way an estimate of 5-700 would imply a normal distribution centred at 352.5, it would be more like a logarithmic distribution where the mean is around 10 days. And additionally, this treats each task as independent...i.e. one estimate being at the high end wouldn't mean another one would be as well.
It shouldn't matter how familiar you are with the question. If you're pretty familiar, give a narrow 90% credence interval. If you're unfamiliar, give a wide interval.
This jives with my general reaction to the post, which was that the added complexity and difficulty of reasoning about the ranges actually made me feel less confident in the result of their example calculation. I liked the $50 result, you can tack on a plus or minus range but generally feel like you're about breakeven. On the other hand, "95% sure the real balance will fall into the -$60 to +$220 range" feels like it's creating a false sense of having more concrete information when you've really just added compounding uncertainties at every step (if we don't know that each one is definitely 95%, or the true min/max, we're just adding more guesses to be potentially wrong about). That's why I don't like the Drake equation, every step is just compounding wild-ass guesses, is it really producing a useful number?
It is producing a useful number. As more truly independent terms are added, error grows with the square root while the point estimation grows linearly. In the aggregate, the error makes up less of the point estimation.
This is the reason Fermi estimation works. You can test people on it, and almost universally they get more accurate with this method.
If you got less certain of the result in the example, that's probably a good thing. People are default overconfident with their estimated error bars.
Read a bit on Fermi estimation, I'm not quite sure exactly what the "method" is in contrast to a less accurate method, it's basically just getting people to think in terms of dimensional analysis? This passage from the Wikipedia is interesting:
By contrast, precise calculations can be extremely complex but with the expectation that the answer they produce is correct. The far larger number of factors and operations involved can obscure a very significant error, either in mathematical process or in the assumptions the equation is based on, but the result may still be assumed to be right because it has been derived from a precise formula that is expected to yield good results.
So the strength of it is in keeping it simple and not trying to get too fancy, with the understanding that it's just a ballpark/sanity check. I still feel like the Drake equation in particular has too many terms for which we don't have enough sample data to produce a reasonable guess. But I think this is generally understood and it's seen as more of a thought experiment.
> People are default overconfident with their estimated error bars.
You say this but yet roughly in a top level comment mentions people keep their error bars too close.
Sorry, my comment was phrased confusingly.
Being overconfident with error bars means placing them too close to the point estimation, i.e. the error bars are too narrow.
Ah right thanks, I read that backwards.
They are meaning the same thing. The original comment pointed out that people’s qualitative description and mental model of the 95% interval means they are overconfident… they think 95 means ‘pretty sure I’m right’ rather than ‘it would be surprising to be wrong’
I think the point is to create uncertainty, though, or to at least capture it. You mention tacking a plus/minus range to $50, but my suspicion is that people's expected plus/minus would be narrower than the actual - I think the primary value of the example is that it makes it clear there's a very real possibility of the outcome being negative, which I don't think most people would acknowledge when they got the initial positive result. The increased uncertainty and the decreased confidence in the result is a feature, not a bug.
I did a project with non-technical stakeholders modeling likely completion dates for a big GANTT chart. Business stakeholders wanted probabilistic task completion times because some of the tasks were new and impractical to quantify with fixed times.
Stakeholders really liked specifying work times as t_i ~ PERT(min, mode, max) because it mimics their thinking and handles typical real-world asymmetrical distributions.
[Background: PERT is just a re-parameterized beta distribution that's more user-friendly and intuitive https://rpubs.com/Kraj86186/985700]
This looks like a much more sophisticated version of PERT than I have seen used. When people around me have claimed to use PERT, they have just added together all the small numbers, all the middle numbers, and all the big numbers. That results in a distribution that is too extreme in both lower and upper bound.
that... is not PERT. it's 'I read a tweet about three point estimates' and I'm using a generous interpretation of read
arguably this is how it should always be done, fixed durations for any tasks are little more than wishful thinking.
>rent is very likely to be correlated with food costs - so, if rent is high, food costs are also likely to be high
Not sure I agree with this. It's reasonable to have a model where the mean rent may be correlated with the mean food cost, but given those two parameters we can model the fluctuations about the mean as uncorrelated. In any case at the point when you want to consider something like this you need to do proper Bayesian statistics anyways.
>In general normal distributions are rarer than people think - they tend to require some kind of constraining factor on the values to enforce.
I don't know where you're getting this from. One needs uncorrelated errors, but this isn't a "constraint" or "negative feedback".
The family example is a pat example, but take something like project planning - two tasks, each one takes between 2 and 4 weeks - except that they’re both reliant on Jim, and if Jim takes the “over” on task 1, what’s the odds he takes the “under” on task 2?
This is why I joked about it as the mortgage derivatives maxim - what happened in 2008 (mathematically, at least - the parts of the crisis that aren’t covered by the famous Upton Sinclair quote) was that the mortgage backed derivatives were modeled as an aggregate of a thousand uncorrelated outcomes (a mortgage going bust), without taking into account that at least a subset of the conditions leading to one mortgage going bust would also lead to a separate unrelated mortgage going bust - the results were not uncorrelated, and treating them as such meant the “1 in a million” outcome was substantially more likely in reality than the model allowed.
Re: negative feedback - that’s a separate point from the uncorrelated errors problem above, and a critique of using the normal distribution at all for modeling many different scenarios. Normal distributions rely on some kind of, well, normal scattering of the outcomes, which means there’s some reason why they’d tend to clump around a central value. We see it in natural systems because there’s some constraints on things like height and weight of an organism, etc, but without some form of constraint, you can’t rely on a normal distribution - the classic examples being wealth, income, sales, etc, where the outliers tend to be so much larger than average that they’re effectively precluded by a normal distribution, and yet there they are.
To be clear, I’m not saying there are not statistical methods for handling all of the above, I’m noting that the naive approach of modeling several different uncorrelated normally distributed outcomes, which is what the posted tool is doing, has severe flaws which are likely to lead to it underestimating the probability of outlier outcomes.
Normal distributions are the maximum entropy distributions for a given mean and variance. Therefore, in accordance with the principle of maximum entropy, unless you have some reason to not pick a normal distribution (e.g. you know your values must be non-negative), you should be using a normal distribution.
At least also accept a log-normal distribution. Sometimes you need a factor like .2 ~ 5, but that isn't the same as N(2.6, 1.2).
> you should be using a normal distribution.
...if the only things you know about an uncertain value are its expectation and variance, yes.
Often you know other things. Often you don't know expectation and variance with any certainty.
> I’ve just seen a lot of people pick up statistics for the first time and lose a finger.
I love this. I've never though of statistics like a power tool or firearm, but the analogy fits really well.
Unfortunately it's usually someone else who loses a finger, not the person wielding the statistics.
I think to do all that you’d need a full on DSL rather than something pocket calculator like. I think adding a triangular distribution would be good though.
Great points. I think the idea of this calculator could just be simply extended to specific use cases to make the statistical calculation simple and take into account additional variables. Moving being one example.
Actually using it already after finding it few days ago on HN
> 2) probability is rarely truly uncorrelated
Without having fully digested how the Unsure Calculator computes, it seems to me you could perhaps "weight" the ranges you pass to the calculator. Rather than a standard bell curve the Calculator could apply a more tightly focused — or perhaps skewed curve for that term.
If you think your salary will be in the range of 10 to 20, but more likely closer to 10 you could:
10<~20 (not to be confused with less-than)
or: 10!~20 (not to be confused with factorial)
or even: 10~12~20 to indicate a range of 10 to 20 ... leaning toward 12.
The correlation in this case isn't about the distribution for the individual event, it's about the interactions between them - so, for instance, Rent could be anywhere between 1200 and 1800, and Food could be anywhere between 100 and 150, but if Rent is 1200, it means Food is more likely to be 100, and if Food is 150, it means Rent is more likely to be 1800. Basically, there's a shared factor that's influencing both (local cost of living) that's the actual thing you need to model.
So, a realistic modeling isn't 1200~1500 + 100~150, it's (1~1.5)*(1200 + 150) - the "cost of living" distribution applies to both factors.
I have written similar tools
- for command line, fermi: https://git.nunosempere.com/NunoSempere/fermi
- for android, a distribution calculator: https://f-droid.org/en/packages/com.nunosempere.distribution...
People might also be interested in https://www.squiggle-language.com/, which is a more complex version (or possibly <https://git.nunosempere.com/personal/squiggle.c>, which is a faster but much more verbose version in C)
Fermi in particular has the following syntax
```
5M 12M # number of people living in Chicago
beta 1 200 # fraction of people that have a piano
30 180 # minutes it takes to tune a piano, including travel time
/ 48 52 # weeks a year that piano tuners work for
/ 5 6 # days a week in which piano tuners work
/ 6 8 # hours a day in which piano tuners work
/ 60 # minutes to an hour
```
multiplication is implied as the default operation, fits are lognormal.
Here is a thread with some fun fermi estimates made with that tool: e.g., number of calories NK gets from Russia: https://x.com/NunoSempere/status/1857135650404966456
900K 1.5M # tonnes of rice per year NK gets from Russia
* 1K # kg in a tone
* 1.2K 1.4K # calories per kg of rice
/ 1.9K 2.5K # daily caloric intake
/ 25M 28M # population of NK
/ 365 # years of food this buys
/ 1% # as a percentage
Oh, this is very similar to what I have with Precel, less syntax. Thanks for sharing!
Another tool in this spirit is <https://carlo.app/>, which allows you to do this kind of calculation on google sheets.
Their pricing is absolutely out of this world though. Their BASIC plan is $2990 USD per year, the pro plan is $9990/year. https://carlo.app/pricing
They have a free tier as well, just with fewer samples, and aren't in the zero marginal cost regime
Would be a nice touch if Squiggle supported the `a~b` syntax :^)
I tried the unsure calc and the android app and they seem to produce different results?
The android app fits lognormals, and 90% rather than 95% confidence intervals. I think they are a more parsimonious distribution for doing these kinds of estimates. One hint might be that, per the central limit theorem, sums of independent variables will tend to normals, which means that products will tend to be lognormals, and for the decompositions quick estimates are most useful, multiplications are more common
Is there a way to do non-scalar multiplication? E.g if I want to say "what is the sum of three dice rolls" (ignoring the fact that that's not a normal distro) I want to do 1~6 * 3 = 1~6 + 1~6 + 1~6 = 6~15. But instead it does 1~6 * 3 = 3~18. It makes it really difficult to do something like "how long will it take to complete 1000 tasks that each take 10-100 days?"
https://www.getguesstimate.com/ is this, as a spreadsheet
This is neat! If you enjoy the write up, you might be interested in the paper “Dissolving the Fermi Paradox” which goes even more on-depth into actually multiplying the probability density functions instead of the common point estimates. It has the somewhat surprising result that we may just be alone.
https://arxiv.org/abs/1806.02404
This was quite a fun read, thanks!
a bit depressing TBH... but ~everyone on this site should read this for the methodology
I have made a similar tool but for the command line[1] with similar but slightly more ambitious motivation[2].
I really like that more people are thinking in these terms. Reasoning about sources of variation is a capability not all people are trained in or develop, but it is increasingly important.[3]
[1]: https://git.sr.ht/~kqr/precel
[2]: https://entropicthoughts.com/precel-like-excel-for-uncertain...
[3]: https://entropicthoughts.com/statistical-literacy
The ASCII art (well technically ANSI art) histogram is neat. Cool hack to get something done quickly. I'd have spent 5x the time trying various chart libraries and giving up.
On a similar note, I like the crude hand-drawn illustrations a lot. Fits the "napkin" theme.
Here [1] is a nice implementation written in Awk. A bit rough around the edges, but could be easily extended.
[1] https://github.com/stefanhengl/histogram
Would be nice to retransform the output into an interval / gaussian distribution
Drake Equation or equation multiplying probabilities can also be seen in log space, where the uncertainty is on the scale of each probability, and the final probability is the product of exponential of the log probabilities. And we wouldnt have this negative issueThe default example `100 / 4~6` gives the output `17~25`
Amazing, thank you !
It sounds like a gimmick at first, but looks surprisingly useful. I'd surely install it if it was available as an app to use alongside my usual calculator, and while I cannot quite recall a situation when I needed it, it seems very plausible that I'll start finding use cases once I have it bound to some hotkey on my keyboard.
They use dart as their primary language so it should be easy to make a flutter app from it...
> if it was available as an app
Consider https://f-droid.org/en/packages/com.nunosempere.distribution...
Feature request: allow specifying the probability distribution. E.g.: ‘~’: normal, ‘_’: uniform, etc.
I think they should be functions: G(50, 1) for a Gaussian with µ=50, σ=1; N(3) for a negative exponential with λ=3, U(0, 1) for a uniform distribution between 0 and 1, UI(1, 6) for an uniform integer distribution from 1 to 6, etc. Seems much more flexible, and easier to remember.
Not having this feature is a feature—they mention this.
Not really, or at least not permanently; uniform distribution is mentioned in a github changelog, perhaps it’s an upcoming feature:
> 0.4.0
> BRAKING: x~y (read: range from x to y) now means "flat distribution from x to y". Every value between x and y is as likely to be emitted.
> For normal distribution, you can now use x+-d, which puts the mean at x, and the 95% (2 sigma) bounds at distance d from x.
https://github.com/filiph/unsure/blob/master/CHANGELOG.md#04...
Also very very good is Guesstimate - https://www.getguesstimate.com/.
Interval/affine arithmetic are alternatives which do not make use of probabilities for this these kinds of calculations.
https://en.wikipedia.org/wiki/Interval_arithmetic
I think arbitrary distribution choice is dangerous. You're bound to end up using lots of quantities that are integers, or positive only (for example). "Confidence" will be very difficult to interpret.
Does it support constraints on solutions? E.g. A = 3~10, B = 4 - A, B > 0
On the whole it seems like a nice idea, but there's a couple of weird things, such as:
> Note: If you're curious why there is a negative number (-5) in the histogram, that's just an inevitable downside of the simplicity of the Unsure Calculator. Without further knowledge, the calculator cannot know that a negative number is impossible (in other words, you can't have -5 civilizations, for example).
The input to this was "1.5~3 x 0.9~1.0 x 0.1~0.4 x 0.1~1.0 x 0.1~1.0 x 0.1~0.2 x 304~10000" - every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
I guess this is a consequence of "I am not sure about the exact number here, but I am 95% sure it's somewhere in this range" so it's actually considering values outside of the specified range. In this case, 10% either side of all the ranges is positive except the large "304~10000".
Trying with a simpler example: "1~2 x 1~2" produces "1.3~3.4" as a result, even though "1~4" seems more intuitive. I assume this is because the confidence of 1 or 4 is now only 90% if 1~2 was at 95%, but it still feels off.
I wonder if the 95% thing actually makes sense, but I'm not especially good at stats, certainly not enough to be sure how viable this kind of calculator is with a tighter range. But just personally, I'd expect "1~2" to mean "I'm obviously not 100% sure, or else I wouldn't be using this calculator, but for this experiment assume that the range is definitely within 1~2, I just don't know where exactly".
>The input to this was "1.5~3 x 0.9~1.0 x 0.1~0.4 x 0.1~1.0 x 0.1~1.0 x 0.1~0.2 x 304~10000" - every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
Every single range here includes positive and negative numbers. To get the correct resulting distribution you have to take into account the entire input distribution. All normal distributions have a non-zero possibility to be negative.
If you want to consider only the numbers inside the range you can look at interval arithmetic, but that does not give you a resulting distribution.
The calculator in Emacs has support for what it is you request, which it calls "interval forms". Interval form arithmetic simply means executing the operations in parallel on both ends of the interval.
It also has support for "error forms" which is close to what the calculator in OP uses. That takes a little more sophistication than just performing operations on the lower and upper number in parallel. In particular, the given points don't represent actual endpoints on a distribution, but rather low and high probability events. Things more or less likely than those can happen, it's just rare.
> I'm not especially good at stats
It shows! All the things you complain about make perfect sense given a little more background knowledge.
Is it actually just doing it at both ends or something nore complex? Because for example if I did 7 - (-1~2)^2 the actual range would be 3-7 but just doing both ends of the interval would give 3-6 as the function is maximised inside the range.
Oh, maybe it's performing more complicated interval arithmetic. I had no idea. That's kind of cool!
> every single range was positive, so regardless of what this represents, it should be impossible to get a negative result.
They explain that the range you give as input is seen as only being 95% correct, so the calculator adds low-probability values outside of the ranges you specified.
I can see how that surprises you, but it's also a defensible design choice.
I perused the codebase but I'm unfamiliar with dart:
https://github.com/filiph/unsure/blob/master/lib/src/calcula...
I assume this is a montecarlo approach? (Not to start a flamewar, at least for us data scientists :) ).
Yes it is.
Can you explain how? I'm an (aspiring)
I didn't peruse the source code. I just read the linked article in its entirety and it says
> The computation is quite slow. In order to stay as flexible as possible, I'm using the Monte Carlo method. Which means the calculator is running about 250K AST-based computations for every calculation you put forth.
So therefore I conclude Monte Carlo is being used.
It's dead simple. Here is the simplified version that returns the quantiles for '100 / 2 ~ 4'.
Line 19 to 21 should be the Monte-Carlo sampling algorithm. The implementation is maybe a bit unintuitive but apparently he creates a function from the expression in the calculator, calling that function gives a random value from that function.
Smol Show HN thread a few years ago https://news.ycombinator.com/item?id=22630600
I put "1 / (-1~1)" and expected something around - to + infinty. It instead gave me -35~35.
I really don't known how good it is.
I'm guessing this is not an error. If you divide 1/normal(0,1), the full distribution would range from -inf to inf, but the 95% output doesn't have to.
I don't quite understand, probably because my math isn't good enough.
If you're treating -1~1 as a normal distribution, then it's centered on 0. If you're working out the answer using a Monte Carlo simulation, then you're going to be testing out different values from that distribution, right? And aren't you going to be more likely to test values closer to 0? So surely the most likely outputs should be far from 0, right?
When I look at the histogram it creates, it varies by run, but the most common output seems generally closest to zero (and sometimes is exactly zero). Wouldn't that mean that it's most frequently picking values closest to -1 or 1 denoninator?
If X is normal and centered around 0, then the average of 1/X does not exist (math speak for "is infinity" in this case). In these cases Monte Carlo simulations are not reliable because they give high variance estimates (math speak for "the histogram varies run by run").
The actual distribution of 1/X is fairly interesting, see https://en.m.wikipedia.org/wiki/Inverse_distribution#Recipro...
Only 1 percent of values would end up being 100+ on a uniform distribution.
For normal it is higher but maybe not much more so.
OK, but do we necessarily just care about the central 95% range of the output? This calculation has the weird property that values in the tails of the input correspond to values in the middle of the output, and vice versa. If you follow the intuition that the range you specify in the input corresponds to the values you expect to see, the corresponding outputs would really include -inf and inf.
Now I'm realizing that this doesn't actually work, and even in more typical calculations the input values that produce the central 95% of the output are not necessarily drawn from the 95% CIs of the inputs. Which is fine and makes sense, but this example makes it very obvious how arbitrary it is to just drop the lowermost and uppermost 2.5%s rather than choosing any other 95/5 partition of the probability mass.
That may be true, but if you look at the distribution it puts out for this, it definitely smells funny. It looks like a very steep normal distribution, centered at 0 (ish). Seems like it should have two peaks? But maybe those are just getting compressed into one because of resolution of buckets?
It does indeed have two peaks: https://en.m.wikipedia.org/wiki/Inverse_distribution#Recipro...
As the mean of this distribution does not exist (it's "infinite") the Monte Carlo estimates aren't reliable
https://qalculate.github.io can do this also for as long as I've used it (only a couple years to be fair). I've got it on my phone, my laptop, even my server with apt install qalc. Super convenient, supports everything from unit conversion to uncertainty tracking
The histogram is neat, I don't think qalc has that. On the other hand, it took 8 seconds to calculate the default (exceedingly trivial) example. Is that JavaScript, or is the server currently very busy?
It's all computed in the browser so yeah, it's JavaScript. Still, 8 seconds is a lot -- I was targeting sub-second computation times (which I find alright).
Yes! (5±6)*(9±12) => 45±81. Uncertainty propagation!
This is awesome. I used Causal years ago to do something similar, with perhaps slightly more complex modelling, and it was great. Unfortunately the product was targeted at high paying enterprise customers and seems to have pivoted into finance now, I've been looking for something similar ever since. This probably solves at least, err... 40~60% of my needs ;)
I actually stumbled upon this a while ago from social media and the web version has a somewhat annoying latency, so I wrote my own version in Python. It uses numpy so it's faster. https://gist.github.com/kccqzy/d3fa7cdb064e03b16acfbefb76645... Thank you filiph for this brilliant idea!
Nice! Are you using your python script often?
The reason I'm asking: unsure also has a CLI version (which is leaps and bounds faster and in some ways easier to use) but I rarely find myself using it. (Nowadays, I use https://filiph.github.io/napkin/, anyway, but it's still a web app rather than a CLI tool.)
Yes. I have Python on my phone so I just run it.
Love it! I too have been toying with reasoning about uncertainty. I took a much less creative approach though and just ran a bunch of geometric brownian motion simulations for my personal finances [0]. My approach has some similarity to yours, though much less general. It displays the (un)certainty over time (using percentile curves), which was my main interest. Also, man, the UI, presentation, explanations: you did a great job, pretty inspiring.
[0] https://dmos62.github.io/personal-financial-growth-simulator...
Very cool. This can also be used for LLM cost estimation. Basically any cost estimation I suppose. I use cloudflare workers a lot and have a few workers running for a variable amount of time. This could be useful to calculate a ball park figure of my infra cost. Thank you!
Here (https://uncertainty.nist.gov/) is another similar Monte Carlo-style calculator designed by the statisticians at NIST. It is intended for propagating uncertainties in measurements and can handle various different assumed input distributions.
I think I was looking at this and several other similar calculators when creating the linked tool. This is what I mean when I say "you'll want to use something more sophisticated".
The problem with similar tools is that of the very high barrier to entry. This is what my project was trying to address, though imperfectly (the user still needs to understand, at the very least, the concept of probability distributions).
The histogram is great, nice work;
I want to ask about adjacent projects - user interface libraries that provide input elements for providing ranges and approximate values. I'm starting my search around https://www.inkandswitch.com/ and https://malleable.systems/catalog/ but I think our collective memory has seen more examples.
Yooo I already installed this on my Home Screen and used it like 20 times, great job, it’s so simple and genius!
If I am reading this right, a range is expressed as a distance between the minimum and maximum values, and in the Monte Carlo part a number is generated from a uniform distribution within that range[1].
But if I just ask the calculator "1~2" (i.e. just a range without any operators), the histogram shows what looks like a normal distribution centered around 1.5[2].
Shouldn't the histogram be flat if the distribution is uniform?
[1] https://github.com/filiph/unsure/blob/123712482b7053974cbef9...
[2] https://filiph.github.io/unsure/#f=1~2
Under the "Limitations" section:
> Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.
Part of the confusion here is likely that the tool, as seen on the web, probably lags significantly behind the code. I've started using a related but different tool (https://filiph.github.io/napkin/).
The HN mods gave me an opportunity to resubmit the link, so I did. If I had more time, I'd have also upgraded the tool to the latest version and fix the wording. But unfortunately, I didn't find the time to do this.
Apologies for the confusion!
Cool. It would be great to extend with a confidence operator. Something like:
Without default confidence: 0~9
With confidence: 0%100~9%95
We are sure it is 0 or more and we are %95 certain it is 9 or less.
Would that work?
This reminds me of https://www.getguesstimate.com/ , a probabilistic spreadsheet.
The authors of Guesstimate are now working on https://www.squiggle-language.com/
Someone also turned it into the https://github.com/rethinkpriorities/squigglepy python library
Wow, this is fantastic! I did not know about squiggle language, and it's basically what I was trying to get to from my unsure calculator through my next project (https://filiph.github.io/napkin/). Squiggle looks and works much better.
Thanks for the link!
I was looking for this. Seen it (or a similar tool) ages ago.
Want to use it every 3 months or so to pretend that we know what we can squeeze in the roadmap for the quarter.
An alternative approach is using fuzzy-numbers. If evaluated with interval arithmetic you can do very long calculations involving uncertain numbers very fast and with strong mathematical guarantees.
It would especially outperform the Monte-Carlo approach drastically.
This assumes the inputs are uniform distributions, or perhaps normals depending on what exactly fuzzy numbers mean. M-C is not so limited.
No. It assumes the numbers aren't random at all.
Although fuzzy-number can be used to model many different kinds of uncertainties.
I'm familiar with fuzzy numbers (e.g. see my https://filiph.net/fuzzy/ toy) but I didn't know there's arithmetic with fuzzy numbers. How is it done? Do you have a link?
There is a book by Hanss on it. It focuses on the sampling approach (he calls it "transformation method") though.
If you want to do arithmetic and not a black box approach you just have to realize that you can perform them on the alpha-cuts with ordinary interval arithmetic. Then you can evaluate arbitrary expressions involving fuzzy numbers, keeping the strengths and weaknesses of interval arithmetic.
The sampling based approach is very similar to Monte-Carlo, but you sample at certain well defined points.
This is really useful, but is this correct?
persons = 10~15 // → 10~15
budget = persons * 1~2 // → 12~27
Should it not say 10-30?
If they are truly independent of each other some of the uncertainty cancels out. 10 people and a budget of $1/person are both unlikely events, and two unlikely events occurring independently of each other is even more unlikely. And because the calculator is not about the full range of possible values but about the values in the 95% confidence interval this leads to the outer edges of the range now falling outside the 95% confidence interval
Interesting. I like the notation and the histogram that comes out with the output. I also like the practical examples you gave (e.g. the application of the calculator to business and marketing cases). I will try it out with simple estimates in my marketing campaigns.
Cool! Some random requests to consider: Could the range x~y be uniform instead of 2 std dev normal (95.4%ile)? Sometimes the range of quantities is known. 95%ile is probably fine as a default though. Also, could a symbolic JS package be used instead of Monte-Carlo? This would improve speed and precision, especially for many variables (high dimensions). Could the result be shown in a line plot instead of ASCII bar chart?
”Without further knowledge, the calculator cannot know that a negative number is impossible (in other words, you can't have -5 civilizations, for example).”
Not true. If there are no negative terms, the equation cannot have negative values.
The calculator cannot know whether there are no negative terms. For example, if people's net worth is distributed 0.2–400, there's likely a significant chunk of people who are, on the whole, in debt. These will be represented as a negative term, even though their distribution was characterised by positive numbers.
The range notation indicates 95% confidence intervals, not the minima and maxima. If the lower bounds are close enough to zero (and the interval is large enough), then there may some residual probability mass associated with negative values of the variable.
Chalk also supports uncertainty : https://chachatelier.fr/chalk/chalk-features.php (combined with arbitrary long numbers and interval arithmetic)
This reminded me of this submission a few days ago: Napkin Math Tool[1].
[1]: https://news.ycombinator.com/item?id=43389455
So is it like plugging in a normal distribution into some arithmetic?
Consider maybe 1 + 1 ~ +-2 like Q factor, if you know what I mean.
That would help to filter out more probabilistic noise in using it to help reason with.
No. It is sampling the resulting distribution with Monte-Carlo.
I made one that's much faster because it instead modifies the normal distribution instead of sending thousands of samples: https://gistpreview.github.io/?757869a716cfa1560d6ea0286ee1b...
This is more limited. I just tested and for one example, exponentiation seems not to be supported.
Really cool! On iOS there's a noticeable delay when clicking the buttons and clicking the backspace button quickly zooms the page so it's very hard to use. Would love it in mobile friendly form!
is this the same as error propagation? I used to do a lot of that during my physics degree
It doesn't propagate uncertainty through the computation, but rather treats the expression as a single random variable.
So is it 250k calculations for every approximation window ? So i guess it will only be able to calculate upto 3-4 approximations comfortably ?
Any reason why we kept it 250k and now a lower number like 10k
I love this! As a tool for helping folks with a good base in arithmetic develop statistical intuition, I can't think offhand of what I've seen that's better.
> The UI is ugly, to say the least.
I actually quite like it. Really clean, easy to see all the important elements. Lovely clear legible monospace serif font.
This is super cool.
It seems to break for ranges including 0 though
100 / -1~1 = -3550~3500
I think the most correct answer here is -inf~inf
I'd argue this is WAI.
It's hard for me to imagine _dividing_ by -1~1 in a real-world scenario, but let's say we divide by 0~10, which also includes zero. For example, we are dividing the income between 0 to 10 shareholders (still forced, but ok).
Clearly, it's possible to have a division by zero here, so "0 sharehodlers would each get infinity". And in fact, if you try to compute 500 / 0, or even 500~1000 / 0, it will correctly show infinity.
But if you divide by a range that merely _includes_ zero, I don't think it should give you infinity. Ask yourself this: does 95% of results of 500 / 0~10 become infinity?
similar to guesstimate, which does the same but for spreadsheets: https://www.getguesstimate.com/
See also Guesstimate https://getguesstimate.com. Strengths include treating label and data as a unit, a space for examining the reasoning for a result, and the ability to replace an estimated distribution with sample data => you can build a model and then refine it over time. I'm amazed Excel and Google Sheets still haven't incorporated these things, years later.
Thank you, I would have mentioned this myself, but forgot the name of it.
I think the SWI Prolog clpBNR package is the most complete interval arithmetic system. It also supports arbitrary constraints.
https://github.com/ridgeworks/clpBNR
This is terrific and it’s tempting to turn into a little python package. +1 for notation to say it’s ~20,2 to mean 18~22
i like it and i skimmed the post but i don't understand why the default example 100 / 4~6 has a median of 20? there is no way of knowing why the range is between 4 and 6
The chance of 4~6 being less than 5 is 50%, the chance of it being greater is also 50%. The median of 100/4~6 has to be 100/5.
>there is no way of knowing why the range is between 4 and 6
??? There is. It is the ~ symbol.
mmh ok thanks, i guess i need extra maths training ;)
i didn't mean knowing _that_ the range is between 4 and 6 but _why_, i thought the weighing would be explained by the reasoning, like: "we divide a €100 bill between possibly 4, rather 5 and most probably not 6 persons"
how do you mean?
Great implementation. I would love to see this syntax added to spreadsheet software. Far less complicated than current functions.
cool! are all ranges considered poisson distributions?
No:
> Range is always a normal distribution, with the lower number being two standard deviations below the mean, and the upper number two standard deviations above. Nothing fancier is possible, in terms of input probability distributions.
love it! gonna use this instead of calculating my own extremes now
brilliant work, polished ui. although sometimes give wrong ranges for equations like 100/1~(200~2000)
Can you elaborate? What is the answer you’re getting and what answer would you expect?
How do you process this equation ? 100 divided by something from one to ...?
> 100 / 4~6
Means "100 divided by some number between 4 and 6"
Yes, but this is not what op has. Their formula is 100 / 1~(20~200), with a double tilde
"...some number with a 95% probability of falling between 4.0 and 6.0 inclusive," I believe.
I love it! Now I need it in every calculator
There's an amazing scene in "This is Spinal Tap" where Nigel Tufnel had been brainstorming a scene where Stonehenge would be lowered from above onto the stage during their performance, and he does some back of the envelope calculations which he gives to the set designer. Unfortunately, he mixes the symbol for feet with the symbol for inches. Leading to the following:
https://www.youtube.com/watch?v=Pyh1Va_mYWI
awesome