The sisters “paradox” – counter-intuitive probability

(blog.engora.com)

70 points | by Vermin2000 13 hours ago ago

108 comments

There is no sisters paradox. The trick is how the question is weirdly framed and has to be interpreted. What people think about when they hear the question would effectively lead to a probability of 0.5: if you see a family in the street with a girl and know they have two kids, the probability of the other kid being a girl is indeed 0.5.

The trick of the so-called "paradox" is turning the question into the Monty Hall but with an ambitious enough formulation that you might be confused it’s not.

[-]

6gvONxR4sf7o 6 hours ago

The way to see this is bayes rule. p(answer | data) = p(data | answer) * p(answer) / (sum_{all possible answers'} p(data | answer') * p(answer')). So for this question, that's expands to:

    p(both are girls | you're told at least one is a girl)
     = p(you're told at least one is a girl | both are girls) * p(both are girls) / (
            p(you're told at least one is a girl | both are girls) * p(both are girls)
            +
            p(you're told at least one is a girl | they aren't both girls) * p(they aren't both girls)
        )

The problem is that we don't know p(you're told at least one is a girl | they aren't both girls). Clearly if both are boys, then you won't be told at least one is a girl (or at least it's implied that you're told the truth). But that still leaves us p(you're told at least one is a girl | one boy and one girl).

This is the crux of the thing. Different readings of the setup imply different answers to p(what you're told | the unknowns).

It's also a great case of where bayes rule shorthands can be slippery. You'll usually abbreviate it out (hell, it was tedious to write this way even with copy-paste). But if you abbreviate "you're told there's at least one girl" to "there's at least one girl", then you've stopped modeling a crucial part of the setup. p(there's at least one girl | they aren't both girls) has an unambiguous answer.

lqet 4 hours ago

This. A less confusing way to ask the question with the 1/3 answer would be:

  What is the probability that a family with 2 children has exactly 2 daughters *if you know that the family does not have 2 boys*?

The reasons why the original problem is so confusing is the same reason why the Monty Hall is so confusing: people have different understandings of the question, and don't realize it in discussions. As I have written a few years ago [0]:

Because most people don't talk formal probabilities, your explanations will be so vague that the other person will not realize your different understanding. You will discuss forever, you will both be right, and you will part ways with the strange feeling that maybe the other person was right, when all along you were talking about different problems. This is why this problem is so notorious.

[0] https://news.ycombinator.com/item?id=24707305

js2 6 hours ago

So it's "what is the probability both are girls?" vs. "what is the probability the other is a girl?" and most people will hear the latter and answer 1/2 whereas the question is the former and its answer is 1/3. Do I have that right?

[-]

AnotherGoodName 5 hours ago

"The question writer took all sets of two child families and ruled out the bb case. Then they asked the exact question above" This is 1/3 chance - select gg from [gg,bg,gb]'

"The question writer came across a girl from a two child family, then they asked the exact question above". This is 1/2 chance - select gg from [gg, gg, bg, gb] with gg listed twice since there's two ways to select a girl from that set; ie. coming across a girl is twice as likely to occur from the gg case than it is either gb or bg.

I think that's the clearest wording to get the message across. Either way it's the exact same question but it reasonably has a completely different answer. There's no way to resolve this ambiguity with the question as written.

[-]

LegionMammal978 3 hours ago

That's a good framing. It's similar to the fact that the chance of a given star being in a multiple system (~47% in our vicinity [0]) is significantly higher than the chance of a given system having multiple stars (~30%), because counting by individual stars gives more weight to the multiple systems.

[0] https://astronomy.stackexchange.com/a/55505

LudwigNagasena 5 hours ago

Those questions are equivalent. What is important is the conditional “… given that I looked at a random child and it was a girl” / “… given that I looked at both children and at least one of them was a girl”.

taeric 5 hours ago

I'd hazard that people also typically hear "what are the odds of this from the outset?" Effectively, "you flip two quarters, and see one land heads; what were your odds to flip two tails?"

zahlman 6 hours ago

>ambitious

ambiguous?

tocs3 13 hours ago

"This might seem abstract, but I've seen variations of this problem pop up in business and I've had difficult conversations with non-technical people as a result."

Does anyone have some real life examples? i cannot think of any off hand but would like to be able to cite a couple if someone says "So, what is this good for?".

[-]

JHonaker 9 hours ago

Any time you start conditioning on something, i.e. selecting subsets of data to analyze. You can fool yourself quite often if you do something seemingly innocuous like select "everyone with at least one X" and compare expectations to what's true unconditionally (meaning not conditioning on anything, not "in all cases") with conditional computations.

jmount 12 hours ago

Peter Winkler shares some great variations of this: "Boy Born on Tuesday" (p. xix) and "Men with Sisters" (p. xxii) in "Mathematical Puzzles".

"Mrs. Chance has two children of different ages. At least one of them is a boy born on Tuesday. What is the probability that both of them are boys?"

(note: it is a puzzle, not a biology or data demography problem. so there are 50/50 independence assumptions on gender and uniform day of week assumptions prior to adding the conditioning.)

[-]

layer8 12 hours ago

Here “on Tuesday” is ambiguous, in my opinion. I first thought it meant “on a Tuesday” and that it was just a diversion. But it is likely intended to mean “last Tuesday” or “this Tuesday” (which excludes the boy-then-girl case). Wording it more clearly would likely reduce the ratio of wrong answers.

Furthermore, “of different ages” is likely intended to exclude the case of twins. However, even with twins, one is generally nominally older than the other. (Not to mention that it’s possible for two non-twin siblings to be the same age in years, at certain points in time.) Why not just say “that aren’t twins”?

I loathe when logic puzzles are obscured by ambiguous language, turning them more into “gotcha” text interpretation riddles than logic puzzles.

[-]

jmount 11 hours ago

Puzzles are definitely odd birds. I myself have gotten into a literal screaming match try to push my belief that they never should be used in interviews. The bulk of that was an interviewer said the interviewee was "clearly confused when they were asked a puzzle" yet refused to agree that may evidence the presentation of the puzzle may in fact be confusing (and not measuring anything).

I can't speak for Winkler, but both he and Jaynes implicitly separate the reading of the puzzle from the work. Winkler start his book with a few awful "reading trick ones", but in the explanations gives a few reading directions to try and avoid that going forward. I happen to know he meant "on a Tuesday." But a correct solution to a different read would be a correct solution even if it doesn't match the book text. I don't think he was trying to set a text trap, it is just hard to be clear, concise, and unambiguous at the same time. (Even "on a Tuesday" isn't completely clear if it means "all I am telling you was the day of week was Tuesday" versus "it was a very specific Tuesday, that I am not telling.")

[-]

exmadscientist 9 hours ago

The value of puzzles in interviewing is never about reaching the solution. It is about seeing how candidates deal with tricky situations that stretch them a bit, because that happens all the time on the job. It should almost always be done interactively, so you can see what clarifications and extra information they ask for, when and how they give up, and if they're dumb enough to say HR violations out loud (it does happen).

This does require a rather skilled interviewer, so the benefits may well not be worth it. But it can be very interesting information to have.

[-]

teekert 5 hours ago

Or whether you watch Veritasium and just know you can jump out of that blender because weight scales with the cube of height and muscle power scales quadratically.

zeroonetwothree 4 hours ago

I think it clearly means “on a Tuesday”, anything else wouldn’t make sense as a puzzle. We are meant to assume each day of the week is equally likely.

Excluding twins is so that we can assume the probability of each day of the week is independent.

stronglikedan 11 hours ago

I agree with the first one, but age is measured in days at a minimum, so twins are always the same age. (I'm sure there are cases where they are born farther apart due to some issue with the pregnancy, but that is statistically insignificant here.)

[-]

layer8 11 hours ago

Even if you take days, one can be born at 23:58 and the other at 00:03 the next day. (And it could be New Year’s day — in some cultures that would even imply different ages in years.) Regardless of days, it’s not uncommon to talk about who is the older twin.

Of course colloquially twins are the same age, but we are talking about a mathematical puzzle about probabilities here, where precision is paramount.

zeroonetwothree 4 hours ago

So we would write down the possibilities as (B-Tue, B-Tue), (B-Tue, G-Any), (B-Tue, B-NotTue) and the inverse for the latter two. This results in 27 cases. Of those 13 have two boys so the answer is 13/27.

[-]

zeroonetwothree 4 hours ago

Similarly if you had the knowledge “at least one is a boy born on May 11” then it would be very close to but slightly less than 50%.

So we can see in the limit as the information becomes more and more specific it turns into the unconditional probability. That is, the case of “the first is a boy, what is the probability both are boys” (50%).

I think this clarifies the situation in the OP pretty well.

jimmaswell 12 hours ago

Why can't you just disregard the existing boy and reframe the question as the probability that the other child is a boy, and the space of all possible answers is BG and BB, equally probable (1/2)? Not really following explanations I find online.

[-]

joshuaissac 11 hours ago

Because GB is also a possibility. You are not told that the existing boy is the elder child.

[-]

jimmaswell 10 hours ago

https://www.online-python.com/Q5leTWuvb6

https://www.online-python.com/RueVd2514m

No matter how I frame or interpret this question, the birthdays and birth order appear completely irrelevant - the results are still ~0.50 as expected. Whatever the author was trying to say, they didn't communicate it well. I'm really curious exactly what word or phrase the author thought I was supposed to take to mean something else. If someone could edit one of these simulations to show what the author intended then that would probably be the clearest way.

[-]

LudwigNagasena 3 hours ago

https://www.online-python.com/6LDtZz7lMh

* Take all families with two children

* Take a subset where at least one child is a boy born on Tuesday

* Take a subset of the previous subset where all children are boys

* The share of the 2nd subset relative to the 1st subset is around 48%

Rickasaurus 12 hours ago

Great book, I highly recommend it too.

JeffJor 7 hours ago

Q1: "A family has two children. You're told that at least one of them is a girl. What's the probability both are girls?"

Q2: "A family has two children. You're told that at least one of them is a boy. What's the probability both are boys?"

Note that these are symmetric problems, and must have the same answer.

Q3: "A family has two children. You're told that a gender, that applies to at least one, is written inside a sealed envelope. What's the probability both have that gender?"

In Q3, we have no information. So the answer is the proportion of two-child families that are single gendered. That is, 1/2.

But if we open the envelope, and read what is written inside, the problem becomes either Q1 or Q2. Which have the same answer. So we don't have to open it; whatever the answer to Q1 and Q2 is, opening the envelope in Q3 make its answer the same. If that answer is 1/3, we have a paradox. The answer has to be 1/2 of we don't look.

This is what is known as "Bertrand's Box Paradox." Well, if we add a fourth box to his problem, with one gold and one silver coin. I realize that in modern times the problem itself is called the paradox, but what Bertrand actually wrote (edited to this problem) was "How can it be that opening the envelope suffices to change the probability from 1/2 to 1/3?"

The resolution is that probability must be based on the full set of possibilities, not the possibilities that _could_ result from the full set of _states._ These are the possibilities for this problem:

1) BB and you are told that there is at least one boy. 2A) BG and you are told that there is at least one boy. 2B) BG and you are told that there is at least one girl. 3A) GB and you are told that there is at least one boy. 3B) GB and you are told that there is at least one girl. 4) GG and you are told that there is at least one girl.

Each numbered case has a prior probability of 1/4. Let's say the "A" subcases have a probability of Q/4, so the "B" subcases have a probability of (1-Q)/4.

The answer to the first problem is the probability of case 1, which is 1/4, divided by the total probability of cases 1, 2A, which is (1+2Q)/4. That's 1/(1+2Q).

The answer to the second problem is the probability of case 4, divided by the total probability of cases 4, 2B, and 3B. Which is (3-2Q)/4.

Bertrand's paradox, stated another way, is that these must be equal, but can only be equal if Q=1/2 and both answers are 1/2.

[-]

Majromax 5 hours ago

In all of these questions, you're making an assumption about the data-generating process. In Q1 and Q2, you're assuming that you had a 0% chance (a priori) of hearing that 'neither is a (girl/boy)', and in Q3 you're assuming that there's a 0% chance of hearing that the envelope doesn't match the family.

Take a look at this problem beginning with no assumptions. We have two kids, and an envelope that contains 'B' or 'G'. Our probability space is (B,G)^3, with each having probability of 1/8.

Now, we add information about the match as conditioning. Conditional on being told that the envelope matches the family, we can exclude the BBG and GGB cases. That brings us down to 6, of which we have BBB, GGG, and (BG,GB)(B,G). With this additional information, the probability of matching genders becomes 1/3. This probability is still 1/3 if we open the envelope to find B or G, since we exclude all three cases where the envelope doesn't match our observation of it.

In my view, this is related to the Monty Hall problem; we have to realize that we're given additional information with the statement/envelope.

meatmanek 5 hours ago

In Q3, you've got 8 possibilities, expressed as (gender of 1st child, gender of 2nd child, which child's gender is written inside the sealed envelope?), each with presumably equal probability:

   1. B B 1
   2. B B 2
   3. B G 1
   4. B G 2
   5. G B 1
   6. G B 2
   7. G G 1
   8. G G 2

in which case 4 of 8 possibilities satisfy the condition (the first two and the last two).

Once you open the sealed envelope and it says "girl", it does not become Q1, it becomes a different question:

Q4: "A family has two children. I randomly sampled one of the children and it was a girl. What's the probability both are girls?"

In which case, we're looking at possibilities 4, 5, 7, and 8, and in only 2 of those 4 possibilities are both children girls.

In Q1, you're actually told "A family has two children. I looked at both children and can tell you that at least one of them is a girl. What's the probability that both are girls?". In which case, possibilities 3, 4, 5, 6, 7, 8 are all valid. Only in 2 of those 6 possibilities are both children girls.

So as in_cahoots said in https://news.ycombinator.com/item?id=45053187, it matters whether the person asking looked at both children or just a single one.

D13Fd 11 hours ago

The "paradox" problem is in the setup. It's easy to mistake it as "a couple has one girl, what is the probability that their next child will be a girl," in which case the answer is 50%.

[-]

tantalor 11 hours ago

If you allow misunderstanding the question, then any answer is allowed.

[-]

in_cahoots 11 hours ago

But it's a valid point, the question is not well-posed. If you said, "I looked at both children and saw that at least one was a girl" more people would get the right answer. Many people will assume that the author looked at only one child, not both. And there's nothing in the wording to indicate either way.

As others are pointing out, this is just the Monty Hall problem. But the way the question is posed there is much clearer.

[-]

tantalor 11 hours ago

I don't know how this could be made more clear:

"You're told that at least one of them is a girl"

> Many people will assume that the author looked at only one child

There is no mentioning of "looking"

[-]

in_cahoots 11 hours ago

How did you determine at least one is a girl? Presumably you looked in some way. But did you look at one child or both? That's the crux of the ambiguity.

[-]

tantalor 10 hours ago

I think you are asking "how did the person who told you there is at least one girl learn that".

The answer is: it doesn't matter how because that is an unambiguous statement.

It means "you can assume the family does not have two boys".

I think people are actually getting hung up on "you are told" as if that could be a lie, or some kind of trick, when it is really just supposed to mean "here is some more information that you can rely on".

[-]

kgwgk 5 hours ago

> It means "you can assume the family does not have two boys".

But it does not mean that you can assume that p(you're told at least one is a girl | both are girls) = p(you're told at least one is a girl | they aren't both girls) as explained by 6gvONxR4sf7o.

If you allow assuming whatever you want, then many answer are allowed!

That’s what it means that the problem is not “well-posed” as mentioned by in_cahoots. You need additional assumptions to get a definite answer - and the answer will depend on the assumptions.

As JeffJor noted it seems much more natural to have assumptions that keep the symmetry of the problem (because why not?) and the answer 1/2 is not just possible but arguably “better”.

simonh 4 hours ago

These two questions are not equivalent.

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?

[-]

zeroonetwothree 4 hours ago

There is no “the other” in the original question. Introducing that completely changes the meaning.

[-]

in_cahoots 22 minutes ago

The other part doesn't change anything at all. Here you go:

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability there are two girls?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability there are two girls?

kbelder 4 hours ago

I agree, it's perfectly clear. In my humble opinion, people are bringing their incorrect assumptions to the question, and because they're wrong, are trying to blame the framing of the question. That happens a lot with the Monty Haul paradox, as well.

And, of course, neither are paradoxes. They're just math that can seem paradoxical if you don't look closely at it.

[-]

kgwgk 4 hours ago

The "it's perfectly clear" crowd are also bringing their own assumptions into the answer.

"Different readings of the setup imply different answers to p(what you're told | the unknowns)." See https://news.ycombinator.com/item?id=45056790

Do you think that it's perfectly clear that the answer to all the questions here is 2/3? https://news.ycombinator.com/item?id=45057514

sixo 12 hours ago

It's not even worth mentioning this problem unless you talk about how the result depends on the data generating process. If you take it to be something like "you randomly sample from families with two children, discarding any without at least one girl", you get the 1/3 result, but there are various other ways to read a sampling process from the problem statement which lead to other results.

[-]

pontus 12 hours ago

Just to pile on here, there's also ambiguity around how the observed girl is selected. Consider the following framing:

I go to a random house on a random street and knock on the door. A young girl opens the door. I ask how many siblings they have and they say one. What's the probability that they have a sister?

Now it's 50% even though cosmetically it seems like it'd be fair to say that the family has at least one daughter. The reason is that once I see a girl at the door, I'm slightly more confident in that it's a GG household since a GB or BG household would sometimes show a boy opening the door (assuming the two kids are equally likely to open the door).

P(GG | G at door) = P(G at door | GG) P(GG) / P(G at door)

P(G at door) = 1/2 (by symmetry)

So, P(GG | G at door) = 1 * 1/4 * 2 = 1/2

[-]

MontyCarloHall 12 hours ago

This is the crux of the "paradox," which is really just an interpretation problem. Most people assume that the question asks exactly your scenario, i.e. if a specific child is selected and it's a girl, what's the probability that the sibling is also a girl? In that case, the event space is just GB or GG, and p(GG)/(p(GB) + p(GG)) = 0.5. (BG is not in the event space because we are conditioning on a specific child being a girl.)

However, if the question is interpreted as "what's the probability of having two girls if we know there aren't two boys," then the event space is GB, BG, GG, and p(GG)/(p(GB) + p(BG) + p(GG)) = 1/3. Both GB and BG are in the event space because we are not conditioning on the sex of one specific child.

the_gipsy 12 hours ago

Why can you not frame it as: "a random family has been sampled, the sample family has two childs, one of them is a girl"?

I.e. without "discarding", just giving some additional, but not complete, information on the random sample. Is adding information about the picked sample the same as discarding all contrarian samples? Why is this relevant?

[-]

AnotherGoodName 12 hours ago

If there were two possible statements they asked

"a random family has been sampled, the sample family has two childs, one of them is a girl"?

and

"a random family has been sampled, the sample family has two childs, one of them is a boy"?

and they selected each statement based on randomly picking a child from a random family then the probability actually becomes 50% boy/girl for the next child since the boy/boy or girl/girl has twice the chance of generating the above statement for the respective gender compared to the mixed gender children family.

Ie. if they say one is a girl that statement had a 50% chance of being generated by a girl/girl family (since we pick the statement based on a random selection of one of the two childrens gender and there's 2 girls, doubling the chance of a statement that one's a girl coming from a girl/girl family), there's 25% chance the statement was generated from a girl/boy family and a 25% chance the statement was generated from a boy/girl family.

If you take 50% chance girl/girl, 25% chance boy/girl and 25% girl/boy you'll see there's a 50/50 chance of the next child being either gender.

All this due to changing how we sampled.

florbnit 9 hours ago

> a random family has been sampled, the sample family has two childs, one of them is a girl"

It’s not a random family if it must have at least one girl. If you want to talk about a random family you can only make statements of the kind “one of the children is <gender>” where the gender depend on the specific family or “the family has between 0 and 2 girls”

ndr 12 hours ago

I took this to mean exactly that:

> Assume the family is selected at random because they have at least one girl.

And then again, if they sampled all families with 2 children the posterior would not change, would it?

Still assuming boy vs girls are completely iid and equally probable

two_handfuls 12 hours ago

That's how I read it. What other ways were you thinking about?

[-]

bloak 12 hours ago

Well, one way of getting families with two children, at least one of which is a girl, would be to go to a girls' school and ask the children to raise their hand if they have exactly one sibling.

[-]

aidenn0 12 hours ago

I would expect that would yield a 50% chance of the other being a girl, right?

renewiltord 5 hours ago

Indeed. One thing they haven't mentioned is that the mother wasn't Zharata The Man Hater, who would kill any boy child. Therefore, in the Zharata case the answer is 1, and we're missing the probability of Zharata's family being considered, which could be one of pure certainty since she always puts her family forward for any puzzle question - killing any philosopher who would pose one not relating to her own family.

taeric 4 hours ago

So, my problem with how this is modeled is it assumes order doesn't matter in one aspect, but that it does in another.

Simply stated, if you allow the possibility space of "boy-girl" and "girl-boy", you have to also have two "girl-girl" states. Since you don't know which of the kids is known. Why is that not correct?

State it with coins, if I know that you flipped a quarter and a dime and one turned up heads, what are the odds that both are heads?

[-]

sokoloff 2 hours ago

There aren't two "girl-girl" states, because of the stated assumption in the problem:

> Assume the family is selected at random because they have at least one girl.

Given that plus "a family has two children" and "Assume that the probability of having a girl or boy is 50%"

That means you're starting from the set of all two child families: BB, BG, GB, and GG, being told that you do not have the BB case, leaving 3 ways in which the family could be composed and being asked about "the one which is not a G".

That's different from the dime and quarter case, and would also be different if you were told "the oldest child is a girl", because being told "the oldest child is a girl" eliminates both BB and BG.

Being told "[at least] one of the coins is heads" or "[at least] one of the kids is a girl" only eliminates one of the four cases, while being told "the quarter is heads"/"the oldest is a girl" eliminates two cases.

zeroonetwothree 4 hours ago

The answer is the same in your coin version. There are four possible outcomes: (Q, D) = HH, HT, TH, TT. Given that one turned up heads that eliminates TT so we see that HH has 1/3 probability.

As you can see there aren’t two HH states just as there aren’t two GG states in the original question.

[-]

taeric 4 hours ago

Well, sorta? You were either in the world where dime was heads, in which case the space has TH, HH, or the Quarter was Heads, in which case you have HT, or again HH. Both with 50% probability, no? You were never in a scenario where HT and TH have the same probability as each other. (Well, again, sorta. Point being that you know one of them is possible.)

Edit: I want to be clear that my initial thinking was exactly what you said. I was trying to "steelman" the bad intuition and I think I've trapped myself. :D

jihadjihad 12 hours ago

From the Wikipedia article linked in TFA:

> Following classical probability arguments, we consider a large urn containing two children.

I like how they modified a classic from probability texts, drawing items from an urn, and made sure it would be big enough in this example to accommodate two kids.

[-]

bitwize 11 hours ago

This is better than an "assume a spherical cow" joke in the wild.

tromp 13 hours ago

Perhaps people would be more likely to give the correct answer if "at least one of them is a girl" is rephrased as the equivalent "the youngest is a girl or the oldest is a girl".

[-]

justonceokay 12 hours ago

Well constructing a different question to remove the “trick” of the problem is one approach. Kind of the “no child left behind” approach to riddles.

kgwgk 4 hours ago

Maybe people would be more likely to give the correct answer if the problem had _one_ correct answer. :-)

renewiltord 5 hours ago

Another way is to say: the probability is 1/3, what is the probability? In this way, more people can answer it correctly, though perhaps not all.

Ekaros 10 hours ago

This reminds me of the Monty Hall. Which doesn't explicitly tell that host never reveals the car. Which for lot of people would be sensible option in game show. Just screw them up instantly. Well at least the winner gets some mutton.

rml 11 hours ago

For something this small we can enumerate all the cases (this is a Scheme version of the Mathematica `Tuples` function)

    > (list-tuples '(B G) 2)
    ((B B) (B G) (G B) (G G))

3 cases have at least one girl

of those 3 cases, 1/3 are both girls

[-]

florbnit 9 hours ago

A coworker approached you and goes “hi, I have two children and one is a boy…” and is promptly vaporized because he doesn’t fit the selection criteria of the problem statement, another approaches and goes “err hi, I have two children and one is a girl…” looks nervously at the vaporizer but is left standing. What is the chance their other child is a girl, who has not been vaporized?

If you phrase the question as “someone with two children tell you the gender of a random one, what is the chance the other is the same gender?” Chance is 50/50 because 50% will have BB or GG and the vaporizer isn’t active.

hdgvhicv 12 hours ago

> Select only the families that have at least one girl.

That’s not what the first question said. The first question was select a family (Bb,Bb,gb,gg)

Then that they happen to have a girl.

[-]

tantalor 11 hours ago

I fail to see the distinction.

[-]

hnbad 4 hours ago

It's explained in the Wikipedia article linked in TFA.

If the family was picked at random, a GG family is twice as likely to have resulted in us finding out one of the children is a girl as a BG or GB family (and a BB family would be ruled out by observation) so you end up with the intuitive but in this case also entirely correct 1/2 chance.

If the family was picked from only those that have at least one girl, a GG family is equally likely as a BG or GB family (and a BB family would be ruled out again but in this case by definition) you end up with the very unintuitive 1/3 chance the article describes.

So the initial filtering is necessary to create this "paradox" (it's actually not a "paradox" but a "problem", as others have mentioned). Without it, the intuitive answer is actually the correct answer.

Vermin2000 13 hours ago

The sisters paradox is madenning example of counter-intuitive probability. The resolution is straightforward, but it's really easy to get tied up in knots.

[-]

EMM_386 11 hours ago

I don't understand at all how this is maddening or counter-intuitive.

When you have a child, the odds are ~50% ... so the chance the next child is a boy or girl is almost equal. Is it because of the way it's framed that makes people think harder than they need to be?

This is like when I (very rarely) play something like "pick six".

I play 1, 2, 3, 4, 5, and 6. People think I'm crazy. They don't realize I have the same odds as any ticket they purchase.

[-]

n4r9 5 hours ago

If you have to share the prize money with other people that guessed the same numbers, then it seems sensible to avoid obvious patterns.

bell-cot 12 hours ago

> maddening example of counter-intuitive probability.

Not how I'd describe it. The setup is mundane enough for people to just assume that their intuition will work fine. The difference between the naive and correct answers is too small to spot in a small-n dataset. And ~0% of the population is actually familiar with analyzing such situations, for their "intuition" to be applicable.

It's a bit like Gell-Mann amnesia - people are too quick to apply an easy cognitive strategy, when (in theory) they know enough to rule that strategy out.

[-]

spadros 11 hours ago

Yes, I found this one easy. Was surprised my data management intuition came back after all these years since school. There’s really only three options:

- boy - boy

- boy - girl

- girl - girl

So it must be 1/3 chance. If you’re looking at permutations in order, that’s a different question.

[-]

AIPedant 10 hours ago

This intuition is wrong even if turned out to get the right answer. The three unordered options do not have equal probabilities, boy+girl is twice as likely to occur as boy+boy and girl+girl.

To get the right answer you must be careful about conditional probabilities (or draw out the sample space explicitly). The crux of the issue is that you are told extra information, which changes your estimate of the probability.

(This question as written is very easy to misinterpret. The Monty Hall problem, which illustrates the same thing, is better since the sample selection is much more carefully explained.)

[-]

taeric 4 hours ago

Oddly, this is a part I'm sticking with on this problem.

Specifically, if you know that one is a girl, then the unordered options seem like they are back on equal footing? That is, it isn't twice as likely if you know that one ordering can't happen? (Or, stated differently, you don't know which version of two girls you are looking at.)

So, for this one, you know that either the youngest is a girl (so, girl-boy is not possible) or that the oldest is a girl (so boy-girl is out). That puts you back to the rest of the possibilities. Boy-boy is out, sure, as you have a girl. But every other path remains? So, you have one of (boy-girl(known), girl-girl(known), girl(known)-boy, girl(known)-girl). Which drops you back to 50/50?

[-]

AIPedant 4 hours ago

Like others in the thread have said, the question could have been phrased more precisely. Technically you are misreading it but in an annoying and trivial way.

What the problem is really saying is this:

1) You have a large collection of families with two kids of varying genders.

2) You draw one of them at random. At this point, your only estimate of P(2 girls) is 0.25.

3) Someone tells you that the family you drew has at least one girl.

4) This extra information changes your probability estimate because the possibility of two boys has been ruled out; the naive 1/4 estimate is refined to 1/3.

The way you are interpreting it is this:

1) You have a large collection of families with two kids, at least one of whom is a girl.

2) Then the probability that the other child is a girl is clearly 50%.

As a reminder this is how the original post phrased the question:

  Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?

This is just too vague and admits both interpretations, they needed to be more specific about where the family "came from." That's why Monty Hall is a better illustration: it starts with you explicitly choosing a door at random. Here the family has been chosen at random from the pool of families with two children, but that's totally unclear.

[-]

taeric 3 hours ago

The annoying thing is this sits with my teaching fine, it is more my intuition that is failing to withstand trying to break it. :(

So, in the original: "a family has two children. You're told at least one of them is a girl." What are the possible states? Well, assume first born is the girl, then you have 50% that the next is a girl. Then, assume that the first born was a boy, then there is no chance and the second born is the girl that you know of. So, at 50/50 on those chances, you have 50% chance of having a 50% chance, or a 50% chance of it being 0. I can't see how to combine those to get 1/3. :(

And the Monty Hall explicitly covers the case that a decision is made on which door is shown to you. I don't see any similar framing to this problem. Yes, the total states are GB, BG, GG, but only if you treat GG in such a way that either BG or GB was not a possible state. (That is, using G for girl that you know of, and g for unknown, then possible states are GB, Gg, gG, BG. There is no version of Bg or gB that is possible, so to treat those as equal strikes me as problematic.)

kgwgk 3 hours ago

> 4) This extra information changes your probability estimate because the possibility of two boys has been ruled out; the naive 1/4 estimate is refined to 1/3.

That’s not correct in general.

It’s only correct if you assume that “3) Someone tells you that the family you drew has at least one girl.” was equally likely to happen whether or not there were two girls.

That’s a quite strong assumption.

One can make different assumptions and get answers different from 1/3. For example, 1/2.

a3w 5 hours ago

Becomes intuitive if you ask "nth one is a girl. What is nth+1 one likely to be?", for any arbitrary number

puppycodes 8 hours ago

And then theres that intersex kid that makes all the numbers wrong

AdrianB1 4 hours ago

I think the explanation is wrong. It is based on the probability of having combinations of boys and girls and then counting the combinations that have at least one girl, but this is not the situation: there is no probability in question for one of the kids, it is a confirmed, past event and the only other probability is the sex of the other kid.

Otherwise you can derive any probability as a branch of a probability tree that contains it and calculate the probabilities of the tree and then the one of the branch. This makes no sense.

For example, a family has a kid and the kid is a girl. The family wants another kid; what is the probability to be a girl? Is it 1/4 because having 2 girls is 1/4? No, it is 1/2 as it is for any new kid.

[-]

simonh 4 hours ago

Your example is different in a critical way. These two questions are not equivalent.

Q1: I looked at only one of a pair of two randomly selected children and it was a girl. What is the probability the other I didn’t see is a girl?

Q2: I looked at both of two randomly selected children and at least one of the pair of children is a girl. What is the probability the other is also a girl?

The question in the article is the second question, not the first. The fact that the observer looked at both children and not just one of them is crucial. As is often the case in these puzzles the exact information available is the critical issue.

pmg101 12 hours ago

GG / BG GB GG is 1 / 3.

What's the paradox?

[-]

AnotherGoodName 12 hours ago

Because it's entirely dependent on sampling assumptions. Go to a random house where there's two children, one of which randomly opens the door. Each bb, gg, bg, gb is equal probability and a random child opens the door.

Now if you see a boy disregard that since you can't make the statement that one is a girl.

If you see a girl go ahead and make the statement "a family has two children. You're told that at least one of them is a girl.

What is the probability now?

You have twice the chance of making that statement if you encounter a gg family over a bg/gb family right since there's one of two girls possibly answering the door amongst those families.

So 50% chance of that statement being enabled from a gg family, 25% chance coming from a bg family, 25% chance of coming from a gb family. Which means 50% chance the other child's a girl and 50% chance the other childs a boy.

The probabilities here are entirely dependent on details of the sampling which is not made explicit here.

teekert 12 hours ago

Paradoxes don't exist in reality (they do in hypothetical situations), so there is indeed no paradox as you correctly observe. Instead, most people answer this wrongly, for some reason. And for some reason we call situations where this happens "a paradox". Though I agree that we shouldn't.

Edit, ok, there are things like "This statement is false.", but we should perhaps stick to "self-referential problems" with those.

I think paradoxes just exist in our theories, languages, and formal systems when we make flawed assumptions or create inconsistent frameworks. But physical reality itself just is what it is - no contradictions, just phenomena we sometimes struggle to describe accurately.

If contradictions (paradoxes) can exist, then anything becomes possible through the principle of "explosion in logic". From a contradiction, any statement can be "proven" true. The whole foundation of rational thought would be undermined. Right?

[-]

luxcem 12 hours ago

The Medical Test Paradox or what's that called do exist in the sense that when a test is positive for a rare disease we always run a second one.

[-]

teekert 12 hours ago

To me that is not a paradox, just logical, with very low incidence you just find much more false positives than real positives. What is paradoxical about that?

It's why we don't screen for just any condition in the general population. I.e. we just do it for 65+ y/o's, 3 packs/day smokers because there we may actually find it worth the cost of the program.

There's no contradiction anywhere in this scenario, just people's incorrect intuitions meeting (mathematical) reality.

kgwgk 5 hours ago

The “paradox” may become apparent if you think the answer to all the questions below is the same (2/3). And if it’s not the same, why not?

——

You meet three people:

Alice has two children. You're told that at least one of them is a girl.

Bob has two children. You're told that at least one of them is a boy.

Csilla has two children. You're told that at least one of them is a lány. That clearly meant boy or girl because of the context, but you don’t know enough Hungarian to know what it is.

For each of them, what's the probability that they have a girl and a boy?

—-

You meet all the parents with two children in your neighborhood. Say there are 60 such families.

For 30 of them you’re told that at least one of them is a boy. What's the probability that they have a girl and a boy?

For the other 30 you’re told that at least one of them is a girl. What's the probability that they have a girl and a boy?

hammock 12 hours ago

Confusing permutations and combinations

[-]

pmg101 12 hours ago

Oh yes. Strange.

hammock 12 hours ago

This is Monty hall problem, is it not?

[-]

teekert 12 hours ago

Is it? Would it change your intuition when I tell you "A couple has 100 kids, at least 99 are girls, what is the probability 100 are girls?"

I'm a bit at a loss I have to admit.

[-]

meatmanek 5 hours ago

That would change my intuition quite a bit. Even assuming 100 kids were feasible in a human lifespan, the odds of 99 girls or 100 girls happening is (1 [arrangement where all the kids are girls] + 100 [arrangements where there are 99 girls and 1 boy]) / 2^100, or about 8e-29. Practically zero.

If we assume that each child really did have a 50/50 chance of being boy or girl, then the result would be that there's a 1/101 chance that it's 100 girls.

Given what I know about the world and genetics and such, I think it's much more likely that there's some predisposition by the couple to have girls.

If we think it's, say, 90/10, then the prior probability of the 100-girls case would be 0.9 ^ 100, and the prior probability of each of the 99-girls cases would be 0.9^990.1 -- i.e. the all-girls case is 9x more likely. 0.9^100 / (100 0.9^990.1 + 0.9^100) = 0.9 / (1000.1 + 0.9) = 0.9/10.9 = about 8% probability of having 100 girls, 92% probability of having 99 girls and 1 boy.

If we think the couple has 99:1 odds of girl:boy on each birth, then it's 0.99^100 / (100 * 0.99^990.01 + 0.99^100) or about 50/50 on whether they have 99 girls or 100 girls.

If we think the odds are 999:1, then it's 0.999^100 / (100 0.999^99*0.001 + 0.999^100) = around 90% chance they have 100 girls.

Someone else can do the math assuming an uninformative prior on the couple's girl:boy odds and calculating the posterior distribution given that we know there are 99 girls.

luxcem 12 hours ago

The 'sample space' reduction method is indeed also used to solve the Monty Hall problem.

flappyeagle 11 hours ago

It is exactly the same

bitwize 11 hours ago

Related to the Monty Hall "paradox". Spoiler: You'll get the car if you switch doors with 2/3 probability.

https://en.m.wikipedia.org/wiki/Monty_Hall_problem

[-]

JeffJor 6 hours ago

Bertrand's Box Paradox, which I wrote about in my own comment, applies to it. The upshot is that probability is not based on which prize placements _could_ lead the current game state, it is the set of all possible game states. Lets assume that the contestant starts off with door #3.

Case 1: The prize is behind door #1, and the host must open door #2. Probability 1/3.

Case 2: The prize is behind door #2, and the host must open door #1. Probability 1/3.

Case 3: The prize is behind door #3, and the host has a choice. Case 3A: The host opens door #1. Probability Q/3. Case 3B: The host opens door #2. Probability (1-Q)/3.

If the host actually opens door #1, the probability that door #2 has the prize is (Case 2)/(Case 2 + Case 3A) = (1/3)/(1/3+Q/3) = 1/(1+Q).

If the host actually opens door #2, the probability that door #1 has the prize is (Case 1)/(Case 1 + Case 3B) = (1/3)/(1/3+(1-Q)/3) = 1/(2-Q).

My point is that, since you get to see which door is opened, 2/3 is correct only if you assume Q=1/2. We aren't told what Q is, but we must assume it is 1/2 because otherwise the answer is different depending on which door is chosen.

[-]

zeroonetwothree 3 hours ago

Well if we frame the question as “what is the probability of winning by always switching” then this doesn’t play into it and the answer is indeed 2/3. Hence as a question about general strategy the standard answer is correct.

You’re right if we are asking about a specific case though.

wpollock 4 hours ago

This always bugged me. It isn't switching your choice that gives the 2/3 probability, it is exercising a choice once one of the doors has been eliminated. The odds are the same as flipping a fair coin to decide on which of the two doors remaining to pick.

Am I wrong?

onecommentman an hour ago

When I first heard of the Monty Hall problem, I assumed the naive answer was true, then I thought more and drew up the decision tree and thought the “correct” answer was right. Now I think it is an underspecified problem and literally any probability can be assigned to the result. There are no bad answers because it is a bad(ly stated) problem.

“Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?”

— This is the complete statement of the problem! Everything else is an assumption that may or may not be correct. And is certainly not necessarily a complete set of underlying assumptions relevant to the problem statement.

“Assume that the probability of having a girl or boy is 50% and that the birth order has no effect on the probability. Assume the family is selected at random because they have at least one girl.”

— This is not a part of the statement of the problem! These are a subset of assumptions that can choose to accept, or not. As a modeler or decision analyst you have to make that distinction. Eh, let’s accept them, for the time being. We’ll even assume the narrator is honest, which isn’t a stated assumption.

But let’s add to that list of assumptions. The narrator telling you that one of them is a girl gets all winnings from bets on the outcome of the unknown gender of the “other child” and wants those winnings. The narrator knows that a probability tree analysis of the problem, with perhaps unwarranted assumptions of independence and prior probabilities, will lead to an assignment of 1/3 probability for the other sibling being a girl, and knows you know that result and believes you want to win. [A valid credible interpretation, not misinterpretation, of the original problem statement.]

“What do you think the probability is that both children are girls?”

— Let’s make this question more actionable. “Should you take the even odds bet on both children being girls made by the narrator?. $100 - if they are both girls, the narrator wins $100 and you lose $100; if they aren’t, you win $100 and the narrator loses $100. The narrator and you want to win the money.”

The answer to this question, which seemingly follows from the question of probabilities to be “yes”, is, in fact, “no” - under the additional valid, and quite credible, assumptions made. Because you will only be presented sets of two-girl pairs by the narrator. Let’s assume the “assumptions” are actually correct, and the families will be indeed selected at random, and in the general population there is a 50-50 mix of boys and girls. There is nothing, even in the “assumptions”, that precludes the narrator from preselecting and only presenting two-girl pairs to you, thus always winning when you believe and follow the 1/3 two-girl result.

The statement of the problem, and only the statement of the problem, underspecified as it is, leads to a whole suite of possibly correct answers. The problem is the territory, the problem statement and assumptions are the map.

None of these maps are the territory, necessarily. The probability tree answer is just as sloppy, from a decision analyst perspective, as the naive answer.

flappyeagle 11 hours ago

This is exactly the Monty hall problem

lightvector 12 hours ago

One of the challenges with puzzles like this it that it gives you "at least one of them is a girl" as a mathematical assertion where you're not supposed to further introspect the context of how/why you're being given that fact.

But that's unrealistic. In real life, the context for how and why there would be a speaker telling you such a thing in the first place can be relevant and affect the probability!

How is this possible? Suppose among all the math-riddle-loving parents of two children who would ask such a puzzle in the first place there are an equal number of parents of B-B, B-G, G-B, G-G, and that each is equally likely to ask you such a riddle when you meet them.

Suppose when asking such a riddle the B-B parents tell you "at least one of them is a boy" (they don't have any girls, so that's the only way they can ask this kind of riddle), the G-G parents tell you "at least one of them is a girl" (same thing but in reverse), while the B-G and G-B parents say one of "at least one of them is a boy" and "at least one of them is a girl" equally at random.

Then, conditioned on being told that "at least one of them is a girl", the probability of another girl is actually 1/2, not 1/3 like the paradox answer claims. To see this, imagine 40 examples of the above puzzle asking taking place. You get 10 B-B parents saying "at least one of them is a boy", 10 G-G parents saying "at least one of them is a girl", and among the 20 (B-G and G-B) parents since they choose randomly, you have 10 saying "at least one of them is a boy" and saying "at least one of them is a girl".

So out of the 20 times where "at least one of them is a girl" is said, there are 10 cases where it's a G-G family and 10 cases where it's a B-G or G-B family, therefore conditioned on being told "at least one of them is a girl", the probability of two girls is actually 1/2.

If there were some gender bias in how the B-G and G-B families might ask the question, or other differences that affect how likely different of these people would be posing the puzzle to you, then the probability could be yet different than either of 1/3 or 1/2.

So there's a difference in being present something as a flat mathematical assertion that you're supposed to take at face value and not supposed to question further (where the probability is 1/3, as the article claims). Versus being told something in real life, where you always need to take into account the context and situation of the speaker, and the probability could be different.

There are real life implications of this too - the big classic one being publication bias / newsworthiness bias. As most people intuitively know by now, it is also often wrong to take the statistical analysis or claims of a particular research study or paper entirely at face value, because there is a bias in the fact that "positive" and "exciting" results are more likely to be reported in the first place, and so statistical outliers that aren't actually replicable are disproportionately likely to be reported (see also https://xkcd.com/882/). And publication bias still occurs with respect to the reporting of results, amplification or not in the media etc, even when the the authors themselves are trustworthy and have done their analysis within the paper in a statistically proper way. So conditioned on you hearing about the result in the first place, it is often less likely to be true (and less likely to replicate in the future, etc) than you would think if you just took the statistical analysis in the paper at face value, even when that analysis was done correctly. The situation in the "sisters paradox" of computing a probability taking a statement entirely at logical face value is rare in real life.

[-]

zeroonetwothree 3 hours ago

It’s not meant to be a real life problem but a math one. Would you like it better if it was phrased instead as: X and Y are random Bernoulli variables with p=0.5. What is P(X+Y=2| X=1 or Y=1)?

fkyoureadthedoc 12 hours ago

> a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?

    +-----------+-----------+-----------+-----------+
    |   Boy     |           |   Girl    |           |
    |  (0.5)    |           |  (0.5)    |           |
    +-----------+-----------+-----------+-----------+
    | Boy-Boy   | Boy-Girl  | Girl-Boy  | Girl-Girl |
    |  (0.25)   |  (0.25)   |  (0.25)   |  (0.25)   |
    +-----------+-----------+-----------+-----------+

Does this mean the probability that they are both boys is also 25% lol

[-]

randallsquared 12 hours ago

That one is ruled out by the "at least one is a girl".

[-]

fkyoureadthedoc 12 hours ago

Well brother I'm looking at the table and it clearly says 25%. You may have been lied to about the at least one girl it seems.

[-]

furyofantares 11 hours ago

The text directly above that table:

> A simpler question

> Let's image you're asked a simpler question.

> A family has two children. What's the probability both are girls?

[-]

fkyoureadthedoc 9 hours ago

Under the 1 child policy, probably slim U+1F614