This is the second in a series on Frequently Questioned Answers – that is, answers we have given that are often challenged by readers, either just out of confusion, or in the form of attacks on our intelligence or honesty. Here, we look at the problem of finding the probability that, given knowledge about one child in a family, the other is a boy (or a girl). This is discussed in our FAQ, Boy or Girl?, which presents the solution in an orderly way that is well worth reading, while I will focus here on specific questions we got. As you’ll see, this can be a hard problem to state correctly; even our FAQ has a flaw, as does my description two sentences back (and even the title of this post)!
The basic problem and its answer
Here is the first question we got on this, from Bret in 1996:
Probability of Two Male Children Several weeks ago, in one of the weekly periodicals I read, a person presented this logical statement: If a family has two children, and the older child is a boy, there is a 50 percent chance the family will have two boys. However, in a family with two children, if all we know is that one child is a boy (no age specified) there is a 1/3 chance of that family having two male children. This boggles me, as I can plainly see the first scenario - it's quite simple: the younger child may be a boy or girl (1/2). It is a mystery to me how by not knowing whether the known boy is older or younger could change the probability of the family as a whole. However, I have discussed this problem with my grandfather and he explains it like so: any two-child family has four possible outcomes: b:b, g:g, b:g, g:b. He states that, in the first case where the known boy is also known to be the older child, we thereby eliminate the g:g and g:b combinations, leaving two left. And obviously one of those two is b:b so our odds of having two male children is 1/2. By his same logic, in the 2nd instance all that is known is that one child is a boy, therefore of course the g:g combination cannot exist, leaving b:b, b:g, g:b. Hence, there is a 1/3 chance of two boys. Although his explanation is very clear, I fail to understand how the odds of the entire family can be swayed by whether or not a boy is older or not. Example: I see a common ordinary boy. I am told that he has a sibling. Am I to believe now that his sibling has a 2/3 chance of being a girl?? Now I am told he is the older child. Magically the odds change and his younger sibling has a 1/2 chance of being a girl now? It baffles me.
Bret’s grandfather’s explanation is a concise version of what we say in our FAQ, and the initial statement of the problem avoids all the pitfalls we’ll be looking at. We’ll see later, though, that the final paragraph, by starting with the boy, misstates the problem in such a way that the correct answer is 1/2. The problem is very sensitive to small changes in its statement!
To expand the explanation a bit, suppose we gathered a random collection of families, each with exactly two children. Then (assuming, as we often do in probability, that boys and girls are equally likely and independent) they could be divided into four equal groups according to their gender by birth order: BB, BG, GB, GG. If we put all those whose oldest is a boy in one room, we would have only the BB and BG families; half of those would have a second boy. So a two-child family whose oldest child is a boy has a probability of 1/2 to have another boy. That’s just what we’d expect.
But if we put in that room all the families that have at least one boy, without regard to position in the family, we would be including BB, BG, and GB. Only 1/3 of these would have two boys! So not knowing which child is a boy reduces the probability. How can that be?
Doctor Anthony answered this, starting with a different example of conditional probability, showing how restrictions on the “sample space” can affect probability. He then briefly discussed this problem:
The more information you provide, the more you change the sample space (the denominator) of the probability calculation. In the case of sons or daughters in a family of four, if you exclude gg or gb, then as you said you can only have bb or bg, so 1/2 probability that second child is a girl, whereas if one child (unspecified) is a boy, the probability space is now bb, bg, gb and the chance of two boys is now only 1/3. The thing to remember is that CONDITIONAL probability can dramatically change the commonsense idea of what a particular probability should be.
More will have to be said …
The basic problem was asked and answered in much the same way in 1999:
Bayes Theorem
How to (mis)read the problem
In 2000, we got a challenge to the previous answer:
Boy or Girl: Two Interpretations While looking for something else, I stumbled upon this question: Probability of Two Male Children http://mathforum.org/dr.math/problems/mcclory.7.5.96.html and saw that you said the probability of the second boy having a brother was 1/3. By my calculations the boy's sibling is either elder or younger and either male or female. Assuming that the probabilities of each are equal: elder brother = 1/4 younger brother = 1/4 elder sister = 1/4 younger sister = 1/4 As such, the probability of him having a brother is 1/4 + 1/4 = 1/2. Where you went wrong was in saying b:b was as likely as b:g or g:b, when of course in b:b if we now call the boy we know about x, we have x:b and b:x, so we have the sets x:g, g:x, b:x, and x:b; and thus a probability of 1/2.
Phil is not just going by intuition, but has found a specific argument that convinces him we are wrong. Are we? Doctor TWE replied, focusing on a key idea:
One of the biggest problems in probability is stating the problem clearly. Either answer, your 1/2 or Dr. Anthony's 2/3, could be correct depending on how the problem is set up. In this problem, a key factor in determining the probability is how the child and family are selected. When we say, "in a two-child family, one child is a boy," how did we select the child? The selection process makes a big difference in the final probability (or, as Dr. Anthony would say, in the "sample space" of the problem.)
We often comment that the exact words (and, too often, unstated assumptions) are essential in any probability problem. We have to be careful both in how we state the problem, and in how we read it. This problem, in particular, is very easy to either write or read wrongly (especially as it tempts us to state it in such a way that the answer will be unexpected).
First, this is how Phil is seeing it:
Supposing that we randomly pick a _child_ from a two-child family. We see that he is a boy, and want to find out whether his sibling is a brother or a sister. (For example, from all the children of two-child families, we select a child at random who happens to be a boy.) In this case, an unambiguous statement of the question could be: From the set of all families with two children,
a child is selected at random and is found to
be a boy. What is the probability that the
other child of the family is a girl? Note that here we have a pool of kids (all of whom are from two-child families) and we're pulling one kid out of the pool. This is like the problem you're talking about. The child selected could have an older brother, an older sister, a younger brother or a younger sister. Let's look at the possible combinations of two children. We'll use B for Boy and G for girl, and for each combination we'll list the older child first, so GB means older sister while BG means younger sister. There are 4 possible combinations: {BB, BG, GB, GG} From these possible combinations, we can eliminate the GG combination since we know that one child is a boy. The three remaining possible combinations are: {BB, BG, GB} In these combinations there are four boys, of whom we have chosen one. Let's identify them from left to right as B1, B2, B3 and B4. So we have: {B1B2, B3G, GB4} Of these four boys, only B3 and B4 have a sister, so our chance of randomly picking one of these boys is 2 in 4, and the probability is 1/2 - as you have indicated.
So, we put all our two-child families into that room, and half the boys will be from two-boy families (two of them supplied by each such family!), and the other half will be from one-boy families. Everything is as we expect. But here, we counted boys, not families.
But now let's look at a different way of selecting the "boy" in the problem. Suppose we randomly choose the two-child _family_ first. Once the family has been selected, we determine that at least one child is a boy. (For example, from all the mothers with two children, we select one and ask her whether she has at least one son.) In this case, an unambiguous statement of the question could be: From the set of all families with two children,
a family is selected at random and is found to
have a boy. What is the probability that the
other child of the family is a girl? Note that here we have a pool of families (all of whom are two-child families) and we're pulling one family out of the pool. Once we've selected the family, we determine that there is, in fact, at least one boy. Since we're told that one child (we don't know which) is a boy, we can eliminate the GG combination. Thus, our remaining possible combinations are: {BB, BG, GB} Each of these combinations is still equally likely because we picked one of the four families. Now we want to count the combinations in which the "other" child is a girl. There are two such combinations: BG and GB. Since there are three combinations of possible families, and in two of them one child is a girl, the probability is 2/3.
This is the answer to the problem as intended by our FAQ.
Why are these two probabilities different? As in other probability problems, how information is obtained is as important as the information itself. Without knowledge of the data gathering process, ambiguity can result. How do we know that one child is a boy? In the first (your) interpretation, each _boy_ has an equal chance of being chosen. Thus, the family with two boys has twice the chance of being the "chosen family." The boys are equally probable, but the families are not. In the second (Dr. Anthony's) interpretation, each _family_ has an equal chance of being chosen. In a family with two boys, each boy has only half that chance of being "the boy" referenced in the statement. The families are equally probable, but the boys are not. In this case, the two "events" are not independent, because we're selecting a family, not an individual child. In fact, there's really only one "event" - the selection of the family. If you're still not convinced, try the following experiment. Take two fair coins and toss them. I think you'll agree that each coin has a 1/2 chance of being heads. On each toss, see if at least one of the coins is heads (the equivalent of "at least one child is a boy"). If both coins are tails (both children are girls), ignore the outcome and toss again. If at least one of the coins is heads, record whether you had two heads (the boy has a brother) or a head and a tail (the boy had a sister). Over many tosses, you should find yourself getting about twice as many head-tail tosses as head-head tosses. Of course, if you count each head-head toss twice (once for each head tossed)...
Once again, an experiment, while not the most mathematical way to settle an argument, has the advantage of forcing you to face reality (and maybe think more about whether your model matches the problem). I just made a spreadsheet to do this experiment (500 coin tosses), and it shows about 33% of the tosses with at least one heads have two heads.
Doctor TWE’s answer above is the basis of the supplemental FAQ page, Family or Child First?
For similar problems, see
Cupcakes and Boxes: Conditional Probability Boy, What Is Your Probability?
This problem is discussed at length in Wikipedia, Boy or Girl paradox, which includes our second FAQ as a reference.
Our FAQ used to be wrong!
Looking through the many unarchived questions on this topic, I found that in 2001 a reader challenged, not our answer, but the closing statement of the problem itself:
Page http://forum.swarthmore.edu/dr.math/faq/faq.boy.girl.html has near its end: "Remember: information that creates conditional probability can dramatically affect commonsense ideas about probability. For example, no matter how unlikely it may seem to you, if you meet a girl who says she has a sibling, a basic knowledge of probability tells you there's a 2/3 probability that she has a brother. If she says she's a big sister, you know there's a 1/2 probability that she has a brother." Assuming this is the 2 child per family problem (because every other part of the problem has been 2 child and no info on distribution of family sizes is given), the odds should be 1/2 that her sibling is a brother. The explanation for that is given on: http://forum.swarthmore.edu/dr.math/faq/faq.boygirl.choose.html
in the section "Choosing the Child First". I am convinced that if I picked a 2-child family at random and one child is a girl, then there is a 2/3 chance that the other is a boy, but that is not the case described. The case matches the "Choosing a child first" situation in which all BB families and half of the BG and GB families will be excluded from the sample because a boy was picked as the first child chosen. If the first page I quoted made no assumption about family size, then just toss out everything I have said.
Karl was right! Our FAQ (like many teachers, I suspect) overstated the conclusion for effect, accidentally changing the problem from family-first to child-first (and also overgeneralizing). (This is exactly Bret’s error in the first question above.) Doctor TWE changed the FAQ from what is quoted above to what it is now:
Remember: information that creates conditional probability can dramatically affect common sense ideas about probability. For example, no matter how unlikely it may seem to you, if you meet a mother of two who says she has a daughter, a basic knowledge of probability tells you there's a 2/3 probability that the daughter mentioned has a brother. If she says she's an older daughter, you know there's a 1/2 probability that the daughter has a younger brother.
Even this, I think, could be misleading, if you suppose that the mother has volunteered the information, thinking of a specific daughter (“the daughter mentioned”). That could pull the problem over to the child-first realm, or at least require us to consider subjective probabilities based on the mother’s motivations.
And it is still a little wrong!
Much later, in 2012, Mike wrote about the introduction to the problem:
... Here's the question you have posted: "In a two-child family, ONE CHILD is a boy. What is the probability that THE OTHER CHILD is a girl?" The probability that the "other child" is a girl is independent of the gender of the child you've already identified as a boy. "Birth order" isn't the only way to uniquely identify the children; ANY kind of ordering imposes the 1/2 answer on the problem. You're no longer talking about sets of children (i.e. families), you're talking about children. That's different. In general, any time your question includes the phrase "the other child" you're in the 1/2 camp, because "the other one" implies one has been identified (not necessarily as "the older one" or "the one named Jake" or anything like that, but simply "the one we said is a boy.") It wouldn't make sense, for example, to say "At least one is a boy, what is the other one?" The other what? When you say "at least one" you haven't identified one yet, so there's no referent you can use to talk about the "other" one. This may seem like mere semantics, but it's more than that. The way you have the question worded, the answer is definitively 1/2. You would need to rework it to make 2/3 an acceptable interpretation (e.g. "A family has two children, at least one of which is a boy. What is the probability that they're not both boys?")
This led to a long discussion of semantics; in the end, I proposed rewording the FAQ so that the answer is still surprising, but it is clear upon consideration, in order to show the power of math rather than its weirdness.
This is my final proposed beginning to the FAQ:
In a two-child family, we are told that one child is a boy. What is the probability that they also have a girl? What if we are told that the older child is a boy? Does this information change the probability that the second child is a girl? ----- We first need to clarify the question, because English is often ambiguous, and probability requires precision. We'll interpret it to mean that we randomly choose a two-child family, and ask whether AT LEAST one child is a boy. The answer is yes. We want to know how likely it is that they have one boy and one girl. (What if we choose the _child_ first?) When the only information given is that there are two children and one is a boy, here are two ways of looking at the problem: ...
This fixes several issues. Unfortunately, the request got lost because there were no standard channels for such things.
Pingback: More on Gender Probability: Twins – The Math Doctors