Binomial probability


Life is usually complicated. But sometimes you can simplify some specific problems to yes/no or good/bad answers. If you have a sequence of independent repetitions of an event, with two possible outcomes, and the probabilities of those outcomes does not vary between repetitions, we call that a Bernoulli trial.

If all the above conditions are fulfilled, we can denote p as the probability of success and q = (1 – p) the probability of failure for one such event. For a sequence of n events we have the probability to be successful in k of them:

P = C(n, k) \cdot p^{k} \cdot q^{n - k} (1)

While this may seem a bit boring, we can get use the above relation for various probability estimates, as long as we apply it correctly, and the events are Bernoulli trials.

Example 1

Let’s say you have to pass a test. There are 20 multiple choice questions, each one with 4 choices and one correct answer. What is the probability to pass if you do not study? That would be the probability to answer correctly to 12 questions from 20, with p being 0.25 (1 out of 4 possible choices):

P = C(20, 12) \cdot 0.25^{12} \cdot 0.75^{8} \approx 0.000752

A probability of 0.075% is really low, so you will probably fail without study. However this particular test also has other rules like: 3 questions are mandatory to answer correctly, or you fail regardless of the other answers. So let’s suppose you study somewhat so you’ll correctly answer those 3 mandatory questions. You probability to pass becomes:

P = C(17, 9) \cdot 0.25^{9} \cdot 0.75^{8} \approx 0.009284

which is 120 times larger that if you don’t study at all, but still low.

In the end the best idea to pass any test is to study for it, but in case you don’t feel like it at least you can estimate how much you should study based on how lucky you feel.

Example 2

Let’s suppose we’re playing a game where you can win 1 dollar with probability p = 0.5 and lose 1 dollar with q = 1 – p. The expected value of the game is:

E = p \cdot 1 - q \cdot 1 = 0

so in the long run we would expect to neither gain, nor loose from repeatedly playing this game.

However let’s calculate the probability to win at least 4 dollars after 20 games. This is the probability to win 11 games (this means 11 vs. 9 so we gain 2 dollars) or the probability to win 12 games (this means 12 vs. 8 so 4 dollars). We can add the probabilities because the 2 events are mutually exclusive. According to what we discussed this means:

P = C(20, 11) \cdot 0.5^{11} \cdot 0.5^{9} + C(20, 12) \cdot 0.5^{12} \cdot 0.5^{8} \approx 0.28

This looks like a good enough probability to win. For a run of 80 games we have:

P = C(80, 41) \cdot 0.5^{41} \cdot 0.5^{39} + C(80, 42) \cdot 0.5^{42} \cdot 0.5^{38} \approx 0.167

So although the probability decreases with the number of runs it is still looks significant.

So how does this work? On one side we have the expected value of the game 0, so you expect to win 0 dollars on the long run but if we actually calculate the probability to win a couple of dollars in a series of repeated games we get big numbers. The answer is simple actually, there is a probability to lose at least 4 dollars and in this case it is equal with the probability to win, so you will probably get nothing in the end of a long series of games.

Always think of all the probabilities or you will miss important ones.

Conditional probability


Conditional probability is the probability of an event A, given that some other event B occurred. There is no implicit causal or temporal relationship between the two events. In most of the practical examples, there is such a relationship, but this is not necessary for the mathematical relations described here.

Examplesource: A car is produced in 2 models: Model 1 and Model 2. Each model is produced in 2 designs: hatchback and wagon. Off all the cars produced, 30% are Model 1, and 20% are wagons. But 50% of Model 1 are wagons.

Let’s see this example in a Venn diagram.

I know Venn diagrams are usually made with circles, but I drew it with rectangles since it’s easier to represent the areas exactly. This diagram illustrates the basic rules of conditional probability:

P(A | B) = \frac{P(A \cap B)}{P(B)}, \text{ where } P(B) \neq 0 \quad (1)


P(B | A) = \frac{P(A \cap B)}{P(A)}, \text{ where } P(A) \neq 0 \quad (2)

Independent and mutually exclusive events

The event A is said to be independent of B iff:

P(A | B) = P(A)

in other words, if the fact that B occurs (or not) has no effect of the probability of A. Now, are two mutually exclusive events (events that have no basic outcomes in common) also independent? Two mutually exclusive events are actually dependent of each other, since the occurrence of one of them means the second event cannot happen.

If events A and B are independent it follows that events A and B’ (the complement of B); A’ (the complement of A) and B; and A’ and B’ are independent also. A Venn diagram of 2 mutually exclusive events A and B has no intersection between them, while the diagram of 2 independent events A and B will have the following relation P(A | B) = P(A) = P(A | B'). This means – according to equation (1) – that the ratio of A \cap B and B is equal with the ratio of A \cap B' and B' and equal with the proportion of A in the whole sample space.

Bayes’ theorem

If we equal P(A \cap B) in equations (1) and (2) above, we get Bayes’ theorem:

P(B | A) = \frac{P(A | B) \cdot P(B)}{P(A)}

Confusion of the inverse

Confusion of the inverse is the assumption that P(A|B) is the same as P(B|A). The classical example for this is the assumption that if you’re tested positive for a particular disease that means you have that disease. Let’s try to put some numbers on this example. Usually we know the following:

  • P(ill) – the probability that a specific population has the disease. This is usually determined through statistics by recording the number of known cases of the disease. You can probably find this number from your favorite statistics agency.
  • P(positive|not ill) – the false positive rate for the test (type I error). This is the probability that the test will be positive for a person that does not have the disease. This value is usually denoted by \alpha . Conversely P(negative|not ill) = 1 - \alpha
  • P(negative | ill) – the false negative rate for the test (type II error). This is the probability that the test will be negative for a person that really has the disease. This value is usually denoted by \beta . Conversely P(positive|ill) = 1 - \beta

Let’s apply Bayes’ theorem now:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{positive})}

The people for which the test is positive can be split in ill or not ill. Since (being ill and testing positive) is mutually exclusive with (not being ill and testing positive) we can write:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{ill } \cap \text{ positive}) + P(\text{not ill } \cap \text{ positive})}

and then by using equation (2) we get:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{positive} | \text{ill}) \cdot P(\text{ill}) + P(\text{positive} | \text{not ill}) \cdot P(\text{not ill}))}

P(\text{ill} | \text{positive}) = \frac{(1 - \beta)P(\text{ill})}{(1 - \beta)P(\text{ill}) + \alpha(1 - P(\text{ill}))} \quad (3)

which is obviously different from P(positive|ill) = 1 - \beta .

How different? Well for any reasonable good test we should expect \alpha and \beta to be really small. So below you can see 2 graphs (3D because we have 2 independent variables \alpha < 0.1 and \beta < 0.1 ) for 2 different values of P(ill).

While P(positive|ill) depends only on the parameters of the test, P(ill|positive) depends on the parameters of the test and also on P(ill). The difference between the two is larger when P(ill) is smaller, meaning if the disease is rare, it is less probable that one test will tell you that you are sick (unless it has amazingly low false positive and false negative rates).

Conditional probabilities are easy to guesstimate and difficult to calculate. And remember that usually in this case guesstimate is more of a wild guess based on your emotions at that particular moment then an estimate. When dealing with conditional probabilities it is better to think, calculate them, calculate them again and then look for an independent confirmation just to be sure that you’ve done it right.

Poker probabilities


I got this problem from a friend, and although the answer can be found with a web search I will reproduce it here for future reference.

What are the probabilities of the different poker hands? First let me be specific in that I’m talking about the probability of getting one of the poker hands at the first draw from a full deck of cards. So we have 52 cards (no wild cards) and we draw 5, or we a number of players and we give 5 cards to each.

Both of the above situations are equivalent. There is no difference in the way you calculate the probability if you draw the first 5 cards or you draw the 2nd, 6th, 10th, 14th and 18th card from the deck.

Royal Flush

Royal flush is ace, king, queen, jack, ten in the same suit. The probability is:

P_{royal\_flush} = \frac{4}{C(52, 5)} \approx 0.0001539\%

There is only 1 possible combination for each suit that can give you a royal flush, multiplied with the number of suits.

Straight Flush

Straight flush is 5 cards in rank sequence and of the same suit. The probability is (including royal flush):

P_{straight\_flush} = \frac{4 \cdot 10}{C(52, 5)} \approx 0.001539\%

There are 10 ways of getting 5 cards from 14 in rank sequence (because we can have ace-high or an ace-low straight flush) for each suit, multiplied with the number of suits.

Four of a kind

Four of a kind means 4 cards of one rank, and an unmatched card of another rank. The probability is:

P_{4\_of\_kind} = \frac{13 \cdot 48}{C(52, 5)} \approx 0.024\%

There are 13 possible ways to get 4 cards of the same rank (that’s the number of ranks available) and there are (52 – 4) possible ways to get the remaining 5th card.

Full house

Full house means 3 matching cards of one rank, and 2 matching cards of another rank. The probability is:

P_{full\_house} = \frac{P(13, 2) \cdot C(4, 3) \cdot C(4, 2)}{C(52, 5)} \approx 0.144\%

There are P(52, 5) possible ways to get 2 suits from the 13 available. This is because in this case it matters if for example we have 3 cards of clubs and 2 cards of spades or 3 cards of spades and 2 cards of clubs. There are C(4, 3) and C(4, 2) possible ways of getting 3 and respectively 2 cards from the 4 cards of the same rank.


Flush means 5 cards of the same suit, not in rank sequence, excluding the straight flush and royal flush. The probability is:

P_{flush} = \frac{4 \cdot C(13, 5) - 40}{C(52, 5)} \approx 0.19654\%

There are C(13, 5) possible ways to get 5 cards from each suit and we have to subtract the 40 straight flushes (see above) from that number.


Straight means 5 cards in rank sequence but in more than one suit. The probability is:

P_{straight} = \frac{10 \cdot 4^5 - 40}{C(52, 5)} \approx 0.39246468\%

There are 10 ways of getting 5 cards in rank sequence and 4^5 combinations available for those 5 cards. As before we have to subtract the 40 straight flushes from that number.

Three of a kind

Three of a kind means 3 cards of the same rank, plus 2 unmatched cards. The probability is:

P_{3\_of\_kind} = \frac{13 \cdot C(4, 3) \cdot C(12, 2) \cdot 4^2}{C(52, 5)} \approx 2.1129\%

OK, so there are C(4, 3) possible ways of selecting 3 cards from 4 of the same rank, and we have to multiply this with the number of ranks. For the remaining 2 cards there are C(12, 2) possible ways of selecting the rank so it would not match the rank of the first 3 cards, and 4^2 combinations available for those 2 cards.

Two pair

Two pair means two cards of the same rank, plus two cards of another rank (that match each other but not the first pair), plus one unmatched card. The probability is:

P_{2\_pair} = \frac{C(13, 2) \cdot C(4, 2) \cdot C(4, 2) \cdot 11 \cdot 4}{C(52, 5)} \approx 4.7539\%

We have C(13, 2) possible ways of selecting 2 different ranks from the 13 available, multiplied twice with C(4, 2) which is the number of possible ways of selecting 3 cards from 4 of the same rank. For the remaining card we have only 11 suits available and 4 possible cards per suit.

One pair

One pair means two cards of the same rank, plus three other unmatched cards. The probability is:

P_{2\_pair} = \frac{13 \cdot C(4, 2) \cdot C(12, 3) \cdot 4^3}{C(52, 5)} \approx 42.2569\%

We have C(4, 2) possible ways of selecting 2 cards from 4 of the same rank multiplied with the number of ranks. For the remaining 3 cards we have C(12, 3) possible ways of selecting a different rank and 4^3 combinations available for those 3 cards.

For a final overview, the below pie chart shows all the probabilities discussed previously (all of them are in the chart, just that some of them are really small).

Gambler’s ruin


Let’s suppose you play with an opponent the following game: if you win you get 1 dollar, if you lose you lose 1 dollar. The probability to win the game is p (and conversely the probability to lose the game is q = 1 – p), you start with i dollars and your opponent with N – i dollars (N is the total sum in the game). What is the probability to win the whole sum of money, before you go broke? Or in scientific terms we’re looking for the probability to win everything P(N|i), that is the probability to win N dollars conditioned by starting with i dollars. After a not so complicated proof the result is:

P(N|i) = \begin{cases} \frac{1 - \rho^i}{1 - \rho^N} & p \neq q, (\rho = \frac{q}{p})\\ \frac{i}{N} & p = q = 0.5 \end{cases}

This problem is known as gambler’s ruin and was first proposed by Christiaan Huygens. Let’s plot the above result and see if we can draw some conclusions from this problem.

First, in the case of a perfectly fair game (p = q = 0.5, meaning no player has an advantage) the probability to win the total amount is dependent only on how much money you start with. In other words the one with the most money, has a bigger probability to win. Or if you prefer the corollary, if you want all the money in a game try to start the game with most of them in your pocket.

If the game is not perfectly fair it gets even worse. If N = 50 dollars and you start with 20 dollars, the probability to win everything drops to below 0.1 if the probability of winning one game is just slightly against you (0.48). The probability to win everything decreases slowly after that (if the probability to win a game goes below 0.48) but that just shows that a small difference is all it takes, you don’t need to cheat to badly to get an effect.

So the lesson is simple. If you want to maximize the probability to win all the money in a series of games, try to start with (much) more money than your opponent(s) and try to have a bigger chance of winning the game than your opponent(s). This is the basic principle why professional gambling establishments cannot lose since they will make sure that both of the above conditions are met for them.


To find P(N|i) let’s draw a Markov chain. This is basically a graphical representation of the gaming process as a series of states (characterized by the amount of money you have won i). The future state is determined only by the present state and the probabilities p and q. The states 0 and N are special end states, meaning if you reach them the game has ended. Also in the 0 and N states we have the boundary conditions P(N|0) = 0 (you can’t win the game if you have 0 dollars) and P(N|N) = 1 (you have won the game if you have N dollars).

For any state i except 0 and N we have probability p to go to state i + 1 (green transition) and probability q to go to state i – 1 (red transition). So we can write the following:

P(N|i) = qP(N|i-1) + pP(N|i+1)

P_i = qP_{i-1} + pP_{i+1}

(p+q)P_i = qP_{i-1} + pP_{i+1}

p(P_{i+1} - P_i) = q(P_i - P_{i-1})

P_{i+1} - P_i = \frac{q}{p}(P_i - P_{i-1}) = \rho(P_i - P_{i-1}) \quad (1)

We know that P(N|0) = 0 and by writing equation (1) for each transition:

P_2 - P_1 = \rho P_1

P_3 - P_2 = \rho (P_2 - P_1) = \rho^2 P_1


P_i - P_{i-1} = \rho^{i-1} P_1

Adding all the above:

P_i - P_1 = (\rho + \rho^2 + \cdots + \rho^{i-1}) P_1

P_i = (1 + \rho + \rho^2 + \cdots + \rho^{i-1}) P_1

P_i = P_1 \frac{1 - \rho^i}{1 - \rho} \quad (2)

Writing the other boundary condition P(N|N) = 1 for equation (2) we get:

P_N = P_1 \frac{1 - \rho^N}{1 - \rho} = 1

P_1 = \frac{1 - \rho}{1 - \rho^N} \quad (3)

Replacing (2) into (3) we get the relation we were looking for:

P_i = \frac{1 - \rho^i}{1 - \rho^N}

Of course this result is not valid for ρ = 1, but this case is easily demonstrated by replacing ρ into equation (2), where we get:

P_i = iP_1

In this case the boundary condition becomes:

P_N = NP_1 = 1

And the result for ρ = 1 (p = q = 0.5) is:

P_i = \frac{i}{N}