How to get probabilities wrong

2010-01-30

A couple of days ago, one of my friends asked me to calculate some probability related to Texas hold’em. The probability he wanted to calculate was quite simple, but exactly this is the problem. The easier it looks to calculate, the more opportunities for mistake. This reverse can be also true, if it’s hard to calculate you probably won’t even try.

The problem is the following: given that the 2 cards you have in your hand are of the same suit, what is the probability that you will get 4 cards of the same suit after the flop.

Since people learn from their mistakes, but smart people learn from other people’s mistakes, lets see how to calculate this probability. The way it works is that you know 2 cards so you have 50 left. Now, it can be argued that the other players have each 2 cards too, thus it will be less than 50 cards in the game. But from your point of view it really doesn’t matter if the other players have cards or how many players are there. Imagine the following scenarios:

  • You have a pack of 52 cards. You give yourself 2 cards, then you draw some cards and just leave them on the table, then you draw 3 more and put them face up (the flop in Texas hold’em). Here it is obvious that you have 50 cards and you just draw 3 of them randomly
  • You have a pack of 52 cards. You give 2 cards to you and a number of other players and then draw 3 more and place them face up. There is no difference in the way the cards are drawn compared to the previous scenario.

Of course if you would knew the cards of the other players you could calculate more accurately the probability that we’re looking for. But based only on the knowledge of your 2 cards you can use the first scenario regardless of the number of players at the table. It’s the same as if you have a dice. If you just know it’s just a dice you can say the probability to get the number 1 is 1/6. But if you would know in detail the mass distribution of the material of the dice, the initial position of the dice, and the initial force and moment you could give a more accurate probability to get number 1. But this is already getting into the philosophy of probability so lets leave this for another post.

So we have 50 cards and we draw 3 of them. This means C(50, 3) is the total number of outcomes. But we already have 2 card of a particular suit and we want 2 more of the same suit. Since we are left with 11 cards of our suit of interest in the rest of the cards, and we want any 2 of them that means C(11, 2) favorable outcomes. And for the third card in the flop we can have anything else, so that means 50 – 2 = 48 outcomes. This gives us:

p = \frac{C(11, 2) \cdot 49}{C(50, 3)} = 13.47\%

(Un)fortunately we had the right result, and it wasn’t what we got. The mistake was that for the third card we multiplied by the number of cards remaining in the deck. However there are cards of the same suit after we draw the 4 we were interested in. So what we calculated was the probability to get 4 or 5 cards of the same suit after the flop. The requirement was to calculate the probability to get only 4 cards of the same suit.

So the correct answer if that for the third card we can have any card of another suit. That means 39 outcomes giving the right result of:

p = \frac{C(11, 2) \cdot 39}{C(50, 3)} = 10.94\%


Poker probabilities

2009-12-09

I got this problem from a friend, and although the answer can be found with a web search I will reproduce it here for future reference.

What are the probabilities of the different poker hands? First let me be specific in that I’m talking about the probability of getting one of the poker hands at the first draw from a full deck of cards. So we have 52 cards (no wild cards) and we draw 5, or we a number of players and we give 5 cards to each.

Both of the above situations are equivalent. There is no difference in the way you calculate the probability if you draw the first 5 cards or you draw the 2nd, 6th, 10th, 14th and 18th card from the deck.

Royal Flush

Royal flush is ace, king, queen, jack, ten in the same suit. The probability is:

P_{royal\_flush} = \frac{4}{C(52, 5)} \approx 0.0001539\%

There is only 1 possible combination for each suit that can give you a royal flush, multiplied with the number of suits.

Straight Flush

Straight flush is 5 cards in rank sequence and of the same suit. The probability is (including royal flush):

P_{straight\_flush} = \frac{4 \cdot 10}{C(52, 5)} \approx 0.001539\%

There are 10 ways of getting 5 cards from 14 in rank sequence (because we can have ace-high or an ace-low straight flush) for each suit, multiplied with the number of suits.

Four of a kind

Four of a kind means 4 cards of one rank, and an unmatched card of another rank. The probability is:

P_{4\_of\_kind} = \frac{13 \cdot 48}{C(52, 5)} \approx 0.024\%

There are 13 possible ways to get 4 cards of the same rank (that’s the number of ranks available) and there are (52 – 4) possible ways to get the remaining 5th card.

Full house

Full house means 3 matching cards of one rank, and 2 matching cards of another rank. The probability is:

P_{full\_house} = \frac{P(13, 2) \cdot C(4, 3) \cdot C(4, 2)}{C(52, 5)} \approx 0.144\%

There are P(52, 5) possible ways to get 2 suits from the 13 available. This is because in this case it matters if for example we have 3 cards of clubs and 2 cards of spades or 3 cards of spades and 2 cards of clubs. There are C(4, 3) and C(4, 2) possible ways of getting 3 and respectively 2 cards from the 4 cards of the same rank.

Flush

Flush means 5 cards of the same suit, not in rank sequence, excluding the straight flush and royal flush. The probability is:

P_{flush} = \frac{4 \cdot C(13, 5) - 40}{C(52, 5)} \approx 0.19654\%

There are C(13, 5) possible ways to get 5 cards from each suit and we have to subtract the 40 straight flushes (see above) from that number.

Straight

Straight means 5 cards in rank sequence but in more than one suit. The probability is:

P_{straight} = \frac{10 \cdot 4^5 - 40}{C(52, 5)} \approx 0.39246468\%

There are 10 ways of getting 5 cards in rank sequence and 4^5 combinations available for those 5 cards. As before we have to subtract the 40 straight flushes from that number.

Three of a kind

Three of a kind means 3 cards of the same rank, plus 2 unmatched cards. The probability is:

P_{3\_of\_kind} = \frac{13 \cdot C(4, 3) \cdot C(12, 2) \cdot 4^2}{C(52, 5)} \approx 2.1129\%

OK, so there are C(4, 3) possible ways of selecting 3 cards from 4 of the same rank, and we have to multiply this with the number of ranks. For the remaining 2 cards there are C(12, 2) possible ways of selecting the rank so it would not match the rank of the first 3 cards, and 4^2 combinations available for those 2 cards.

Two pair

Two pair means two cards of the same rank, plus two cards of another rank (that match each other but not the first pair), plus one unmatched card. The probability is:

P_{2\_pair} = \frac{C(13, 2) \cdot C(4, 2) \cdot C(4, 2) \cdot 11 \cdot 4}{C(52, 5)} \approx 4.7539\%

We have C(13, 2) possible ways of selecting 2 different ranks from the 13 available, multiplied twice with C(4, 2) which is the number of possible ways of selecting 3 cards from 4 of the same rank. For the remaining card we have only 11 suits available and 4 possible cards per suit.

One pair

One pair means two cards of the same rank, plus three other unmatched cards. The probability is:

P_{2\_pair} = \frac{13 \cdot C(4, 2) \cdot C(12, 3) \cdot 4^3}{C(52, 5)} \approx 42.2569\%

We have C(4, 2) possible ways of selecting 2 cards from 4 of the same rank multiplied with the number of ranks. For the remaining 3 cards we have C(12, 3) possible ways of selecting a different rank and 4^3 combinations available for those 3 cards.

For a final overview, the below pie chart shows all the probabilities discussed previously (all of them are in the chart, just that some of them are really small).


Gambler’s ruin

2009-12-05

Let’s suppose you play with an opponent the following game: if you win you get 1 dollar, if you lose you lose 1 dollar. The probability to win the game is p (and conversely the probability to lose the game is q = 1 – p), you start with i dollars and your opponent with N – i dollars (N is the total sum in the game). What is the probability to win the whole sum of money, before you go broke? Or in scientific terms we’re looking for the probability to win everything P(N|i), that is the probability to win N dollars conditioned by starting with i dollars. After a not so complicated proof the result is:

P(N|i) = \begin{cases} \frac{1 - \rho^i}{1 - \rho^N} & p \neq q, (\rho = \frac{q}{p})\\ \frac{i}{N} & p = q = 0.5 \end{cases}

This problem is known as gambler’s ruin and was first proposed by Christiaan Huygens. Let’s plot the above result and see if we can draw some conclusions from this problem.

First, in the case of a perfectly fair game (p = q = 0.5, meaning no player has an advantage) the probability to win the total amount is dependent only on how much money you start with. In other words the one with the most money, has a bigger probability to win. Or if you prefer the corollary, if you want all the money in a game try to start the game with most of them in your pocket.

If the game is not perfectly fair it gets even worse. If N = 50 dollars and you start with 20 dollars, the probability to win everything drops to below 0.1 if the probability of winning one game is just slightly against you (0.48). The probability to win everything decreases slowly after that (if the probability to win a game goes below 0.48) but that just shows that a small difference is all it takes, you don’t need to cheat to badly to get an effect.

So the lesson is simple. If you want to maximize the probability to win all the money in a series of games, try to start with (much) more money than your opponent(s) and try to have a bigger chance of winning the game than your opponent(s). This is the basic principle why professional gambling establishments cannot lose since they will make sure that both of the above conditions are met for them.

Annex

To find P(N|i) let’s draw a Markov chain. This is basically a graphical representation of the gaming process as a series of states (characterized by the amount of money you have won i). The future state is determined only by the present state and the probabilities p and q. The states 0 and N are special end states, meaning if you reach them the game has ended. Also in the 0 and N states we have the boundary conditions P(N|0) = 0 (you can’t win the game if you have 0 dollars) and P(N|N) = 1 (you have won the game if you have N dollars).

For any state i except 0 and N we have probability p to go to state i + 1 (green transition) and probability q to go to state i – 1 (red transition). So we can write the following:

P(N|i) = qP(N|i-1) + pP(N|i+1)

P_i = qP_{i-1} + pP_{i+1}

(p+q)P_i = qP_{i-1} + pP_{i+1}

p(P_{i+1} - P_i) = q(P_i - P_{i-1})

P_{i+1} - P_i = \frac{q}{p}(P_i - P_{i-1}) = \rho(P_i - P_{i-1}) \quad (1)

We know that P(N|0) = 0 and by writing equation (1) for each transition:

P_2 - P_1 = \rho P_1

P_3 - P_2 = \rho (P_2 - P_1) = \rho^2 P_1

\cdots

P_i - P_{i-1} = \rho^{i-1} P_1

Adding all the above:

P_i - P_1 = (\rho + \rho^2 + \cdots + \rho^{i-1}) P_1

P_i = (1 + \rho + \rho^2 + \cdots + \rho^{i-1}) P_1

P_i = P_1 \frac{1 - \rho^i}{1 - \rho} \quad (2)

Writing the other boundary condition P(N|N) = 1 for equation (2) we get:

P_N = P_1 \frac{1 - \rho^N}{1 - \rho} = 1

P_1 = \frac{1 - \rho}{1 - \rho^N} \quad (3)

Replacing (2) into (3) we get the relation we were looking for:

P_i = \frac{1 - \rho^i}{1 - \rho^N}

Of course this result is not valid for ρ = 1, but this case is easily demonstrated by replacing ρ into equation (2), where we get:

P_i = iP_1

In this case the boundary condition becomes:

P_N = NP_1 = 1

And the result for ρ = 1 (p = q = 0.5) is:

P_i = \frac{i}{N}


Two dice can be tricky

2009-11-26

In the 17th century French salons, one of the favourite pastimes was gambling. One of the essayists of those times, Chevalier de Méré was playing the following game: roll a single die 4 times and bet of getting a 6. His reasoning was the following: if the probability of getting 6 for one roll is 1/6, then for 4 rolls it would be 4/6, so in the long run he would win more than he would lose.

When nobody wanted to play with him anymore because he was winning, he changed his game a little: roll 2 dice 24 times and bet on getting a double 6. I this case his reasoning was: the chance of getting a double 6 for one roll (of 2 dice) is 1/36. So by throwing 24 time the probability would be 24/36 = 4/6 so he would still win (although it would take a longer time) but the game being different he could convince people to play again with him. To his total surprise he started to lose in this game. Since he could not explain this he asked one of his friends, Blaise Pascal to help him with the mathematics.

So let’s see the correct way to estimate the probabilities in this (simple) case.

Game 1

First of all it is not correct to add probabilities in this case. If we have 2 events A and B then:

P(A \text{ or } B) = P(A) + P(B) - P(A \text { and } B)

P(A and B) is 0 only if the events are mutually exclusive, that is there is no basic outcome that is common for A and B. In our case if event A is: get 6 on the first roll and event B is: get 6 on the second roll, there is a basic outcome: get 6 on the first roll and get 6 on the second roll, which is common to both A and B, so that means we cannot ignore P(A and B) in the formula above. This reasoning obviously applies to the remaining die rolls.

The correct way to calculate the probability is the following:

P(\text {getting at least one 6 in 4 die rolls}) =
\text{  }1 - P(\text{not getting 6 on any roll})

where

P(\text{not getting 6 on any die roll}) =
P(\text{(not getting 6 on the first roll) AND}
\text{(not getting 6 on the second roll) AND ... so on})

For 2 events:

P(A \text{ and } B) = P(A)P(B)

only if the 2 events are independent that is A happening has no influence over B happening. In our case that means that getting 6 on the first roll has no influence over getting 6 on the second roll which is true.

So the correct probability for this game is:

P(win) = 1 - \left(\frac{5}{6}\right) \left(\frac{5}{6}\right) \left(\frac{5}{6}\right) \left(\frac{5}{6}\right)= 1 - \left(\frac{5}{6}\right)^4 = 0.51775

Thus, Chevalier de Méré had almost 52% chance of winning (instead of his estimate of 66.6%) so we can say he was lucky in his probability estimation.

The expected value is a useful measure of the winnings/losses in the long run. Assuming that each participant is paying 1 dollar per game the expected value is:

E = 2 \cdot P(win) + (-2) \cdot P(lose) = 0.071

Game 2

As seen from the previous game, in this case the correct probability is:

P(win) = 1 - \left(\frac{35}{36}\right)^{24} = 0.4914

and the expected value:

E = 2 \cdot P(win) + (-2) \cdot P(lose) = -0.035

As indicated by practice (presumably) Chevalier de Méré would lose on this second game. If there’s a lesson we can learn from this is don’t estimate probability if you don’t know the rules, you will get it wrong even for simple 2 dice games.