Conditional probability

Conditional probability is the probability of an event A, given that some other event B occurred. There is no implicit causal or temporal relationship between the two events. In most of the practical examples, there is such a relationship, but this is not necessary for the mathematical relations described here.

Examplesource: A car is produced in 2 models: Model 1 and Model 2. Each model is produced in 2 designs: hatchback and wagon. Off all the cars produced, 30% are Model 1, and 20% are wagons. But 50% of Model 1 are wagons.

Let’s see this example in a Venn diagram.

I know Venn diagrams are usually made with circles, but I drew it with rectangles since it’s easier to represent the areas exactly. This diagram illustrates the basic rules of conditional probability:

P(A | B) = \frac{P(A \cap B)}{P(B)}, \text{ where } P(B) \neq 0 \quad (1)


P(B | A) = \frac{P(A \cap B)}{P(A)}, \text{ where } P(A) \neq 0 \quad (2)

Independent and mutually exclusive events

The event A is said to be independent of B iff:

P(A | B) = P(A)

in other words, if the fact that B occurs (or not) has no effect of the probability of A. Now, are two mutually exclusive events (events that have no basic outcomes in common) also independent? Two mutually exclusive events are actually dependent of each other, since the occurrence of one of them means the second event cannot happen.

If events A and B are independent it follows that events A and B’ (the complement of B); A’ (the complement of A) and B; and A’ and B’ are independent also. A Venn diagram of 2 mutually exclusive events A and B has no intersection between them, while the diagram of 2 independent events A and B will have the following relation P(A | B) = P(A) = P(A | B'). This means – according to equation (1) – that the ratio of A \cap B and B is equal with the ratio of A \cap B' and B' and equal with the proportion of A in the whole sample space.

Bayes’ theorem

If we equal P(A \cap B) in equations (1) and (2) above, we get Bayes’ theorem:

P(B | A) = \frac{P(A | B) \cdot P(B)}{P(A)}

Confusion of the inverse

Confusion of the inverse is the assumption that P(A|B) is the same as P(B|A). The classical example for this is the assumption that if you’re tested positive for a particular disease that means you have that disease. Let’s try to put some numbers on this example. Usually we know the following:

  • P(ill) – the probability that a specific population has the disease. This is usually determined through statistics by recording the number of known cases of the disease. You can probably find this number from your favorite statistics agency.
  • P(positive|not ill) – the false positive rate for the test (type I error). This is the probability that the test will be positive for a person that does not have the disease. This value is usually denoted by \alpha . Conversely P(negative|not ill) = 1 - \alpha
  • P(negative | ill) – the false negative rate for the test (type II error). This is the probability that the test will be negative for a person that really has the disease. This value is usually denoted by \beta . Conversely P(positive|ill) = 1 - \beta

Let’s apply Bayes’ theorem now:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{positive})}

The people for which the test is positive can be split in ill or not ill. Since (being ill and testing positive) is mutually exclusive with (not being ill and testing positive) we can write:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{ill } \cap \text{ positive}) + P(\text{not ill } \cap \text{ positive})}

and then by using equation (2) we get:

P(\text{ill} | \text{positive}) = \frac{P(\text{positive} | \text{ill}) \cdot P(\text{ill})}{P(\text{positive} | \text{ill}) \cdot P(\text{ill}) + P(\text{positive} | \text{not ill}) \cdot P(\text{not ill}))}

P(\text{ill} | \text{positive}) = \frac{(1 - \beta)P(\text{ill})}{(1 - \beta)P(\text{ill}) + \alpha(1 - P(\text{ill}))} \quad (3)

which is obviously different from P(positive|ill) = 1 - \beta .

How different? Well for any reasonable good test we should expect \alpha and \beta to be really small. So below you can see 2 graphs (3D because we have 2 independent variables \alpha < 0.1 and \beta < 0.1 ) for 2 different values of P(ill).

While P(positive|ill) depends only on the parameters of the test, P(ill|positive) depends on the parameters of the test and also on P(ill). The difference between the two is larger when P(ill) is smaller, meaning if the disease is rare, it is less probable that one test will tell you that you are sick (unless it has amazingly low false positive and false negative rates).

Conditional probabilities are easy to guesstimate and difficult to calculate. And remember that usually in this case guesstimate is more of a wild guess based on your emotions at that particular moment then an estimate. When dealing with conditional probabilities it is better to think, calculate them, calculate them again and then look for an independent confirmation just to be sure that you’ve done it right.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: