An Intuitive (and Short) Explanation of Bayes’ Theorem

Bayes’ theorem was the subject of a detailed article. The essay is good, but over 15,000 words long — here’s the condensed version for Bayesian newcomers like myself:

Bayes’ theorem converts the results from your test into the real probability of the event. For example, you can:

Anatomy of a Test

The article describes a cancer testing scenario:

Put in a table, the probabilities look like this:

bayes table

How do we read it?

How Accurate Is The Test?

Now suppose you get a positive test result. What are the chances you have cancer? 80%? 99%? 1%?

Here’s how I think about it:

The table looks like this:

bayes table computed

And what was the question again? Oh yes: what’s the chance we really have cancer if we get a positive result. The chance of an event is the number of ways it could happen given all possible outcomes:

\displaystyle< \text<Probability></p>
<p> = \frac>> >

The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 9.6% of the time (quite high), so there will be many false positives in a given population. For a rare disease, most of the positive test results will be wrong.

Let’s test our intuition by drawing a conclusion from simply eyeballing the table. If you take 100 people, only 1 person will have cancer (1%), and they’re most likely going to test positive (80% chance). Of the 99 remaining people, about 10% will test positive, so we’ll get roughly 10 false positives. Considering all the positive tests, just 1 in 11 is correct, so there’s a 1/11 chance of having cancer given a positive test. The real number is 7.8% (closer to 1/13, computed above), but we found a reasonable estimate without a calculator.

Bayes’ Theorem

We can turn the process above into an equation, which is Bayes’ Theorem. It lets you take the test results and correct for the “skew” introduced by false positives. You get the real chance of having the event. Here’s the equation:

bayes theorem colorized equation

And here’s the decoder key to read it:

Try it with any number:

It all comes down to the chance of a true positive divided by the chance of any positive. We can simplify the equation to:

\displaystyle<\Pr(\mathrm</p>
<p>|\mathrm) = \frac<\Pr(\mathrm|\mathrm)\Pr(\mathrm)><\Pr(\mathrm)>>

Pr(E) tells us the chance of getting any positive result, whether a true positive in the cancer population (1%) or a false positive in the non-cancer population (99%). In acts like a weighting factor, adjusting the odds towards the more likely outcome.

Forgetting to account for false positives is what makes the low 7.8% chance of cancer (given a positive test) seem counter-intuitive. Thank you, normalizing constant, for setting us straight!

Intuitive Understanding: Shine The Light

The article mentions an intuitive understanding about shining a light through your real population and getting a test population. The analogy makes sense, but it takes a few thousand words to get there :).

Consider a real population. You do some tests which “shines light” through that real population and creates some test results. If the light is completely accurate, the test probabilities and real probabilities match up. Everyone who tests positive is actually “positive”. Everyone who tests negative is actually “negative”.

But this is the real world. Tests go wrong. Sometimes the people who have cancer don’t show up in the tests, and the other way around.

Bayes’ Theorem lets us look at the skewed test results and correct for errors, recreating the original population and finding the real chance of a true positive result.

Bayesian Spam Filtering

One clever application of Bayes’ Theorem is in spam filtering. We have

Plugged into a more readable formula (from Wikipedia):

\displaystyle<\Pr(\mathrm</p>
<p>|\mathrm) = \frac<\Pr(\mathrm|\mathrm)\Pr(\mathrm)><\Pr(\mathrm)>>

Bayesian filtering allows us to predict the chance a message is really spam given the “test results” (the presence of certain words). Clearly, words like “viagra” have a higher chance of appearing in spam messages than in normal ones.

Spam filtering based on a blacklist is flawed — it’s too restrictive and false positives are too great. But Bayesian filtering gives us a middle ground — we use probabilities. As we analyze the words in a message, we can compute the chance it is spam (rather than making a yes/no decision). If a message has a 99.9% chance of being spam, it probably is. As the filter gets trained with more and more messages, it updates the probabilities that certain words lead to spam messages. Advanced Bayesian filters can examine multiple words in a row, as another data point.

Further Reading

There’s a lot being said about Bayes:

Other Posts In This Series

  1. A Brief Introduction to Probability & Statistics
  2. An Intuitive (and Short) Explanation of Bayes' Theorem
  3. Understanding Bayes Theorem With Ratios
  4. Understanding the Monty Hall Problem
  5. How To Analyze Data Using the Average
  6. Understanding the Birthday Paradox