Let’s set the scene. You’re driving alone at night along an empty road through the woods. It’s rumoured to be haunted, but you’re not particularly scared – you don’t believe in ghosts anyway. Then a spectral figure appears directly in front of you. You slam on the breaks, but as soon as you come to a stop the ghostly image has disappeared. Was it just your brain playing tricks on you? Or are the rumours true?
Even though you might still not say that ghosts are definitely real, you’d probably believe that the chance of ghosts existing is a little higher now than it was before. This is Bayes’ Theorem in action.
When you find new evidence that either supports or opposes your current view, you may update your beliefs using this new data. Mathematically, this can be represented as what we call a ‘conditional probability’.
Let’s take the familiar example of rolling a fair die: the probability of rolling a six is ⅙, as there are six numbers, and all are equally likely to be rolled. The probability of rolling an even number is ½, as there are 3 even numbers of the six total. In mathematical notation, this is written as
P(6) = 1/6,
P(even number) = 1/2.
So far so familiar, so let’s make things a little trickier. Suppose we want to work out the conditional probability of rolling a six, given the number rolled is even. This just means what is the probability of rolling a six when we already know that we have definitely rolled an even number. Now, there are 3 possible even numbers to roll, one of which is 6. As all the numbers are equally likely there is a ⅓ chance of the even number rolled being 6.
On the other hand, if we reverse the statement, then the conditional probability of rolling an even number given we know the number rolled is a 6 must be 1, since 6 is an even number.
What we’ve actually calculated here is the general formula for Bayes’ Theorem.
This might look like very strange notation, but in probability the ‘|’ symbol just means ‘given’, which means P(A|B) is the probability of A happening given B has already happened. So overall, this equation states that the probability of A given B is equal to the probability of B given A, times the probability of A, and all divided by the probability of B.
That’s quite a mouthful, so let’s again use our dice as an example.
Not so bad after all!
The dice example hopefully makes sense, but Bayes’ theorem can often be counterintuitive. Consider the following:
“Steve has been described by a neighbour as shy, withdrawn and invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”
Is Steve more likely to be a librarian or a farmer?
The answer is, perhaps surprisingly, a farmer. At the time of the study, there were 20 times as many farmers in the US as librarians. Even if you thought 90 out of every 100 librarians fit the description of Steve, and only 10 in every 100 farmers fit the description, it is still more likely that Steve is a farmer. Let’s break it down by drawing a diagram:
There are 20 farmers for every 1 librarian. Therefore, if we take a sample of 210 farmers and librarians, there will be 10 librarians and 200 farmers. 9 out of the 10 librarians will fit Steve’s description, and 20 out of the 200 farmers will fit Steve’s description. There are more farmers who fit the description of Steve, so therefore it is more likely that Steve is a farmer, despite only 10% of farmers matching his description.
This is not about how correct our stereotypes of farmers and librarians are. Instead, it shows that many people don’t even think to ask how many farmers there are in comparison to librarians, which as you can see, will change the result dramatically. In other words, the information available to you is key in your decision making.
As I’m assuming you don’t normally spend your time guessing the jobs of people named Steve, let’s look at some more practical applications of Bayes’ Theorem.
When insurers calculate your insurance policy and your premium costs, they consider the conditional probability of an accident happening given your personal risk factors. This is why they ask for your age and your job amongst other things, because these will increase or lower your risk. Young drivers are most likely to have an accident in the first year of driving, so if you’re a new driver, your insurance may cost more than your car does!
As if dealing with your insurance company after an incident isn’t bad enough, usually your premiums then become more expensive, because you’re more likely to have another accident again. Similarly, if your house has been flooded, your home insurance policy will become more expensive, because the new probability of your house flooding given it has been flooded before is much higher.
You can thank Bayes for that…
You may have heard the phrase “lightning never strikes the same place twice”. Clearly, this is rubbish – the Empire State Building is struck by lightning between 25 and 100 times per year! If it’s a tall, conducting object like the Empire State Building is struck, it’s very likely that it will be hit again because it’s a good target. This also can be understood with Bayes’ Theorem.
Another very common use of Bayes’ Theorem is when filtering your email inbox. Most email accounts will filter out spam or junk email automatically, but how do they decide whether an email is spam?
Without Bayes’ Theorem, filters would mark any email containing a specific set of words as spam. This could mean that all emails containing the words ‘Nigerian Prince’ and ‘bank details’ are automatically rejected as spam. Unfortunately, if you are African royalty, some non-spam emails will also be rejected, because they contain the spam words the filter checks for.
Let’s pretend you’re an email spam filter and you receive the following emails. Which ones are spam, and which ones are real? (Answers at the bottom of the page)
Without realising, you were probably just using Bayes’ Theorem.
The email filter (you) calculates the probability of each email being spam given the words it contains. Each ‘spam’ word on its own will not lead to the email being rejected, but if there are multiple spam words, then the probability of the email being spam is very high, and the filter will mark it as so. To avoid as many of your important emails being lost as possible, often the filter requires the probability of the email being spam to be well over 0.99 (so the filter is 99% certain it is indeed spam).
However, this of course means that if a spam email contains a lot of ‘normal’ text it may not be filtered out. The filter can be improved when you mark an email as spam or non-spam manually. It will work out which words are common in your spam emails, and which are not, so can change how it calculates the probability of an email being spam. So don’t worry, your holiday romance with a Nigerian prince is safe from interference!
A final example (and one of my favourites) of Bayes’ Theorem is from the Second World War.
In WWI and WWII, the Nazis used the Enigma Machine to code all their messages. Each letter in the message was changed to another letter in the coded message, but each time the same letter was pressed a different letter would be used in the code. For example, GOOD MORNING could become LHJE BTSXMOKF. The only flaw in this was that a letter would never be coded to itself. T could become any letter in the alphabet, except T and so on.
This created a coded message that was incredibly complicated to crack without the Enigma Machine itself. To make things even harder, the machine settings were changed every day, so at the stroke of midnight, all the previous work was worthless (a sort of Cinderella for mathematicians). For every sequence of letters there is an almost infinite number of possible translations, and testing all of these would take weeks, by which time the code would have changed. A quick and reliable method of cracking the code was required by the British and their allies to decipher these messages.
An A-Team of code breakers were gathered together in the now-famous Bletchley Park. People from all over the country were called up in total secrecy, including mathematicians, linguists, and even winners of a nationwide cryptic crossword competition! So those puzzle games on your phone might actually be making you a better code breaker…
This team, led by Alan Turing, suggested only the most likely possible translations should be tested. For example, every morning at 6am a weather report was broadcast. Wetterbericht is the German word for weather report. The code breakers could place the word wetterbericht underneath the coded message, and if any letters were the same, they knew they had matched up the word to the code incorrectly. Let’s consider the following example:
Here you can see that only the second matching can be correct, as in the top and bottom row there are matching letters.
This decoded part of the message (known as a crib) can then be used to work out the rest of the message. The probability of this deciphered message being correct is much higher, because the codebreakers only used phrases that were very likely to appear in the message in the first place. From this knowledge of the code, the working of the enigma machine could be discovered and recreated, allowing Nazi messages to be reliably and quickly decoded. Who could have known that such a small equation could save so many lives?
The work at Bletchley Park was so vital to the war effort, it was kept a secret until the 1970s, and even then the full story wasn’t known until the 1990s. Two papers written by Alan Turing at the time were only released in 2012 – 67 years after the Second World War ended.
Understanding Bayes’ Theorem was crucial to the eventual defeat of the Nazis almost 75 years ago, but there are still many cases today where conditional probabilities applied incorrectly may have life-threatening consequences. As we shall see in the rest of this series of articles, this seemingly simple equation could be the difference between life in prison and walking free, and can help to explain why accurate tests for diseases are so vital to stop outbreaks.
Answers: All 3 emails are spam, but 1 and 3 are particularly dangerous phishing emails – emails that try to get you to give up personal data and passwords. How could you specifically filter out phishing emails from the spam?
Article 2: Bayes’ Theorem and Disease Testing
Article 3: The Prosecutor’s Fallacy