The Normal Distribution and the Central Limit Theorem

Ruby Nixson

What do the length of a carrot, the height of a human female, and the number of times you check your phone per day have in common? Anyone would be forgiven for thinking these 3 things are completely unrelated, as they seem like pretty random measurements, but actually, they are all connected by something called the ‘normal distribution’. So, why is it that these seemingly unrelated events can be modelled using the same distribution? The answer lies in one of the most useful theorems in probability theory, the Central Limit Theorem.

But first, let’s delve a little deeper into the normal distribution…

To fully define a normal distribution for a given set of data (e.g. the length of a carrot), we need to know two things: its mean and variance. The mean is the ‘average’ value of the data, and on a plot of the distribution, it corresponds to the value on the x-axis at the peak. The variance is a measure of how far the data is spread out from the mean – so a distribution with a higher variance will appear to have a wider, flatter peak when seen on a graph, and a distribution with a small variance will have a narrow, taller peak. A more familiar name for a normal distribution curve is a bell curve, which you might remember from school, but don’t worry if not as some examples are given below to remind you.

Here:

  • Red: mean = 0, variance = 1
  • Green: mean = 0, variance = 3
  • Purple: mean = 1, variance = 2

The green curve has the largest variance, so we see it is the widest and flattest of the curves. The green and red both have mean = 0, so their peaks correspond to zero on the x-axis, whereas the purple curve has a mean of 1, so its peak is at 1 on the x-axis.

The standard normal distribution has a mean of 0 and a variance of 1, and is the red plot above. By suitable operations (subtracting the mean and dividing by the square root of the variance), we can in fact write any other normal distribution in terms of the standard one. This is great news for mathematicians as the standard normal is often much easier to work with, and everything we know about it can be extended to any other normal distribution using the appropriate scaling. This is a classic maths trick – turn something you don’t know about into something you do, and work backwards to get what you want! 

The height of male and female humans are often both modelled to be normally distributed. The figures below show the shape for female heights and male heights (both measured in inches).

The right-hand figure gives the plot for female heights, and has a mean of 64 inches, and the left-hand figure gives the plot for male heights, and has a mean of 70 inches. Both figures have the same variance, so they are the same shape. It’s important to remember that these properties are specific to the data used to make these figures – if a different group of people were used instead, the mean and variance might differ from those seen here.

The normal distribution is especially useful in probability and statistics because it is used to define several other distributions. For example, if a random variable X (which might represent a height or length) is normally distributed, then X2 is said to have what is called a ‘chi-squared’ distribution. We also define the ‘t-distribution’, which is often used by economists, by taking a standard normal distribution and then dividing by the square-root of a chi-squared distribution. 

Normally distributed random variables behave pretty well under independence too. Two random variables are independent if the outcome of one has no effect on the outcome of the other. To see this, think about rolling a die twice – the value of the first roll has no effect on the value of the second, so the rolls are independent (the concept of independence is also explored in the Gambler’s Ruin article here). When two normally distributed random variables are independent, their sum is also normally distributed. This isn’t true for many distributions, making the normal distribution an ideal distribution to be working with.

So, in summary, we’ve got this well-behaved, mathematically useful, easy-to-use distribution, so surely only a small selection of things have this distribution, right? Once again the answer lies in the Central Limit Theorem…

The Central Limit Theorem (CLT) allows mathematicians to relate other distributions to the normal distribution, provided certain conditions hold. This means all of the nice properties we talked about above can be used. To use the theorem, we first need a sequence of random variables. A variable is something we can use a value for, like height, so we might take ‘height of person 1’, ‘height of person 2’, and so on, to be the first and second variables in the sequence. We also need the variables to all be independent of each other (like with the dice example above). Finally, we need all of the variables to have the same mean and variance. If these conditions hold, then when we add up all the variables, we will get a value which is approximately normally-distributed. The approximation becomes more and more accurate as the number of variables increases, where in statistics, 30 is usually deemed large enough for an accurate distribution. The mathematical definition is given as follows:

or equivalently

where:

  • The Xi are the independent variables
  • μ is the mean of the variables
  • n is the number of variables
  • σ2 is the variance of the variables
  • Zn is a standard normally-distributed variable
  • Yn is a normally-distributed variable with mean = n*μ, variance  = n*σ2.

To really get to grips with this, let’s go back to the die rolling example and check the conditions for the CLT for 100 rolls:

  • Each roll is independent of the others – yes.
  • We assume that on each roll, each number is equally likely to come up (with probability ⅙) – so each roll has the same mean and variance.

All of the conditions are therefore satisfied and so the CLT will work! The total score of the 100 rolls will follow an approximate normal distribution.

A similar experiment features in Tom’s Monopoly video. Here, two dice were rolled each time and the sum of the values on the two dice (from 2 to 12) recorded. This was repeated to get 105 values. Using the data from the video, the mean score per turn was 7.095, and the variance was 2.236. So if we were to add up the score from each of the 105 rolls, the CLT says that this final sum should have a normal distribution with:

  • Mean = 7.095 x 105 = 745.0
  • Variance = 2.236 x 105 = 234.8

We have 105 independent variables here, so we expect this to be quite an accurate distribution. Plotting this gives the following shape:

Looking at the graph, we can conclude that we would expect the total of 105 die rolls to lie somewhere between 700 and 790, with a probability very close to 1. This of course doesn’t mean other values are impossible, just that they are highly improbable.

For another example, consider the number of times you pick up your phone every day. We can divide this into the sum of the number of times you pick up your phone per hour, and assuming that the hourly rate has a constant mean and variance, and each hourly rate is independent of the others, this sum will also have an approximate normal distribution by the CLT.

Using data from a typical day, I picked up my phone an average of 7.4375 times per hour (I classed a day as 16 hours, which is the time I would be awake to use the phone). The variance of the hourly rate is 7.4625.

HourNo. of times
110
28
39
47
54
69
78
84
95
1013
117
126
136
149
1511
163

If we assume that the mean and variance are constant across each of the 16 hours in my day, the formula for the CLT suggests that the total number of times I pick up my phone has an approximate normal distribution with mean 119 (i.e. 16 x 7.4375), and variance 119.4 (i.e. 16 x 7.4625), giving a plot looking like:

Here, the graph suggests that I pick up my phone 119 times per day, on average. But, remember this graph was created using only one day of data. If I had instead picked a different day to take the data, the average would likely be different. The interesting thing here is the variance. It suggests that on subsequent days, I am likely to pick up my phone somewhere between 90 and 150 times per day, which is actually a pretty accurate prediction (based on my recollection).

Have a think for yourself about what else could be normally distributed – maybe even conduct your own experiment and let us know! As we’ve seen in this article, some things are normally distributed, because they naturally are (eg. height, carrot length), but in other cases we can approximate distributions that we don’t know much about with a normal distribution. All thanks to the Central Limit Theorem!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s