Oxbridge admission question: how many paths are there between opposite corners of a cube?

This is one of my favourite Maths questions. I encountered it among a list of past interview questions for Oxbridge admissions: how can we get from point A to point B using the edges of the cube, without going over the same edge twice?

pic1

I encourage you to try it yourself first, as getting to the solution gave me another glimpse at the beauty of Maths problem solving. And who knows? Maybe you’ll get to a different solution than the one below…

Now, let’s take a look at my process for coming to a solution. I often look at beautiful solutions and think to myself ‘how on Earth did they come up with that?’ So, I’ll show you my exact steps.

First, we need a way of representing the road we take from A to B. Since we can represent an edge by stating its two endpoints, let’s try to label all the vertices of the cube. The natural way to do that is with the standard 3D Cartesian coordinates.

pic2

As you can see, A is chosen to be the origin of the coordinate system. Also, for ease of notation B was chosen to be the point (1, 1, 1), which can be done without changing the problem (all we are doing is re-scaling the cube to have side length 1 which doesn’t affect the number of paths).

It was at this point my friend remarked that the cube was too hard to visualise on paper, which made me realise that we don’t actually have to visualise it in 3D. The only information we need is how the points are connected together – ie. where we can move to from a given point.

So, picking a random point say (1, 0, 1), where can we move? Looking at the diagram you can see that there are three points connected to this point: (1, 1, 1), (1, 0, 0) and (0, 0, 1). From here it is not too hard to deduce that any two points are connected by an edge if their coordinates are different in exactly one place (only in x, y or z from (x, y, z)). And we can use this to show the lines in a 2D way as follows:

pic3

Next, we need some logic for the way we arrange these points in a diagram. In other words, we need more symmetry in the picture. A cube is very symmetric (48 symmetries in total – can you work them all out?) and so our 2D representation should ideally be more symmetric. This was my second attempt:

pic4

…we can do better than that. The central part in the next one is less jumbled.

pic5

Now we’re getting somewhere. Do you notice something in the middle? The path in blue looks like a crumpled hexagon – what would happen if we unravelled it?

pic6

Much better. I’d say we’re almost at the point where we can start counting the ways to get from A to B, we just have a couple of little problems to iron out…

First, B is not symmetrical in the picture. Sure, you could say this is only an aesthetic thing, but symmetry in Maths often helps us to solve things more easily so let’s try to fix it. One idea would be to add 3 ‘B’s into the picture – one for each of the nodes connected to it. But, thinking about this a little deeper, we see that because the path has to end at B – and once we are there we cannot go back – then in fact B does not need to be connected to the rest of the picture. Instead, we can just draw some EXIT signs at the nodes connected to B. Once you reach one of these points, you have two choices: either “exit” the picture through a B (only one way to do that: move to B from that point), or continue along the hexagon. This observation will be very helpful when it comes to counting paths.

Second, we are not quite sure where to go from A. There is no obvious choice for a starting move. From A to… which of the three? Thankfully, due to the symmetry of the picture (and of the problem, of course), we can simply count the ways to get from A to B with the first move being ‘A to (0, 0, 1)’ and then multiply by 3 to get the result. So, let’s do exactly that!

Given our new-found knowledge, we can make very useful simplifications to the picture. Firstly, the line from A to (0, 0, 1) can be replaced simply by an ENTRY sign at (0, 0, 1), since that is our starting move (which we do in exactly one way). Secondly, since we cannot use that first line anymore, we can simply remove it.

pic7

Now things are really starting to take shape. In this form we see that the line through the point A is equivalent to just going from (1, 0, 0) to (0, 1, 0) or the reverse (once we are in A, we cannot go back to where we came from), and so we can simply delete the point A. Also, for neatness, let’s rotate the whole diagram 60 degrees clockwise.

pic8

Perfect! I’d say we are now in a position to start counting paths – remembering that we have to multiply by 3 at the end because of the symmetry of the first move. In fact, looking at our much simplified – and very symmetric – diagram, it becomes apparent that we can also count all of the paths that start by going from the ENTRY POINT to (1, 0, 1) and then multiply by 2 at the end. I told you symmetry was going to be helpful!

The following picture should help with the counting:

pic9

It perhaps looks a little complicated so let’s break it down. Following all the arrows, or doing it on your own, you can hopefully see that by going through an edge once and starting as decided above (ENTRY to (1,0,1)), we have the following:

  • one way to get to the (1, 0, 1) EXIT
  • two ways to get to the (1, 1, 0) EXIT (ah, that pesky purple arrow!)
  • two ways to get to the (0, 1, 1) EXIT (either only on the hexagon, or using the shortcut).

In total, with the chosen start, we have 5 ways to get from A to B. But, don’t forget we need to multiply due to the symmetries that we’ve used to simplify the problem. First, multiply by 2 (we can go to (0, 1, 1) instead of (1, 0, 1) starting from (0, 0, 1)) and get 10 ways. And then multiply by 3, the number of ways to choose an ENTRY point. Thus, the actual total is 30.

ANSWER: 30 Ways to get from A to B!

Whilst we take a well-deserved moment to pat ourselves on the back for getting the right answer, let’s look at some things to take away from this problem.

  1. Even if you are asked about a cube in a problem, you mustn’t get stuck on the idea of a cube. Maths is flexible that way. You can model your cube into any other structure that keeps the important elements in place. In our case, only the ways in which the vertices were connected mattered, not the angles between the lines or anything. Thus, a graph was a fairly good choice.
  2. You can sometimes make use of the symmetry of a problem to make your life easier, as we did with the way we counted the total number of paths. Symmetry plays a very important role in Mathematics, so keep an eye out for it.

I hope you enjoyed my solution and best of luck with your own Maths problems! If you want to discuss something, leave a comment below – this place is always open to interesting solutions or remarks.

The original version of this article can be found here.

Vlad Tuchilus

What’s your type? The Maths behind the ‘Qwerty’ Keyboard

The maths behind the most unshakeable technology of the 20th century

Martha Bozic

In 1867, a newspaper editor in Milwaukee contemplated a new kind of technology. He had previously patented a device which could be used to number the pages of books but, inspired by the suggestions of fellow inventors, he decided to develop it further. The idea itself wasn’t exactly new – it had been echoing around the scientific community for over 100 years. The challenge was to realise it in a way that was commercially viable, and Christopher Latham Sholes was ready.

His first design, in 1868, resembled something like a piano. Two rows of keys were arranged alphabetically in front of a large wooden box. It was not a success. Then, after almost 10 years of trial and error came something much more familiar. It had the foot-pedal of a sewing machine and most of the remaining mechanism was hidden by a bulky casing, but at the front were four rows of numbers, letters and punctuation… and a spacebar.

Surprisingly little is certain about why he chose to lay it out as he did, probably because to Sholes the layout was no more than a side-effect of the machine he was trying to create. But as the most influential component of the typewriter, the qwerty keyboard has attracted debates about its origin, its design and whether it is even fit for purpose. Without historical references, most arguments have centred on statistical evidence, jostling for the best compromise between the statistical properties of our language and the way that we type. More recently, questions have been posed about how generic ‘the way that we type’ actually is. Can it be generalised to design the perfect keyboard, or could it be unique enough to personally identify each and every one of us?

The first typewriter was designed for hunt-and-peck operation as opposed to touch typing.  In other words, the user was expected to search for each letter sequentially, rather than tapping out sentences using muscle-memory. Each of the 44 keys was connected to a long metal typebar which ended with an embossed character corresponding to the one on the key. The typebars themselves were liable to jam, leading to the commonly disputed myth that the qwerty arrangement was an effort to separate frequently used keys.

Throughout the 20th century new inventors claimed to have created better, more efficient keyboards, usually presenting a long list of reasons why their new design was superior. The most long-lasting of these was the Dvorak Simplified Keyboard, but other challengers arrived in a steady stream from 1909, four years after qwerty was established as the international standard.

Is it possible that there was a method behind the original arrangement of the keys? It really depends who you ask. The typebars themselves fell into a circular type-basket in a slightly different order to the one visible on the keyboard. Defining adjacency as two typebars which are immediately next to each other, the problem of separating them so that no two will jam is similar to sitting 44 guests around a circular dinner table randomly and hoping that no one is seated next to someone they actively dislike.

Picture 1
The qwerty keyboard and typebasket (Kay, 2013)

For any number, n, of guests, the number of possible arrangements is (n-1)!. That is, there are n places to seat the first guest, multiplied by (n-1) places left to seat the second guest, multiplied by (n-2) for the third guest and so on. Because the guests are seated round a circular table with n places, there are n ways of rotating each seating plan to give another arrangement that has already been counted. So, there are (n x (n-1) x (n-2) x…x 1)/n = (n-1) x (n-2) x…x 1 arrangements, which is written (n-1)!.

By pairing up two feuding guests and considering them as one, you can find the total number of arrangements where they are sat next to each other by considering a dinner party with one less person. From our calculation above we know the total number of possible arrangements is (n-2)!, but since the feuding pair could be seated together as XY or YX we have to multiply the total number of arrangements by two. From this, the final probability of the two feuding guests being sat together is 2(n-2)!/(n-1)! = 2/(n-1), and so the probability of them not being sat together is 1-(2/(n-1)) = (n-3)/(n-1).

But what if one or more of the guests is so unlikable that they have multiple enemies at the table? Say ‘h’ who has been known before now to clash with both ‘e’ and ‘t’. Assuming the events are independent (one doesn’t influence the other) we just multiply the probabilities together to get the chance of ‘h’ being next to neither of them as [(n-3)/(n-1)]2. And the probability that on the same table ‘e’ is also not next to her ex ‘r’ is [(n-3)/(n-1)]2 x [(n-3)/(n-1)] = [(n-3)/(n-1)]3. So, for any number of pairs of feuding guests, m, the probability of polite conversation between all is [(n-3)/(n-1)]m.

Now, returning to the problem of the typebars, a frequency analysis of the English language suggests there are roughly 12 pairings which occur often enough to be problematic. For n=44 symbols, the dinner party formula gives a probability of [(44-3)/(44-1)]12 = [41/43]12 = 0.56. That is a better than 50% chance that the most frequently occurring letter pairs could have been separated by random allocation. An alternative theory suggests that Sholes may have looked for the most infrequently occurring pairs of letters, numbers and punctuation and arranged these to be adjacent on the typebasket. The statistical evidence for this is much more compelling, but rivals of qwerty had other issues with its design.

August Dvorak and his successors treated keyboard design as an optimisation problem. With the advantage of hindsight now that the typewriter had been established, they were able to focus on factors which they believed would benefit the learning and efficiency of touch typing. Qwerty was regularly criticised as defective and awkward for reasons that competing keyboards were claimed to overcome.

The objectives used by Dvorak, qwerty’s biggest antagonist and inventor of the eponymous Dvorak Standard Keyboard (DSK), were that:

  • the right hand should be given more work than the left hand, at roughly 56%;
  • the amount of typing assigned to each finger should be proportional to its skill and strength;
  • 70% of typing should be carried out on the home row (the natural position of fingers on a keyboard);
  • letters often found together should be assigned positions such that alternate hands are required to strike them, and
  • finger motions from row to row should be reduced as much as possible.
Picture 2
The Dvorak Simplified Keyboard (Wikimedia commons)

To achieve these aims, Dvorak used frequency analysis data for one-, two-, three-, four- and five- letter sequences, and claimed that 35% of all English words could be typed exclusively from the home row. He also conducted multiple experiments on the ease of use of his new design over qwerty, although the specifics were sparsely published.

Of course, however good Dvorak’s new design may have been, there was a problem. Qwerty being pre-established meant that finding subjects who were unfamiliar with both keyboards was difficult. Participants who tested the DSK had to ‘unlearn’ touch typing, in order to relearn it for a different layout, while those using qwerty had the advantage of years of practice. The main metric used to determine the ‘better’ design was typing speed but clearly this was not only a test of the keyboard, it was also a measure of the skill of the typist.

Alone, average typing speed would not be enough to distinguish between individuals – any more than 40 words per minute (wpm) is considered above average and since a lot more than 40 people are average or below average typists, some of them must have the same wpm – but other information is available. Modern computer keyboards send feedback from each letter you type, leading to a wealth of data on the time between each consecutive key press. This can be broken down into the time between any particular letter pairing, building a profile on an individuals specific typing patterns, and combined with typing speed it is surprisingly effective at identifying typists.

In a battle of the keyboards, despite its suboptimal design and uncertain past, qwerty has remained undefeated. Today it is so ubiquitous that for most people to see a different layout would be jarring, yet our interactions with it are still identifiably unique. Nearly 150 years after its conception, the keyboard is embedded in our culture – it’s an old kind of technology, just not the one Scholes thought he was inventing.

What is P versus NP?

TRM intern and University of Oxford student Kai Laddiman speaks to St John’s College Computer Scientist Stefan Kiefer about the infamous million-dollar millennium problem: P versus NP. 

You can read more about P vs NP here.

Perfect Numbers and Mersenne Primes

Perfect numbers and Mersenne primes might seem like unrelated branches of math, but work by Euclid and Euler over 2000 years apart showed they are so deeply connected that a one-to-one correspondence exists between the two sets of numbers.

Produced by Tom Rocks Maths intern Kai Laddiman, with assistance from Tom Crawford. Thanks to St John’s College, Oxford for funding the placement.

From aliens to bees via tattoos…

A short review of intern Joe Double’s work with Tom Rocks Maths over the summer of 2018. Written for the OUS East Kent branch who provided funding for the project. 

‘First of all, I must thank you again for the grant, and for the warmth and friendliness at your event; it was an absolute delight to give my presentation and talk to your members, as it has been interacting with you in general.

I had the opportunity to work with one of my tutors over the summer to produce pieces for a general audience about complex mathematical topics. Without the help of the OUS East Kent group, I couldn’t have taken up this opportunity – with their grant’s help, I was able to afford to live in Oxford through a large part of the summer, allowing me to work in close contact with my tutor and use his studio for creating the videos and audio pieces I worked on. The OUSEK grant can be put to use far more flexibly than those from bigger schemes (which always have preconditions to meet about how the project will apply to industry, say), so I couldn’t recommend applying more if you have an idea for a project for your time at Oxford which is on the unusual side!’

Pieces I produced during the project:

Why do Bees Build Hexagons? Honeycomb Conjecture explained by Thomas Hales

A video I edited of Tom (my tutor) interviewing Thomas Hales about the mathematics behind beehives.

Would Alien (Non-Euclidean) Geometry Break Our Brains?

My main video, written, filmed and edited by me, about demystifying non-Euclidean geometry.

Take me to your chalkboard

My main audio piece, where I interview Professor Adrian Moore (also of St Hugh’s) about what philosophy can tell us about how aliens might do maths.

Maths proves that maths isn’t boring

An article about Gödel’s incompleteness theorems, and how they show maths is always risky.

Getting tattooed for science…

An audio piece I edited about a tattoo Tom got of the Platonic solids.

Alien maths – we’re counting on it

An article about how we use the mathematics of prime numbers to send messages to the stars.

Play Nice!

An article about a game theory paper which could amongst other things help stop deforestation.

The original article was published on the OUS East Kent website here.

What are the chances that two England teammates share a Birthday?

Cast your mind back to the summer of 2018… we saw the warmest ever weather in the UK, Brexit was not yet a complete and utter disaster, and seemingly against all the odds the England football team reached the semi-finals of the World Cup for the first time since 1990. No doubt the team had a huge celebration together afterwards – but it wouldn’t be the first time that two of them have celebrated an occasion at the same time. As well as playing together at the heart of England’s defence, Manchester City duo Kyle Walker and John Stones also share the same birthday! Stones was born on 28th May 1994, making him 24 years old; Walker was born on the same day in 1990, meaning that he is exactly four years older than his teammate. How strange! Or is it…?

John_Stones_2018-06-13_1 Kyle_Walker

On the face of it, it seems quite surprising that in an England squad of just 23 players, two of them happen to share a birthday. However, as we’re about to see, this isn’t a freakish coincidence – maths says that it’s quite likely! What we’re talking about here is commonly known as the birthday problem: if there are a group of people of a certain size, what is the likelihood that at least two of them have the same birthday?

Let’s start by saying that we have a group of N people, and assume that birthdays are equally likely on every day of the year. (There is some evidence to suggest that this isn’t the case for top athletes; some say that they tend to be born early in the school year, such as around September in England. This is because they are slightly older than the other children in the year, and so they have a slight head-start in their physical development. However, we don’t want to make things too complicated, so we’ll ignore that for now.)

The easiest way to think about the problem is to first try to work out what the probability is that none of the N people share a birthday. Suppose our N people walk into a room, that is empty at first, one at a time. When the first person walks in, it’s obvious that they don’t share a birthday with anyone else in the room, because there isn’t anyone else. Therefore, they have the maximum probability of not sharing a birthday with anyone else in the room, which is 1.

Now think of the second person who walks in. The only way that they could share a birthday with someone in the room is if it happens to be exactly the same day as the first person. That means there is a 1 in 365 chance that they do share a birthday, so there is a 364 in 365 chance that they don’t.

Suppose that the first two birthdays don’t match, and then the third person walks in. They now have 2 days that they can’t share a birthday with, so there are 363 possible choices out of 365. Because we assumed that the first two didn’t match, we multiply the probabilities, so now the chance that none of them share a birthday is (364/365) * (363/365).

We can repeat this process until we get to our final person, number N. For example, the fourth person has 3 birthdays that they cannot share, so we multiply by a chance of 362/365; the fifth person has 4 days to avoid, so we include a probability of 361/365… By the time the Nth person walks in, there are N-1 people already in the room, so there are N-1 days that their birthday cannot fall on. This leaves them with 365-(N-1) possibilities out of 365.

To work out the total probability, we multiply all of these terms together which gives the likelihood that none of the N people share a birthday as

1 * (364/365) * (363/365) * (362/365) * … * ((365-(N-1))/365).

You might be thinking that this still looks like quite a big probability that none of them share a birthday, because all of the terms are very close to 1. But, if we try some values of N in a calculator, then it tells a very different story. (The percentages are calculated by finding the probability from the equation above and multiplying by 100.)

When N = 10, we get an 88% chance that none of them share a birthday. However, this drops down to 59% when there are N = 20 people. When we get to N = 23, the number of players in the England squad, the probability reaches just under 50%. That means that, incredibly, the likelihood that at least two of the 23 people share a birthday is just bigger than 50%!

So, in a random group of 23 people, it’s more likely than not that two of them share a birthday! This seems very strange at first; surely you’d need more than 23 people for a shared birthday to be more likely than not?! This is why the problem is commonly known as the birthday paradox – it might be very hard to get your head around, but the maths doesn’t lie!

Perhaps, in order to convince ourselves, we should look at some real-life examples. This is where the World Cup squads come into play: each team is restricted to bringing 23 players to the tournament. (We’ve seen that number before…) If our calculations above are correct, then if we picked any one of the World Cup squads, there would be roughly a 50:50 chance that at least two of the squad members share a birthday, which means that out of all of the squads that went to Russia, we would expect about half of them to have a birthday match. Well, let’s take a look…

Of the 32 teams, which were divided into 8 groups of 4, the following teams have at least one pair of players who share a birthday:

Group A Russia
Group B Iran, Morocco, Portugal, Spain
Group C Australia, France, Peru
Group D Croatia, Nigeria
Group E Brazil, Costa Rica
Group F Germany, South Korea
Group G England
Group H Poland

So, not only is there at least one team in every group with a birthday match, but if we count the total, there are 16 squads with a shared birthday pair – exactly half of the teams! The experimental results have matched up with the mathematical theory to perfection. Hopefully that’s enough to convince you that our calculations were indeed sound!

A slightly different question that you might ask is as follows: if I am in a group with a certain number of people, what are the chances that at least one of them shares my birthday? Is it the same idea? What we have worked out above is the probability that any two people in the room share a birthday (or rather, we worked out the opposite, but we can find the right answer from our working). Note that the pair doesn’t necessarily include you; it’s a lot more likely that it’s some other pair in the group.

In order to work out the answer to this similar sounding question, we work the other way around again, by calculating the probability that none of the N people share my birthday. For each of the N people, there is only one birthday that they cannot have, and that is mine (14th November, in case you were wondering), which means there are 364 out of 365 possibilities for each person. We no longer care whether their birthdays match up; we only care if they match with mine. So each person has a 364/365 chance of not sharing my birthday; and the overall probability is just 364/365 * 364/365 * … * 364/365, N times, which we write as (364/365)N.

Once again, we can plug some values of N into a calculator: N = 10 gives a 97% chance that no-one else has my birthday. For N = 50 the probability is still very high: there is an 87% chance that none of these 50 people have the same birthday as me. N = 100 gives 76%; N = 200 gives 58%; you have to go all the way to N = 253 before the probability dips below 50%, and it becomes more likely than not that at least one person will celebrate their birthday with me.

Applying this idea to all 736 players (32 squads of 23 players) involved in the World Cup, we should expect around 3 of them to have been born on the same day as me – 14th November. And I am very happy to confirm that France’s Samuel Umtiti, Switzerland’s Roman Burki, and Belgium’s Thomas Vermaelen all have what is undoubtedly the best birthday of the year… Two similar problems with two very different solutions!

Thomas_Vermaelen_2018 Samuel_Umtiti_2018AUT_vs._SUI_2015-11-17_(250)

You can check which footballers share a birthday with you at www.famousbirthdays.com/date/monthDD-soccerplayer.html, where you enter the month in words and the day in numbers (no preceding zero required).

Kai Laddiman 

Why do Bees Build Hexagons? Honeycomb Conjecture explained by Thomas Hales

Mathematician Thomas Hales explains the Honeycomb Conjecture in the context of bees. Hales proved that the hexagon tiling (hexagonal honeycomb) is the most efficient way to maximise area whilst minimising perimeter.

Produced by Tom Rocks Maths intern Joe Double, with assistance from Tom Crawford. Thanks to the Oxford University Society East Kent Branch for funding the placement and to the Isaac Newton Institute for arranging the interview.

Would Alien (Non-Euclidean) Geometry Break Our Brains?

The author H. P. Lovecraft often described his fictional alien worlds as having ‘Non-Euclidean Geometry’, but what exactly is this? And would it really break our brains?

 

Produced by Tom Rocks Maths intern Joe Double, with assistance from Tom Crawford. Thanks to the Oxford University Society East Kent Branch for funding the placement.

Not so smooth criminals: how to use maths to catch a serial killer

The year is 1888, and the infamous serial killer Jack the Ripper is haunting the streets of Whitechapel. As a detective in Victorian London, your mission is to track down this notorious criminal – but you have a problem. The only information that you have to go on is the map below, which shows the locations of crimes attributed to Jack. Based on this information alone, where on earth should you start looking?

Picture1

The fact that Jack the Ripper was never caught suggests that the real Victorian detectives didn’t know the answer to this question any more than you do, and modern detectives are faced with the same problem when they are trying to track down serial offenders. Fortunately for us, there is a fascinating way in which we can apply maths to help us to catch these criminals – a technique known as geospatial profiling.

Geospatial profiling is the use of statistics to find patterns in the geographical locations of certain events. If we know the locations of the crimes committed by a serial offender, we can use geospatial profiling to work out their likely base location, or anchor point. This may be their home, place of work, or any other location of importance to them – meaning it’s a good place to start looking for clues!

Perhaps the simplest approach is to find the centre of minimum distance to the crime locations. That is, find the place which gives the overall shortest distance for the criminal to travel to commit their crimes. However, there are a couple of problems with this approach. Firstly, it doesn’t tend to consider criminal psychology and other important factors. For example, it might not be very sensible to assume that a criminal will commit crimes as close to home as they can! In fact, it is often the case that an offender will only commit crimes outside of a buffer zone around their base location. Secondly, this technique will provide us with a single point location, which is highly unlikely to exactly match the true anchor point. We would prefer to end up with a distribution of possible locations which we can use to identify the areas that have the highest probability of containing the anchor point, and are therefore the best places to search.

With this in mind, let’s call the anchor point of the criminal z. Our aim is then to find a probability distribution for z, which takes into account the locations of the crime scenes, so that we can work out where our criminal is most likely to be. In order to do this, we will need two things.

  1. A prior distribution for z. This is just a function which defines our best guess at what z might be, before we have used any of our information about the crime locations. The prior distribution is usually based off data from previous offenders whose location was successfully determined, but it’s usually not hugely important if we’re a bit wrong – this just gives us a place to start.
  2. A probability density function (PDF) for the locations of the crime sites. This is a function which describes how the criminal chooses the crime site, and therefore how the criminal is influenced by z. If we have a number of crimes committed at known locations, then the PDF describes the probability that a criminal with anchor point z commits crimes at these locations. Working out what we should choose for this is a little trickier…

We’ll see why we need these in a minute, but first, how do we choose our PDF? The answer is that it depends on the type of criminal, because different criminals behave in different ways. There are two main categories of offenders – resident offenders and non-resident offenders.

Resident offenders are those who commit crimes near to their anchor point, so their criminal region (the zone in which they commit crimes) and anchor region (a zone around their anchor point where they are often likely to be) largely overlap, as shown in the diagram:

Picture2

If we think that we may have this type of criminal, then we can use the famous normal distribution for our density function. Because we’re working in two dimensions, it looks like a little hill, with the peak at the anchor point:

Picture3

Alternatively, if we think the criminal has a buffer zone, meaning that they only commit crimes at least a certain distance from home, then we can adjust our distribution slightly to reflect this. In this case, we use something that looks like a hollowed-out hill, where the most likely region is in a ring around the centre as shown below:

Picture4

The second type of offenders are non-resident offenders. They commit crimes relatively far from their anchor point, so that their criminal region and anchor region do not overlap, as shown in the diagram:

Picture5

If we think that we have this type of criminal, then for our PDF we can pick something that looks a little like the normal distribution used above, but shifted away from the centre:

Picture6

Now, the million-dollar question is which model should we pick? Determining between resident and non-resident offenders in advance is often difficult. Some information can be made deduced from the geography of the region, but often assumptions are made based on the crime itself – for example more complex/clever crimes have a higher likelihood of being committed by non-residents.

Once we’ve decided on our type of offender, selected the prior distribution (1) and the PDF (2), how do we actually use the model to help us to find our criminal? This is where the mathematical magic happens in the form of Bayesian statistics (named after statistician and philosopher Thomas Bayes).

Bayes’ theorem tells us that if we multiply together our prior distribution and our PDF, then we’ll end up with a new probability distribution for the anchor point z, which now takes into account the locations of the crime scenes! We call this the posterior distribution, and it tells us the most likely locations for the criminal’s anchor point given the locations of the crime scenes, and therefore the best places to begin our search.

This fascinating technique is actually used today by police detectives when trying to locate serial offenders. They implement the same steps described above using an extremely sophisticated computer algorithm called Rigel, which has a very high accuracy of correctly locating criminals.

So, what about Jack?

If we apply this geospatial profiling technique to the locations of the crimes attributed to Jack the Ripper, then we can predict that it is most likely that his base location was in a road called Flower and Deane Street. This is marked on the map below, along with the five crime locations used to work it out.

Picture7

Unfortunately, we’re a little too late to know whether this prediction is accurate, because Flower and Deane street no longer exists, so any evidence is certainly long gone! However, if the detectives in Victorian London had known about geospatial profiling and the mathematics behind catching criminals, then it’s possible that the most infamous serial killer in British history might never have become quite so famous…

Francesca Lovell-Read

WordPress.com.

Up ↑