First year St John’s Maths student Brian discusses his favourite areas of undergraduate Maths (featuring a famous sequence) and his plans for a future career in Data Science. Produced for the SJC Inspire Programme.
Bayes’ Theorem allows us to assign a probability to an unknown fact. Thomas Bayes himself described an experiment with a billiard table so me and James tried to recreate it. Unfortunately, it didn’t quite go to plan…
Check out James’ channel here: https://www.youtube.com/user/singingbanana
The maths behind the most unshakeable technology of the 20th century
In 1867, a newspaper editor in Milwaukee contemplated a new kind of technology. He had previously patented a device which could be used to number the pages of books but, inspired by the suggestions of fellow inventors, he decided to develop it further. The idea itself wasn’t exactly new – it had been echoing around the scientific community for over 100 years. The challenge was to realise it in a way that was commercially viable, and Christopher Latham Sholes was ready.
His first design, in 1868, resembled something like a piano. Two rows of keys were arranged alphabetically in front of a large wooden box. It was not a success. Then, after almost 10 years of trial and error came something much more familiar. It had the foot-pedal of a sewing machine and most of the remaining mechanism was hidden by a bulky casing, but at the front were four rows of numbers, letters and punctuation… and a spacebar.
Surprisingly little is certain about why he chose to lay it out as he did, probably because to Sholes the layout was no more than a side-effect of the machine he was trying to create. But as the most influential component of the typewriter, the qwerty keyboard has attracted debates about its origin, its design and whether it is even fit for purpose. Without historical references, most arguments have centred on statistical evidence, jostling for the best compromise between the statistical properties of our language and the way that we type. More recently, questions have been posed about how generic ‘the way that we type’ actually is. Can it be generalised to design the perfect keyboard, or could it be unique enough to personally identify each and every one of us?
The first typewriter was designed for hunt-and-peck operation as opposed to touch typing. In other words, the user was expected to search for each letter sequentially, rather than tapping out sentences using muscle-memory. Each of the 44 keys was connected to a long metal typebar which ended with an embossed character corresponding to the one on the key. The typebars themselves were liable to jam, leading to the commonly disputed myth that the qwerty arrangement was an effort to separate frequently used keys.
Throughout the 20th century new inventors claimed to have created better, more efficient keyboards, usually presenting a long list of reasons why their new design was superior. The most long-lasting of these was the Dvorak Simplified Keyboard, but other challengers arrived in a steady stream from 1909, four years after qwerty was established as the international standard.
Is it possible that there was a method behind the original arrangement of the keys? It really depends who you ask. The typebars themselves fell into a circular type-basket in a slightly different order to the one visible on the keyboard. Defining adjacency as two typebars which are immediately next to each other, the problem of separating them so that no two will jam is similar to sitting 44 guests around a circular dinner table randomly and hoping that no one is seated next to someone they actively dislike.
For any number, n, of guests, the number of possible arrangements is (n-1)!. That is, there are n places to seat the first guest, multiplied by (n-1) places left to seat the second guest, multiplied by (n-2) for the third guest and so on. Because the guests are seated round a circular table with n places, there are n ways of rotating each seating plan to give another arrangement that has already been counted. So, there are (n x (n-1) x (n-2) x…x 1)/n = (n-1) x (n-2) x…x 1 arrangements, which is written (n-1)!.
By pairing up two feuding guests and considering them as one, you can find the total number of arrangements where they are sat next to each other by considering a dinner party with one less person. From our calculation above we know the total number of possible arrangements is (n-2)!, but since the feuding pair could be seated together as XY or YX we have to multiply the total number of arrangements by two. From this, the final probability of the two feuding guests being sat together is 2(n-2)!/(n-1)! = 2/(n-1), and so the probability of them not being sat together is 1-(2/(n-1)) = (n-3)/(n-1).
But what if one or more of the guests is so unlikable that they have multiple enemies at the table? Say ‘h’ who has been known before now to clash with both ‘e’ and ‘t’. Assuming the events are independent (one doesn’t influence the other) we just multiply the probabilities together to get the chance of ‘h’ being next to neither of them as [(n-3)/(n-1)]2. And the probability that on the same table ‘e’ is also not next to her ex ‘r’ is [(n-3)/(n-1)]2 x [(n-3)/(n-1)] = [(n-3)/(n-1)]3. So, for any number of pairs of feuding guests, m, the probability of polite conversation between all is [(n-3)/(n-1)]m.
Now, returning to the problem of the typebars, a frequency analysis of the English language suggests there are roughly 12 pairings which occur often enough to be problematic. For n=44 symbols, the dinner party formula gives a probability of [(44-3)/(44-1)]12 = [41/43]12 = 0.56. That is a better than 50% chance that the most frequently occurring letter pairs could have been separated by random allocation. An alternative theory suggests that Sholes may have looked for the most infrequently occurring pairs of letters, numbers and punctuation and arranged these to be adjacent on the typebasket. The statistical evidence for this is much more compelling, but rivals of qwerty had other issues with its design.
August Dvorak and his successors treated keyboard design as an optimisation problem. With the advantage of hindsight now that the typewriter had been established, they were able to focus on factors which they believed would benefit the learning and efficiency of touch typing. Qwerty was regularly criticised as defective and awkward for reasons that competing keyboards were claimed to overcome.
The objectives used by Dvorak, qwerty’s biggest antagonist and inventor of the eponymous Dvorak Standard Keyboard (DSK), were that:
- the right hand should be given more work than the left hand, at roughly 56%;
- the amount of typing assigned to each finger should be proportional to its skill and strength;
- 70% of typing should be carried out on the home row (the natural position of fingers on a keyboard);
- letters often found together should be assigned positions such that alternate hands are required to strike them, and
- finger motions from row to row should be reduced as much as possible.
To achieve these aims, Dvorak used frequency analysis data for one-, two-, three-, four- and five- letter sequences, and claimed that 35% of all English words could be typed exclusively from the home row. He also conducted multiple experiments on the ease of use of his new design over qwerty, although the specifics were sparsely published.
Of course, however good Dvorak’s new design may have been, there was a problem. Qwerty being pre-established meant that finding subjects who were unfamiliar with both keyboards was difficult. Participants who tested the DSK had to ‘unlearn’ touch typing, in order to relearn it for a different layout, while those using qwerty had the advantage of years of practice. The main metric used to determine the ‘better’ design was typing speed but clearly this was not only a test of the keyboard, it was also a measure of the skill of the typist.
Alone, average typing speed would not be enough to distinguish between individuals – any more than 40 words per minute (wpm) is considered above average and since a lot more than 40 people are average or below average typists, some of them must have the same wpm – but other information is available. Modern computer keyboards send feedback from each letter you type, leading to a wealth of data on the time between each consecutive key press. This can be broken down into the time between any particular letter pairing, building a profile on an individuals specific typing patterns, and combined with typing speed it is surprisingly effective at identifying typists.
In a battle of the keyboards, despite its suboptimal design and uncertain past, qwerty has remained undefeated. Today it is so ubiquitous that for most people to see a different layout would be jarring, yet our interactions with it are still identifiably unique. Nearly 150 years after its conception, the keyboard is embedded in our culture – it’s an old kind of technology, just not the one Scholes thought he was inventing.
The first in a new feature where I’ll be interviewing some of my students at the University of Oxford about their love of maths for the St John’s College Inspire Programme that aims to provide role models for students at non-selective state schools in the UK. Meet first year student Diamor…
The year is 1888, and the infamous serial killer Jack the Ripper is haunting the streets of Whitechapel. As a detective in Victorian London, your mission is to track down this notorious criminal – but you have a problem. The only information that you have to go on is the map below, which shows the locations of crimes attributed to Jack. Based on this information alone, where on earth should you start looking?
The fact that Jack the Ripper was never caught suggests that the real Victorian detectives didn’t know the answer to this question any more than you do, and modern detectives are faced with the same problem when they are trying to track down serial offenders. Fortunately for us, there is a fascinating way in which we can apply maths to help us to catch these criminals – a technique known as geospatial profiling.
Geospatial profiling is the use of statistics to find patterns in the geographical locations of certain events. If we know the locations of the crimes committed by a serial offender, we can use geospatial profiling to work out their likely base location, or anchor point. This may be their home, place of work, or any other location of importance to them – meaning it’s a good place to start looking for clues!
Perhaps the simplest approach is to find the centre of minimum distance to the crime locations. That is, find the place which gives the overall shortest distance for the criminal to travel to commit their crimes. However, there are a couple of problems with this approach. Firstly, it doesn’t tend to consider criminal psychology and other important factors. For example, it might not be very sensible to assume that a criminal will commit crimes as close to home as they can! In fact, it is often the case that an offender will only commit crimes outside of a buffer zone around their base location. Secondly, this technique will provide us with a single point location, which is highly unlikely to exactly match the true anchor point. We would prefer to end up with a distribution of possible locations which we can use to identify the areas that have the highest probability of containing the anchor point, and are therefore the best places to search.
With this in mind, let’s call the anchor point of the criminal z. Our aim is then to find a probability distribution for z, which takes into account the locations of the crime scenes, so that we can work out where our criminal is most likely to be. In order to do this, we will need two things.
- A prior distribution for z. This is just a function which defines our best guess at what z might be, before we have used any of our information about the crime locations. The prior distribution is usually based off data from previous offenders whose location was successfully determined, but it’s usually not hugely important if we’re a bit wrong – this just gives us a place to start.
- A probability density function (PDF) for the locations of the crime sites. This is a function which describes how the criminal chooses the crime site, and therefore how the criminal is influenced by z. If we have a number of crimes committed at known locations, then the PDF describes the probability that a criminal with anchor point z commits crimes at these locations. Working out what we should choose for this is a little trickier…
We’ll see why we need these in a minute, but first, how do we choose our PDF? The answer is that it depends on the type of criminal, because different criminals behave in different ways. There are two main categories of offenders – resident offenders and non-resident offenders.
Resident offenders are those who commit crimes near to their anchor point, so their criminal region (the zone in which they commit crimes) and anchor region (a zone around their anchor point where they are often likely to be) largely overlap, as shown in the diagram:
If we think that we may have this type of criminal, then we can use the famous normal distribution for our density function. Because we’re working in two dimensions, it looks like a little hill, with the peak at the anchor point:
Alternatively, if we think the criminal has a buffer zone, meaning that they only commit crimes at least a certain distance from home, then we can adjust our distribution slightly to reflect this. In this case, we use something that looks like a hollowed-out hill, where the most likely region is in a ring around the centre as shown below:
The second type of offenders are non-resident offenders. They commit crimes relatively far from their anchor point, so that their criminal region and anchor region do not overlap, as shown in the diagram:
If we think that we have this type of criminal, then for our PDF we can pick something that looks a little like the normal distribution used above, but shifted away from the centre:
Now, the million-dollar question is which model should we pick? Determining between resident and non-resident offenders in advance is often difficult. Some information can be made deduced from the geography of the region, but often assumptions are made based on the crime itself – for example more complex/clever crimes have a higher likelihood of being committed by non-residents.
Once we’ve decided on our type of offender, selected the prior distribution (1) and the PDF (2), how do we actually use the model to help us to find our criminal? This is where the mathematical magic happens in the form of Bayesian statistics (named after statistician and philosopher Thomas Bayes).
Bayes’ theorem tells us that if we multiply together our prior distribution and our PDF, then we’ll end up with a new probability distribution for the anchor point z, which now takes into account the locations of the crime scenes! We call this the posterior distribution, and it tells us the most likely locations for the criminal’s anchor point given the locations of the crime scenes, and therefore the best places to begin our search.
This fascinating technique is actually used today by police detectives when trying to locate serial offenders. They implement the same steps described above using an extremely sophisticated computer algorithm called Rigel, which has a very high accuracy of correctly locating criminals.
So, what about Jack?
If we apply this geospatial profiling technique to the locations of the crimes attributed to Jack the Ripper, then we can predict that it is most likely that his base location was in a road called Flower and Deane Street. This is marked on the map below, along with the five crime locations used to work it out.
Unfortunately, we’re a little too late to know whether this prediction is accurate, because Flower and Deane street no longer exists, so any evidence is certainly long gone! However, if the detectives in Victorian London had known about geospatial profiling and the mathematics behind catching criminals, then it’s possible that the most infamous serial killer in British history might never have become quite so famous…
Stripping back the most important equations in maths so that everyone can understand…
The Normal Distribution is one of the most important in the world of probability, modelling everything from height and weight to salaries and number of offspring. It is used by advertisers to better target their products and by pharmaceutical companies to test the success of new drugs. It seems to fit almost any set of data, which is what makes it SO incredibly important…
You can watch all of the Equations Stripped series here.