What’s your type? The Maths behind the ‘Qwerty’ Keyboard

The maths behind the most unshakeable technology of the 20th century

Martha Bozic

In 1867, a newspaper editor in Milwaukee contemplated a new kind of technology. He had previously patented a device which could be used to number the pages of books but, inspired by the suggestions of fellow inventors, he decided to develop it further. The idea itself wasn’t exactly new – it had been echoing around the scientific community for over 100 years. The challenge was to realise it in a way that was commercially viable, and Christopher Latham Sholes was ready.

His first design, in 1868, resembled something like a piano. Two rows of keys were arranged alphabetically in front of a large wooden box. It was not a success. Then, after almost 10 years of trial and error came something much more familiar. It had the foot-pedal of a sewing machine and most of the remaining mechanism was hidden by a bulky casing, but at the front were four rows of numbers, letters and punctuation… and a spacebar.

Surprisingly little is certain about why he chose to lay it out as he did, probably because to Sholes the layout was no more than a side-effect of the machine he was trying to create. But as the most influential component of the typewriter, the qwerty keyboard has attracted debates about its origin, its design and whether it is even fit for purpose. Without historical references, most arguments have centred on statistical evidence, jostling for the best compromise between the statistical properties of our language and the way that we type. More recently, questions have been posed about how generic ‘the way that we type’ actually is. Can it be generalised to design the perfect keyboard, or could it be unique enough to personally identify each and every one of us?

The first typewriter was designed for hunt-and-peck operation as opposed to touch typing.  In other words, the user was expected to search for each letter sequentially, rather than tapping out sentences using muscle-memory. Each of the 44 keys was connected to a long metal typebar which ended with an embossed character corresponding to the one on the key. The typebars themselves were liable to jam, leading to the commonly disputed myth that the qwerty arrangement was an effort to separate frequently used keys.

Throughout the 20th century new inventors claimed to have created better, more efficient keyboards, usually presenting a long list of reasons why their new design was superior. The most long-lasting of these was the Dvorak Simplified Keyboard, but other challengers arrived in a steady stream from 1909, four years after qwerty was established as the international standard.

Is it possible that there was a method behind the original arrangement of the keys? It really depends who you ask. The typebars themselves fell into a circular type-basket in a slightly different order to the one visible on the keyboard. Defining adjacency as two typebars which are immediately next to each other, the problem of separating them so that no two will jam is similar to sitting 44 guests around a circular dinner table randomly and hoping that no one is seated next to someone they actively dislike.

Picture 1
The qwerty keyboard and typebasket (Kay, 2013)

For any number, n, of guests, the number of possible arrangements is (n-1)!. That is, there are n places to seat the first guest, multiplied by (n-1) places left to seat the second guest, multiplied by (n-2) for the third guest and so on. Because the guests are seated round a circular table with n places, there are n ways of rotating each seating plan to give another arrangement that has already been counted. So, there are (n x (n-1) x (n-2) x…x 1)/n = (n-1) x (n-2) x…x 1 arrangements, which is written (n-1)!.

By pairing up two feuding guests and considering them as one, you can find the total number of arrangements where they are sat next to each other by considering a dinner party with one less person. From our calculation above we know the total number of possible arrangements is (n-2)!, but since the feuding pair could be seated together as XY or YX we have to multiply the total number of arrangements by two. From this, the final probability of the two feuding guests being sat together is 2(n-2)!/(n-1)! = 2/(n-1), and so the probability of them not being sat together is 1-(2/(n-1)) = (n-3)/(n-1).

But what if one or more of the guests is so unlikable that they have multiple enemies at the table? Say ‘h’ who has been known before now to clash with both ‘e’ and ‘t’. Assuming the events are independent (one doesn’t influence the other) we just multiply the probabilities together to get the chance of ‘h’ being next to neither of them as [(n-3)/(n-1)]2. And the probability that on the same table ‘e’ is also not next to her ex ‘r’ is [(n-3)/(n-1)]2 x [(n-3)/(n-1)] = [(n-3)/(n-1)]3. So, for any number of pairs of feuding guests, m, the probability of polite conversation between all is [(n-3)/(n-1)]m.

Now, returning to the problem of the typebars, a frequency analysis of the English language suggests there are roughly 12 pairings which occur often enough to be problematic. For n=44 symbols, the dinner party formula gives a probability of [(44-3)/(44-1)]12 = [41/43]12 = 0.56. That is a better than 50% chance that the most frequently occurring letter pairs could have been separated by random allocation. An alternative theory suggests that Sholes may have looked for the most infrequently occurring pairs of letters, numbers and punctuation and arranged these to be adjacent on the typebasket. The statistical evidence for this is much more compelling, but rivals of qwerty had other issues with its design.

August Dvorak and his successors treated keyboard design as an optimisation problem. With the advantage of hindsight now that the typewriter had been established, they were able to focus on factors which they believed would benefit the learning and efficiency of touch typing. Qwerty was regularly criticised as defective and awkward for reasons that competing keyboards were claimed to overcome.

The objectives used by Dvorak, qwerty’s biggest antagonist and inventor of the eponymous Dvorak Standard Keyboard (DSK), were that:

  • the right hand should be given more work than the left hand, at roughly 56%;
  • the amount of typing assigned to each finger should be proportional to its skill and strength;
  • 70% of typing should be carried out on the home row (the natural position of fingers on a keyboard);
  • letters often found together should be assigned positions such that alternate hands are required to strike them, and
  • finger motions from row to row should be reduced as much as possible.
Picture 2
The Dvorak Simplified Keyboard (Wikimedia commons)

To achieve these aims, Dvorak used frequency analysis data for one-, two-, three-, four- and five- letter sequences, and claimed that 35% of all English words could be typed exclusively from the home row. He also conducted multiple experiments on the ease of use of his new design over qwerty, although the specifics were sparsely published.

Of course, however good Dvorak’s new design may have been, there was a problem. Qwerty being pre-established meant that finding subjects who were unfamiliar with both keyboards was difficult. Participants who tested the DSK had to ‘unlearn’ touch typing, in order to relearn it for a different layout, while those using qwerty had the advantage of years of practice. The main metric used to determine the ‘better’ design was typing speed but clearly this was not only a test of the keyboard, it was also a measure of the skill of the typist.

Alone, average typing speed would not be enough to distinguish between individuals – any more than 40 words per minute (wpm) is considered above average and since a lot more than 40 people are average or below average typists, some of them must have the same wpm – but other information is available. Modern computer keyboards send feedback from each letter you type, leading to a wealth of data on the time between each consecutive key press. This can be broken down into the time between any particular letter pairing, building a profile on an individuals specific typing patterns, and combined with typing speed it is surprisingly effective at identifying typists.

In a battle of the keyboards, despite its suboptimal design and uncertain past, qwerty has remained undefeated. Today it is so ubiquitous that for most people to see a different layout would be jarring, yet our interactions with it are still identifiably unique. Nearly 150 years after its conception, the keyboard is embedded in our culture – it’s an old kind of technology, just not the one Scholes thought he was inventing.

What is P versus NP?

TRM intern and University of Oxford student Kai Laddiman speaks to St John’s College Computer Scientist Stefan Kiefer about the infamous million-dollar millennium problem: P versus NP. 

You can read more about P vs NP here.

Why do Bees Build Hexagons? Honeycomb Conjecture explained by Thomas Hales

Mathematician Thomas Hales explains the Honeycomb Conjecture in the context of bees. Hales proved that the hexagon tiling (hexagonal honeycomb) is the most efficient way to maximise area whilst minimising perimeter.

Produced by Tom Rocks Maths intern Joe Double, with assistance from Tom Crawford. Thanks to the Oxford University Society East Kent Branch for funding the placement and to the Isaac Newton Institute for arranging the interview.

Would Alien (Non-Euclidean) Geometry Break Our Brains?

The author H. P. Lovecraft often described his fictional alien worlds as having ‘Non-Euclidean Geometry’, but what exactly is this? And would it really break our brains?

 

Produced by Tom Rocks Maths intern Joe Double, with assistance from Tom Crawford. Thanks to the Oxford University Society East Kent Branch for funding the placement.

Not so smooth criminals: how to use maths to catch a serial killer

The year is 1888, and the infamous serial killer Jack the Ripper is haunting the streets of Whitechapel. As a detective in Victorian London, your mission is to track down this notorious criminal – but you have a problem. The only information that you have to go on is the map below, which shows the locations of crimes attributed to Jack. Based on this information alone, where on earth should you start looking?

Picture1

The fact that Jack the Ripper was never caught suggests that the real Victorian detectives didn’t know the answer to this question any more than you do, and modern detectives are faced with the same problem when they are trying to track down serial offenders. Fortunately for us, there is a fascinating way in which we can apply maths to help us to catch these criminals – a technique known as geospatial profiling.

Geospatial profiling is the use of statistics to find patterns in the geographical locations of certain events. If we know the locations of the crimes committed by a serial offender, we can use geospatial profiling to work out their likely base location, or anchor point. This may be their home, place of work, or any other location of importance to them – meaning it’s a good place to start looking for clues!

Perhaps the simplest approach is to find the centre of minimum distance to the crime locations. That is, find the place which gives the overall shortest distance for the criminal to travel to commit their crimes. However, there are a couple of problems with this approach. Firstly, it doesn’t tend to consider criminal psychology and other important factors. For example, it might not be very sensible to assume that a criminal will commit crimes as close to home as they can! In fact, it is often the case that an offender will only commit crimes outside of a buffer zone around their base location. Secondly, this technique will provide us with a single point location, which is highly unlikely to exactly match the true anchor point. We would prefer to end up with a distribution of possible locations which we can use to identify the areas that have the highest probability of containing the anchor point, and are therefore the best places to search.

With this in mind, let’s call the anchor point of the criminal z. Our aim is then to find a probability distribution for z, which takes into account the locations of the crime scenes, so that we can work out where our criminal is most likely to be. In order to do this, we will need two things.

  1. A prior distribution for z. This is just a function which defines our best guess at what z might be, before we have used any of our information about the crime locations. The prior distribution is usually based off data from previous offenders whose location was successfully determined, but it’s usually not hugely important if we’re a bit wrong – this just gives us a place to start.
  2. A probability density function (PDF) for the locations of the crime sites. This is a function which describes how the criminal chooses the crime site, and therefore how the criminal is influenced by z. If we have a number of crimes committed at known locations, then the PDF describes the probability that a criminal with anchor point z commits crimes at these locations. Working out what we should choose for this is a little trickier…

We’ll see why we need these in a minute, but first, how do we choose our PDF? The answer is that it depends on the type of criminal, because different criminals behave in different ways. There are two main categories of offenders – resident offenders and non-resident offenders.

Resident offenders are those who commit crimes near to their anchor point, so their criminal region (the zone in which they commit crimes) and anchor region (a zone around their anchor point where they are often likely to be) largely overlap, as shown in the diagram:

Picture2

If we think that we may have this type of criminal, then we can use the famous normal distribution for our density function. Because we’re working in two dimensions, it looks like a little hill, with the peak at the anchor point:

Picture3

Alternatively, if we think the criminal has a buffer zone, meaning that they only commit crimes at least a certain distance from home, then we can adjust our distribution slightly to reflect this. In this case, we use something that looks like a hollowed-out hill, where the most likely region is in a ring around the centre as shown below:

Picture4

The second type of offenders are non-resident offenders. They commit crimes relatively far from their anchor point, so that their criminal region and anchor region do not overlap, as shown in the diagram:

Picture5

If we think that we have this type of criminal, then for our PDF we can pick something that looks a little like the normal distribution used above, but shifted away from the centre:

Picture6

Now, the million-dollar question is which model should we pick? Determining between resident and non-resident offenders in advance is often difficult. Some information can be made deduced from the geography of the region, but often assumptions are made based on the crime itself – for example more complex/clever crimes have a higher likelihood of being committed by non-residents.

Once we’ve decided on our type of offender, selected the prior distribution (1) and the PDF (2), how do we actually use the model to help us to find our criminal? This is where the mathematical magic happens in the form of Bayesian statistics (named after statistician and philosopher Thomas Bayes).

Bayes’ theorem tells us that if we multiply together our prior distribution and our PDF, then we’ll end up with a new probability distribution for the anchor point z, which now takes into account the locations of the crime scenes! We call this the posterior distribution, and it tells us the most likely locations for the criminal’s anchor point given the locations of the crime scenes, and therefore the best places to begin our search.

This fascinating technique is actually used today by police detectives when trying to locate serial offenders. They implement the same steps described above using an extremely sophisticated computer algorithm called Rigel, which has a very high accuracy of correctly locating criminals.

So, what about Jack?

If we apply this geospatial profiling technique to the locations of the crimes attributed to Jack the Ripper, then we can predict that it is most likely that his base location was in a road called Flower and Deane Street. This is marked on the map below, along with the five crime locations used to work it out.

Picture7

Unfortunately, we’re a little too late to know whether this prediction is accurate, because Flower and Deane street no longer exists, so any evidence is certainly long gone! However, if the detectives in Victorian London had known about geospatial profiling and the mathematics behind catching criminals, then it’s possible that the most infamous serial killer in British history might never have become quite so famous…

Francesca Lovell-Read

Take me to your chalkboard

Is alien maths different from ours? And if it is, will they be able to understand the messages that we are sending into space? My summer intern Joe Double speaks to philosopher Professor Adrian Moore from BBC Radio 4’s ‘a history of the infinite’ to find out…

Complex Numbers – they don’t have to be complex!

The idea of complex numbers stems from a question that bugged mathematicians for thousands of years: what is the square root of -1? That is, which number do you multiply by itself to get -1?

Such a simple question has blossomed into a vast mathematical theory, for the simple reason that the answer isn’t real! It can’t be 1, as 1 * 1 = 1; it can’t be -1, as -1 * -1 = 1; whichever number you multiply by itself, you can’t get a negative number. Up until the 16th century, almost everyone ignored this issue; perhaps they were afraid of the implications it could bring. But then, gradually, people began to realise that there was a whole new world of mathematics waiting to be discovered if they faced up to the question.

In order to explain this apparent gap in maths, the idea of an ‘imaginary’ number was introduced. The prolific Swiss mathematician Leonhard Euler first used the letter i to represent the square root of -1, and as with most of his ideas, it stuck. Now i isn’t something that you’ll see in everyday life in relation to physical quantities, such as money. If you’re lucky enough to have money in your bank account, then you’ll see a positive number on your bank statement. If, as is the case for most students, you currently owe money to the bank (for example, if you have an overdraft), then your statement will display a negative number. However, because i is an ‘imaginary’ unit, it is neither ‘positive’ nor ‘negative’ in this sense, and so it won’t crop up in these situations.

Helpfully, you can add, subtract, multiply and divide using i in the same way as with any other numbers. By doing so, we expand the idea of imaginary numbers to the idea of complex numbers.

Take two real numbers a and b – these are the type that we’re used to dealing with.

They could be positive, negative, whole numbers, fractions, whatever.

A complex number is then formed by taking the number a + b * i. Let’s call this number z.

We say that a is the real part of z, and b is the imaginary part of z.

Any number that you can make in this way is a complex number.

For example, let a = -3 and b = 2; then -3 + 2*i, which we write as -3 + 2i, is a complex number.

As we saw before, complex numbers don’t actually pop up in ‘real-life’ situations. So why do we care about them? The reason is that complex numbers have some very neat properties that allow them to be used in all sorts of mathematical contexts. So even though you may not see the number i in everyday life, it’s very likely that there are complex numbers involved behind the scenes wherever you look. Let’s have a quick glance at some of these properties.

The key observation is that the square of i is -1, that is, i * i = -1.

We can use this fact to multiply complex numbers together.

Let’s look at a concrete example: multiply 2 + 2i by 4 – 3i.

We use the grid method for multiplying out brackets:

  4 -3i
2 2 * 4 = 8 2 * -3i = -6i
+2i 4 * 2i = 8i 2i * -3i = -6 * i * i = -6 * -1 = 6

Adding the results together, we get (2 + 2i)(4 – 3i) = 8 + 6 – 6i + 8i = 14 + 2i.

Therefore, multiplying two complex numbers has given us another complex number!

This is true in general, and it turns out to be very handy. In fact, Carl Friedrich Gauss proved a very famous result – known as the Fundamental Theorem of Algebra because it’s so important – that effectively tells us that the solutions to all equations can be written as complex numbers. This is extremely useful because we know that we don’t have to go any ‘deeper’ into numbers; once you’ve got your head around complex numbers, you can proudly declare that you’ve mastered them all!

Because of this fundamental theorem, our little friend i pops up all over the place in physics, engineering, computer science, and of course, in all sorts of areas of maths. While it may only be imaginary, its applications can be very real, from air traffic control, to animating characters in films. It plays a really important role in much of theoretical mathematics, which in turn is used in almost every scientific discipline. And to think, all of this stemmed from an innocent-looking question about -1; what were they so scared of?!

Kai Laddiman

WordPress.com.

Up ↑