Keyboard designs have a long history and each design have undergone several iterations. The most standard one – the QWERTY layout – has persisted since typewriters came in vogue though there were other layouts invented after QWERTY but they weren’t as popular. The primary reason being, with keyboard layouts, is that here form follows function. Where in people have adopted themselves to use this layout, but there are evidences that QWERTY layout is not the fastest to use. In fact the DOVRAK layout is proven to be faster because this layout is based on minimizing hand movement. A keyboard layout design could be based on many factors and few of them are,
In the above list, letter arrangement based on language is the most important factor because the primary purpose of a keyboard is to communicate using the language it was designed for. There are articles pointing that Christopher Latham Sholes the creator of QWERTY layout was inspired from the study of Bigram language model.
So what is Bi-gram? A Bi-gram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. Which means the letter arrangement or grouping is based on the Bi-gram/N-gram model. Bi-grams or N-grams are also extensively used in text mining, natural language processing, language analytic, spelling correction, word breaking and text summarization and much more.
Bigram uses a word prediction algorithm which is a conditional probability method to predict next word/letter after observing N-1 words/letter. So how do we calculate this probability? It all starts with sampling. Any probability calculation require a sample data set to begin with. There are so many word corpus ( word samples based on real world language usage) out there which could be used as a sampling, if Sholes have used Bigram method he might also have used some word corpus during the early 1870’s when the first keyboard was designed. There are many studies published about this frequency study, for example a study conducted at Queen’s university based several large-scale English corpora (approximately 183 million words in total) have listed the bigram frequency of various letter combination. If we take these frequency values and map them to our keyboard design this is how it would look like (I have mapped only the first row). In the below illustration 0.67 is the frequency value of the combination WQ and 1.95 for QW, 11.8 for WE, 11.5 for EW etc.
This is how complex it would get when we get into decoding of this design but here we are analyzing only one of the factor i.e. the letter arrangement.
The current world of smartphones and tablets have also adapted this QWERTY layout widely. Though mobile devices revolutionized keyboard design In terms of options, interactions etc., the basic layout is left unchanged. The language on the other hand is evolved to SMS language or text messaging from its basic form of English language.Now if we consider the Bi-gram approach and have to design a layout for mobile devices then the frequency values should be based on a sample from text messaging corpus. So using the same conventional QWERTY layout for mobile devices may not be appropriate from the user experience point of view. Let’s examine this. Consider this example of letter ‘u’ and ‘i’ which are next to each other in the first row of our keyboard layout
According to a thesis study about corpus linguistics of SMS text messaging the top used word in a test messaging is “YOU” which is typed as “u”. (See data below). Coming back to our example if the letter ‘I’ and ‘u’ are placed close to each other the probability of error in swapping them while typing is also high, in many instances it would create a totally opposite meaning to what the user intents to say.
There could be similar use cases or examples to prove that the QWERTY layout may not be best suited for our mobile devices. When going through other keyboard layouts out there, KALQ layout which is designed especially for thumb based typing is the only design I see in which our above mentioned problem won’t exist. Probably this design is based on text messaging corpus and the other factors? ? But this layout also seems to be restrictive to 2 thumbs usage. I haven’t used this layout so far but now eager to try it out and will be posting my reviews in the next article.
Love to hear your thoughts, ideas and inputs about this discussion and if you are interested to dive deep into the studies I mention I have provided the references below