On the surface keyboard shortcuts seems like a rather small topic: ctrl + s to saves and alt + f4 to closes, what else is there to know? But as with most things there’s always more than what meets the eye.
First of all lets state what a keyboard shortcut is and what it does:
A keyboard shortcut (or accelerator key, shortcut key, hot key, key binding, keybinding, key combo, etc.) is a key or set of keys that performs a predefined function. These functions can often be done via some other, more indirect mechanism, such as using a menu, typing a longer command, and/or using a pointing device. By reducing such sequences to a few keystrokes, this can often save the user time, hence “shortcut”.
For a system to be successful it needs to be sympathetic to its user. We therefore need to look at limitations of the user and how to accommodate them.
Anyone that paid attention in biology class will have heard the term opposable thumbs. The “thumbs” of other animals evolved into wings, hooves or flippers, but ours have shift around a bit to be opposite our fingers. This change in position allows us to grab things, it has also resulted in our thumbs becoming the strongest and one of the more dextrous fingers. Our fingers decease in strength as they move away from the thumb. It therefore follows that any good design should utilise this fact by making good use of our superior digits; the thumb, index and middle (ring) fingers.
The brain (cognetics)
Designing an object so it is sympathetic to our bodies is only half of the story. A well designed ‘thing’ must be sympathetic to the constraints of our minds too.
Modes are a major design consideration for good shortcuts.
In user interface design, a mode is a distinct setting within a computer program or any physical machine interface, in which the same user input will produce perceived different results than it would in other settings.
- The caps lock key is modal (ie it creates a mode). When the ‘k’ key is press a ‘k’ is displayed, but when caps lock is engaged pressing ‘k’ will display ‘K’.
- In TextEdit pressing cmd + t displays the font palette, but in Safari pressing cmd + t opens a new tab. The same key presses have different results. (Conflicts such a this are common due to the current computing paradigm of independent applications which is inherently modal).
Modes confuse the user and should be avoided. There is an alternative to modes, quasi-modes. Quasi-modes are like modes but with one important difference. Quasi-modes require a constant action to occur for the quasi-mode to become active. The standard shift key is quasi-modal; if the key is not held down it has no effect. Quasi-modes rely on the fact that the process of performing the action require to be in the quasi-modes (eg the holding of the shift key) causes the user to be aware of the fact that they are in a different mode. Quasi-modes reduce the confusion of modes.
The second consideration is human memory. Our short term memory is quiet limited. We can store around 7 items of data in our short term memory. This has an impact on the way we use shortcuts. When a user is performing a task they will concentrating on their data than the tools for manipulating the data.
There are a few ways to reduce the burden of remember how to use a system. The first is to remove the burden of remembering – this is done by clear labelling. The second is by creating meaning relationships between the desired result and required action. Meaningful relationships allow users to understand the system as a whole rather than having to learn a collection of unrelated and arbitrary actions.
Current systems shortcuts
Lets see how Windows XP and OS X Leopard far with the above criteria.
The physical design of Mac and standard PC keyboards are almost identical. The most noticeable differences are the modifier keys. A standard PC keyboard has ctrl, alt and windows keys, a Mac keyboard has ctrl, alt and cmd keys.
Left side of a mac keyboard
Left side of a Windows keyboard
In Windows the most common modifier key used with shortcuts is the ctrl key. The ctrl keys are located on the bottom row at the far left and far right of a standard keyboard.
It is the little finger (pinkie finger) that people most often use to press the ctrl key. The little finger is a feeble thing and tires quickly. Also the degree of stretch required to move the hands from the standard typing position is quite pronounced which makes it prone to a RSI (see How To Avoid The Emacs Pinky Problem).
OS X fairs better. The modifier key used is always the cmd key which are located directly to left and right of the space bar (additional modifier keys may also be used). The positioning allows the modifier keys to be pressed with the thumb – ideal.
Windows has many mode based issues. To issue a in shortcut we have to press either ctrl, alt or the ‘Windows key’ which is often followed by pressing another key. The ctrl key is the most common modifer key to be used in a shortcuts, fortunately the ctrl key does not suffer from modal issues. Unfortunately the same is not true of the alt or Windows keys, both of which are modal. Worse still the behaviour of the alt and Windows keys are inconsistent.
The alt key is generally used to move focus to the menu bar and for window management (eg, alt + f4 to close, alt + tab to switch to another window), but occasionally the alt key is used in ‘normal’ shortcuts. The key press cycle of alt key in a normal application (eg Notepad, Windows Explorer, Internet Explorer) is as follows:
- The alt key is pressed in. This moves the focus to the menu bar, illustrated by an the underlining of letters required to access the menus.
- At this point there are 4 possible sequence of events:
- Alt key is released resulting in the focus moving to the first menu (normal the File menu).
- A key that is underlined is press which results in the associated menu being displayed and the focus moving to the first item of that menu.
- Another valid key is pressed (eg, F4 or alt)
- A key that is not underlined is press resulting in the focus remaining in its current loci (eg the text area in Notepad).
The biggest culprit of modal operation in these possibilities is the first – releasing of the alt key without a second key press. This puts the system into an unnecessary and potentially confusing mode. The simplest way to remedy this is to not move the focus until a second key is pressed – i.e. make it quasi-modal.
The key press cycle of the Windows key is as follows:
- Windows Key is pressed in (there is no on screen indication that this has occurred).
- At this point there are 3 possible sequence of events.Note that events b and c can occur numerous times without releasing the Windows key:
- The key is released resulting in the Start menu being displayed and the focus moving to it.
- A valid key is pressed resulting in the related action being executed (the only way to discover valid keys is to read the documentation). When the Windows key is finally released the Start menu is not displayed.
- A non valid key is pressed. The key press is ignored by the Windows key and is handled by the application with focus. When the Windows key is finally released the Start menu is not displayed.
The behaviour of the Windows key is confusing. When it is pressed with no subsequent key presses it behaves modally, but when key presses do follow it behaves quasi-modally (depending on the command). The Windows key strikes me as a wasted opportunity. For such a prominent key it provides very little functionality, and the functionality it does provide is almost impossible to discover.
The on screen labelling of shortcuts in OS X and Windows are largely similar. Both try to use mnemonic to imply a system. For example cmd +s is saves, cmd + l is loads and cmd + p is print. There are limitations to this approach.
Firstly conflicts soon arise, for example should cmd + s be save or search? Windows applications tend to address this problem by using a different letter, thus breaking the mnemonic system. OS X sometimes uses a different letter but also uses additional modifier keys, but which additional modifier key is unpredictable. Both of these approaches are some what arbitary as they are not part of a system which the user can learn and therefore predict the shortcut. (OS X has a convention of using the shift key to perform related actions. For example cmd + s is save cmd + shift + s is save as, cmd + z is undo cmd + shift + z is redo.)
The second problem is internationalisation. The mnemonic system is fine when the system is in english but becomes arbitary when the same shortcuts are used in conjunction with other languages.
In addition to the onscreen labeling it is also worth noting the keyboard labelling. In OS X the labelling of these keys are largely consistent with their behaviour; the cmd key is used when issuing commands, the alt/options key will often give an alternative option, and the ctrl key will show controls. In Windows this is not the case, modifier keys are assigned without regard to their label.
Improvements & alternate implementation
What can be done to improve keyboard shortcuts? The most effective fixes need to take place at the operating system level, which is unfortunate as it means little can be done by individual developers. It is possible for an application to alter the standard operation but this leads to inconsistencies between applications which cause more problems that it solves.
My suggestions are simply to implement what I have discussed above.
Shortcuts should use thumb based modifier keys
Jef Raskin points out in The Humane Interface that the current keyboard design makes poor use of our thumbs. Raskin was involved with the Canon Cat which had two ‘leap’ keys beneath the space bar. The ‘leap’ keys allowed the user to ‘leap’ forward and backwards in a document. While I question the usefulness of leap keys in relation to modern GUIs it is certainly true that our thumbs are still ‘twiddling’ and should be put to better use.
Limited use of modes – shortcuts should utilize quasi-modes
Modes are an inherent feature of the desktop/windowing metaphor. However we can certainly reduce their negative impact by carefully considered design. Keep to the any standard shortcuts that are in use by the operating system. (The alternative is to purse other computing metaphors such as ZUIs (as outlined in the Humane Interface), or life streams).
Clearer labelling is harder to achieve, however there are a few systems for doing this. Digidesign produce custom keyboards for their Pro Tools system. Having used one of these keyboard I can testify to their usefulness. The problem with having the details printed on the key is that they are only applicable to one application.
A more generic solution to keyboard labelling is the Optimus Maximus keyboard. Each key of the Optimus Maximus has an embeded OLED screen. These screens are used to show different glyphs in different circumstance. For example when using Photoshop the keyboard displays the icons of the on screen tools. Unfortunately the Optimus Maximus is considerable more expensive than a standard keyboard (it also has received criticism for its typing experience). However the Optimus Maximus is the first of its kind – I expect more affordable and more tightly integrated solutions will soon emerge.