Apple's Hardware and GUI Design Symmetry
02/09/10 16:44
Yesterday Apple released iTunes 10 along with updates to the iPod line. The iTunes interface has had a few tweaks; 'album list', grey icons, repositioning of the 'traffic lights' to match the mini player and a restyled volume slider. The volume slider mimics the appearance of the shuffle switch of the iPod Shuffle:


The symmetry between Apple GUIs and hardware has existed for a while. In 1998 Apple had released the iMac. The iMac was an all in one PC with a trend setting corrugated plastic case. OS X, released in 2001, included a redesigned interface called Aqua. Aqua mimic the corrugated plastic of the iMac:


In 2003 the Power Mac G5 and PowerBook G4 were released. These Macs had a brushed aluminium case. OS X Panther was also released in 2003 and included a new brushed aluminium style for Finder:


By the release of Tiger in 2005 Apples hardware line up included the snow white iMac G5 and iBook. Tiger replaced the corrugated GUI appearance with a smooth polished appearance that matched the snow white hardware:



By the release of Leopard in 2007 the iMac, Mac Mini, MacBook Pro and Apple TV all had a brushed aluminium case, and once again the interface had evolved:



Another interface tweak in iTunes 10 are the 'flat' buttons:
These remind me of a patent for etched buttons. Could these flat buttons be a hint to the design of new Apple hardware?
HTML parsing/screen scraping in iOS
25/08/10 11:24
I asked a question on Stack Overflow about how to do screen scarping in iOS. The outcome was that using running some javascript with UIWebView stringByEvaluatingJavaScriptFromString: to extract and serialise the data is the best approach. However, there are a few gotchas.
Firstly, it's worth noting what stringByEvaluatingJavaScriptFromString actually 'returns'. It's a little strange but some examples make it clear. The comment at the end of the lines is the output of NSLog:
However, we can use stringByEvaluatingJavaScriptFromString: to inject javascript into the DOM and make additional stringByEvaluatingJavaScriptFromString calls to fetch the result:
It’s worth noting that there are other ways to communicate with the javascript in a UIWebView.
OK, on to the scraping.
The biggest problem is being certain that the DOM has loaded before the script is run. The UIWebViewDelegate protocol includes webViewDidFinishLoad: which at first glance seems perfect. If only life was that simple. I've encountered quite a few pages that trigger webViewDidFinishLoad: multiple times before the DOM is actually ready (presumable this is due to iframes or javascript).
The solution is to combine webViewDidFinishLoad: with the standard javascript approach of detecting when the DOM is ready. On the first invocation of webViewDidFinishLoad: we inject code to check the DOM for readyness (injecting this in webViewDidStartLoad: has unpredictable results):
We then poll the UIWebView to determine when the DOM is ready:
That's it!
I've created a class, EMKJavascriptEvaluation (zip archive), to handle all of this. Here’s a usage example:
Take a look at the .h for details.
The code is completely free and comes with no warranty what so ever (MIT style license). I haven’t used this code in a finished app yet so there’s probably a bug or two. Please send me an email if you have any comments.
Firstly, it's worth noting what stringByEvaluatingJavaScriptFromString actually 'returns'. It's a little strange but some examples make it clear. The comment at the end of the lines is the output of NSLog:
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"'hello';"]); // hello
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"'hello';'goodbye';"]); // goodbye
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"return 'hello';"]); //
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"var greeting = function(){return 'hello';}; greeting();"]); //hello
However, we can use stringByEvaluatingJavaScriptFromString: to inject javascript into the DOM and make additional stringByEvaluatingJavaScriptFromString calls to fetch the result:
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"document.wordOfTheDay = 'discorporate';"]); //
NSLog(@"%@", [webView stringByEvaluatingJavaScriptFromString:@"document.wordOfTheDay;"]); //discorporate
It’s worth noting that there are other ways to communicate with the javascript in a UIWebView.
OK, on to the scraping.
The biggest problem is being certain that the DOM has loaded before the script is run. The UIWebViewDelegate protocol includes webViewDidFinishLoad: which at first glance seems perfect. If only life was that simple. I've encountered quite a few pages that trigger webViewDidFinishLoad: multiple times before the DOM is actually ready (presumable this is due to iframes or javascript).
The solution is to combine webViewDidFinishLoad: with the standard javascript approach of detecting when the DOM is ready. On the first invocation of webViewDidFinishLoad: we inject code to check the DOM for readyness (injecting this in webViewDidStartLoad: has unpredictable results):
if (/loaded|complete/.test(document.readyState))
{
document.UIWebViewDocumentIsReady = true;
} else
{
document.addEventListener('DOMContentLoaded', function(){document.UIWebViewDocumentIsReady = true;}, false);
}
We then poll the UIWebView to determine when the DOM is ready:
-(void)pollDocumentReadyState
{
if ([@"true" caseInsensitiveCompare:[webview stringByEvaluatingJavaScriptFromString:@"document.UIWebViewDocumentIsReady;"]] == NSOrderedSame)
{
NSString *json = [webView stringByEvaluatingJavaScriptFromString:myFancyParsingAndSerializationScript];
//Do something with json
} else
{
[self performSelector:@selector(pollDocumentReadyState) withObject:nil afterDelay:1];
}
}
That's it!
I've created a class, EMKJavascriptEvaluation (zip archive), to handle all of this. Here’s a usage example:
-(void)beginScrape
{
[[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(jsEvaluationCompleted:) name:EMKJavascriptEvaluationComplete object:nil];
NSString *scriptPath = [[NSBundle mainBundle] pathForResource:@"myFancyParsingAndSerializationScript" ofType:@"js"];
NSString *script = [NSString stringWithContentsOfFile:scriptPath encoding:NSUTF8StringEncoding error:NULL];
NSURL* url = [NSURL URLWithString: @"http://example.com"];
EMKJavascriptEvaluation *evaluation = [EMKJavascriptEvaluation evaluateScript:script withHtmlAtURL:url];
[evaluation injectLibraryAtPath:[[NSBundle mainBundle] pathForResource:@"jquery" ofType:@"js"]];
[evaluation injectLibraryAtPath:[[NSBundle mainBundle] pathForResource:@"json2" ofType:@"js"]];
[evaluation evaluate];
}
-(void)jsEvaluationCompleted:(NSNotification*)notification
{
NSLog(@"result: %@", [[notification object] result]);
}
Take a look at the .h for details.
The code is completely free and comes with no warranty what so ever (MIT style license). I haven’t used this code in a finished app yet so there’s probably a bug or two. Please send me an email if you have any comments.
Data access layer with PHP Streams
27/03/10 16:06
I had an idea: "Why not use the PHP Streams layer as an abstraction mechanism for a data access layer?". After reading the PHP streams manual I can see no reason why this wouldn't work. By using streams the boiler plater functionality of a request can then be distilled into standard file operations. Authentication and authorization can be handled by defining a uri scheme that follows the standard form of
scheme://username:password@.... and calling getperems() on the uri. If username:password is invalid then getperms() would return false, else it would return standard unix permissions. It's a very RESTful solution.@property, dot syntax and a sprinkling of code
22/03/10 18:54
Until recently I've avoided dot syntax. I reluctantly adopted dot syntax due to the fact that Apple uses it extensively in iPhone Xcode templates. I had no desire to re-write the templates but I also wanted my code to be consistent. Therefore I bit the bullet and started using dot syntax.
I'm glad I made the change. With dot syntax I find it easier to see the difference between when the state of an object is being changed and when the object is being asked to perform an action.
(See In defense of Objective-C 2.0 Properties for a round up of the criticism and a defense of dot syntax).
Although I've only recently been using dot syntax I have always used the @property directive. Some times when setting (and occasionally getting) a property the object being called needs to do a little extra work, for example updating an internal cache. When using standard message syntax for setters this was easy to implement:
(See Objective-C 2.0 Accessors & Memory Management at Stay Hungry, Stay Foolish for a full description of this solution).
Unfortunately this solution doesn't work with dot syntax. With dot syntax...
is compiled into...
My solution is to use 2 property declarations. A private one that does the basic getting and setting and a public one which the extra code can be added to:
Admittedly this is a lot of boiler plate code which does slightly undermining the convenience of @property. However this approach allows a class to present a uniform interface for all of its properties.
I'm glad I made the change. With dot syntax I find it easier to see the difference between when the state of an object is being changed and when the object is being asked to perform an action.
(See In defense of Objective-C 2.0 Properties for a round up of the criticism and a defense of dot syntax).
Although I've only recently been using dot syntax I have always used the @property directive. Some times when setting (and occasionally getting) a property the object being called needs to do a little extra work, for example updating an internal cache. When using standard message syntax for setters this was easy to implement:
@interface ImageGallery : NSView{ ... NSArray* images;}@property(readwrite, retain, setter:setImagesSynth:) NSArray* images;-(void)setImages:(NSArray*)newImages;...@end@implementation ImageGallery...@synthesize images;-(void)setImages:(NSArray*)newImages{ [self setImagesSynth: newImages]; //do additional work here [self resizeCanvas];}@end(See Objective-C 2.0 Accessors & Memory Management at Stay Hungry, Stay Foolish for a full description of this solution).
Unfortunately this solution doesn't work with dot syntax. With dot syntax...
gallery.images = photos;is compiled into...
[gallery setImagesSynth: photos];My solution is to use 2 property declarations. A private one that does the basic getting and setting and a public one which the extra code can be added to:
@interface ImageGallery : NSView{ ... NSArray* imagesProperty;}@property(readwrite, retain) NSArray* imagesProperty;@property(readwrite, retain) NSArray* images;...@end@implementation ImageGallery...@synthesize imagesProperty;@dynamic images;-(NSArray*)images{ return self.imagesProperty;}-(void)setImages:(NSArray*)newImages{ self.setImagesProperty = newImages; //do additional work here [self resizeCanvas];}@endAdmittedly this is a lot of boiler plate code which does slightly undermining the convenience of @property. However this approach allows a class to present a uniform interface for all of its properties.
Don't forget the data model!
14/02/10 13:24
Regardless of the complexity of an application a data model can help gleam a deeper understanding of the problem that the application is addressing. A data model is an abstract description of the data, it is not tied to any technology. A data model illustrates the core of an application. A data model will help find flaws in your application and provides rationale for further development.
Reading the Wikipedia Data Modelling article may leave you scratching your head. Theres a huge number of long, cryptic terms and it's not clear how or where to start the modelling. I'm not a data modelling expert, but my approach, outlined below, has served me well.
I use two types of diagrams. I start by sketching an Entity Relationship model (ERM). I do this on paper and iterate quickly. The ERM gives me a broad understanding of the problem. Once I'm happy with the ERM I move onto an Entity-attribute-value model (EAV). The EAV lets me flesh out the details and describe inheritance relationships within the data (EAV is the approach used by Core Data).
Reading the Wikipedia Data Modelling article may leave you scratching your head. Theres a huge number of long, cryptic terms and it's not clear how or where to start the modelling. I'm not a data modelling expert, but my approach, outlined below, has served me well.
I use two types of diagrams. I start by sketching an Entity Relationship model (ERM). I do this on paper and iterate quickly. The ERM gives me a broad understanding of the problem. Once I'm happy with the ERM I move onto an Entity-attribute-value model (EAV). The EAV lets me flesh out the details and describe inheritance relationships within the data (EAV is the approach used by Core Data).
Zend Framework Quickstart internal server error
19/11/09 18:50
I’m learning the Zend Framework with the help of the quick start tutorial. I received a ‘500 internal server error’ when trying to access
/guestbook. A look in the Apache log file (/private/var/log/apache2/error_log or via Console.app) reveals that mod_rewrite is exceeding 10 redirects. The solution is to edit /quickStartRoot/public/.htaccess file so that the rewrite is an absolute path, ie the last line should be: RewriteRule ^.*$ /index.php [NC,L]Setting up a web development environment in OS X
19/11/09 18:47
- Follow Snow Leopard for web developer
- Use Coda by Panic
Tweaking HTTP
16/10/09 12:37
I recently watched Alan Kay's OOPSLA 1997 lecture "the Computer Revolution hasn't Happened Yet". Watching it was mildly frustrating as Kay would make tantalising abstract points and not elaborate upon them.
Kay makes some points about the web which I initially found jarring. But after thinking about the web in the context of the message passing environments that Kay helped pioneer I began to see the flaws that Kay may have had in mind.
So, what's wrong with the web and how can it be fixed? In my view there are two main problems, both of which can be solved by without compromising the simplicity of the web. These problems are the assumption of prior knowledge and the richness of communication.
When the web was created it consisted of one data format, HyperText Markup Language documents (HTML), and a protocol for transporting them, HyperText Transport Protocol (HTTP). The only web clients were web browsers and the only action they could perform was to GET documents. Browser only had to know how to handle HTML data as that was the only type of data on the web. This was the web as codified in the 0.9 spec.
The short comings of the 0.9 spec were quickly addressed by the 1.0 spec (and further expanded in the 1.1 spec) which extended HTTP in two ways. Firstly it allowed arbitrary data, not just HTML, to be transported. Secondly, it augmented the GET request method with the POST method, thus providing a mechanism for the client to send data to the server. It is the implementation of these two features which I think Alan Kay may have had in mind when he criticised the web.
In the 0.9 spec HTTP was coupled to HTML - any data sent over HTTP was assumed to be HTML. The web was initially a system for sharing interlinked documents which were to be displayed on a screen for a human to read. This coupling allowed the system to remain simple but at a cost - if the data could not be represented in an HTML document then it could not be made available on the web.
The coupling of HTTP and HTML was addressed by the introduction of the content-type header. The content-type header allowed the server to send metadata that described the format of the data that the client had requested. The content-type header effectively de-coupled HTTP from HTML resulting in a more flexible system. This opened up HTTP to any situation that required data to be transmitted from point A to point B. The data no longer had to be HTML and there was no requirement for it to be consumable by humans. A client could be any software that consumed data, not just a browser.
This increase in scope created a conflict with the primary intent of the web as a system for sharing documents. In the 0.9 spec the browser only had to render HTML documents to be able to display all of the documents available on the web. The implication of the content-type header is that if a browser is to render everything on the web then it has to be able to render any data that the world can throw at it - which is a seemingly impossible task. This problem is currently address by the following measures:
I believe this approach is defeatist. The system doesn't empower the browser, instead it defaults to the lowest common denominator.
A better solution would be for the server to provided rendering instructions in addition to the data. This would effectively separate the communication of the data from the rendering of the data. Therefore the browser would only have to fetch the data and provide a screen space for the data renderer. A HTTP headers listing a URI would be an adequate mechanism for locating a renderer - for example
I think this is what Alan Kay was getting at when he said the following:
In the 0.9 spec there was one request method that a client could use to communicate with the server; GET. Later this was augmented to include other methods, most noticeable POST, PUT and DELETE. The HTTP spec outlines the intent of this verbs, but due to technical limitations, ambiguity in the spec (and poor programming practice) request methods do not adequately describing the intent of the request. Some examples:
These problems could be addressed by using semantically relevant methods. Such methods are permissible within the 1.1 spec:
Accompanied by the "405 Method Not Allowed" status code HTTP provides a very clean mechanism for semantically correct communication. The two problems outlined above could then be easily addressed:
The first of these suggestions is certainly grander than the second and would require significant changes to client software to implement in full. The second of these suggestions would only require a guiding hand as there are no (theoretical) infrastructure changes required. The creation of a registry for extended HTTP request methods is all that would be required. The HTTP 1.1 spec is currently being revised and it seems that such a registry has been suggested.
Kay makes some points about the web which I initially found jarring. But after thinking about the web in the context of the message passing environments that Kay helped pioneer I began to see the flaws that Kay may have had in mind.
So, what's wrong with the web and how can it be fixed? In my view there are two main problems, both of which can be solved by without compromising the simplicity of the web. These problems are the assumption of prior knowledge and the richness of communication.
Background
When the web was created it consisted of one data format, HyperText Markup Language documents (HTML), and a protocol for transporting them, HyperText Transport Protocol (HTTP). The only web clients were web browsers and the only action they could perform was to GET documents. Browser only had to know how to handle HTML data as that was the only type of data on the web. This was the web as codified in the 0.9 spec.
The short comings of the 0.9 spec were quickly addressed by the 1.0 spec (and further expanded in the 1.1 spec) which extended HTTP in two ways. Firstly it allowed arbitrary data, not just HTML, to be transported. Secondly, it augmented the GET request method with the POST method, thus providing a mechanism for the client to send data to the server. It is the implementation of these two features which I think Alan Kay may have had in mind when he criticised the web.
Arbitrary data and the assumption of prior knowledge
In the 0.9 spec HTTP was coupled to HTML - any data sent over HTTP was assumed to be HTML. The web was initially a system for sharing interlinked documents which were to be displayed on a screen for a human to read. This coupling allowed the system to remain simple but at a cost - if the data could not be represented in an HTML document then it could not be made available on the web.
The coupling of HTTP and HTML was addressed by the introduction of the content-type header. The content-type header allowed the server to send metadata that described the format of the data that the client had requested. The content-type header effectively de-coupled HTTP from HTML resulting in a more flexible system. This opened up HTTP to any situation that required data to be transmitted from point A to point B. The data no longer had to be HTML and there was no requirement for it to be consumable by humans. A client could be any software that consumed data, not just a browser.
This increase in scope created a conflict with the primary intent of the web as a system for sharing documents. In the 0.9 spec the browser only had to render HTML documents to be able to display all of the documents available on the web. The implication of the content-type header is that if a browser is to render everything on the web then it has to be able to render any data that the world can throw at it - which is a seemingly impossible task. This problem is currently address by the following measures:
- Standards based data formats are developed and promoted
- Browsers directly render common data types (i.e. the most common data formats, which are not necessarily the standards based formats)
- Browsers provide a plugin mechanism for other data types (the HTML spec provides a mechanism for embedding these plugins in the form of the
- Browsers save unknown data types to disk and hand them off to the OS
I believe this approach is defeatist. The system doesn't empower the browser, instead it defaults to the lowest common denominator.
A better solution would be for the server to provided rendering instructions in addition to the data. This would effectively separate the communication of the data from the rendering of the data. Therefore the browser would only have to fetch the data and provide a screen space for the data renderer. A HTTP headers listing a URI would be an adequate mechanism for locating a renderer - for example
pargma: content-render http://example.com/renders/mathml. This approach could resolve or help to resolve all sorts of problems:- 'browser' compatibility problems (rendering engine compatibility problems is technically more accurate) e.g.
pargma: content-render http://microsoft.com/guano/trident - codec support e.g.
pargma: content-render http://xiph.org/codec/ogg
I think this is what Alan Kay was getting at when he said the following:
"HTML on the Internet has gone back to the dark ages because it presupposes that there should be a browser that should understand its formats. This has to be one of the worst ideas since MS-DOS."
Richness of communication
In the 0.9 spec there was one request method that a client could use to communicate with the server; GET. Later this was augmented to include other methods, most noticeable POST, PUT and DELETE. The HTTP spec outlines the intent of this verbs, but due to technical limitations, ambiguity in the spec (and poor programming practice) request methods do not adequately describing the intent of the request. Some examples:
- The GET method can only use the query part of the URI to transmit data to the server. The URI has a practical maximum length, therefore there is a limit to the amount of data a client can send to the server. To overcome this problem clients often send query data via the POST method. The POST (and PUT) methods both imply the creation of a resource, which is not think of happening when we querying a resource.
- Cool URI's Don't Change states that
In which case why is DELETE one of the HTTP verbs?Pretty much the only good reason for a document to disappear from the Web is that the company which owned the domain name went out of business or can no longer afford to keep the server running.
These problems could be addressed by using semantically relevant methods. Such methods are permissible within the 1.1 spec:
The set of common methods for HTTP/1.1 is defined below. Although this set can be expanded, additional methods cannot be assumed to share the same semantics for separately extended clients and servers.
Accompanied by the "405 Method Not Allowed" status code HTTP provides a very clean mechanism for semantically correct communication. The two problems outlined above could then be easily addressed:
- A QUERY method which makes use of the message body. This would over come the limitations of using GET and the semantic error of using POST (this ignores the failing of HTML forms which can only use the GET and POST methods. HTML 5 will allow PUT and DELETE but not arbitrary methods)
- Using an EXPIRE or INVALIDATE method would be better than DELETE.
Conclusion
The first of these suggestions is certainly grander than the second and would require significant changes to client software to implement in full. The second of these suggestions would only require a guiding hand as there are no (theoretical) infrastructure changes required. The creation of a registry for extended HTTP request methods is all that would be required. The HTTP 1.1 spec is currently being revised and it seems that such a registry has been suggested.
10/GUI
13/10/09 21:13
I've just watched the video at 10gui.com:
I'm impressed. Most re-inventions of the desktop leave me with little more than a renewed contempt for technology. 3D desktops, for example, are a teribble idea. Superfically they look pretty, but they are inherintly flawed as they try to reduce 3 dimensions worth of information into 2 dimensions. I'm a huge fan of Jeff Raskins The Humane Interface, but I believe the zooming metaphor fails as it effiectively locks data into a single hierarchally structure and also requires information to be presented in a document orientated manner.
10GUI strikes me as an attempt to keep the best bits of the current windowing metaphor (eg document or task based interface design) and the zooming metaphor (eg increasing the usefulness of spatial information).
10/GUI from C. Miller on Vimeo.
I'm impressed. Most re-inventions of the desktop leave me with little more than a renewed contempt for technology. 3D desktops, for example, are a teribble idea. Superfically they look pretty, but they are inherintly flawed as they try to reduce 3 dimensions worth of information into 2 dimensions. I'm a huge fan of Jeff Raskins The Humane Interface, but I believe the zooming metaphor fails as it effiectively locks data into a single hierarchally structure and also requires information to be presented in a document orientated manner.
10GUI strikes me as an attempt to keep the best bits of the current windowing metaphor (eg document or task based interface design) and the zooming metaphor (eg increasing the usefulness of spatial information).
Slugs and Airports
15/09/09 10:31
I've setup an NSLU2 to serve video to my iPod Touch. Here's what I did.
I have a conflicted position with regard to technology. On one hand I think technology should just work. I have no time for cryptic interfaces and poorly written instruction manuals. This is why I like Apple. I have a Macbook, and Airport Extreme (with printer and hard drives attached) and an iPod Touch. These three devices constitute my home network and I'm pleased to say that it works perfectly.
However, I'm also a geek. When I see the flash of a green LED I can't help but ask myself 'I wonder if...'. Over time my geekish tendencies to fiddle have been tempered by my desire to have elegant solutions, so much so that I know refuse to let a geekish endeavour compromise the integrity a working solution. It turns out that this restriction has, possibly unsurprisingly, caused my geekish side to raise its' game.
A few years ago I bit the bullet and ripped all of my CDs. It took a while, but it was a worthwhile investment. Ever since then I've had my sights set on my DVD collection. Ripping all my DVDs is a harder problem to crack. The sticking point was not the ripping, it was the watching. I have no problem having my music only being accessible from my laptop (or my transferring to my iPod), but that won't do for movies - I don't want to mess with my laptop just to watch a movie. I want a solution that just works;
I want to be able to watch a movies by expending less effort than if I were to put a disc in a dvd player. So how do I do it? First of all lets take a look at the elements that constitute the current solution. Firstly there are the shelves full of DVD discs, secondly there's the DVD player to play them on.
Replacing the DVD player is relatively straight forward. I have a TV-out cable for my iPod Touch which I'm very pleased with. This is the simplest replacement for the DVD player.
Replacing the shelves of DVDs is less straight forward. The first problem is deciding where to store the video. They could be stored on the laptop, but there are problems with this. The laptop only has a 500GB hard drive video files take up a lot of space. Even if the hard drive had nothing except movies on it, it wouldn't take too long before the hard drive was full. Also, if the movies were stored on the laptop then the laptop would have to be turned for the movies to be accessible. My anal-retentive non-geeky side deems this unacceptable as it would require turning the laptop on whenever I wanted to watch a movie. The alternative to storing them on the laptop is to store them on a network drive. With this in mind I bought a 2TB WD RAID drive and attached it to the Airport Extreme. I set up the drive to act in mirror mode to provide a backup if one of the drive fails (it's not possible to backup network drives with Time Machine).
The second problem is how to get the movies from the Airport to the iPod. The movies can be transferred via iTunes or they can be access wirelessly via HTTP or a different protocol, such as uPnP ("there's an app for that"). Obviously transferring movies via iTunes is not an option for the same reasons that storing the movies on the laptop are not an option. That leaves the wireless network option — which raises the question of a server.
The Airport Extreme has only 2 network protocols AFP and SAMBA, neither of which can be accessed via the iPod touch. Game over? No. The solution is to add another network device which can access the Airport Extremes network drives and then make that data available via a protocol that the iPod Touch can work with. The obvious device is another computer. Another computer could easily be setup to act as a server. But there in lies the problem, another computer could easily do a lot more than sever video. Another computer is massive overkill. The wasted energy and computing power of having a computer set up just to act as a web server for one or two clients makes me shudder.
So what else could do the job? The Linksys NSLU2, affectionately known as a slug. The NSLU2 has one LAN port and 2 USB ports. The idea is that you attach USB hard drive to the NSLU2, hook it up to your network and the hard drive become accessible via the network. At least that's how Linksys expected the device to be used, some geeks had different ideas. NSLU2-Linux is the home of the alternate firmware for NSLU2. By installing alternative firmware the NSLU2 becomes capable of much more. There are a few different firmwares to choice from. After a bit of playing with them I decided to go for the Debian Lenny. This provides a full Linux system with a huge selection of software to install.
Martin Michlmayr has detailed instruction for installing Debian on the NSLU2. I used the unofficial image from Slug-Firmware.net and upslug2 for flashing the NSLU2. There isn't a pre-compiled version of upslug2 for OS X. I used a Ubuntu 9.04 virtual machine. There is an upslug2 package in the pre-configure Ubuntu repository. To install it type the following use the following command:
Once Lenny is installed you will need access to edit files. This can be done via SSH and edited in the terminal with vi or nano, but Macfusion offers an alternative. Macfusion makes it possible for arbitrary resources to mounted as part of the local file system, (Macfusion is based on FUSE). FTPFS is such a resource. FTPFS clients can connect to an out of the box Lenny installation. If you install Macfusion you will be able to browse the NSLU2 filesystem with Finder and edit files with the app of your choice.
Once Lenny has installed, login to the NSLU2 via SSH, and install the SAMBA package (we only need the client package):
Create a directory to mount to:
Mount the shared drive:
Check that it worked:
The contents of the root directory of you shared drive should be listed.
Make sure there is a blank line at the end of fstab.
Check that the amendment has worked by rebooting:
After the reboot the contents of the network drive should be available in
Nginx is a very capable alternative to resource-hungry Apache. Nginx is resource-light thus making it well suited to the NSLU2.
Install nginx:
Check that nginx has installed by entering accessing your NSLU2 via a browser.
Organise the web server directory to make it easier to add extra features later:
Create a symlink to the mount point of the shared network drive:
Next, add the mime types for mpeg 4 movies to the Nginx mime type config file. The file is found at
The last step is to configure the Nginx site so that it serves the files from the correct directory and is capable of sending large files. Open the default site config file located at
Change the value of
Finally, restart nginx:
Access your slug via the browser and you will be able to files shared by the Airport. Movies will be accessible from an iPod Touch or iPhone. If everything went as planned the NSLU2 will mount the network drive and start Nginx when it is turned on.
Installing a server-side scripting language gives you greater control over the pages you server. Scripting languages have to run as a fastCGI server for them to be accessible from nginx. I installed PHP. Here's some screen shots of the site my NSLU2 servers up:


I did this by setting the autoindex to a script that is located in the root of public_html. The source code is a bit messy so I won't release it just yet. However, if you'd like the messy version just send me an email.
FireFly is an DAAP server. DAAP is the protocol used by iTunes and AppleTV to access remote media. Unfortunately the version of FireFly in the Lenny repository doesn't serve video correctly. There are other DAAP servers, but I haven't investigated them.
Setting up a dynamic DNS would allow you to access your NSLU2 from anywhere on the internet.
The problem
I have a conflicted position with regard to technology. On one hand I think technology should just work. I have no time for cryptic interfaces and poorly written instruction manuals. This is why I like Apple. I have a Macbook, and Airport Extreme (with printer and hard drives attached) and an iPod Touch. These three devices constitute my home network and I'm pleased to say that it works perfectly.
However, I'm also a geek. When I see the flash of a green LED I can't help but ask myself 'I wonder if...'. Over time my geekish tendencies to fiddle have been tempered by my desire to have elegant solutions, so much so that I know refuse to let a geekish endeavour compromise the integrity a working solution. It turns out that this restriction has, possibly unsurprisingly, caused my geekish side to raise its' game.
A few years ago I bit the bullet and ripped all of my CDs. It took a while, but it was a worthwhile investment. Ever since then I've had my sights set on my DVD collection. Ripping all my DVDs is a harder problem to crack. The sticking point was not the ripping, it was the watching. I have no problem having my music only being accessible from my laptop (or my transferring to my iPod), but that won't do for movies - I don't want to mess with my laptop just to watch a movie. I want a solution that just works;
I want to be able to watch a movies by expending less effort than if I were to put a disc in a dvd player. So how do I do it? First of all lets take a look at the elements that constitute the current solution. Firstly there are the shelves full of DVD discs, secondly there's the DVD player to play them on.
Replacing the DVD player is relatively straight forward. I have a TV-out cable for my iPod Touch which I'm very pleased with. This is the simplest replacement for the DVD player.
Replacing the shelves of DVDs is less straight forward. The first problem is deciding where to store the video. They could be stored on the laptop, but there are problems with this. The laptop only has a 500GB hard drive video files take up a lot of space. Even if the hard drive had nothing except movies on it, it wouldn't take too long before the hard drive was full. Also, if the movies were stored on the laptop then the laptop would have to be turned for the movies to be accessible. My anal-retentive non-geeky side deems this unacceptable as it would require turning the laptop on whenever I wanted to watch a movie. The alternative to storing them on the laptop is to store them on a network drive. With this in mind I bought a 2TB WD RAID drive and attached it to the Airport Extreme. I set up the drive to act in mirror mode to provide a backup if one of the drive fails (it's not possible to backup network drives with Time Machine).
The second problem is how to get the movies from the Airport to the iPod. The movies can be transferred via iTunes or they can be access wirelessly via HTTP or a different protocol, such as uPnP ("there's an app for that"). Obviously transferring movies via iTunes is not an option for the same reasons that storing the movies on the laptop are not an option. That leaves the wireless network option — which raises the question of a server.
The Airport Extreme has only 2 network protocols AFP and SAMBA, neither of which can be accessed via the iPod touch. Game over? No. The solution is to add another network device which can access the Airport Extremes network drives and then make that data available via a protocol that the iPod Touch can work with. The obvious device is another computer. Another computer could easily be setup to act as a server. But there in lies the problem, another computer could easily do a lot more than sever video. Another computer is massive overkill. The wasted energy and computing power of having a computer set up just to act as a web server for one or two clients makes me shudder.
So what else could do the job? The Linksys NSLU2, affectionately known as a slug. The NSLU2 has one LAN port and 2 USB ports. The idea is that you attach USB hard drive to the NSLU2, hook it up to your network and the hard drive become accessible via the network. At least that's how Linksys expected the device to be used, some geeks had different ideas. NSLU2-Linux is the home of the alternate firmware for NSLU2. By installing alternative firmware the NSLU2 becomes capable of much more. There are a few different firmwares to choice from. After a bit of playing with them I decided to go for the Debian Lenny. This provides a full Linux system with a huge selection of software to install.
The solution - How to set up an NSLU2 to server movies
1. Install Debian (Lenny)
Martin Michlmayr has detailed instruction for installing Debian on the NSLU2. I used the unofficial image from Slug-Firmware.net and upslug2 for flashing the NSLU2. There isn't a pre-compiled version of upslug2 for OS X. I used a Ubuntu 9.04 virtual machine. There is an upslug2 package in the pre-configure Ubuntu repository. To install it type the following use the following command:
apt-get install upslug2Macfusion
Once Lenny is installed you will need access to edit files. This can be done via SSH and edited in the terminal with vi or nano, but Macfusion offers an alternative. Macfusion makes it possible for arbitrary resources to mounted as part of the local file system, (Macfusion is based on FUSE). FTPFS is such a resource. FTPFS clients can connect to an out of the box Lenny installation. If you install Macfusion you will be able to browse the NSLU2 filesystem with Finder and edit files with the app of your choice.
2. Install SAMBA
Once Lenny has installed, login to the NSLU2 via SSH, and install the SAMBA package (we only need the client package):
NSLU1:/# apt-get install smbfsCreate a directory to mount to:
NSLU1:/# mkdir /media/MoviesMount the shared drive:
NSLU1:/# mount -t smbfs //AIRPORT-IP-ADDRESS/AIRPORT-VOLUME-NAME /media/Movies -o password=AIRPORT-PASSWORDCheck that it worked:
NSLU1:/# ls /media/MoviesThe contents of the root directory of you shared drive should be listed.
3. Mount SAMBA drive on boot
fstab is used to determine where to mount devices. By adding an entry for the shared drive the NSLU2 will mount the drive at boot. Open /etc/fstab and add the following line://AIRPORT-IP-ADDRESS/AIRPORT-VOLUME-NAME /media/Movies smbfs password=AIRPORT-PASSWORD 0 0Make sure there is a blank line at the end of fstab.
Check that the amendment has worked by rebooting:
NSLU1:/# rebootAfter the reboot the contents of the network drive should be available in
/media/Movies.4. Install a web server (Nginx)
Nginx is a very capable alternative to resource-hungry Apache. Nginx is resource-light thus making it well suited to the NSLU2.
Install nginx:
NSLU1:/# apt-get install nginxCheck that nginx has installed by entering accessing your NSLU2 via a browser.
5. Configure Nginx
Organise the web server directory to make it easier to add extra features later:
NSLU1:/# mkdir /var/www/public_htmlCreate a symlink to the mount point of the shared network drive:
NSLU1:/# ln -s /media/Movies /var/www/public_html/MoviesNext, add the mime types for mpeg 4 movies to the Nginx mime type config file. The file is found at
/etc/nginx/mime.types , open it and add the following lines above the closing curly bracket.video/x-m4v m4v;
video/mp4 mp4; The last step is to configure the Nginx site so that it serves the files from the correct directory and is capable of sending large files. Open the default site config file located at
/etc/nginx/sites-enabled/default (this file is actually a symlink to a file in sites-available).Change the value of
root from /var/www; to /var/www/public_html;. Add the following line into the server section: sendfile off;.Finally, restart nginx:
NSLU1:/# /etc/init.d/nginx restartAccess your slug via the browser and you will be able to files shared by the Airport. Movies will be accessible from an iPod Touch or iPhone. If everything went as planned the NSLU2 will mount the network drive and start Nginx when it is turned on.
Optional extras
Install a scripting language (eg PHP)
Installing a server-side scripting language gives you greater control over the pages you server. Scripting languages have to run as a fastCGI server for them to be accessible from nginx. I installed PHP. Here's some screen shots of the site my NSLU2 servers up:


I did this by setting the autoindex to a script that is located in the root of public_html. The source code is a bit messy so I won't release it just yet. However, if you'd like the messy version just send me an email.
FireFly
FireFly is an DAAP server. DAAP is the protocol used by iTunes and AppleTV to access remote media. Unfortunately the version of FireFly in the Lenny repository doesn't serve video correctly. There are other DAAP servers, but I haven't investigated them.
Access your NSLU2 from anywhere
Setting up a dynamic DNS would allow you to access your NSLU2 from anywhere on the internet.
<select multiple> sucks
13/06/09 14:00
The select element is used to create list of options. In ‘normal’ mode it presents a popup box. In ‘multiple’ mode it presents a list which requires the user to hold a key to select additional items. The native list control in Windows and OS X works exactly the same.
I really don’t like this control. There are no visual clues that the user can select multiple items, which means that most users don’t know that multiple selections are possible. To address this problem websites often add a label to explain how multiple selections is made:

When notes and labels are added to things it’s a huge clue that the thing in question suffers from poor design. Also, the label in the screenshot is inaccurate. It is true in Windows, but not in OS X (and possibly not in true in GTK, QT etc).
The control requires the user to user press a key so that they can make multiple multiple selections - this means that the control is quasi-modal. Modes confuse the user and should be avoid. For such a simple task these failings are inexcusable.
Here’s a better approach:
<div style="overflow-y:scroll;height:6em;width:20em;border:1px solid black;">
<input type="checkbox">Jimmy</input><br />
<input type="checkbox">Jimi</input><br />
<input type="checkbox">Frank</input><br />
<input type="checkbox">Dweezil</input><br />
<input type="checkbox">Jeff</input><br />
<input type="checkbox">Keef</input><br />
<input type="checkbox">John</input><br />
</div>
The above creates a scrolling checkbox list by setting the size and overflow style attributes of the parent block element (in this case a <div>, but it could be applied to the <form> directly). Checkboxes lists are common in OS’s so the user will understand how to use the control.
I really don’t like this control. There are no visual clues that the user can select multiple items, which means that most users don’t know that multiple selections are possible. To address this problem websites often add a label to explain how multiple selections is made:

When notes and labels are added to things it’s a huge clue that the thing in question suffers from poor design. Also, the label in the screenshot is inaccurate. It is true in Windows, but not in OS X (and possibly not in true in GTK, QT etc).
The control requires the user to user press a key so that they can make multiple multiple selections - this means that the control is quasi-modal. Modes confuse the user and should be avoid. For such a simple task these failings are inexcusable.
Here’s a better approach:
Jimmy
Jimi
Frank
Dweezil
Jeff
Keef
John
Jimi
Frank
Dweezil
Jeff
Keef
John
<div style="overflow-y:scroll;height:6em;width:20em;border:1px solid black;">
<input type="checkbox">Jimmy</input><br />
<input type="checkbox">Jimi</input><br />
<input type="checkbox">Frank</input><br />
<input type="checkbox">Dweezil</input><br />
<input type="checkbox">Jeff</input><br />
<input type="checkbox">Keef</input><br />
<input type="checkbox">John</input><br />
</div>
The above creates a scrolling checkbox list by setting the size and overflow style attributes of the parent block element (in this case a <div>, but it could be applied to the <form> directly). Checkboxes lists are common in OS’s so the user will understand how to use the control.
The correct way to display time
28/02/09 08:36
I’ve just upgraded the hard drive of my MacBook. To do this I used the excellent Drive Genius 2 which was included in the MacUpdate Holiday Bundle. Drive Genius 2 is a great piece of software. It’s really easy to use compared to GParted and all the other fiddle Linux tools. The interface to Drive Genius contains lots of (superfluous) animation but on the whole is very clear. However, when I was duplicating the drive I noticed something that got my goat:

Why is time state as a decimal? This is confusing. For example, does ‘4.33 hours’ mean 4 hour 33 minutes or 4 hour 20 minutes?
Time should be displayed in its natural units; hours minutes and seconds. The work involved in converting ‘1.5 hours’ into ‘1 hour 30 minutes’ is trivial and the result is a better user experience.
Bonus rant: I much prefer analogue clocks that have a second hand which moves at a constant rate rather than one that ticks. I prefer them for two reasons. Firstly time is continuous, the discreet unit of a second is for our convenience, therefore continuous movement better represents this. Secondly, I hate the constant ‘tick. tick. tick.’ it drives me crazy!
Why is time state as a decimal? This is confusing. For example, does ‘4.33 hours’ mean 4 hour 33 minutes or 4 hour 20 minutes?
Time should be displayed in its natural units; hours minutes and seconds. The work involved in converting ‘1.5 hours’ into ‘1 hour 30 minutes’ is trivial and the result is a better user experience.
Bonus rant: I much prefer analogue clocks that have a second hand which moves at a constant rate rather than one that ticks. I prefer them for two reasons. Firstly time is continuous, the discreet unit of a second is for our convenience, therefore continuous movement better represents this. Secondly, I hate the constant ‘tick. tick. tick.’ it drives me crazy!
NSCollectionView Tips
04/11/08 11:55
Recently I’ve been teaching myself Cocoa. I’ve been following the excellent Cocoa Programming For Mac OS X by Aaron Hillegass. Quality reference material like this book and Apple’s documentation makes learning much easier. Apple’s material is consitent, consisce and in the most part complete. However, there is one class where Apple’s reference is quite poor; NSCollectionView.
NSCollectionView is similar to NSTableView; they both display data with the help of a prototype which is copied for each piece of data to be displayed. NSTableView uses the NSCell for the prototype. The NSCell draws its self directly onto the NSTableView. NSCollectionView uses NSCollectionViewItem for the prototype. NSCollectionViewItem is a simple controller (it inherits directly from NSObject), it has no visual element. NSCollectionViewItem has two properties, representedObject and view. representedObject holds the object that the view will display and is set by the NSCollectionView. It is the responsibility of the NSCollectionViewItem to provide the view object.
When an NSCollectionView is created in Interface Builder two additional objects are created:
Well, it’s great for a short while. Problems arises when you want more than simple bindings between the view and the representedObject. There are two parts to this problem:
The next problem is that unlike the bindings the in the NSView, the IBOutlet’s specifed in our NSCollectionViewItem subclass are not connected when the prototype is copied. So how do we connect the IBOutlet’s specified in our NSCollectionViewItem subclass to the controls in the view? This problem is trivial once you realise that Interface Builder is not being very clever.
Interface Builder puts the custom NSView in the same nib as the NSCollectionView and NSCollectionViewItem. This is dumb. The solution is to move the NSView to its own nib and get the controller to load the view programmatically:
[NSBundle LoadFromNib:@"viewItem" owner: result];
//we can configure other aspects of result too [result setPopupMenuDelegate: [self popupMenuDelegate];
return result; } /*This might not be the best place for LoadFromNib:. If it was place in setRepresentObject: we could load different views depending on the class of the representedObject.*/
Problem solved. We now have much more control of NSCollectionView. (Remember you can still bind to representObject).
NSCollectionView is similar to NSTableView; they both display data with the help of a prototype which is copied for each piece of data to be displayed. NSTableView uses the NSCell for the prototype. The NSCell draws its self directly onto the NSTableView. NSCollectionView uses NSCollectionViewItem for the prototype. NSCollectionViewItem is a simple controller (it inherits directly from NSObject), it has no visual element. NSCollectionViewItem has two properties, representedObject and view. representedObject holds the object that the view will display and is set by the NSCollectionView. It is the responsibility of the NSCollectionViewItem to provide the view object.
When an NSCollectionView is created in Interface Builder two additional objects are created:
- An NSCollectionViewItem which is connected to the prototype outlet of the NSCollectionView
- An NSView which is connected to the view outlet of the NSCollectionViewItem
Well, it’s great for a short while. Problems arises when you want more than simple bindings between the view and the representedObject. There are two parts to this problem:
- where do we put our controller code?
- how do we access the IBOutlets specified in our controller code?
The next problem is that unlike the bindings the in the NSView, the IBOutlet’s specifed in our NSCollectionViewItem subclass are not connected when the prototype is copied. So how do we connect the IBOutlet’s specified in our NSCollectionViewItem subclass to the controls in the view? This problem is trivial once you realise that Interface Builder is not being very clever.
Interface Builder puts the custom NSView in the same nib as the NSCollectionView and NSCollectionViewItem. This is dumb. The solution is to move the NSView to its own nib and get the controller to load the view programmatically:
- Move the NSView into its own nib (thus breaking the connection between the NSCollectionViewItem and NSView).
- In I.B., change the Class Identity of File Owner to the NSCollectionViewItem subclass.
- Connect the controls to the File Owner outlets.
- Finally get the NSCollectionViewItem subclass to load the nib:
[NSBundle LoadFromNib:@"viewItem" owner: result];
//we can configure other aspects of result too [result setPopupMenuDelegate: [self popupMenuDelegate];
return result; } /*This might not be the best place for LoadFromNib:. If it was place in setRepresentObject: we could load different views depending on the class of the representedObject.*/
Problem solved. We now have much more control of NSCollectionView. (Remember you can still bind to representObject).
Ramblings on Keyboard Shortcuts
27/07/08 11:11
On the surface keyboard shortcuts seems like a rather small topic: ctrl + s to saves and alt + f4 to closes, what else is there to know? But as with most things there’s always more than what meets the eye.
First of all lets state what a keyboard shortcut is and what it does:
A keyboard shortcut (or accelerator key, shortcut key, hot key, key binding, keybinding, key combo, etc.) is a key or set of keys that performs a predefined function. These functions can often be done via some other, more indirect mechanism, such as using a menu, typing a longer command, and/or using a pointing device. By reducing such sequences to a few keystrokes, this can often save the user time, hence “shortcut”.
Wikipedia - Keyboard shortcut
For a system to be sucessful it needs to be sympathetic to its user. We therefore need to look at limitations of the user and how to accomodate them.
Physiology (ergonomics)
Anyone that paid attention in biology class will have heard the term opposable thumbs. The “thumbs” of other animals evolved into wings, hooves or flippers, but ours have shift around a bit to be opposite our fingers. This change in position allows us to grab things, it has also resulted in our thumbs becoming the strongest and one of the more dextrous fingers. Our fingers decease in strength as they move away from the thumb. It therefore follows that any good design should utilise this fact by making good use of our superior digits; the thumb, index and middle (ring) fingers.
The brain (cognetics)
Designing an object so it is sympathetic to our bodies is only half of the story. A well designed ‘thing’ must be sympathetic to the constraints of our minds too.
Modes are a major design consideration for good shortcuts.
In user interface design, a mode is a distinct setting within a computer program or any physical machine interface, in which the same user input will produce perceived different results than it would in other settings.
Wikipedia - Mode (computer interface)
Some examples:
The second consideration is human memory. Our short term memory is quiet limited. We can store around 7 items of data in our short term memory. This has an impact on the way we use shortcuts. When a user is performing a task they will concentrating on their data than the tools for manipulating the data.
There are a few ways to reduce the burden of remember how to use a system. The first is to remove the burden of remembering - this is done by clear labelling. The second is by creating meaning relationships between the desired result and required action. Meaningful relationships allow users to understand the system as a whole rather than having to learn a collection of unrelated and arbitrary actions.
Current systems shortcuts
Lets see how Windows XP and OS X Leopard far with the above criteria.
Ergonomics:
The physical design of Mac and standard PC keyboards are almost identical. The most noticeable differences are the modifier keys. A standard PC keyboard has ctrl, alt and windows keys, a Mac keyboard has ctrl, alt and cmd keys.
Left side of a mac keyboard
Left side of a Windows keyboard
In Windows the most common modifier key used with shortcuts is the ctrl key. The crtl keys are located on the bottom row at the far left and far right of a standard keyboard.
It is the little finger (pinkie finger) that people most often use to press the ctrl key. The little finger is a feeble thing and tires quickly. Also the degree of stretch required to move the hands from the standard typing position is quite pronounced which makes it prone to a RSI (see How To Avoid The Emacs Pinky Problem).
OS X fairs better. The modifier key used is always the cmd key which are located directly to left and right of the space bar (additional modifier keys may also be used). The positioning allows the modifier keys to be pressed with the thumb - ideal.
Modal design:
Windows has many mode based issues. To issue a in shortcut we have to press either ctrl, alt or the ‘Windows key’ which is often followed by pressing another key. The ctrl key is the most common modifer key to be used in a shortcuts, fortunately the ctrl key does not suffer from modal issues. Unfortunately the same is not true of the alt or Windows keys, both of which are modal. Worse still the behaviour of the alt and Windows keys are inconsistent.
The alt key is generally used to move focus to the menu bar and for window management (eg, alt + f4 to close, alt + tab to switch to another window), but occasionally the alt key is used in ‘normal’ shortcuts. The key press cycle of alt key in a normal application (eg Notepad, Windows Explorer, Internet Explorer) is as follows:
The key press cycle of the Windows key is as follows:
Labelling:
The on screen labelling of shortcuts in OS X and Windows are largely similar. Both try to use mnemonic to imply a system. For example cmd +s is saves, cmd + l is loads and cmd + p is print. There are limitations to this approach.
Firstly conflicts soon arise, for example should cmd + s be save or search? Windows applications tend to address this problem by using a different letter, thus breaking the mnemonic system. OS X sometimes uses a different letter but also uses additional modifier keys, but which additional modifier key is unpredictable. Both of these approaches are some what arbitary as they are not part of a system which the user can learn and therefore predict the shortcut. (OS X has a convention of using the shift key to perform related actions. For example cmd + s is save cmd + shift + s is save as, cmd + z is undo cmd + shift + z is redo.)
The second problem is internationalisation. The mnemonic system is fine when the system is in english but becomes arbitary when the same shortcuts are used in conjunction with other languages.
In addition to the onscreen labeling it is also worth noting the keyboard labelling. In OS X the labelling of these keys are largely consistent with their behaviour; the cmd key is used when issuing commands, the alt/options key will often give an alternative option, and the ctrl key will show controls. In Windows this is not the case, modifier keys are assigned without regard to their label.
Sidenote: interaction with the mouse
It is not sensible to consider the keyboard without mentioning the mouse. Most people are right handed (70% to 90% according to Wikipedia) and therefore operate the mouse with their right hand. While mousing the available keys are limited to those which are accessible by the non-dominate hand (ie the left hand in most cases). I am left handed. For example, when I use Safari to browser the web I use a combination of the cursor and keyboard to navigate. I use my right hand to switch between tabs while my left hand moves the cursor to links and other things that catch my eye. The shortcuts I use to switch between tabs are cmd + alt + left and cmd + alt + right. This approach would not work for a right handed person.
Improvements & alternate implementation
What can be done to improve keyboard shortcuts? The most effective fixes need to take place at the operating system level, which is unfortunate as it means little can be done by individual developers. It is possible for an application to alter the standard operation but this leads to inconsistencies between applications which cause more problems that it solves.
My suggestions are simply to implement what I have discussed above.
Shortcuts should use thumb based modifier keys
Jef Raskin points out in The Humane Interface that the current keyboard design makes poor use of our thumbs. Raskin was involved with the Canon Cat which had two ‘leap’ keys beneath the space bar. The ‘leap’ keys allowed the user to ‘leap’ forward and backwards in a document. While I question the usefulness of leap keys in relation to modern GUIs it is certainly true that our thumbs are still ‘twiddling’ and should be put to better use.
Limited use of modes - shortcuts should utilize quasi-modes
Modes are an inherent feature of the desktop/windowing metaphor. However we can certainly reduce their negative impact by carefully considered design. Keep to the any standard shortcuts that are in use by the operating system. (The alternative is to purse other computing metaphors such as ZUIs (as outlined in the Humane Interface), or life streams).
Clear labelling
Clearer labelling is harder to achieve, however there are a few systems for doing this. Digidesign produce custom keyboards for their Pro Tools system. Having used one of these keyboard I can testify to their usefulness. The problem with having the details printed on the key is that they are only applicable to one application.
A more generic solution to keyboard labelling is the Optimus Maximus keyboard. Each key of the Optimus Maximus has an embeded OLED screen. These screens are used to show different glyphs in different circumstance. For example when using Photoshop the keyboard displays the icons of the on screen tools. Unfortunately the Optimus Maximus is considerable more expensive than a standard keyboard (it also has received critisim for its typing experiance). However the Optimus Maximus is the first of its kind - I expect more affordable and more tightly integrated solutions will soon emerge.
Further Reading
First of all lets state what a keyboard shortcut is and what it does:
A keyboard shortcut (or accelerator key, shortcut key, hot key, key binding, keybinding, key combo, etc.) is a key or set of keys that performs a predefined function. These functions can often be done via some other, more indirect mechanism, such as using a menu, typing a longer command, and/or using a pointing device. By reducing such sequences to a few keystrokes, this can often save the user time, hence “shortcut”.
Wikipedia - Keyboard shortcut
For a system to be sucessful it needs to be sympathetic to its user. We therefore need to look at limitations of the user and how to accomodate them.
Physiology (ergonomics)
Anyone that paid attention in biology class will have heard the term opposable thumbs. The “thumbs” of other animals evolved into wings, hooves or flippers, but ours have shift around a bit to be opposite our fingers. This change in position allows us to grab things, it has also resulted in our thumbs becoming the strongest and one of the more dextrous fingers. Our fingers decease in strength as they move away from the thumb. It therefore follows that any good design should utilise this fact by making good use of our superior digits; the thumb, index and middle (ring) fingers.
The brain (cognetics)
Designing an object so it is sympathetic to our bodies is only half of the story. A well designed ‘thing’ must be sympathetic to the constraints of our minds too.
Modes are a major design consideration for good shortcuts.
In user interface design, a mode is a distinct setting within a computer program or any physical machine interface, in which the same user input will produce perceived different results than it would in other settings.
Wikipedia - Mode (computer interface)
Some examples:
- The caps lock key is modal (ie it creates a mode). When the ‘k’ key is press a ‘k’ is displayed, but when caps lock is engaged pressing ‘k’ will display ‘K’.
- In TextEdit pressing cmd + t displays the font pallette, but in Safari pressing cmd + t opens a new tab. The same key presses have different results. (Conflicts such a this are common due to the current computing paradigm of independent applications which is inherently modal).
The second consideration is human memory. Our short term memory is quiet limited. We can store around 7 items of data in our short term memory. This has an impact on the way we use shortcuts. When a user is performing a task they will concentrating on their data than the tools for manipulating the data.
There are a few ways to reduce the burden of remember how to use a system. The first is to remove the burden of remembering - this is done by clear labelling. The second is by creating meaning relationships between the desired result and required action. Meaningful relationships allow users to understand the system as a whole rather than having to learn a collection of unrelated and arbitrary actions.
Current systems shortcuts
Lets see how Windows XP and OS X Leopard far with the above criteria.
Ergonomics:
The physical design of Mac and standard PC keyboards are almost identical. The most noticeable differences are the modifier keys. A standard PC keyboard has ctrl, alt and windows keys, a Mac keyboard has ctrl, alt and cmd keys.
Left side of a mac keyboard
Left side of a Windows keyboard
In Windows the most common modifier key used with shortcuts is the ctrl key. The crtl keys are located on the bottom row at the far left and far right of a standard keyboard.
It is the little finger (pinkie finger) that people most often use to press the ctrl key. The little finger is a feeble thing and tires quickly. Also the degree of stretch required to move the hands from the standard typing position is quite pronounced which makes it prone to a RSI (see How To Avoid The Emacs Pinky Problem).
OS X fairs better. The modifier key used is always the cmd key which are located directly to left and right of the space bar (additional modifier keys may also be used). The positioning allows the modifier keys to be pressed with the thumb - ideal.
Modal design:
Windows has many mode based issues. To issue a in shortcut we have to press either ctrl, alt or the ‘Windows key’ which is often followed by pressing another key. The ctrl key is the most common modifer key to be used in a shortcuts, fortunately the ctrl key does not suffer from modal issues. Unfortunately the same is not true of the alt or Windows keys, both of which are modal. Worse still the behaviour of the alt and Windows keys are inconsistent.
The alt key is generally used to move focus to the menu bar and for window management (eg, alt + f4 to close, alt + tab to switch to another window), but occasionally the alt key is used in ‘normal’ shortcuts. The key press cycle of alt key in a normal application (eg Notepad, Windows Explorer, Internet Explorer) is as follows:
- The alt key is pressed in. This moves the focus to the menu bar, illustrated by an the underlining of letters required to access the menus.
- At this point there are 4 possible sequence of events:
- Alt key is released resulting in the focus moving to the first menu (normal the File menu).
- A key that is underlined is press which results in the associated menu being displayed and the focus moving to the first item of that menu.
- Another valid key is pressed (eg, F4 or alt)
- A key that is not underlined is press resulting in the focus remaining in its current loci (eg the text area in Notepad).
The key press cycle of the Windows key is as follows:
- Windows Key is pressed in (there is no on screen indication that this has occurred).
- At this point there are 3 possible sequence of events.Note that events b and c can occur numerous times without releasing the Windows key:
- The key is released resulting in the Start menu being displayed and the focus moving to it.
- A valid key is pressed resulting in the related action being executed (the only way to discover valid keys is to read the documentation). When the Windows key is finally released the Start menu is not displayed.
- A non valid key is pressed. The key press is ignored by the Windows key and is handled by the application with focus. When the Windows key is finally released the Start menu is not displayed.
Labelling:
The on screen labelling of shortcuts in OS X and Windows are largely similar. Both try to use mnemonic to imply a system. For example cmd +s is saves, cmd + l is loads and cmd + p is print. There are limitations to this approach.
Firstly conflicts soon arise, for example should cmd + s be save or search? Windows applications tend to address this problem by using a different letter, thus breaking the mnemonic system. OS X sometimes uses a different letter but also uses additional modifier keys, but which additional modifier key is unpredictable. Both of these approaches are some what arbitary as they are not part of a system which the user can learn and therefore predict the shortcut. (OS X has a convention of using the shift key to perform related actions. For example cmd + s is save cmd + shift + s is save as, cmd + z is undo cmd + shift + z is redo.)
The second problem is internationalisation. The mnemonic system is fine when the system is in english but becomes arbitary when the same shortcuts are used in conjunction with other languages.
In addition to the onscreen labeling it is also worth noting the keyboard labelling. In OS X the labelling of these keys are largely consistent with their behaviour; the cmd key is used when issuing commands, the alt/options key will often give an alternative option, and the ctrl key will show controls. In Windows this is not the case, modifier keys are assigned without regard to their label.
Sidenote: interaction with the mouse
It is not sensible to consider the keyboard without mentioning the mouse. Most people are right handed (70% to 90% according to Wikipedia) and therefore operate the mouse with their right hand. While mousing the available keys are limited to those which are accessible by the non-dominate hand (ie the left hand in most cases). I am left handed. For example, when I use Safari to browser the web I use a combination of the cursor and keyboard to navigate. I use my right hand to switch between tabs while my left hand moves the cursor to links and other things that catch my eye. The shortcuts I use to switch between tabs are cmd + alt + left and cmd + alt + right. This approach would not work for a right handed person.
Improvements & alternate implementation
What can be done to improve keyboard shortcuts? The most effective fixes need to take place at the operating system level, which is unfortunate as it means little can be done by individual developers. It is possible for an application to alter the standard operation but this leads to inconsistencies between applications which cause more problems that it solves.
My suggestions are simply to implement what I have discussed above.
Shortcuts should use thumb based modifier keys
Jef Raskin points out in The Humane Interface that the current keyboard design makes poor use of our thumbs. Raskin was involved with the Canon Cat which had two ‘leap’ keys beneath the space bar. The ‘leap’ keys allowed the user to ‘leap’ forward and backwards in a document. While I question the usefulness of leap keys in relation to modern GUIs it is certainly true that our thumbs are still ‘twiddling’ and should be put to better use.
Limited use of modes - shortcuts should utilize quasi-modes
Modes are an inherent feature of the desktop/windowing metaphor. However we can certainly reduce their negative impact by carefully considered design. Keep to the any standard shortcuts that are in use by the operating system. (The alternative is to purse other computing metaphors such as ZUIs (as outlined in the Humane Interface), or life streams).
Clear labelling
Clearer labelling is harder to achieve, however there are a few systems for doing this. Digidesign produce custom keyboards for their Pro Tools system. Having used one of these keyboard I can testify to their usefulness. The problem with having the details printed on the key is that they are only applicable to one application.
A more generic solution to keyboard labelling is the Optimus Maximus keyboard. Each key of the Optimus Maximus has an embeded OLED screen. These screens are used to show different glyphs in different circumstance. For example when using Photoshop the keyboard displays the icons of the on screen tools. Unfortunately the Optimus Maximus is considerable more expensive than a standard keyboard (it also has received critisim for its typing experiance). However the Optimus Maximus is the first of its kind - I expect more affordable and more tightly integrated solutions will soon emerge.
Further Reading
- Donald Norman - Design of Everyday Things (I highly recommend this book).
- Jef Raskin - The Humane Interface.
REST + XMLHTTPRequest + 401 != joy
27/05/08 15:38
For the last few months I’ve been working on a web app using PHP that I’m trying to be as RESTful I can. It’s been a learning curve, and has been both satisfying and infuriating. Over the weekend I’ve been working on the authentication. As it is a RESTful app HTTP digest authentication is the obvious (only?) choice. This being 2008 I also want the app to be pretty, like all the other web 2.0 apps (I hate the term “2.0″. If the internet was made of technologies that could be broken in to discrete versions then life would be so much easier. While IE6 is still around the web will be stuck in beta testing).
Unfortunately, after much keyboard thumping and bouts of pseudo-tourette, I’ve reach the conclusion that RESTful apps haven’t got the style to get into Club Web 2.0.
This conclusion is based on the assumption that the standard browser authentication dialog box should be banished to 1997 and has no place in a 2008 web app. I must mention that I have only tested this on Firefox 2, Safari 3 and Opera 9 on OS X 10.4. FF and Safari both suffer from the problem outlined below. (Opera will always display the ugly dialog box even if a username and password were supplied to the xmlhttprequest object).
There are a few tutorials that say that you can use the xmlhttprequest object to suppress the browsers dialog box. It is true that the xmlhttprequest object can do this, but it only works in very select conditions. This is because of two features (bugs?) of the xmlhttprequest:
1. Firstly the XHR only answers the first 401 response per resource. Any subsequent 401 responses will cause the ugly dialog box to pop up. There a quite a few scenarios where the server might respond with a 2nd 401 response. The most common is when the username and password supplied to the XHR object are incorrect. Another common occurrence is that the nonce used in the digest expires causing the server to send a new nonce with the 401.
2. Above I mentioned that “the XHR only answers the first 401 response per resource“. This means that we could re-authenticate using a different resource. The problem with this is that browser send a mix and match of digest information. Here’s an example:
Lets say that www.example.com/restricted/ requires authentication. Once the server has authenticated the client, the client will continues to send the the authentication details to all resources within www.example.com/restricted/. This is the correct behaviour.
Next the client tries to access www.example.com/restricted/secrets.html. When the client sends the request it also sends the authentication headers as before but this time the server response with a 401. The credentials that were valid for the rest of the site are not valid for this one resource. If we this request is made with the XHR object then it counts as the first 401 response from that resource. Therefore the XHR sends the request again using the username and password. So where’s the problem? The problem lies with the fact that the client (Firefox 2 and Safari 3 at least), continue to send the nonce from the very first 401 response. The server may not accept the old nonce for this request and will send another 401 response. This second 401 response will cause the client to show the ugly dialog boxes.
I have developed methods to work around both of these problems so that the XHR could be used. I’m not going to use either of them because they are ugly and complicated. I always strive for simplicity as simple code is easier to understand and there’s less chance for things to go wrong. Introducing unnecessary complication into an authentication system is asking for trouble.
Unfortunately, after much keyboard thumping and bouts of pseudo-tourette, I’ve reach the conclusion that RESTful apps haven’t got the style to get into Club Web 2.0.
This conclusion is based on the assumption that the standard browser authentication dialog box should be banished to 1997 and has no place in a 2008 web app. I must mention that I have only tested this on Firefox 2, Safari 3 and Opera 9 on OS X 10.4. FF and Safari both suffer from the problem outlined below. (Opera will always display the ugly dialog box even if a username and password were supplied to the xmlhttprequest object).
There are a few tutorials that say that you can use the xmlhttprequest object to suppress the browsers dialog box. It is true that the xmlhttprequest object can do this, but it only works in very select conditions. This is because of two features (bugs?) of the xmlhttprequest:
1. Firstly the XHR only answers the first 401 response per resource. Any subsequent 401 responses will cause the ugly dialog box to pop up. There a quite a few scenarios where the server might respond with a 2nd 401 response. The most common is when the username and password supplied to the XHR object are incorrect. Another common occurrence is that the nonce used in the digest expires causing the server to send a new nonce with the 401.
2. Above I mentioned that “the XHR only answers the first 401 response per resource“. This means that we could re-authenticate using a different resource. The problem with this is that browser send a mix and match of digest information. Here’s an example:
Lets say that www.example.com/restricted/ requires authentication. Once the server has authenticated the client, the client will continues to send the the authentication details to all resources within www.example.com/restricted/. This is the correct behaviour.
Next the client tries to access www.example.com/restricted/secrets.html. When the client sends the request it also sends the authentication headers as before but this time the server response with a 401. The credentials that were valid for the rest of the site are not valid for this one resource. If we this request is made with the XHR object then it counts as the first 401 response from that resource. Therefore the XHR sends the request again using the username and password. So where’s the problem? The problem lies with the fact that the client (Firefox 2 and Safari 3 at least), continue to send the nonce from the very first 401 response. The server may not accept the old nonce for this request and will send another 401 response. This second 401 response will cause the client to show the ugly dialog boxes.
I have developed methods to work around both of these problems so that the XHR could be used. I’m not going to use either of them because they are ugly and complicated. I always strive for simplicity as simple code is easier to understand and there’s less chance for things to go wrong. Introducing unnecessary complication into an authentication system is asking for trouble.
Shuttle K45 - optical drive? Check.
20/04/08 18:41
I recently bought a Shuttle K45 to run Ubuntu server. I’m waiting until Hardy Heron is released proper until I set to work setting up the software. However, there are a few things about the K45 that I think are of interest.
- Firstly, there is space in the case for an optical drive:




As you can see from these photos there is space for an optical drive, albeit a slim line one. The problem is that the plastic image sheet and the perspex that cover the front (abscent in the photos) don’t have a slot for the drive. This is a only a minor problem, however. - I bought my K45 from Misco. It didn’t come supplied with the Shuttle ICE cooling system. The ICE system is far quieter than the standard CPU fan that comes with the Celeron I bought. Also the instructions (which are a bit thin, consisting only of one large colour sheet) show and refer to thumb screws. I like thumb screws, they’re nice and chunky, however, my K45 only came with normal screws. Boo.
- The IDE socket is very picky. My original plan was to have two SATA hard drives and an compact flash to IDE converter. I was going use the CF/IDE to run the OS from and use the harddrives exclusively for storing data. Unfortunately the K45 doesn’t like CF/IDE converters. I tried two of them and each time the K45 refused to boot from them, no BIOS fiddling seemed to fix this. I empathises ‘boot’ as it does recognise the drive. Infact, I managed to install Ubuntu to the CF/IDE but it wouldn’t boot from it. This isn’t a major problem, but it is an inconvenience. My first attempt at a solution was to use a USB CF reader and run the OS from that. But the CF reader I bought wasn’t bootable, so I’ve gone for a normal USB flash drive which will complicate the install, but where would the fun be if everything was easy?