The 4th Screen: Annual Report

Lately, I have not been blogging about my work. I have been working on an unannounced product and I would have had to omit too much about my work to make blogging about it worthwhile. Now that the product has been announced, I can say a few things about my work over the past year. And, since I just got back from China, it is fitting I should give a report of my work around the time of the Chinese New Year.

The communications user interface

I have been making a communications user interface for mobile handsets. That's kind of the software equivalent of working on rocket fuel: Lot's of fun, but everyone else who tried has blown themselves to bits.

The art of the possible

In addition to being fun, I think it's possible. Even profitable. Sometimes it didn't look that way, but Google has broken down some barriers with Android. So, if you are careful about how you are doing it, getting a new mobile user interface into the market is possible now.

The discipline of working with a small team at a small company that is massively outweighed in resources by our competitors means you have to focus. Nokia, Microsoft, Qualcomm, or half a dozen others could put 20 times as many engineers and designers on a similar project.

One thing that makes my project viable is that most other makers of handset software think communicating is a solved problem. They have moved on to music, or games, or cameras, or digital TV. Previously they turned the phone into a personal organizer, realized that PDAs are a dead species, and have ever since been fishing around for the next compelling add-on to the task of communicating.

Meanwhile, I set out to make communicating on a mobile device work better. Has the industry gone bonkers and left innovating in the central purpose of a handset to little D2 Technologies by default? Possibly, possibly...

The other reason making handset software is no longer a big-company pursuit is that the whole handset business is undergoing a shift from a vertically integrated model that resembles the old minicomputer and mainframe model in the computer business to one that is made of horizontal layers of interchangeable technology components at the lower layers and software at the upper layers that is portable across the underlying hardware.

How market openings happen

There are two factors that explain how a market opening in communications user interfaces was left to a small company known mainly for high-quality DSP and soft-DSP software: IMS, and dual-mode handsets with WiFi and mobile radios.. IMS was supposed to provide all the answers for how IP communication and mobile communication would work side-by-side. It won't.

IMS does a lot, including bringing IP voice to mobile handsets. The trouble is that it simultaneously does too much and too little: IMS requires a wide-ranging and expensive upgrade of infrastructure nodes. IMS also duplicates a lot of what people are already doing using Web browsers and AJAX applications. But IMS does not cover every way that an end-user wants to communicate, unless the carrier that user subscribes to enables those ways of communicating. The central goal of IMS isn't about choice and capability – it is about control and the ability to put up toll-gates.

And it gets worse. You can't say that IMS is a conspiracy of the telecom industry to control the user. The telecom industry itself is divided about IMS: Network operators suspect it is a conspiracy by equipment makers to sell them an expensive bundle of upgrades. And yet, IMS slowly grinds forward, propelled by the standards-driven way that progress happens in the telecom industry.

So, what will really happen?

If IMS were a failure, it would be irrelevant. IMS will get deployed. Handsets will have to be able to use IMS features, or rather IMS will start to appear in carriers' specifications for new handsets. But the check-box on the requirements won't be the only thing driving IMS capability in handsets. IMS is an important conceptual framework for thinking about what users will want to do with handsets. That is, the best way to think of IMS is to deconstruct it, identify the key use cases and functional requirements of each component, and refactor these components into a user interface that can manipulate IMS functionality, plus all the other useful ways to mix IP and circuit-switched communications.

Users will mix and match communications services to get what they want, whether that is a a mix of in-house PBX and messaging services and the mobile network for enterprise users, or a mix of Yahoo (or MSN, or AOL...) IM, fixed-line replacement VoIP, and SMS, and, no doubt, many other ingredients for consumers. The WiFi radio is the Internet camel under the mobile carriers' tents. IMS was supposed to make the Internet “safe” for mobile carriers, but IMS is late to the game and dual-mode handsets are proliferating and exposing mobile customers to Internet freedom.

IMS could do all the things corporate and consumer users need. But it won't. IMS will not be connected to every corporate messaging system. IMS will not allow consumers to drive around the toll-gates to get to their favorite Web sites, and it won't have gateways to services that have not made a deal with the carrier. The intent of IMS is correct, and the functional requirements it imposes on the communications user interface are correct – just not general enough, and too much in service of the toll booth operator.

A multi-service world

Every service, every mode of communication, with multiple simultaneous sessions, delivered through a “push-to-X” presence-oriented user interface. That's what the multi-service future requires. It is also what IMS requires. Real users will want to make communications “mash-ups” that replace line appearances with presence in PBX call control operations. Or they will want to check your availability on GoogleTalk IM before calling you on a mobile network. The combinations are infinite, and unforeseeable.

Providing a user interface for this kind of communications was an unsolved problem. Current mobile user interfaces rely on the circuit-switched nature of the public land mobile network: One bearer channel to the handset means one real time session and, perhaps, some store and forward messaging events or missed calls the user needs to be notified of. These interfaces may be highly polished over decades of product development. But they are fundamentally insufficient for a communications environment where the user is juggling a couple real time calls on a VoIP network that does not limit the bearer channels to the endpoint, some push-to-talk interactions, upwards of a dozen IM conversations, some Web document-sharing sessions, and making decisions on who they can contact via what medium based on multiple sources of presence and status information

Put another way, nobody has rolled up all the point solutions for presence, IM, push-to-talk, etc. into a unified approach and an interface that operates on all these objects with generic verbs. Or, to take that from the API designer's point of view, nobody has funneled all the call flow state transitions for all the circuit switched and IP protocols through a single middleware layer and single user interface's call state-machine.

A better handset, or a smaller PC?

Personal computers manage this kind of communications environment pretty well: You can install every IM system your friends use, and you can fairly conveniently manage multiple sources of presence information simply by having multiple windows on your screen. That's because a PC is a general-purpose tool. It communicates and plays games and composes documents and, in addition to its general purpose functions, is also the basis of thousands of specialized knowledge-work tools, such as software development, CAD, media production, etc. PCs do a great job of enabling every user to have their preferred mix of tools, communication, and media at hand.

A phone, however, is not like that. The legacy smartphone “shrunken-PC” approach to mobile applications is a dead end. If you want a dozen ways to communicate, you need to build those ways of communicating into the user interface of your communications device. A dozen separate applications, as you might have on a PC, overwhelms the ability of the device to display information, and the user to manage multiple activities. Separate communications applications on a handset are as counterproductive as having a separate spell-checker and graphing applications from your office productivity suite – anachronistic and unfriendly. This is why a new look at the communications user interface is now necessary.

mCUE

And that is what I have been working on. It's called mCUE, for mobile Communications User Experience. mCUE implements IMS requirements for communications interactions for the most commonly used modes of communication. It rides on top of a middleware system, called ISI, which abstracts the differences between communications services and which provides a flow of events from these services and a set of operations on these services that enables mCUE to have a uniform interface to things as diverse as FMI/VCC servers, Internet IM servers, mobile circuit switched services, SIP and h.323 IP PBXs, etc. ISI future-proofs mCUE – you can add new services without modifying the user interface software.

What was the question?

One of the reasons I think I'm doing something more useful than quixotic is that mCUE actually answers a commercially important question: What do you do with dual mode handsets other than surf the Web?

The Nokia E-series of “business-oriented” phones, and many of the N-series phones are already dual mode. The iPhone is dual mode. RIM's Blackberry has added WiFi. Numerous models of phones running Windows Mobile are dual-mode. And yet, nearly all of the bits sent over the WiFi radio are Web bits. The Nokia E-series have SIP and IP voice codecs built in, and God bless you if you can configure them to work with a SIP service.

Ironically, for a communications device, the greatest impact has been on data. Mobile data is now just like data on your PC. Web browsers have killed the concept of “mobile search” as a distinct product category from Internet search. But the conventions of mobile voice and text communication are unaltered and unadapted.

Compelling use cases

The most gratifying aspect of working on a new communications user interface is to see one's theories about use cases come to life. 18 months ago I wrote out a narrative about how communication has escaped the hierarchy that bred the features of the enterprise PBX: Even CEOs answer their own phones. Important calls go directly to mobile phones. Instant messaging is how you ask if someone has time for a call. Voicemail has gone from a convenience to a curse. At CES in January of 2008, I added my personal GoogleTalk account to the demo accounts on my mCUE-powered handset. I IM'ed a friend at his desk in Massachusetts. I asked if he wanted to try talking to me on my new mobile user interface and, a press of the green “call” button later I was speaking to him through his laptop's speaker and mic, clear as a bell. And, had he not been on a service that has real time voice capability, the press of the green button would have chosen his mobile number instead.

The magic of that moment was in the normalcy of my interaction with the handset: In addition to contacts, I saw the presence status of every person with whom I shared an IM service. I could transition from IM'ing to talking so easily that it instantly made calling from a normal phone feel like a shot in the dark, likely to land me in voicemail purgatory. “I really want this!” I thought. Which is a darn sight better than the typical feeling of “I hope the customer puts up with this until compelling use cases emerge and we can fix it in a point release.”

And that is why I feel very fortunate to be working on this stuff. The abstract notion that multi-service communication is useful outside the technology framework of IMS looks to be true. It is, in fact, worthwhile to create a mobile UI that combines all these modes of communication while retaining the obviousness of the best mobile UIs. Communication can be improved without burdening the user with a learning curve.