Home | Previous | Next

Clique: A conversant, task-based audio display for GUI applications

Peter Parente

University of North Carolina at Chapel Hill, Computer Science

E-mail: parente@cs.unc.edu


Abstract

The purpose of the Clique project is to explore a new way of adapting applications with graphical user interfaces (GUIs) for use in audio. Existing adaptation methods retain the components and metaphors of visual interfaces in the audio displays they produce. Clique, on the other hand, presents the user with a conversational audio display based on the tasks supported by programs, not their visual representations. The user interacts solely with this audio display while Clique takes charge of inspecting and controlling the underlying programs via their GUIs. In effect, the graphical nature of program interfaces is hidden from the listening user who is free to concentrate on his or her tasks in audio. We hypothesize that audio displays produced in this manner will prove more effective and satisfying for common tasks than current solutions.

Introduction

Screen reading is a prominent method of making desktop applications accessible to people with visual impairments. A screen reader operates by speaking aloud the text and widgets programs draw on the computer screen. By listening to the screen reader and giving input via a keyboard, people with visual impairments are able to access many of the same applications used by their sighted peers.

Nevertheless, the screen reading paradigm was developed nearly twenty years ago for command line programs and is imperfect when applied to GUI applications. Studies done by Edwards [1] over a decade ago and Barnicle [2] much more recently both give evidence of usability problems associated with attempts to screen read GUI programs. In short, the straightforward transform from on-screen visuals to a stream of speech describing them does not result in an effective audio display. Instead, it forces users to think and interact with an application in terms of graphical concepts that make little or no sense in the audio domain. For instance users must contend with scrolling a text box to hear clipped text even though limited display space and spatial scrolling are unnatural concepts in audio.

Because of their usability problems, screen readers are seen as special solutions for people with visual impairments who have little recourse for computer access. No sighted person would use a screen reader today, even if it could enable access to familiar programs in certain situations (e.g. while walking, working in the bright sunlight, using a small form factor device without a screen). As a result, the demand for screen readers is limited and their cost is high.

Re-Thinking Adaptation to Audio

For the typical user, the purpose of interacting with a computer is to complete a set of tasks. The user does not care how the computer supports the execution of his or her tasks, as long as it does so in an effective, unobtrusive manner. When the user is working in audio, how common tasks are manifested visually is often irrelevant to their successful completion. For instance, a user does not need a visual interface to write an email. He or she merely requires a way to state the recipient, subject, and body of the message and indicate the message should be sent. Only tasks tied to vision (e.g. designing a GUI, editing an image) require intimate knowledge of visual presentation.

Under this premise, the goal of audio adaptation is not to mimic visual interfaces but rather to best aid the completion of user tasks. To meet this goal, an audio display must do more than provide a superficial layer between user and screen. It must interpret what is on the screen and expose the meaning of the visual components, relationships, and metaphors to the user in forms appropriate to audio. Likewise, it must take charge of the menial work of controlling applications via their GUIs. Such an audio display, one that frees the listening user from dealing with visual concepts, is likely to avoid many of the usability problems associated with screen reading.

Clique: Task-based audio display

The Clique project follows this task-based approach to audio adaptation. The Clique software centers around an audio display based on the familiar metaphor of group conversation. Four assistants with unique voices are positioned around the user in a virtual sound space. Each assistant plays a specific role in the conversation - reporting content in the current task, summarizing the state of the current task, reporting events in related tasks, and reporting events in other tasks. All assistants take advantage of common features of conversation such as referencing, pacing, turn-taking, and interruptions. For instance, an assistant who wishes to inform the user that a web page has finished loading outside the active task will either speak immediately, play an audio icon, or wait for the floor. The assistant will make this decision by considering who else is speaking and if the user is giving input.

The user interacts with Clique using a set of global commands valid across tasks. The supported input ranges from simple statements like "Next", "Previous" and "Do that" to exploratory questions and commands such as "Where am I now?", "What can I do next?" and "Remember that for later." Any device that can support the command set can be used for input (e.g. full keyboard, one-handed keyboard, voice input).

Clique automates the standard desktop applications running behind its audio display in order to carry out commands and answer questions. Task models written by third-party developers describe how program tasks should be represented in audio and how the program should be automated to effect task completion.

Expressing task models in terms of medium and application independent interaction patterns is the key to presenting a unified audio display to the user. For instance, the linked browsing pattern is manifested in common email programs (mailbox tree, messages list, message preview) and file managers (folder tree, folder contents list). In both cases, Clique knows to report how an action in one part of the browsing task affects another (e.g. choosing a mailbox updates the messages list.)

Goals

To demonstrate the benefits of a task-based approach to audio adaptation, I intend to do the following:

  1. Implement a prototype of the Clique system supporting the primary tasks of four target applications - Outlook Express (email), Firefox (web browser), WinZip (archive utility), and Day by Day Professional (calendar).
  2. Compare how well users with visual impairments can learn and use these applications through Clique and a popular modern screen reader (e.g. JAWS).
  3. Study the performance of sighted users completing tasks with Clique versus with the original application GUIs while engaged in a simulated walking task.

Status

As of May, 2005 an implementation of the Clique system on the Windows platform is well underway. The framework for managing tasks, responding to user input, and controlling underlying applications is in place. Task models for two of the four target applications are complete.

Two formative user studies have also been performed. The first investigated user understanding of the group conversation metaphor. The second explored the limits of user interaction with an "ideal" audio display - a human expert acting as an audio interface to the computer.

Related work

Finding an alternative to screen reading is not a new idea. The Mercator project [3] thoroughly explored the transform from a hierarchical model of GUIs to audio. However, to support sighted and blind user collaboration, Mercator retained the use of graphical concepts in its audio display. In the future work section of her dissertation on Mercator, Mynatt agrees that building an audio display based on the affordances of an application, rather than its visuals, is a more general goal. Frauenberger [4] recently expanded this idea by proposing the abstraction of GUI widgets to a semantic taxonomy followed by a mapping to audio forms.

Other projects, such as SpeechActs [5] and Chatter [6], have focused on building conversational audio interfaces. These solutions do not provide access to existing applications, but rather implement audio-aware programs from the ground up. Even so, much can be learned from their use of conversation as an interaction metaphor.

References

  1. A. Edwards. "Evaluation of Outspoken software for blind users." Technical report. University
  2. K. Barnicle. "Usability testing with screen reading technology in a Windows environment." in Proceedings of the 2000 conference on Universal Usability. pp. 102-109. 2000.
  3. E. Mynatt. "Transforming graphical interfaces into auditory interfaces for blind users." Human-Computer Interaction. 12. pp 7-45. 1997.
  4. C. Frauenberger, R. H”ldrich, and A. de Campo. "A generic, semantically based design approach for spatial auditory computer displays." in Proceedings of the international conference on Auditory Display. 2004.
  5. P. Martin, et al. "SpeechActs: A spoken language framework." IEEE Computer. vol. 29(7). pp. 33-40. 1996.
  6. E. Ly and C. Schmandt. "Chatter: A conversational learning speech interface." in Symposium on Multi-Media Multi-Modal Systems. 1994.
Home | Previous | Next
Membership
Join the ACM, SIGACCESS, or order other services online from the Join SIGACCESS.

ACM E-Store Logo

RSS Feed
Receive SIGACCESS news updates via RSS directly from this site.
RSS Feed Link
ACM Link
www.acm.org
Assets 2008
The Tenth International ACM SIGACCESS Conference on Computers and Accessibility

Assets 2008 Information

Halifax NS, Canada
October, 2008

First ACM SIGACCESS Award
Accessibility Publications
Accessibility PhD Theses
Authors are invited to submit PhD theses related to the field of accessibility and computing for inclusion in the new accessibility PhD thesis list.

Accessibility PhD Theses