StEP: Standardized (Usability) Evaluation Plan

Table of Contents

Full Paper Reference
Abstract
Abbreviated Recommendations for Testing StEPs

Full Paper Reference

Grissom, Scott B. & Perlman, Gary (1995) StEP(3D): A Standardized Evaluation Plan for Three-Dimensional Interaction Techniques. International Journal of Human-Computer Studies, 43:1, 15-41.

Usability evaluation is a critical component of software development. However, skills necessary to develop a valid and reliable evaluation plan may deter some organizations from performing usability evaluations. These organizations would benefit by having an evaluation plan available to them that was already designed for their needs. A standardized evaluation plan (StEP) is designed to evaluate or compare a wide variety of systems that share certain capabilities. StEPs are developed for a specific domain by usability specialists. These plans can then be used by evaluators with limited experience or facilities because the skills necessary to use a StEP are not as demanding as the skills needed to develop a StEP.

Techniques have been proposed to make three-dimensional interfaces more flexible and responsive to the user but the usability of these techniques have generally not been evaluated empirically. StEP(3D), a standardized evaluation plan for the usability of three-dimensional interaction techniques, combines performance-based evaluation with a user satisfaction questionnaire. It is designed to be portable and simple enough that evaluators can make comparisons of three-dimensional interaction techniques without special equipment or experience. It evaluates the usability of interaction techniques for performing quick and unconstrained three-dimensional manipulations. Two empirical experiments are reported that demonstrate the reliability and validity of StEP(3D). Experiment 1 shows StEP(3D) is appropriate for comparing techniques on different hardware platforms during summative evaluations. Experiment 2 shows StEP(3D) is sensitive enough to detect subtle changes in an interface during formative design.

We make recommendations for developing StEPs based on data we collected and on our experiences with the development of StEP(3D). However, the recommendations are not limited to three-dimensional interaction techniques. Most of the recommendations apply to the development of StEPs in any domain and address issues such as portability, participant selection, experiment protocol and procedures, and usability measures. A collection of StEPs designed for particular domains and purposes would provide a library of reusable evaluation plans. This reusable approach to usability evaluation should reduce the cost of evaluations because organizations are able to take advantage of previously designed plans. At the same time, this approach should improve the quality of usability evaluations because StEPs are developed and validated by usability specialists.

Abbreviated Recommendations for Testing StEPs

The following abbreviated recommendations are based on data collected and on experiences with the development of StEP(3D). However, the recommendations are not limited to three-dimensional interaction techniques. Most recommendations should apply to the evaluation of any interaction technique. See the full paper in IJHCS for full explanations and background data.

TASK ANALYSIS

(1) Require users to perform integrated, not simple tasks
(2) Design core tasks that do not change over time
(3) Develop common tasks that are portable to many systems

ASSIGNMENT OF PARTICIPANTS

(4) Use within-subjects designs to remove variability among users
(5) Use at least four participants per system

DESIGN OF TEST MATERIALS

(6) Design portable instructions (e.g., paper) that can be used to evaluate many platforms
(7) Use visual instructions to avoid biasing users with the names of commands they might use
(8) Pilot test instructions to detect ambiguous or confusing parts

USABILITY MEASURES

(9) Remove superfluous behavior that is not related to the evaluation
(10) Provide adequate participant training about what constitutes success on a task
(11) Use within-evaluator analysis of performance times to control for evaluator bias
(12) Use simplified instructions for between-evaluator estimation of performance times
(13) Collect subjective measures from users after system use; they correlate well with performance time

PROCEDURE

(14) Users should repeat tasks three to four times to remove learning effects
(15) To evaluate expert performance, ignore the first two trials, which should be considered practice
(16) To evaluate novice performance, include all trials to study the learnability of systems

StEP: Standardized (Usability) Evaluation Plan

Full Paper Reference

Abstract

Abbreviated Recommendations for Testing StEPs