StEP: Standardized (Usability) Evaluation Plan
Table of Contents
Grissom, Scott B.
&
Perlman, Gary
(1995)
StEP(3D): A Standardized Evaluation Plan
for Three-Dimensional Interaction Techniques.
International Journal of Human-Computer Studies, 43:1, 15-41.
Usability evaluation is a critical component of software development.
However, skills necessary to develop a valid and reliable evaluation
plan may deter some organizations from performing usability evaluations.
These organizations would benefit by having an evaluation plan available
to them that was already designed for their needs. A standardized
evaluation plan (StEP) is designed to evaluate or compare a wide variety
of systems that share certain capabilities. StEPs are developed for a
specific domain by usability specialists. These plans can then be used
by evaluators with limited experience or facilities because the skills
necessary to use a StEP are not as demanding as the skills needed to
develop a StEP.
Techniques have been proposed to make three-dimensional interfaces
more flexible and responsive to the user but the usability of these
techniques have generally not been evaluated empirically. StEP(3D), a
standardized evaluation plan for the usability of three-dimensional
interaction techniques, combines performance-based evaluation with a
user satisfaction questionnaire. It is designed to be portable and
simple enough that evaluators can make comparisons of three-dimensional
interaction techniques without special equipment or experience. It
evaluates the usability of interaction techniques for performing quick
and unconstrained three-dimensional manipulations. Two empirical
experiments are reported that demonstrate the reliability and validity
of StEP(3D). Experiment 1 shows StEP(3D) is appropriate for comparing
techniques on different hardware platforms during summative evaluations.
Experiment 2 shows StEP(3D) is sensitive enough to detect subtle changes
in an interface during formative design.
We make recommendations for developing StEPs based on data we
collected and on our experiences with the development of StEP(3D).
However, the recommendations are not limited to three-dimensional
interaction techniques. Most of the recommendations apply to the
development of StEPs in any domain and address issues such as
portability, participant selection, experiment protocol and procedures,
and usability measures. A collection of StEPs designed for particular
domains and purposes would provide a library of reusable evaluation
plans. This reusable approach to usability evaluation should reduce the
cost of evaluations because organizations are able to take advantage of
previously designed plans. At the same time, this approach should
improve the quality of usability evaluations because StEPs are developed
and validated by usability specialists.
The following abbreviated recommendations are based on data
collected and on experiences with the development of StEP(3D).
However, the recommendations are not limited to three-dimensional interaction
techniques.
Most recommendations should apply to the evaluation of any
interaction technique.
See the full paper in IJHCS for full explanations
and background data.
TASK ANALYSIS
- (1) Require users to perform integrated, not simple tasks
- (2) Design core tasks that do not change over time
- (3) Develop common tasks that are portable to many systems
ASSIGNMENT OF PARTICIPANTS
- (4) Use within-subjects designs to remove variability among users
- (5) Use at least four participants per system
DESIGN OF TEST MATERIALS
- (6) Design portable instructions (e.g., paper) that can be used
to evaluate many platforms
- (7) Use visual instructions to avoid biasing users with the names of
commands they might use
- (8) Pilot test instructions to detect ambiguous or confusing parts
USABILITY MEASURES
- (9) Remove superfluous behavior that is not related to the evaluation
- (10) Provide adequate participant training about what constitutes
success on a task
- (11) Use within-evaluator analysis of performance times to control
for evaluator bias
- (12) Use simplified instructions for between-evaluator estimation of
performance times
- (13) Collect subjective measures from users after system use;
they correlate well with performance time
PROCEDURE
- (14) Users should repeat tasks three to four times to remove
learning effects
- (15) To evaluate expert performance, ignore the first two trials,
which should be considered practice
- (16) To evaluate novice performance, include all trials to study
the learnability of systems