Providing a Free Web Service:
How to Do It and Why
Gary Perlman
perlman@acm.org
Presented 1998-10-13 to ACM BuckCHI
Table of Contents
These days, if you're not on the Web, you don't exist.
While you're putting up your Web site,
you might consider what you can offer people
to make them come to your site and what can be gained
by offering a free service as part of your site.
The HCI Bibliography (HCIBIB) is a free-access online bibliography
on Human-Computer Interaction. It has over 18,000 entries,
most with abstracts, and with over 4000 links to full text.
It is browsable and searchable from its web site:
http://hcibib.org/
This presentation will cover:
If there is time and interest,
we will cover the construction of some parts of the site
using perl, CGI, and SSI (server-side includes).
The HCI Bibliography is the largest free-access online bibliography on
Human-Computer Interaction.
It includes entries on most major conferences, journals, and books in the field.
- 18,000+ records of publications on HCI
- Conference papers
- Journal articles
- Books and Reports (~400)
- Internet Resources (800+)
- Records contains basic citation and often:
- keywords - uncontrolled vocabulary from publications
- abstract - OCR scanned and validated
- URL - link to full text, related web page, ...
- contents - especially for books
The HCI Bibliography is a free service
(although its materials are copyrighted).
To be a free service, publishers, volunteers,
and organizations have contributed.
The HCI Bibliography has its own domain on the Internet
and allows both Web and FTP access.
Special files have been generated from databases about journals, conferences,
and books, to provide various views of the structure of the database.
In 1998, the HCI Bibliography began cataloging Internet resources
and now describes over 800.
SIGCHI and SIGCAPH indexes,
(with links to forms to suggest new resources).
The HCI Bibliography search service allows
users around the world to find records via a Web form.
The HCI Bibliography began as an FTP site with an email file delivery service
but is now primarily a Web site with FTP access.
- site originally at OSU
- ftp and email server (pre-Web)
- no search service
- new site supported and hosted by ACM SIGCHI
- Ultrix OSF development machine
- UNIX AIX web server with CGI/SSI
- minimal standard set of software (perl, glimpse)
- no access to web server logs
- perl/ksh page generators
- standard header/footer, access counters
- summary pages based on metadata (proceedings, journal volumes)
- logical/temporal dependencies of generators on databases to update (make)
Getting high-quality bibliographic records online has always been
the primary focus of the Project.
The HCI Bibliography procedures are summarized below.
- new "module": conference proceedings / journal volume
- begin tracking in conference/journal database
- obtain copy of module by:
- donation by publisher
- available online
- personal libraries
- interlibrary loan (final resort)
- get module online
- human data entry (seldom used now)
- available online (sometimes uncorrected)
- OCR scan (excellent with high quality text and new systems)
- hundreds of automatic error detection cases
- data quality control
- insertion of representative "bugs" into file
- volunteer validation via email
- calculate validator error detection rate
- estimate number of remaining errors in file
- release data (through filter) if "correct enough"
- update database
- Update of Site (1-5 minutes)
- automated update of web pages
- update of search index
- search service based on glimpse search engine
- installed on service provider, free, reasonably fast
- full text search, limited ability to search on fields
- obscure query language {dog,canine};{cat;feline}
- some bugs for complex searches
- maintaining query + options log (6000 / month)
It's ironic that the HCI Bibliography should have a search
service that is anything but usable,
but the constraints of budget and availability of software
has resulted in some compromises.
This makes for some interesting observation of system usage,
and some opportunities to evaluate ways to make the
system more usable.
Users do not plan searches, so an expert system provides feedback.
- catches search syntax from other services (and, &, +)
- reports terms (not) in index
- catches commonly misspelled author names
- catches words with British/American spelling
- how to expand/narrow a search with options based on #matches
- ...
Users do not make good use of feedback.
- users may not understand the feedback
(perhaps especially because of the obscure glimpse syntax)
- many users write worse and worse queries
(e.g., they add terms to a search with no hits)
- respond better to suggestions in the form of a button
(instead of "use Approximate Match to get more matches",
show button "[Approximate Match]" (get more matches))
- canned searches popular
(carefully planned and tuned searches on specific topics)
- buttons to turn on a single option
EXAMPLE
- highlight terms in query to show why record matched
EXAMPLE
- bookmarks: query that gets single full record
EXAMPLE
- field searching
- sorting records by fields values
- email of results
- annotation
- search history
- hiding more glimpse syntax with forms
Most interesting analysis is informal, of "sessions".
Schneiderman visualisation -i DATA FMT FULL
Ben's last name spelled incorrectly
Shneiderman visualisation -i DATA FMT FULL
terms need not be adjacent
Shneiderman;visualisation -i DATA FMT FULL
British spelling did not match
Shneiderman;visualization -i DATA FMT FULL
highlight the terms
Shneiderman;visualization -i DATA FMT FULL HIGHLIGHT
show top terms
Shneiderman;visualization -i SUMMARY FMT FULL TOKENS HIGHLIGHT
New initiative: Ask an Expert
- sublimates obvious compulsive disorder
- satisfaction of helping a worthwhile community
- work has lasting value (once online, forever online)
- a laboratory for evaluating ideas
- learn new ways to implement new ideas
- monitor how features are used, how often
- gather ideas, skills, tools for use in "real" website/service
(EXAMPLE, SOURCE)
- a place to advertise other goods & services
- positive perception from user community (altruism / expertise)
- low expectations from users (neutral hcibib.org domain)
- few requirements on performance, control of release
- try out new features on a daily basis, unannounced
What free service can you provide?
- Include Same HTML on Each Page
<!--#include file="menu.inc"-->
- Control Time Format and Display Modification Time
<!--#config timefmt="%Y-%m-%d"-->
<!--#echo var="LAST_MODIFIED"-->
Activity Report:
today
yesterday
day before yesterday
#! /usr/local/bin/perl
# Usage: count.pl file
# Result: appends day of week to file and reports the size
# Init: must create world-writeable file initially
# insert in file <!--#exec cmd="count.pl counter-file" -->
$count_file = $ARGV[0];
local ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
if (open (FILE, ">>$count_file")) { # appends to file, so no lock needed
print FILE $wday; # uses day of week for daily counts
close (FILE);
}
local ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat ($count_file);
print "$size"; # report file size (== visits) to caller
exit (0);
- read default data
- read form data (override default data)
- display screen using current state (hidden) in form