Debbie Hysell
and
Gary Perlman
OCLC Online Computer Library Center, Inc.
Contents |
AbstractTo translate the FirstSearch interface and help into French and Spanish in 1998, OCLC developed an entity substitution technique and a special interface for translators and reviewers. For the development of FirstSearch 5.0, OCLC is modifying its approach to the translation process. |
Presented at: IWIPS99: 1st International Workshop on Internationalization of Products & Systems Rochester, New York, USA May 21-22, 1999 |
The OCLC FirstSearch® service is an online reference service now used by nearly 15,000 subscribing libraries in 64 countries to meet the information needs of their users. It provides Web access to more than 85 databases, including WorldCat (the OCLC Online Union Catalog, which is the world's largest bibliographic database now containing more than 41 million records), H.W. Wilson Select, ATLA Religion, Business & Industry, The New York Times, and RILM Abstracts. FirstSearch also provides document delivery and interlibrary loan services.
In February 1998, OCLC launched a project to internationalize the FirstSearch interface to meet the needs of an expanding global audience. Our goal was to provide a multilingual interface and help system (initially French and Spanish) for library staff and patrons who are not native speakers of English. The internationalization effort focused on the user interface (system screens and help) to eliminate language as a barrier to effective navigation and retrieval of needed information. Database content was outside the project's scope, although some of the databases include records in the targeted languages and OCLC plans to add complete databases in other languages in the future.
Besides the overall objective of increasing user satisfaction and efficiency, we wanted to produce the French and Spanish interfaces as quickly as possible. A major redesign of the system was just being initiated. We needed to complete the translation of the current interface in time for users to reap some benefit from it, and also early enough for OCLC staff to learn from this implementation for the system redesign work (now scheduled for release mid 1999).
Some of the challenges that we faced in implementing this multilingual interface included:
The net effect of internationalization on performance was positive because the new requirements, in turn, required us to take a new approach. The FirstSearch service is built on the OCLC SiteSearch software, which is a Web server that implements the Z39.50 information retrieval protocol. A heavily used feature of SiteSearch is entity substitution, which allows entities (variables) to be inserted into HTML and substituted (evaluated) on the server-side before delivery. For example, a string to display the current database might be:
Before internationalization, FirstSearch used hundreds of entities, substituted into thousands of uses per transaction, with acceptable performance. Mainly because SiteSearch had been developed as a research prototype, the storage and lookup algorithm would not scale-up to thousands of entities substituted into tens of thousands of uses. The storage and lookup of entities were converted to hash tables, and entity-lookup time in the existing system dropped from about 45% of the time used by SiteSearch to about 3%. Because hashing allows control over the average access time, the time used for entity lookup remained at about 3%, even after over 5,000 language-independent entities were added to FirstSearch.
To ensure that translators would see all parts of the system, a translators' map was prepared telling how to progress from screen to screen and ensure that all would be translated. For the interface simulation tool to be effective, a low-graphics version of the interface was required to eliminate problems associated with the need to copy fit text for graphics (e.g., translated text strings that might be too long for use in icons and buttons). To develop the simulation tool, all the text strings from the interface had to be redefined consistently as entities. The help files and error messages were not translated through the interface simulation tool; instead, we sent those to the translators in HTML via electronic mail.
Design constraints. Some of the initial limits that we had imposed on the design of the multilingual interface were later found to be unnecessary and unacceptable. For example, we originally intended for users to select a language when first logging on to FirstSearch. We found that users and OCLC staff preferred to be able to switch back and forth from one language to another on any screen. We decided to implement this additional flexibility when we discovered that it would not take significant resources.
Controlled vocabulary. Lack of a controlled vocabulary in all three languages posed problems. We found that inconsistent word choice and the use of acronyms in the interface slowed the translation process. We also found that disagreement among reviewers about the most appropriate vocabulary in French and Spanish delayed the revision process. This same type of debate over word choice occurs among reviewers of the English text as well, but because the translators and reviewers are located at remote locations, communicating about desired changes is more difficult and time-consuming.
Change control. Identifying and communicating changes in the interface and help files were difficult.
Communication problems. The team participating in this project was a virtual team, located in five countries on both sides of the Atlantic, communicating largely via e-mail. We experienced the typical misunderstandings associated with this medium. However, we also shared a sense of esprit de corps and excitement about the project's purpose and ultimate success.
Creeping English. Locating all the entities that required translation using the interface simulation tool was time-consuming. Throughout the project, we found English text tucked away in corners of the interface not yet inspected by translators and reviewers.
In FirstSearch 4.0, all entities were at the same level, by which we mean that an entity named search could be a verb or a noun, possibly upper case, but probably lower case. In part because of the novelty of the internationalization process, in part because some names were chosen before others, we had no naming conventions for language-independent strings. The main design strategy for FirstSearch 5.0 has been to identify the role of the entity, and, from that, to determine how the entity should be used and displayed. For example, several entities are named prefs (short for preferences): hotlink, icon tooltip, page title, submit button, help, status, etc. Each is stored in a separate section of a Windows-style INI (initialization) file, along with similar entities. The prefs entity for a page title is stored with other page titles in a pagetitle section:
[pagetitle] prefs = Set Options search = Basic Search advanced = Advanced Search
The section determines how the variables should be displayed, in this case, as a title with mixed-case letters. Other rules apply to other sections, but within a section, the use is consistent. Naturally, the values of section entities differ from the names, but note that in the case of prefs, the name and value differ in English as well. Often, the terminology in a system cannot be decided before usability testing and other forms of quality assurance; in an internationalized system, terminology can be translated into each language, including the "native" language, from the terminology used internally.
For FirstSearch 5.0, all language entities are held in language-specific files (with names based on ISO 639 2-character language codes like en.ini (English), es.ini (Español), and fr.ini (Français). Each language file has sections in several categories:
These have helped developers know where to find existing entities and create new ones that are consistent with the styles used in the system. The separation from the rest of the system (user interface style, functional aspects) has made it possible for non-programmers to make changes independent of other development. We are in the process of moving all the information in the language files into a content management system so that all language-specific material will be controlled from one source. We do not yet have the experience to tell if these measures will simplify translation, but we are in a much better position to say, for any entity, how and where it is used and what it should look like.
Glossary. We are creating a translation glossary to aid the translators in their selection of appropriate words. The English entity file (en.ini) will be copied to our content management system (an object-oriented database management system). There, along with the entity names and entities (text strings), we will store definitions and any comments that might help the translators (such as, terminology preferences expressed by users and reviewers since the last translation).
Terminology review. When we are ready for the translation to begin, we will export from the content management system a Microsoft Word table containing:
The translators will select from the table any entities (terms) that might be used differently in different regions or among users of different backgrounds. The translators will enter the translation, along with any comments, and will then send the file to reviewers (via the content management system). While the reviewers are approving or suggesting changes to the first group of terms, the translators will continue to translate the remaining entities (those considered to be less debatable). The reviewers will return their comments to the translators and, if needed, a conference call will be arranged to resolve remaining issues. The second group of translated entities will be stored in the content management system without this pre-review process.
The translated entities will be exported to the development environment, where they will be available to translators and reviewers alike for checking the accuracy in the context of other text and formatting.
The en.ini file will be copied over to the content management system at regular intervals and will be compared with entities already in the database. Changed and new entities will be highlighted and sent sequentially to technical writers for review and definition, then to translators for translation, and finally to reviewers for review.
Because the help system for FirstSearch 5.0 is multi-level (basic, advanced, and expert), we are using SGML to reduce redundancy of content and to maximize its reuse. The help files will also be stored in the content management system and will be routed for translation and review in a method similar to that used for the entities.
The process for routing entities for translation and review using the content management system should increase the accuracy of the translation and user satisfaction. Because the entities are under more rigorous control in the development environment, they are more likely to be used consistently in all languages. Providing translators and reviewers with definitions, comments, and an opportunity to interact should resolve any difficulties before they become too difficult to correct. Again, because control of the entities has been improved, it should be easier to implement changes.
OCLC has recently begun using Microsoft NetMeeting and WhitePine MeetingPoint software to conduct remote usability tests. This has allowed us to include, in testing our services, users whom we would have previously had to bypass because of the time and expense involved in bringing them to the usability lab at the OCLC headquarters in Dublin, Ohio. For FirstSearch 5.0, we will define tasks to be performed using the Spanish and French interface and its help system and recruit users in France, Canada, Spain, and Latin America to participate.
Because our processes have been improved through better entity control and the use of SGML and a content management system, we expect that we will be able to respond effectively to whatever enhancements the usability tests suggest.
Areas that we plan to investigate in future internationalization efforts are: