Lessons Learned from Internationalizing a Global Resource

Debbie Hysell and Gary Perlman
OCLC Online Computer Library Center, Inc.

Contents

1 The First Translation of FirstSearch

1.1 Need for Performance Improvement
1.2 The Translation Tool
1.3 The Translation Process
1.4 Difficulties

2 Applying the Lessons Learned

2.1 Simply Stated Rules
2.2 Need for Language Control in Advance
2.3 Need for Process Simplification
2.4 Usability Testing

3. Conclusions

Abstract
To translate the FirstSearch interface and help into French and Spanish in 1998, OCLC developed an entity substitution technique and a special interface for translators and reviewers. For the development of FirstSearch 5.0, OCLC is modifying its approach to the translation process.

Presented at: IWIPS99: 1st International Workshop on Internationalization of Products & Systems Rochester, New York, USA May 21-22, 1999

The OCLC FirstSearch® service is an online reference service now used by nearly 15,000 subscribing libraries in 64 countries to meet the information needs of their users. It provides Web access to more than 85 databases, including WorldCat (the OCLC Online Union Catalog, which is the world's largest bibliographic database now containing more than 41 million records), H.W. Wilson Select, ATLA Religion, Business & Industry, The New York Times, and RILM Abstracts. FirstSearch also provides document delivery and interlibrary loan services.

In February 1998, OCLC launched a project to internationalize the FirstSearch interface to meet the needs of an expanding global audience. Our goal was to provide a multilingual interface and help system (initially French and Spanish) for library staff and patrons who are not native speakers of English. The internationalization effort focused on the user interface (system screens and help) to eliminate language as a barrier to effective navigation and retrieval of needed information. Database content was outside the project's scope, although some of the databases include records in the targeted languages and OCLC plans to add complete databases in other languages in the future.

Besides the overall objective of increasing user satisfaction and efficiency, we wanted to produce the French and Spanish interfaces as quickly as possible. A major redesign of the system was just being initiated. We needed to complete the translation of the current interface in time for users to reap some benefit from it, and also early enough for OCLC staff to learn from this implementation for the system redesign work (now scheduled for release mid 1999).

Some of the challenges that we faced in implementing this multilingual interface included:

Design not optimal for translation. The FirstSearch service was introduced in 1991 and has undergone three major development changes since then. The original system design and subsequent changes had not anticipated the need for translations. For example, the methods of handling text strings (screen entities, error messages, and generated system responses) had been inconsistent.

Chasing a moving target. The fast pace of updates to the interface and databases continued, necessitating monthly maintenance releases, as we progressed toward completing the translations of the system screens and related help. This made change control and coordination more difficult.

Localization. The French interface was intended for use in France, Canada, and Francophone countries, while the Spanish interface was intended for users in Spain, Spanish-speaking Latin America, and Hispanic communities in the United States. The translators' word choice and style needed to be appropriate for audiences at opposite points on the globe.

Limited resources. Because OCLC staff were maintaining the existing system as well as beginning work on the design of a new FirstSearch service, resources were scanty for this internationalization effort.

1 The First Translation of FirstSearch

1.1 Need for Performance Improvement

The net effect of internationalization on performance was positive because the new requirements, in turn, required us to take a new approach. The FirstSearch service is built on the OCLC SiteSearch software, which is a Web server that implements the Z39.50 information retrieval protocol. A heavily used feature of SiteSearch is entity substitution, which allows entities (variables) to be inserted into HTML and substituted (evaluated) on the server-side before delivery. For example, a string to display the current database might be:

Database: &dbname;

When searching WorldCat, the entity dbname (delimited like SGML/HTML entities with & and ;) would display as:

Database: WorldCat

One method for internationalization called for entities to be used for all strings. The label Database: would be replaced by &CurrentDatabaseLabel;, which in English, would be set to:

Database:

and, in French (note the space before the colon), to:

Base de données :

and, in Spanish, to:

Base de datos:

Before internationalization, FirstSearch used hundreds of entities, substituted into thousands of uses per transaction, with acceptable performance. Mainly because SiteSearch had been developed as a research prototype, the storage and lookup algorithm would not scale-up to thousands of entities substituted into tens of thousands of uses. The storage and lookup of entities were converted to hash tables, and entity-lookup time in the existing system dropped from about 45% of the time used by SiteSearch to about 3%. Because hashing allows control over the average access time, the time used for entity lookup remained at about 3%, even after over 5,000 language-independent entities were added to FirstSearch.

1.2 The Translation Tool

A translators' interface simulation tool was developed to facilitate the translation process—letting the translators view the FirstSearch screens and translate them from within the Web browser. This tool was actually a working version of the current FirstSearch Web user interface. It allowed translators at remote locations to log onto FirstSearch, navigate sequentially through the FirstSearch screens and, for each screen, to window to a translation screen for the entry of the text strings in French or Spanish, and then window back to the original interface screen to preview the effect of the translation (copy fitting on the screen, appropriate line breaks, etc.).

To ensure that translators would see all parts of the system, a translators' map was prepared telling how to progress from screen to screen and ensure that all would be translated. For the interface simulation tool to be effective, a low-graphics version of the interface was required to eliminate problems associated with the need to copy fit text for graphics (e.g., translated text strings that might be too long for use in icons and buttons). To develop the simulation tool, all the text strings from the interface had to be redefined consistently as entities. The help files and error messages were not translated through the interface simulation tool; instead, we sent those to the translators in HTML via electronic mail.

1.3 The Translation Process

We contracted with translators who were native speakers of French and Spanish and were experienced with translating library materials. We sent the translators the URL of the interface simulation tool, the translators' map (for successfully navigating through all screens included in the system), and copies of the HTML help files. Within three months, the translators completed the first round of interface and help translations. During the next few months, we arranged for review by bilingual staff in OCLC's international divisions as well as by selected librarians in several countries in Europe and the Americas. While review progressed, the translators went back and translated the text that had changed or been added to the interface and help system while the initial translation was in progress. In August, final review and revision and testing were completed.

1.4 Difficulties

Design constraints. Some of the initial limits that we had imposed on the design of the multilingual interface were later found to be unnecessary and unacceptable. For example, we originally intended for users to select a language when first logging on to FirstSearch. We found that users and OCLC staff preferred to be able to switch back and forth from one language to another on any screen. We decided to implement this additional flexibility when we discovered that it would not take significant resources.

Controlled vocabulary. Lack of a controlled vocabulary in all three languages posed problems. We found that inconsistent word choice and the use of acronyms in the interface slowed the translation process. We also found that disagreement among reviewers about the most appropriate vocabulary in French and Spanish delayed the revision process. This same type of debate over word choice occurs among reviewers of the English text as well, but because the translators and reviewers are located at remote locations, communicating about desired changes is more difficult and time-consuming.

Change control. Identifying and communicating changes in the interface and help files were difficult.

Communication problems. The team participating in this project was a virtual team, located in five countries on both sides of the Atlantic, communicating largely via e-mail. We experienced the typical misunderstandings associated with this medium. However, we also shared a sense of esprit de corps and excitement about the project's purpose and ultimate success.

Creeping English. Locating all the entities that required translation using the interface simulation tool was time-consuming. Throughout the project, we found English text tucked away in corners of the interface not yet inspected by translators and reviewers.

2 Applying the Lessons Learned

The transition from FirstSearch 4.0 to 5.0 has included a nearly complete change of software and hardware. SiteSearch has moved from C and several proprietary data manipulation languages to Java, and the user interface has been completely redesigned to take advantage of the new possibilities. This has required (and allowed) us to redesign the internationalization strategy, ideally, with fewer problems.

In FirstSearch 4.0, all entities were at the same level, by which we mean that an entity named search could be a verb or a noun, possibly upper case, but probably lower case. In part because of the novelty of the internationalization process, in part because some names were chosen before others, we had no naming conventions for language-independent strings. The main design strategy for FirstSearch 5.0 has been to identify the role of the entity, and, from that, to determine how the entity should be used and displayed. For example, several entities are named prefs (short for preferences): hotlink, icon tooltip, page title, submit button, help, status, etc. Each is stored in a separate section of a Windows-style INI (initialization) file, along with similar entities. The prefs entity for a page title is stored with other page titles in a pagetitle section:

[pagetitle]
prefs = Set Options
search = Basic Search
advanced = Advanced Search

The section determines how the variables should be displayed, in this case, as a title with mixed-case letters. Other rules apply to other sections, but within a section, the use is consistent. Naturally, the values of section entities differ from the names, but note that in the case of prefs, the name and value differ in English as well. Often, the terminology in a system cannot be decided before usability testing and other forms of quality assurance; in an internationalized system, terminology can be translated into each language, including the "native" language, from the terminology used internally.

For FirstSearch 5.0, all language entities are held in language-specific files (with names based on ISO 639 2-character language codes like en.ini (English), es.ini (Español), and fr.ini (Français). Each language file has sections in several categories:

Parts of pages: labels, titles, purposes, tips, and status information

Database information: short and long database descriptions, index names

Data values: field names, values of some fields (e.g., record type, translated names of languages)

User interface: prompts, actions (e.g., labels for buttons)

Screen-specific: items used on only one screen (e.g., table headings, on-screen help)

These have helped developers know where to find existing entities and create new ones that are consistent with the styles used in the system. The separation from the rest of the system (user interface style, functional aspects) has made it possible for non-programmers to make changes independent of other development. We are in the process of moving all the information in the language files into a content management system so that all language-specific material will be controlled from one source. We do not yet have the experience to tell if these measures will simplify translation, but we are in a much better position to say, for any entity, how and where it is used and what it should look like.

2.1 Simply Stated Rules

In addition to standardizing the handling of entities, we have also developed rules to guide development staff in creating entities for the FirstSearch 5.0 interface:

No English in Java or HTML code. All English strings must be placed in en.ini (the English entity file) for translation. Any strings of one or more alphabetic characters that will be visible to users must be replaced by the corresponding &Lang.section.varname; entity.
No display-controlling HTML in Java. All HTML that controls the layout of the display should be placed in guiStyle.ini so that it can be modified for the Lynx version. Bad HTML to use: tables, fonts, tables, images, tables, and tables. Okay HTML to use: anchors, form elements. Borderline HTML: paragraphs, headings, lists, bold, italic. Form elements should have title attributes with language entities (for accessibility).
No references to images with image tags. All images should be defined in InterfaceStyle.ini, with ALT text defined in en.ini, so that they can be replaced by plain text for the Lynx version. References to images should look like: &Interface.images.name;.
4. No references to filenames. All references to filenames should be through entities in guiStyle.ini so that the Lynx version can refer to its own set of files. References to filenames should look like: &StyleTable.page.pagename;.
No new formatting classes. Brief records are formatted with BERBriefFmt.java. Full records are formatted with BERFullFmt.java. Both extend FirstSearchFormat.java, in which there are shared methods, e.g., to determine the type of records. Distinct formatting of records should be based on record type, for which there are constants in FirstSearchFormat.java, such as, ARTICLE, BOOK, and CONTENTS.

2.2 Need for Language Control in Advance

Glossary. We are creating a translation glossary to aid the translators in their selection of appropriate words. The English entity file (en.ini) will be copied to our content management system (an object-oriented database management system). There, along with the entity names and entities (text strings), we will store definitions and any comments that might help the translators (such as, terminology preferences expressed by users and reviewers since the last translation).

Terminology review. When we are ready for the translation to begin, we will export from the content management system a Microsoft Word table containing:

Entity name
Entity
Definition (when appropriate)
Comments
Location (screen or context in which entity is used)
Translation of any matching entities from FirstSearch 4.0
Space for entry of the 5.0 translation in the target language
Space for translation comments

The translators will select from the table any entities (terms) that might be used differently in different regions or among users of different backgrounds. The translators will enter the translation, along with any comments, and will then send the file to reviewers (via the content management system). While the reviewers are approving or suggesting changes to the first group of terms, the translators will continue to translate the remaining entities (those considered to be less debatable). The reviewers will return their comments to the translators and, if needed, a conference call will be arranged to resolve remaining issues. The second group of translated entities will be stored in the content management system without this pre-review process.

The translated entities will be exported to the development environment, where they will be available to translators and reviewers alike for checking the accuracy in the context of other text and formatting.

The en.ini file will be copied over to the content management system at regular intervals and will be compared with entities already in the database. Changed and new entities will be highlighted and sent sequentially to technical writers for review and definition, then to translators for translation, and finally to reviewers for review.

Because the help system for FirstSearch 5.0 is multi-level (basic, advanced, and expert), we are using SGML to reduce redundancy of content and to maximize its reuse. The help files will also be stored in the content management system and will be routed for translation and review in a method similar to that used for the entities.

The process for routing entities for translation and review using the content management system should increase the accuracy of the translation and user satisfaction. Because the entities are under more rigorous control in the development environment, they are more likely to be used consistently in all languages. Providing translators and reviewers with definitions, comments, and an opportunity to interact should resolve any difficulties before they become too difficult to correct. Again, because control of the entities has been improved, it should be easier to implement changes.

2.3 Need for Process Simplification

The previous translation process based on a translators' interface was effective in allowing translators to see immediately the effect of their translations in the interface. However, it was painfully slow for translators to bring up each possible screen for more than 60 FirstSearch databases to ensure that they had seen and translated every possible text string. Translating the entities first will be more efficient for the translators and reviewers alike. In most cases, the entities contain a large enough unit of text for an accurate translation without seeing the full context.

2.4 Usability Testing

OCLC has recently begun using Microsoft NetMeeting and WhitePine MeetingPoint software to conduct remote usability tests. This has allowed us to include, in testing our services, users whom we would have previously had to bypass because of the time and expense involved in bringing them to the usability lab at the OCLC headquarters in Dublin, Ohio. For FirstSearch 5.0, we will define tasks to be performed using the Spanish and French interface and its help system and recruit users in France, Canada, Spain, and Latin America to participate.

Because our processes have been improved through better entity control and the use of SGML and a content management system, we expect that we will be able to respond effectively to whatever enhancements the usability tests suggest.

3. Conclusions

Much of our experience in developing a multi-lingual interface for the FirstSearch interface can be extended to the internationalization of other interfaces:

Any preparation separating and organizing translatable text will help in the translation process, but there will always be special cases that will cause problems.
Identifying what something is, and where and when it is used, is a continuing problem. Identifying what has changed from version to version is less of a problem, but important for maintaining translated versions on the same schedule.
Internationalization and translation must be addressed along with other concerns (e.g., performance, accessibility, and cross-platform portability).
It is difficult to communicate the subtleties of a multilingual interface to developers. Even if developers understand the issues, subtle and unrealized language differences (e.g., word order, plurals, and gender) make it difficult for them to make decisions. Review is always recommended, by both automated tools and by people with more expertise.
Developing a user interface while attending to all the concerns of translation, accessibility, etc., can seriously slow development. Often, it is easier to develop a prototype for a few screens in a platform-specific, language-specific form, followed by a cleanup (possibly by others with more expertise in those areas).

Areas that we plan to investigate in future internationalization efforts are:

Use of a simplified English vocabulary to improve the usability of text and to ease the translation task
Development of language-free and culturally neutral icons
Providing users with easy access to machine-translation tools (like AltaVista's Babelfish) for databases available only in a single language