An Electronic Reading-Text of Chaucer's Canterbury Tales

By Paul G. Remley

Copyright 1993 by the author.

The capability of computers to mark the beginning of the third millennium by inaugurating a new phase of humanistic scholarship was recognized long ago. Monographs addressing the subject of computing in the humanities have appeared frequently over the course of the past twenty-five years1 and advanced research on the use of computers in literary scholarship is in its third decade.2 The journal Computers and the Humanities was established in 1969 and numerous serial publications have since followed its lead.3 In the area of computer-assisted pedagogy, there has been a proliferation of computational innovations across the curriculum.4 Nevertheless, after more than a quarter-century of intensive work, much of the initial promise of computer use in higher education remains unfulfilled. For a number of reasons, notably the vast discrepancy between the pace of advances in computer technology and increases in available resources for academic hardware acquisition, the "classrooms of the future" at most universities are still in early phases of development. Progress in another practical discipline, that of computer-assisted textual analysis, has also been slow. Despite intensive work in the field, few standard procedures for the treatment of texts with the aid of computers have achieved any measure of wide acceptance.5 Plans to utilize computers in the production of critical editions of texts, catalogues of manuscripts, full-text literary databases, stemmatic analyses of textual variants, digitized facsimiles of manuscript leaves, and so on, have all been announced, but specific applications often remain accessible only by select groups of specialists or have failed to emerge altogether.6 With the notable exception of the field of computer-assisted lexicography, which has already succeeded both in generating a wide range of publications and producing a substantial amount of tangible results,7 it seems reasonable to conclude that humanistic computing is still at an early stage.

Many of the factors that might be adduced to account for these circumstances are wholly extradisciplinary: the public at large has had very limited access to powerful, multifunctional computers and the cost of disk storage remains dear, especially in comparison to that of traditional media. To date, there simply has been no pervasive replacement of traditional printed texts by electronically encoded text-files for purposes of the preservation and consultation of texts. It is still not entirely clear that the impact of the widespread reading of electronically transmitted texts will ever be as great as that occasioned by the transition from, say, oral to literary transmission of texts or the supersession of the scriptorium by the printing-plant.8

There are a few signs, however, that some changes may be imminent. As the footnotes to this article attest, recent years have seen a huge increase in the number of publications on computer-assisted pedagogy, textual research, and related areas. A significant amount of recent work on computer-assisted analysis of texts has been undertaken in service of research on medieval literature. 9 The publication in computer-readable formats of such standard (and, previously, sometimes unwieldy) reference works as the second edition of the Oxford English Dictionary, the Modern Language Association's International Bibliography, the Corpus Christianorum (CETEDOC), the Riverside Chaucer, and other conspicuous diskette releases from major university presses. A few compendious and almost indispensable textual corpora have only appeared in computer-readable form (e.g., the IBYCUS [Latin] and Thesaurus Linguae Graecae CD-ROMs, materials assembled for the Dictionary of Old English, etc.). The past few years have also seen distribution of high-quality scholarly journals such as the Bryn Mawr Classical Review over international electronic-messaging networks and a huge and diverse group of computer users in all parts of the world have in fact become acclimated to reading texts on computer in the course of their daily exchanges of electronic mail. All of these developments have profound implications that may well become more widely recognized as the years progress. Since 1990, moreover, there has been an enormous increase in the power of generally available microcomputing hardware as well as a concurrent drop in price. It appears that the point may soon be reached at which computer-literate educators and researchers will be able to develop powerful applications suited to their own work on an ad hoc basis. The general release of flexible "authoring systems" has placed access to a wide variety of high-level computational routines in the hands of an unprecedented number of users.10

The main concerns of the present discussion are pedagogical. The primary goal of the project outlined below is the production of an electronically-encoded reading-text of Chaucer's Canterbury Tales, specifically designed to assist undergraduate students in exploring this medieval work in its original (Middle English) language. The proposed text differs fundamentally from, say, the diskette release of the Riverside Chaucer in that it lays no claim to the status of a critical edition and is intended to serve mainly as a tool in the classroom. The following discussion, however, will also address a number of issues that have arisen in the course of the development of the text that bear more directly on areas of research in textual studies (such as lemmatization, homography, and full-text searching by collocation) as well as several kindred scholarly disciplines (e.g., lexicography and textual editing). Above all, I have tried to incorporate into the present essay a bibliographical summary of publications in these areas that have appeared over the course of recent decades. Indeed, this may be one of the last opportunities to undertake such a review in the space of a single article.

The Reading-Text of Chaucer's Canterbury Tales

The set of features that distinguishes the classroom text of the Canterbury Tales described below from other editions is fairly circumscribed. The appearance of the user interface, described in more detail below, in most essentials resembles that of a conventional printed book of poetry. The Middle English text of the Tales normally provides the only words visible to the user on the display of the computer. Students may obtain a Modern English explanation of any word in the Chaucerian text at any time by selecting it on-screen by means of a trackball, mouse, or similar pointing device. This explanation or gloss regularly contains a generous and diverse selection of possible synonyms for the chosen term and a brief linguistic gloss (e.g., "noun" or "comp. adj."). Students may, in addition, step through passages containing all occurrences of a word or phrase of their choosing by searching for a specified string of characters. Passages containing groups or collocations of several different terms may also be tracked down easily in the Chaucerian text. It is only necessary to specify multiple terms when invoking the search. A search for multiple terms will only succeed when the terms are found within a given, user-specified number of lines. So far, apart from the circumstance that it can only be consulted on a computer, the system described here would seem to offer few capabilities beyond those already available in any standard printed edition of Chaucer containing both a set of marginal glosses and a decent glossary. The reading-text includes two specific innovations, however, which set it apart from the standard classroom editions. First, although all searching results in the appearance of passages of Middle English, the term(s) specified for a search may be drawn from the vocabulary of either Middle English or Modern English. This last capability is made possible by the stipulation that all searching should take place simultaneously in (1) the Middle English text and (2) an off-screen buffer, termed below the shadow-text, which contains the Modern English treatments of individual terms that appear in the user-selected glosses described above. The whole process may be termed proximity-searching by collocation.11 One beneficial result of this procedure is that the specification of a particular group of Modern English terms will in almost every instance produce a number of different passages deserving individual consideration. Passages are selected on the basis of their inclusion of terms corresponding to all of the terms specified for the search and their proximity within a user-specified number of lines. As the merits of these must be evaluated independently by the user, the whole system has a tendency to encourage students to explore the text of the Tales at will and to evaluate discrete passages critically with little or no intervention on the part of the instructor.

In the course of development, each of the two main features outlined here--e-screen glossing of vocabulary and proximity-searching by collocation--has revealed previously unsuspected benefits. The one-to-one correspondence resulting from the process of supplying each word of the Canterbury Tales with its own gloss (described in greater detail below) has in practice defined a generalized textual structure (or document architecture) capable of accommodating a scholarly apparatus of virtually any degree of complexity, precision, or erudition. Improvements to the glossing apparatus, however, must in every case be introduced through the labors of an expert, so this structural flexibility should for the moment be considered a long-term benefit of the system. The ability to carry out multiterm searches of a Middle English text by specifying search terms in Modern English produces a benefit that is more immediately available. The passages set out for the user's inspection in many cases embody what amount to conceptual equivalents of the generalized semantic range of the terms specified for the search. The Middle English text is revealed in a way hitherto unattainable through the use of conventional glossaries and lexicons. It becomes an easy matter for the student (or scholar) to search the literary text for an almost infinite variety of themes, formulas, topoi, examples of medieval rhetorical descriptio, and the like, without the necessity of undertaking an arduous program of indexing.

Beyond the fairly restricted set of features uniquely available within the computing environment, the model used for the electronic reading-text is conventional and largely in line with the history of texts reaching back to the codex and scroll. The fundamental computational "metaphor" of the project is that of the book. After launching or "opening" the reading environment containing the text, readers are first confronted with a decorated "cover" illustration. Readers may then turn to a Table of Contents containing coded pointers to the beginning of individual texts, allowing them to move to the beginning of a particular head-link, end-link, or Tale by selecting its title with the pointing device. Beyond the unit of the book, the fundamental divisions of the electronically encoded text are the page and the line.12 The sense in which these terms are used here differs little from that encountered in references to traditional parchment or paper documents. There are twenty-five lines to a page; page-breaks are mainly arbitrary--"hard" page-breaks occur only at the boundaries of discrete head-links, end-links, and Tales--due to the dimensions of the page and the custom typeface that appears on the computer display. A variable-width, Times Roman-based typeface is aligned with the left margin (or both margins in the case of continuous prose); pages are also supplied with graphical approximations of standard running heads and page numbers. One non-traditional feature that does make special use of the computer interface is the provision of limited scrolling abilities on individual pages. Although the user, turning to a particular page for the first time, is presented with exactly twenty-five lines of text, up to fifteen additional lines immediately preceding and following the visible excerpt are hidden off-screen in buffers or scrolling regions. This scheme offers two main benefits. First, the reader, on reaching the bottom of a page, may choose to read one or several additional lines before jumping to another part of the text of the Canterbury Tales or concluding a reading session altogether. (If, however, the reader decides to go to the next page after all, the words appearing at the top of that page will be those immediately following the last visible text on the preceding page.) Second, the inclusion of hidden regions containing scrolled text on each provides a brute-force solution to the problem of searching for collocations across page-breaks.

Although the project summarized here is described essentially as a fait accompli, the initial phases involved several false starts and a large amount of trial and error. The specific procedure followed during the developmental process (which may, in practice, never finish) and some technical specifications for the reading-text itself may now be set out concisely. Although it should be stressed at the outset that what is proposed here is essentially a text-independent system, the developmental text used for the project was based on that found in two nineteenth-century printed editions, i.e., those of W. W. Skeat and Thomas Wright. The developmental text was encoded in-house by my collaborator Eric Juvet and revised to constitute what amounts to a specially prepared private edition.13 We hope to release a publicly-distributed form of the reading-text as soon as possible, perhaps by the end of the year. This will embody a wholly new edition of the Middle English text of a critical standard, which according to the current design specification will comprise a semi-diplomatic edition of the Tales based on the readings of a single manuscript.

The ASCII-encoded text file produced by Juvet occupied 968,252 bytes (inclusive of punctuation), thus comprising almost a full megabyte of information. The file contained a total of 185,814 words. The first step in the text-processing phase of the effort involved the sorting of this full text of the Canterbury Tales to produce a lexicon of unique strings through a simple process of pattern-matching. (If the program has "seen" the string before, it was discarded; otherwise it was appended to the lexicon of unique strings.) When this sorting process had been completed, the one-megabyte text containing nearly two hundred thousand words had yielded only 12,181 unique strings. These strings were then sorted into a lemmatizing database (described in greater detail below), i.e., an arrangement of records containing, among other information, a single headword (or lemma) and a complete accounting of all inflected and variant forms of the word encountered in Middle English texts to date.14 (The process of building and maintaining the lemmatizing database is in fact a protracted process; the system will surely continue to remain capable of receiving further refinement almost indefinitely.) When all of the unique strings had been integrated into the lemmatizing database, the records reflected a vocabulary of around six thousand words for Chaucer. The figure may seem low in comparison to a vocabulary of about thirty thousand words for the typical Ph.D. student, but it may in fact seem rather high when it is noted that the vocabulary is distributed throughout a single literary work.

It is perhaps too early to say precisely where the reading-text of the Canterbury Tales described here stands in relation to the main progression of computational studies of medieval texts.15 As far as I can ascertain, surprisingly few applications of computer technology to the Chaucerian canon have been made to date, and in most of these the Middle English text supplies the object example for a fairly restricted application of a particular methodological approach.16

Lemmatization and Homography

The first step in assembling the mechanism allowing users to access glosses to individual words of the Canterbury Tales involves a process of lexical sorting generally known as lemmatization. Although researchers who have addressed the issue at all invariably conclude that the term describes the nature of the task to which it refers imprecisely at best, lemmatization is regularly taken to entail the association of all forms of a given word that occur in a continuous text, whether inflected or not, with a normalized citation form or lemma, which is analogous to the headword or first element in a dictionary entry.17 (Medievalists will note that the use of the term lemma in this sense bears comparison with the nuance of the Medieval Latin term interpretamentum--as well as certain aspects of the term glossa). Systems of lemmatization have attracted a substantial amount of attention in recent years, mainly in connection with analyses of texts written in modern languages (notably German, Dutch, and French)18 and attempts to develop systems capable of performing automatic translation by machine.19 The lemmatization of English texts, however, has been discussed less often, presumably because of its frequent status as a target language in machine-translation experiments.20 In recent linguistic research, the process of lemmatization has proved crucial in attempts to achieve accurate interpretation of continuous prose (and, in advanced applications, speech) by computer, most of which fall under the rubric of natural-language processing.21 Literary scholars and lexicographers have had many occasions to address the issue for similar reasons. An exhaustive lemmatization of a text constitutes a preliminary stage in the compilation of a lexicon or, say, a glossary for a critical edition. It is in this connection that most of the work to date on the lemmatization of medieval vernacular texts has appeared; studies exist for Medieval Latin, Old French, Old and Middle High German, several dialects of medieval Spanish, and Old Icelandic (but not, to my knowledge, Middle English).22 In the course of the preparation of the reading-text of the Canterbury Tales, lemmatization provided a preliminary means of effecting the precise definition of specific words of the Middle English text, thereby facilitating the intelligent interpretation of entire passages.23 Indeed, lemmatization may be viewed as the main component of the whole system, since it is only through the association of a given string with a lemma that the on-screen text can be connected with its proper gloss. Once the link has been established, the apparatus of the electronically encoded text is in fact capable of accommodating an expanded gloss, capable of achieving virtually any degree of scholarly precision.

As all words in the text of the Canterbury Tales are treated initially by the lemmatizing database as discrete strings of characters, problems arise whenever two distinct terms share a single spelling. All specialists in the field of lemmatization have observed that the phenomenon of homography (and the concomitant ambiguity of meaning that it engenders) constitutes a major obstacle to performing an efficient and accurate lemmatization of any continuous text in a single pass.24 The problem of homography as a whole may be divided into many subcategories. There are special problems, for example, in the treatment of terms from certain grammatical categories (e.g., participles, possessives, and multiple terms written in Middle English as a single discrete string).25 Cases of homographic ambiguity have traditionally been resolved in one of two ways. The most direct method, involving the intervention of an expert reviser, is also the most time-consuming. Beyond this, complex algorithms have been developed that achieve a reasonable degree of precision in the resolution of ambiguity according to context in the treatment of modern languages. No system of lemmatization, to my knowledge, has yet laid claim to a level of perfect accuracy.26 In the treatment of medieval texts, moreover, it is by no means clear that a system of fully automatic lemmatization is a practical or even desirable goal. The corpus of texts preserved in a particular medieval vernacular may in most cases be viewed as finite and fairly stable. Given the variation in orthography, syntax, and punctuation observed in most medieval vernaculars (and many Latin texts as well), the development of an adequate system of algorithmic contextualization would require scarcely less effort than a thorough appraisal by an expert reviser. The whole question of the resolution of orthographic ambiguity in the computer-assisted analysis of medieval texts deserves closer attention that can be offered here.

No ideal method for dealing with homographic ambiguity has come to light in the course of the present project and none is likely to emerge at any point in the near future.27 In accordance with the efforts of other literary scholars who have used computers in the preparation of texts, however, a system of semiautomatic lemmatization has been adopted as a compromise solution.28 The first stage in this process, which may be termed prelemmatization, involves the assignment of groups of generalized, single-word definitions to homographic strings. (Again, this is an unending process; particularly in the case of medieval literature, additional homographs continue to come to light throughout the course of the preparation of the lemmatizing database.) For example, the Middle English homograph glede, which might represent a form of an adjective ("glad, cheerful, radiant"), a verb ("to make glad"), or any one of several nouns ("gladness," "burning coal," or "kite"). The string glede would thus be associated automatically at the prelemmatization stage with a group of possible single-word glosses: glad cheer happiness fire bird. The substitution of the general term "fire" for "burning coal" and "bird" for the more specific "kite" reflects a significant feature of the design of the prelemmatization resource. By assigning a group of possible glosses to a specific homograph which displays a broad semantic range, it is possible to begin the preliminary phase of proximity-searching by collocation (employing Modern English search terms) even at the prelemmatization stage.

Although it does not remove the necessity of the eventual intervention of an expert reviser altogether, the main strength of the system of prelemmatization sketched out here is that it remains a fully automatic process right up to that point. It allows the textual scholar, within hours of receiving a new copy of a computer-readable text, to associate every word of that text with its appropriate lemma and gloss. The main drawback to the scheme is that it also results initially in the association of homographic character strings with lemmata (and thus definitions) that are incorrect and must eventually be removed by an expert reviser. In practice, the loss of precision incurred by the association of multiple definitions with individual homographs is not great. The process in fact supplies the expert reviser at the outset with a predefined set of choices to assist in refining the precision of the electronically encoded apparatus. By supplying homographic strings with single-word definitions whose semantic compass as broad as possible, it has proved possible in practice to carry out useful proximity-searching once the initial stage of prelemmatization has been completed. The student has convenient access to a range of synonyms for a given Middle English term and the literary scholar is immediately placed in a good position to search for themes, concepts, or topoi.

The lemma, in the procedure described above, serves a fundamental role in the production of the Chaucerian reading-text. It provides a "handle" allowing specific information--most commonly definitions, etymologies, and the like--to be linked to particular words of Middle English. The information in question, viewed collectively, has been referred to here with some deliberate vagueness as the gloss. It is worth stressing that the gloss in the electronically encoded text of the Canterbury Tales (or, for that matter, any other text prepared in a similar manner) might in theory contain any type of data. Though in the present instance the contents of the gloss have been restricted to the part of speech and a range of possible synonyms, expanded versions of the text might accommodate a "single best" translation, a detailed etymology, a set of textual variants, a pedagogical commentary, a scholar's personal notes, or any kind of sound, illustration, or animation.

Shadow-Text and Collocation

One crucial aspect of the reading-text has yet to be described: the procedure allowing individual glosses to be associated with specific words in the Canterbury Tales. It was noted above that the main part of the electronically encoded text may be described in terms of two fundamental structural units, i.e., the page and the line. This standard architecture in fact carries over to the glossing apparatus. The glosses reside in every instance on the same page as the continuous Middle English text to which they refer, but they are hidden from the reader in an off-screen field or buffer that is best envisioned as a grid or matrix of references that stand "in the background" of the Chaucerian text. The collected glosses for a particular page of the Canterbury Tales are referred to here as the shadow-text. The process of linking the constituent elements of the normally invisible shadow-text to the individual words in the text consulted by the reader requires the consideration of a third basic structural unit in the reading-text: the word. Each page of the shadow-text includes precisely the same number of lines and words as the corresponding page of the reading-text. Words in the main text of the Tales are defined as strings of characters bounded by blank spaces (regularly assigned an ASCII value of 32) or, in cases of line-breaks, hard returns (ASCII 13). Words in the shadow-text, however, are defined as strings of characters of any length and containing any number of "words" (in the traditional sense), spaces, returns, and the like, separated by one of the less commonly used seven-bit computational characters (in the developmental version, ASCII 255). These are accessed in response to events generated by the user. For example, if a reader selects word 4 of line 8 of page 3 in the on-screen reading-text, the glossing mechanism immediately fetches the corresponding word in the shadow-text and renders it visible on the computer display.

One immediate objection to the scheme set out above would be that it greatly increases the size of the text file used to store the text of the Canterbury Tales. Every word in the main text has its own gloss and many of these glosses are identical and ostensibly redundant, arising as they do in response to multiple occurrences of a given string or lemma. The one-megabyte ASCII file used to generate the developmental version of the reading-text described here expanded to more than eight megabytes when equipped with the full apparatus of the shadow-text. An arguably more efficient scheme would involve a greatly reduced shadow-text that only provided pointers to individual records in a glossary, which would presumably be stored in the form of a flat-file database. This objection has been set aside for two main reasons. First, the proposed document architecture is intended to be capable of accommodating an apparatus able to achieve a high degree of scholarly precision, particularly in the treatment of unusual senses of words and idiomatic phrases, which might in practice comprise any number of words in various lines of the reading-text. The storage of glosses in a database would not absolutely preclude the achievement of this degree of precision, but it would greatly complicate both the initial developmental process and any future revision of the text. According to the plan set out above, however, any word or phrase in Chaucer's text may eventually receive any sort of treatment that an expert reviser deems appropriate. Second, the redundant and seemingly inefficient repetition of similar glosses at various points in the shadow-text is in fact the single feature of the system that allows searching for themes, concepts, and so on, to be undertaken by the reader on an ad hoc basis. The key to the system of proximity-searching, it will be recalled, is the availability of collocations of terms. One of the most striking (and unexpected) conclusions to emerge from the present course of research is the discovery that the arrangement of the fairly simple, lemmatized synonym sets that constitute the glosses described above in the form of a shadow-text immediately produces an extremely powerful searching capability that provides useful results even before the intervention of an expert reviser. This conclusion may have ramifications for future work on other medieval texts.

The Relationship of the Reading-Text to "Hypertext"

The association of the so-called glosses to the text of the Canterbury Tales (here comprising fairly concise sets of synonyms and near-synonyms) with individual terms in a shadow-text effectively produces, in a greatly expanded form, a new semantic approximation of the Chaucerian lexicon. Despite the expansive qualities of this text, after some consideration the term "hypertext" has been excluded from the present discussion in favor of the less contrived, if slightly awkward, phrase "electronic [or, better, 'electronically encoded'] reading-text." The longest-standing formulation of hypertext, primarily associated with the visionary theorization of Theodor Holm Nelson, consists in two main elements: (a) an addressing scheme for the whole of world literature and every other kind of text, intended to facilitate the storage and retrieval of digitally encoded documents of every kind; and (b) a system for embedding text in compound documents.29 The classical formulation of hypertext amounts to what may be termed a full-text model in which no instance of text is actually copied to or reproduced in a linked document. What might appear at first glance to be a quotation of, say, five lines of the Canterbury Tales is in fact a part of the original edition of the text of those lines, stored in a critical, electronically encoded master text of the Chaucerian canon; it is not a copy or representation of those lines. In another formulation, the citation simply points to two addresses in the master text, representing the beginning and end of the cited excerpt and providing a "window," as it were, to those five lines. Readers who wish to investigate, say, a critic's interpretation of the lines in question may quite simply "go into" the citation, that is, to move from the citation to the continuous text of the Canterbury Tales from which it is drawn, where they may read the five cited lines in their full original context. Similarly, references to the work of other scholars in the footnotes of our hypothetical article would supply pointers to the full texts of published versions of the cited research. Clearly this notion of hypertext is far removed from the electronic reading-text described here, both in terms of specific design features and anticipated imminence ofavailability. This text might be modified to provide one node in such a system, but it cannot be held to constitute an example of hypertext in its own right.

In recent literature, the definition of the problematic term hypertext, itself an infelicitously contrived Greco-Latin hybrid that continues to dissonate in the ears of many scholars, has become extremely blurred.30 The introduction of methods of integrating sound and graphic art into computer presentations has exacerbated the situation, giving rise to an even more awkward neologism, i.e., hypermedia.31 The rationale for distancing the Chaucerian text under discussion from these developments, however, rests on criteria more firmly grounded than merely aesthetic objections. At least since the appearance of Cortazar's multistranded novel Hopscotch, experiments in hypertext have involved the introduction of specific linking of passages by an author. Such procedures may be said to rely on a system of explicit links.32 The shadow-text has been designed so as to facilitate the association of words, phrases, and concepts through the use of implicit rather than explicit links. (The Table of Contents mentioned above contains the only instance of the use of explicit links in the system.) This principle--the substitution of implicit links for explicit links--obtains in the cases of both the linking of the gloss to the reading-text and of entire passages to groups of search terms.

Even though it cannot be taken in itself as an example of hypertext, the approach sketched out here may serve to counter some of the objections to the viability of hypertext that have been raised in recent years. There are, after all, many purely practical difficulties involved in the establishment of an extensive system of explicit links in such a system. A single page of a newspaper text or academic publication would, in an ideal system, require hundreds if not thousands of links to be drawn up. As far as anyone has yet been able to ascertain, these links would have to be implemented one at a time either by the author of the original document or an expert reviser. This raises certain questions of trust, since the reader's subjectivity is in effect placed in the hands of unseen indexers whose levels of expertise in any given area that may be of interest to the user are unknown. Even if a comprehensive system of explicit links could be established successfully in a hypertext system, there is a second, even more subtly affective side effect resulting from the use of such links. I would characterize this as a "buried treasure" or "secret door" syndrome. Hypertext systems have the potential to produce a sense that there is something just out of reach, some gloss or interpretation that lies behind the passage at hand but whose precise nature is hard to fathom. In an exhaustively indexed system, the sheer volume of choices might well prove unsettling. It would be hard for the reader to know which way to turn.

Conclusion

Hypertext, while abandoning the linear model of text promoted traditionally by the scriptorium and printing press, may also produce on occasion the unfortunate effect of promoting compartmentalization and producing a greater sense of isolation in the user, perhaps conveying a sense of being manipulated by an unseen, external agent. The tantalizing prospect of achieving something approaching omniscience--or, at any rate, synchronous access to a diverse range of texts--by consulting texts on computers renders the limitations of existing hardware and software all the more frustrating. Most users are still forced to consult single screens of textual data sequentially.

The system used here for the production of the reading-text of the Canterbury Tales addresses several of the concerns raised above. Through its employment of a process of prelemmatization, it supplies a virtually infinite number of implicit links without the intervention of an expert agent. Readers are free to explore these at will by formulating queries in any manner that they choose. The fairly simple system of linguistic glossing introduced in the present system, produces a result that is, on the one hand, absolutely predictable--the reader who selects a word will invariably encounter a group of terms in a gloss that exemplifies a consistent style--without compromising the variety of choices available to the reader and, arguably, greatly increasing their number. Most researchers I think would agree that the ability to flip through the pages of a book at random, to scan the books surrounding the one you are seeking in the shelves of a library stack, to notice the title of an article quite by chance on the cover of a journal, is every bit as important as the systematic examination of sequences of continuous prose. It should be noted, however, that modern books and libraries reached their present state over centuries of trial, error, and refinement. Systems of textual computing will inevitably address their present limitations and continue to evolve. The project described here is intended as a small and preliminary contribution to the task at hand: bringing the use of computers more precisely in line with the way students and scholars work in their daily routines.33

University of Washington

NOTES

1 Sources include Computers in Humanistic Research, ed. Edmund A. Bowles (Englewood Cliffs: Prentice-Hall, 1967); Computing in the Humanities, ed. Serge Lusignan and John S. North (Waterloo, Ontario: University of Waterloo Press, 1977); Computing in the Humanities, ed. Richard W. Bailey (Amsterdam: North-Holland, 1982); George M. Kren and George Christakes, Scholars and Personal Computers: Microcomputing in the Humanities and Social Sciences (New York: Human Sciences Press, 1988); Rudy Hirschheim, Steve Smithson, and Diane Whitehouse, Microcomputers and the Humanities: Survey and Recommendation (New York: Ellis Norwood, 1990); Humanities and the Computer: New Directions, ed. David S. Miall (Oxford: Clarendon Press, 1990); Scholarship and Technology in the Humanities, ed. Mary Katzen, British Library Research (London: Bowker, 1991); and Humanities Research Using Computers, ed. Christopher Turk (London: Chapman, 1991).

2 On the use of computers in literary and textual studies, see Robert L. Oakman, Computer Methods for Literary Research (Columbia: University of South Carolina Press, 1980; 2nd ed.: Athens: University of Georgia Press, 1984); Computers in Literary and Linguistic Research, ed. L. Cignoni and C. Peters, supplement to Linguistica Computazionale, 3 (1983) (Pisa: Giardini, 1984); L'ordinateur et les recherches littéraires et linguistiques, ed. Jacqueline Hamesse and Antonio Zampolli (Paris: Champion: 1985); B. H. Rudall and T. Corns, Computers and Literature: A Practical Guide (Tunbridge Wells: Abacus Press, 1987); Nancy Ide, "The Relevance of Computational Linguistics to Textual Studies," Computers and Texts, 1 (1991):7-9; Computers and Written Texts, ed. Christopher S. Butler, Applied Language Studies (Oxford: Blackwell, 1992), esp. John F. Burrows, "Computers and the Study of Literature," at pp. 167-204; and M. Deegan, S. Lee, and C. Mullings, "Computing in Textual Studies," Computers and Education, 19 (1992):183-91; Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric, ed. Rosanne G. Potter (Philadelphia: University of Pennsylvania Press, 1989).

3 Publications that may be consulted profitably for articles germane to the present discussion include the ALLC [Association for Literary and Linguistic Computing] Bulletin (Swansea: University College et al., 1973-85) and ALLC Journal (Cambridge: Cambridge University Library et al., 1980-85), succeeded by Literary and Linguistic Computing (Oxford: Association for Literary and Linguistic Computing, 1986-); Canadian Humanities Computing (Toronto: Center for Computing in the Humanities, 1987-); Language Technology, incorporating Language Monthly, succeeded by Electric Word (Amsterdam: Language Technology BV et al., 1987-90); Academic Computing (McKinney, Texas: Academic Computing Publications, 1987-90); Computers in Literature and Computers in Literature Update, now Computers and Texts (Oxford: CTI [Computers in Teaching Initiative] Centre for Textual Studies et al., 1990-); and Writing on the Edge (Davis: University of California at Davis Campus Writing Center, 1989-).

4 For details of specific applications, see Susan Hockey, A Guide to Computer Applications in the Humanities (Baltimore: The Johns Hopkins Press, 1980); Humanities Computing Yearbook, ed. Ian Lancashire et al. (New York: Oxford University Press, 1988); and CTI Centre for Textual Studies Resources Guide, March 1992, ed. Caroline Davis, Marilyn Deegan, and Stuart Lee (Oxford: CTI Centre for Textual Studies, 1992). For general discussion of issues involved in text-processing, see Peter Batke, "Text Specific Workstations: A Software Problem," Academic Computing, 4.1 (1988-89):32-35 and 70-72; and Ronald F. E. Weissman, "In Search of the Scholar's Workstation: Recent Trends and Software Challenges," Academic Computing, 4.1 (1988-89):28-30 and 59-64. On specifically pedagogical topics, see John M. Slatin, "Hypertext and the Teaching of Writing," in Text, Context, and Hypertext: Writing with and for the Computer, ed. Edward Barrett (Cambridge, Massachusetts: MIT Press, 1988), pp. 111-29; Susan J. Hockey, Jo Freedman, and J. Cooper, "Computers in the Study of Set Texts," in Humanities and the Computer, ed. Miall, pp. 113-22; and resources listed and reviewed in Bits and Bytes Review: Reviews and News of Products and Resources for Academic Computing (Whitefish, Montana: Bits and Bytes Computer Resources, 1986-).

5 See especially Peter Desmond Smith, An Introduction to Text Processing (Cambridge, Massachusetts: MIT Press, 1990); G. Salton, Automatic Information Organization and Retrieval (New York: McGraw Hill, 1968); Donald E. Knuth, Searching and Sorting, vol. 3 of The Art of Computer Programming (Reading, Massachusetts: Addison-Wesley, 1973). See also W. Martin, B. Al, and P. van Sterkenburg, "Text-Processing and Lexicographical Information--A State of the Art," ALLC Journal, 2 (1981):61-68. For specific applications, see J. McNaught, "Specialized Lexicography in the Context of a British Linguistic Data Bank," in Lexicography in the Electronic Age, ed. Goetschalckx and Rolling, pp. 171-84; Tove Fjeldvig and Anne Golden, "Experiments with Language-Based Aids in Information Retrieval Systems," Nordic Journal of Linguistics, 11 (1988), 33-48; J. K. Proud, The Oxford Text Archive, British Library Research and Development Report, 5985 (London: British Library, 1989); G. Chartron, "Lexicon Management Tools for Large Textual Databases: The Leximet System," Journal of Information Science, 15 (1989):339-44; J. Carroll and C. Grover, "The Derivation of a Large Computational Lexicon for English from LDOCE," in Computational Lexicography for Natural Language Processing, ed. Boguraev and Briscoe, 117-33.

6 Jacques Froger, La critique des textes et son automatisation, Initiation aux nouveautés de la science, 7 (Paris: Dunod, 1968); J. Mau, "Computertechnik im Dienst der Edition lateinischer Texte," in Probleme der Edition mittel- und neulateinischer Texte, ed. Ludwig Hödl and Dieter Wuttke (Boppard: Boldt, 1978), pp. 143-49; La pratique des ordinateurs dans la critique des textes, ed. Jean Irigoin and Gian Piero Zarri, Colloques internationaus du Centre de la Recherche Scientifique, 579 (Paris: CNRS, 1979); Peter L. Shillingsburg, Scholarly Editing in the Computer Age: Theory and Practice (Duntroon: University of New South Wales, 1984); Ulrich Müller, "Personal Computer, wissenschaftliche Manuskripte und Editionen," Editio, 2 (1988):48-72; Gian Piero Zarri, "Some Experiments on Automated Textual Criticism," in Miscellanea di studi in onore di Aurelio Roncaglia, ed. Roberto Antonelli, et al., 1 vol. in 4 (Modena: Mucchi Editore, 1989), 1439-64; Rolf Bräuer, "Historische Edition und Computer. Internationale Tagung von 26. bis 30. Oktober 1988 in Graz," Zeitschrift für Germanistik, 10 (1989):608-11; and Wilhelm Ott, "Computers and Textual Editing," in Computers and Written Texts, ed. Butler, pp. 205-26.

7 See essays collected in Lexicography in the Electronic Age, ed. J. Goetschalckx and L. Rolling (Amsterdam: North-Holland, 1982) and Theorie und Praxis des lexikographischen Prozesses bei historischen Wörterbüchern, ed. Herbert Ernst Wiegand, Lexicographica, series maior, 23 (Tübingen: Niemeyer, 1987); see also Willem Meijs, "Computers and Dictionaries," in Computers and Written Texts, ed. Butler, p. 141-65. Many groundbreaking advances in computational lexicography were made in the course of the treatment of medieval and early modern lexicons. See Computers and Old English Concordances, ed. Angus Cameron, Roberta Frank, and John Leyerle, and A Plan for the Dictionary of Old English, ed. Roberta Frank and Angus Cameron, Toronto Old English Series 1 and 2 (Toronto: University of Toronto Press, 1970 and 1973); Jeffrey F. Huntsman, "Computers and Medieval English Lexicography," Computers and the Humanities, 12 (1978):53-60; Jürgen Schäfer, "Elizabethan Glossaries: A Computer-Assisted Study of the Beginnings of Elizabethan Lexicography," I, ALLC Bulletin, 8 (1980):36-41, and II, in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 235-42; La lexicographie du latin médiéval et ses rapports avec les recherches actuelles sur la civilisation de Moyen-Âge, Colloques internationaux du Centre National de la Recherche Scientifique, 589 (Paris: CNRS, 1981).

8 For some speculation in this area, see John Slatin, "Reading Hypertext: Order and Coherence in a New Medium," in Hypermedia and Literary Studies, ed. Paul Delany and George P. Landow (Cambridge, Massachusetts: MIT Press, 1991), pp. 153-69.

9 The earliest example of a medieval study I have found, on a Medieval Latin topic, is that of Anezka Vidmanová, "Stredolatinská textová kritika a pocítací stroje,"Listy Filologické, 92 (1969):28-52; see also Paul Tombeur, "Le traitement électronique des documents et l'étude de textes médiévaux" (Beckmann et al. 1981), pp. 329-39; Computer Applications to Medieval Studies, ed. Gilmour-Bryson; Mary-Jo Arn, "The Systematic Representation of Early Manuscripts in Computer Form: A Proposal," in Historical and Editorial Studies in Medieval and Early Modern English for Johan Gerritsen, ed. Mary-Jo Arn, Hanneke Wirtjes, and Hans Jansen (Groningen: Wolters-Noordhoff, 1985), pp. 209-19; Paul Tombeur, "Informatique etétude de textes médiévaux," L'homme et sons univers au Moyen Âge, ed. Christian Wenin, Philosophes Médiévaux, 26; 2 vols. (Louvain-la-Neuve: Éditions de l'Institut Supérieur de Philosophie, 1986), 1, pp. 174-86; and C. M. Sperberg-McQueen, "Text in the Electronic Age: Textual Study and Text Encoding with Examples from Medieval Texts," Literary and Linguistic Computing, 6.1 (1991):34-46. Cf. also Helmut Droop, Winfried Lenders, and Michael Zeller, Untersuchungen zur grammatischen Klassifizierung und maschinellen Bearbeitung spätmittelhochdeutschen Texte, Forschungsberichte des Instituts für Kommunikations-forschung und Phonetik der Universität Bonn, III: Linguistische Datenverarbeitung, 55 (Hamburg: Buske, 1976); Maschinelle Verarbeitung altdeutscher Texte, ed. Sappler and Strassner.

10 See, e.g., Patrick W. Conner, The Beowulf Workstation (Morgantown: West Virginia University, Department of English [1991]) and materials produced by Conner, Allen Frantzen, Clare Lees, John Ruffing, and others in connection with their Seafarer project (Chicago, New York, Ithaca, and elsewhere, 1990-). Although the document architecture proposed here is intended to be platform-independent, all prototyping to date has been done with HyperCard software, originally developed by Bill Atkinson for Apple Computer, and its native programming language, developed by Dan Winkler. Workstation hardware, purchased in part with an award from the University of Washington Faculty Workstation Initiative and optimized for speed, included a Macintosh IIci, with cache card and eight megabytes of RAM, operating under system 6.0.8 in one-bit (black-and-white) graphics mode. HyperCard, long regarded as slow and cumbersome, suddenly makes sense in such an environment and emerges as a versatile tool capable of almost unbounded manipulation of large quantities of multiple-typeface, multiple-font text. Remarkably, the configuration described here now constitutes a mid-range system.

11 For an elegant appraisal of related issues, see J. M. Sinclair, Corpus, Concordance, Collocation (Oxford: Oxford University Press, 1991); cf. also Gerald Purnelle, "Recherche automatique de groupes verbaux recurrents et de formules dans les fichiers latins lemmatisés," Revue: informatique, 25 (1989):157-92.

12 For an eloquent defense of this approach, see Wilhelm Ott, "Pages and Lines: Remarks on Some Fundamental Requirements of Text Processing Software," in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 227-33.

13 In recent months, the project has benefited from substantial contributions by Professor Joseph Monda of Seattle University.

14 For accounts of other experiments in this area, see N. Calzolari, "Lexical Definitions in a Computerized Dictionary," Computers and Artificial Intelligence, 2 (1983):225-34; Elmar Seebold, "Die Lemma-Auswahl bei einem etymologischen Wörterbuch," in Theorie und Praxis des lexikographischen Prozesses, ed. Wiegand, pp. 157-71; and B. Slator, "Extracting Lexical Knowledge from Dictionary Text," Knowledge Acquisition, 1 (1989):113-37.

15 As noted above, work in this area is at an early stage. The only volume-length collections I have noted to date are Computer Applications to Medieval Studies, ed. Anne Gilmour-Bryson, Studies in Medieval Culture, 17 (Kalamazoo, Michigan: Medieval Institute Publications, 1984), and Le médiéviste et l'ordinateur. Actes de la Table ronde (Paris, CNRS, 17 november 1989), L. Fossier, chair (Paris: CNRS, 1990).

16 Examples include Charles Moorman, "Computing Housman's Fleas: A Statistical Analysis of Manly's Landmark Manuscripts in the General Prologue to the Canterbury Tales," Association for Literary and Linguistic Computing Journal, 3 (1982):15-35, also printed in Sixth International Conference on Computers and the Humanities, ed. Sarah K. Burton and Douglas D. Short (Rockville, Maryland: Computer Science Press, 1983), at pp. 431-46; Harry M. Logan and Barry W. Miller, "A Case for The Book of the Duchess: A Semantic Analysis of Sentence Structure," in Sixth International Conference on Computers and the Humanities, ed. Burton and Short, pp. 384-90; and Kari Anne Rand Schmidt, "Type/Token Ratio for Consecutive Units of Text as a Variable in Authorship Studies: An Assessment with Special Reference to the Attribution of The Equatorie of the Planetis," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, pp. 333-43. More wide-ranging studies include Walter S. Phelan, "The Study of Chaucer's Vocabulary," Computers and the Humanities, 12 (1978):61-69; ibid., "From Morpheme to Motif in Chaucer's Canterbury Tales," in Proceedings of the International Conference on Literary and Linguistic Computing, ed. Zvi Malachi (Tel Aviv: Katz Research Institute, 1979), 291-316; Eugene Green, "Speech Acts and the Art of the Exemplum in the Poetry of Chaucer and Gower," in Literary Computing and Literary Criticism, ed. Potter, pp. 167-87; Harry M. Logan and Grace B. Logan, "The Case of the Canterbury Pilgrims: Sentence Semantics and World View in Frag. I of The Canterbury Tales," Literary and Linguistic Computing, 5 (1990):242-47; Charles Barber and Nicolas Barber, "The Versification of The Canterbury Tales: A Computer-Based Statistical Study," Leeds Studies in English, new series, 21 (1990):81-103 and 22 (1991):57-83.

17 For an early introductory treatment of the subject that remains valuable today, see M. L. Hann, "Principles of Automatic Lemmatization," ITL: Review of Applied Linguistics, 49 (1973):3-22.

18 On applications in the treatment of texts in modern languages, see Rainer Dietrich, Automatische Textwörterterbücher. Studien zur maschinellen Lemmatisierung verbaler Wortformen des Deutsche (Tübingen: Niemeyer, 1973); Tove Fjeldvig and Anne Golden, Automatisk rotlemmatisering--et lingvistisk hjelpemiddel for tekstøking (Oslo: Universitetsforlaget, 1984); Annette Ostling Andersson, L'identification automatique des lexèmes du français contemporain, Acta Universitatis Upsaliensis, Studia Romanica, 39 (Uppsala: Uppsala University Press, 1987). See also Rudy S. Spraycar, "Automatic Lemmatization in Serbo-Croatian," ALLC Journal, 1 (1980):55-59; Josse de Kock, "De la lematización," Lingúistica española actual, 9 (1987):255-56. For details of specific projects, see Hans Eggers, et al., SALEM: Ein Verfahren zur automatischen Lemmatisierung deutscher Texte (Tübingen: Niemeyer, 1980); Christine Schneider, "Lemmatisierung im Projekt JUDO," ALLC Bulletin, 8 (1980):166-74; Wolfgang Krause and Gerd Willée, "Lemmatizing German Newspaper Texts with the Aid of an Algorithm," Computers and the Humanities, 15 (1981):101-13; Gerd Willée, "Anwendungen des Algorithmus LEMMA2 zur Lemma-tisierung deutscher Wortformer," in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 279-300; Normand Beauchemin and Michel Theoret, "MICRO-SOLIVO. Un Lemma-tiseur semi-automatique pour le québécoise parle," Revue québécoise de linguistique, 3 (1984):19-38; Nicoletta Calzolari, Maria Luigia Ceccotti, and Adriana Roventini, "A Lexical Data Base for Interactive Lemmatization," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, pp. 107-14; Étienne Evrard, "Le L.A.S.L.A," Revue: informatique, 25 (1989):206-7; Pieter Masereeuw, "Les travaux de l'Universite d'Amsterdam," Revue: informatique, 25 (1989):207-11.

19 For surveys of recent advances in machine translation, see William John Hutchins, Machine Translation: Past, Present, Future (Chichester: Ellis Horwood, 1986); Derek Lewis, "Computers and Translation" in Computers and Written Texts, ed. Butler, pp. 75-113; and Muriel Vasconcellos, et al., "State of the Art: Machine Translation," Byte, 18.1 (1993):152-86.

20 For a general introduction, see Edward F. Kelly and P. J. Stone, Computer Recognition of English Word Senses (Amsterdam: North-Holland, 1975).

21 Computational Lexicography for Natural Language Processing, ed. Bran Boguraev and Ted Briscoe (London: Longman, 1989); Natural Language Processing, ed. M. Eilgueiras, L. Damas, N. Moreira, and A. P. Tomás (New York: Springer-Verlag, 1991); Geoffrey Sampson, "Natural Language Processing," in Humanities Research Using Computers, ed. Turk, pp. 125-36; Terry Patten, "Computers and Natural Language Parsing," in Computers and Written Texts, ed. Butler, p. 29-52. On specific applications, see Boris Katz, "Text Processing with the START Natural Language System," in Text, Context, and Hypertext, ed. Barrett, pp. 55-76; F. Antonacci, M. Russo, M. T. Pazienza, and P. Velardi, "A System for Text Analysis and Lexical Knowledge Acquisition," Data and Knowledge Engineering, 4 (1989):1-20.

22 On lemmatization in the treatment of medieval texts, see Hans Fix, "Automatische Normalisierung: Vorarbeit zur Lemmatisierung eines diplomatischen altislandischen Textes," in Maschinelle Verarbeitung altdeutscher Texte, ed. Paul Sappler and Erich Strassner (Tübingen: Niemeyer, 1980), pp. 92-100; H. Kamp, "Die automatische Lemmatisierung frühmittelalterlicher Personennamen," DAI, 42C.4 (1981):697 (no. 4543C); René Pellen, "DILEM: Construire un dictionnaire lemmatisé avec l'informatique. Texte d'experience: Berceo, Los milagros de Nuestra Senora," La Licorne, 7 (1983):197-231; Bernard Derval, "A Computer-Aided System of Text Lemmatization Applied to the Romances of Chrétien de Troyes," ed. Charles Doutrelepont, in Computer Applications to Medieval Studies, ed. Gilmour-Bryson, pp. 31-44; and Dietmar Najock, "Lemmatization of Latin and Greek Concordances and Word-Indexes: Problems and Solutions," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, vol. 2, pp. 53-66. For an early modern study, see P. S. di Virgilio, "Homographs and Lemmata in Thresor de la langue francoyse by Jean Nicot: A Diachronic Perspective?", Quaderni di semantica, 8 (1987):103-14.

23 For comparable approaches, see K. Devine and F. J. Smith, "Direct File Organization for Lemmatized Text Retrieval," Information Technology Research Development Applications, 3 (1984):25-32; G. David Huffman, Dennis A. Vital, and Royal G. Bivins, "Generating Indices with Lexical Association Methods: Term Uniqueness," Information Processing and Management, 26 (1990), pp. 549-58; P. S. Jacobs, G. R. Krupka, and L. F. Rau, "Lexico-semantic Pattern Matching as a Companion to Parsing in Text Understanding," in Speech and Natural Language. Proceedings of a Workshop, ed. P. Price (Palo Alto: Morgan-Kaufman, 1991), pp. 337-41.

24 See the monographs of Hans Dieter Maas, Homographie und maschinelle Sprachübersetzung, Linguistische Arbeiten, 8 (Saarbrücken: Germanistisches Institut, 1969) and M. Boot, Homographie. Ein Beiträg zur automatischen Wortklassenzuweisung in der Computerlinguistik (Utrecht: Rijksuniversiteit, 1979); see also M. Boot, "Homography and Lemmatization in Dutch Texts," ALLC Bulletin, 8 (1980):175-89; Normand Beauchemin, "Homographie et solutions pratiques en lemmatisation minimale," in Méthodes quantitatives et informatiques dans l'étude des textes: En hommage a Charles Muller, pref. Etienne Brunet, 1 vol. in 2, Travaux de linguistique quantitative, 35 (Geneva: Slatkine, 1986), pp. 25-36.

25 A. Duro, "Un angoissant problème de lemmatisation: le traitement du participe," in Proceedings of the Second International Round Table Conference on Historical Lexicography, ed. W. J. J. Pijnenburg and F. de Tollenare (Dordrecht: Foris, 1980), pp. 117-42, and Peter J. Lucas, "Computer Assistance in the Editorial Expansion of Contractions in Middle English Text," ALLC Bulletin, 9.3 (1981):9-10.

26 Notable attempts include Philip J. Hayes, Some Association-Based Techniques for Lexical Disambiguation by Machine, Computer Science Department Technical Report TR25 (Rochester, New York: University of Rochester, 1977); Yaacov Choueka and Serge Lusignan, "Disambiguation by Short Contexts," Computers and the Humanities, 19 (1985):147-57; and Dave Taylor, "Wordz that Almost Match," Computer Language, 3 (November 1986):47-59. On the larger linguistic issues, see Lexical Representation and Process, ed. William Marslen-Wilson (Cambridge, Massachusetts: MIT Press, 1989); N. Calzolari and A. Zampolli, "Methods and Tools for Lexical Acquisition," in Natural Language Processing, ed. Eilgueiras et al., at pp. 4-24. For details of specific applications, see J. A. Leavitt and J. L. Mitchell, "SPAN: A Lexicostatistical Measure and Some Applications," in Computing in the Humanities, ed. Lusignan and North, pp. 59-71; J. Wiederman, "On The Complexity of Lexicographic Sorting and Searching," Aplikace Matematiky, 26 (1981):432-36; L. Blume, A. Brandenburger, and E. Dekel, "An Overview of Lexicographic Choice under Uncertainty," Annals of Operations Research, 19 (1989):247-72.

27 One possible approach to the problem would employ existing algorithms designed for phonetic analysis; see, e.g., V. W. Zue and D. P. Huttenlocher, "Computer Recognition of Isolated Words from Large Vocabularies: Lexical Access Using Partial Phonetic Information," in Proceedings of the International Conference on Advanced Automation, 1983, Julius T. Tou, chair (Taipei: Institute of Information Science, 1984), pp. 343-47; Jim Howell, "An Alternative to Soundex," Dr. Dobb's Journal, November 1987, 62-65; M. D. Riley and A. Ljolje, "Lexical Access with a Statistically-Derived Phonetic Network," in Speech and Natural Language, ed. Price, pp. 289-92.

28 Maria Assumpta Brossa i Alavedra, "Lematització semiauto-matitzada i regularització gràfica de Tirant lo Blanch," Ph.D., Universidad de Barcelona, 1990; see DAI 51C.3 (1990):338-C (no. 1414).

29 Theodor Holm Nelson, Literary Machines 90.1, rev. ed. (Sausalito: Mindful Press, 1990). Nelson's addressing scheme in some respects resembles the Dewey decimal system insofar as any address (e.g., 768.1004.3.345620987) can form the basis of another by means of the incrementation of one of its numerical subsections or the interpolation of another decimal point. Nelson traces the origin of his scheme back to an article by Vannevar Bush, "As We May Think," Atlantic Monthly, July 1945:101-8. He also cites Douglas Engelbart as a major influence in the de-velopment of the concept of hypertext, but Engelbart's papers do not seem to have been issued in any readily accessible form.

30 On current notions of hypertext, see Text, Context, and Hypertext, ed. Barrett; George P. Landow, "Hypertext in Literary Education, Criticism, and Scholarship," Computers and the Humanities, 23 (1989):173-98, reissued as "Changing Texts, Changing Readers: Hypertext in Literary Education, Criticism, and Scholarship," in Reorientations: Critical Theories and Pedagogies, ed. Bruce Henricksen and Thäis E. Morgan (Urbana: University of Illinois Press, 1990), pp. 133-61; Hypertext: Concepts, Systems, and Applications, ed. N. Streitz, A. Rizk , and J. André, Cambridge Series on Electronic Publishing (Cambridge: Cambridge University Press, 1990); Jay David Bolter, Writing Space: The Computer, Hypertext, and the History of Writing (Hillsdale, New Jersey: Erlbaum, 1991) and ibid., "Topographic Writing: Hypertext and the Electronic Writing Space," in Hypermedia and Literary Studies, ed. Delany and Landow, pp. 105-32; Emily Berk and Joseph Devlin, ed., The Hypertext/Hypermedia Handbook 1991 (New York: McGraw Hill, 1991). See also "[Bibliography:] Hypertext and Hypermedia," in CTI Centre for Textual Studies Resources Guide, March 1992, ed. Davis, Deegan, and Lee, p. 68, and Adam Hodgkin, "Tekst of Hypertekst," tr. Harald Engelstad, Vinduet, 45.2 (1991), 62-64.

31 On "hypermedia" see Hypermedia and Literary Studies, ed. Delany and Landow; J. Bradley, "Research Challenges in Information Technology: Hypermedia," Canadian Humanities Computing, 3.1 (1989):4-8; and Geri Younggren, "Using an Object-Oriented Programming Language to Create Audience-Driven Hypermedia Events," in Text, Context, and Hypertext, ed. Barrett, pp. 77-92.

32 On the problems of linking, see Terence Harpold, "The Contingencies of the Hypertext Link," Writing on the Edge 2.2 (Spring 1991):126-38.

33 References to current literature on computing, a discipline that is founded on systematization, are only accessible through a surprisingly diverse group of resources. The longest-running annual bibliography is "Language, Literature, and the Computer," Annual Bibliography of English Language and Literature, 46- (1971-). Researchers, particularly those working on medieval topics, will generally need to supplement this source by reference to specialized bibliographies, e.g., Wilhelm Ott, "Bibliographie. Computer-Anwendung im Editionswesen," in Probleme der Edition mittel- und neulateinischer Texte, ed. Hödl and Wuttke, pp. 175-85, reissued in the Italian translation as "Bibliografia. Uso del computer nella scienza editoriale," in La critica dei testi latini medievali e umanistici, ed. H. Furhmann and A. d'Agostino (Rome: Jouvence, 1984), pp. 203-14; Joseph Rudman, "Selected Bibliography for Computer Courses in the Humanities," Computers and the Humanities, 21 (1987):245-54; Pauline Caras, "Literature and Computers: A Short Bibliography, 1980-87," College Literature, 15 (1988):69-72; S. N. Matsuba, "Computer Applications in the Humanities: A Reading List," Canadian Humanities Computing, 4.1 (1990):1-8; Heyward Ehrlich, "An Interdisciplinary Bibliography for Computers and the Humanities," Computers and the Humanities, 25 (1991):315-26; and "Bibliography," in CTI Centre for Textual Studies Resources Guide, March 1992, ed. Davis, Deegan, and Lee, 64-76. See also references indexed under "computers" and other terms in the MLA International Bibliography and International Medieval Bibliography (Leeds: Maney, 1977-). There is also an important section treating "Elaborazione elettronica dei dati" in the annual bibliography Medioevo Latino (Spoleto: Centro italiano di studi sull'alto medioevo, 1980-). I would like to thank Marilyn Deegan for the opportunity to read this paper at the 1992 International Congress on Medieval Studies at Western Michigan University, and the staffs of the University of Washington Engineering Library and the Microsoft Library at Redmond, Washington, for their assistance in tracking down several hard-to-find items.