The capability of computers to mark the beginning of the third millennium by inaugurating a new phase of humanistic scholarship was recognized long ago. Monographs addressing the subject of computing in the humanities have appeared frequently over the course of the past twenty-five years1 and advanced research on the use of computers in literary scholarship is in its third decade.2 The journal Computers and the Humanities was established in 1969 and numerous serial publications have since followed its lead.3 In the area of computer-assisted pedagogy, there has been a proliferation of computational innovations across the curriculum.4 Nevertheless, after more than a quarter-century of intensive work, much of the initial promise of computer use in higher education remains unfulfilled. For a number of reasons, notably the vast discrepancy between the pace of advances in computer technology and increases in available resources for academic hardware acquisition, the "classrooms of the future" at most universities are still in early phases of development. Progress in another practical discipline, that of computer-assisted textual analysis, has also been slow. Despite intensive work in the field, few standard procedures for the treatment of texts with the aid of computers have achieved any measure of wide acceptance.5 Plans to utilize computers in the production of critical editions of texts, catalogues of manuscripts, full-text literary databases, stemmatic analyses of textual variants, digitized facsimiles of manuscript leaves, and so on, have all been announced, but specific applications often remain accessible only by select groups of specialists or have failed to emerge altogether.6 With the notable exception of the field of computer-assisted lexicography, which has already succeeded both in generating a wide range of publications and producing a substantial amount of tangible results,7 it seems reasonable to conclude that humanistic computing is still at an early stage.
Many of the factors that might be adduced to account for these circumstances are wholly extradisciplinary: the public at large has had very limited access to powerful, multifunctional computers and the cost of disk storage remains dear, especially in comparison to that of traditional media. To date, there simply has been no pervasive replacement of traditional printed texts by electronically encoded text-files for purposes of the preservation and consultation of texts. It is still not entirely clear that the impact of the widespread reading of electronically transmitted texts will ever be as great as that occasioned by the transition from, say, oral to literary transmission of texts or the supersession of the scriptorium by the printing-plant.8
There are a few signs, however, that some changes may be imminent. As the footnotes to this article attest, recent years have seen a huge increase in the number of publications on computer-assisted pedagogy, textual research, and related areas. A significant amount of recent work on computer-assisted analysis of texts has been undertaken in service of research on medieval literature. 9 The publication in computer-readable formats of such standard (and, previously, sometimes unwieldy) reference works as the second edition of the Oxford English Dictionary, the Modern Language Association's International Bibliography, the Corpus Christianorum (CETEDOC), the Riverside Chaucer, and other conspicuous diskette releases from major university presses. A few compendious and almost indispensable textual corpora have only appeared in computer-readable form (e.g., the IBYCUS [Latin] and Thesaurus Linguae Graecae CD-ROMs, materials assembled for the Dictionary of Old English, etc.). The past few years have also seen distribution of high-quality scholarly journals such as the Bryn Mawr Classical Review over international electronic-messaging networks and a huge and diverse group of computer users in all parts of the world have in fact become acclimated to reading texts on computer in the course of their daily exchanges of electronic mail. All of these developments have profound implications that may well become more widely recognized as the years progress. Since 1990, moreover, there has been an enormous increase in the power of generally available microcomputing hardware as well as a concurrent drop in price. It appears that the point may soon be reached at which computer-literate educators and researchers will be able to develop powerful applications suited to their own work on an ad hoc basis. The general release of flexible "authoring systems" has placed access to a wide variety of high-level computational routines in the hands of an unprecedented number of users.10
The main concerns of the present discussion are pedagogical. The primary goal of the project outlined below is the production of an electronically-encoded reading-text of Chaucer's Canterbury Tales, specifically designed to assist undergraduate students in exploring this medieval work in its original (Middle English) language. The proposed text differs fundamentally from, say, the diskette release of the Riverside Chaucer in that it lays no claim to the status of a critical edition and is intended to serve mainly as a tool in the classroom. The following discussion, however, will also address a number of issues that have arisen in the course of the development of the text that bear more directly on areas of research in textual studies (such as lemmatization, homography, and full-text searching by collocation) as well as several kindred scholarly disciplines (e.g., lexicography and textual editing). Above all, I have tried to incorporate into the present essay a bibliographical summary of publications in these areas that have appeared over the course of recent decades. Indeed, this may be one of the last opportunities to undertake such a review in the space of a single article.
In the course of development, each of the two main features outlined here--e-screen glossing of vocabulary and proximity-searching by collocation--has revealed previously unsuspected benefits. The one-to-one correspondence resulting from the process of supplying each word of the Canterbury Tales with its own gloss (described in greater detail below) has in practice defined a generalized textual structure (or document architecture) capable of accommodating a scholarly apparatus of virtually any degree of complexity, precision, or erudition. Improvements to the glossing apparatus, however, must in every case be introduced through the labors of an expert, so this structural flexibility should for the moment be considered a long-term benefit of the system. The ability to carry out multiterm searches of a Middle English text by specifying search terms in Modern English produces a benefit that is more immediately available. The passages set out for the user's inspection in many cases embody what amount to conceptual equivalents of the generalized semantic range of the terms specified for the search. The Middle English text is revealed in a way hitherto unattainable through the use of conventional glossaries and lexicons. It becomes an easy matter for the student (or scholar) to search the literary text for an almost infinite variety of themes, formulas, topoi, examples of medieval rhetorical descriptio, and the like, without the necessity of undertaking an arduous program of indexing.
Beyond the fairly restricted set of features uniquely available within the computing environment, the model used for the electronic reading-text is conventional and largely in line with the history of texts reaching back to the codex and scroll. The fundamental computational "metaphor" of the project is that of the book. After launching or "opening" the reading environment containing the text, readers are first confronted with a decorated "cover" illustration. Readers may then turn to a Table of Contents containing coded pointers to the beginning of individual texts, allowing them to move to the beginning of a particular head-link, end-link, or Tale by selecting its title with the pointing device. Beyond the unit of the book, the fundamental divisions of the electronically encoded text are the page and the line.12 The sense in which these terms are used here differs little from that encountered in references to traditional parchment or paper documents. There are twenty-five lines to a page; page-breaks are mainly arbitrary--"hard" page-breaks occur only at the boundaries of discrete head-links, end-links, and Tales--due to the dimensions of the page and the custom typeface that appears on the computer display. A variable-width, Times Roman-based typeface is aligned with the left margin (or both margins in the case of continuous prose); pages are also supplied with graphical approximations of standard running heads and page numbers. One non-traditional feature that does make special use of the computer interface is the provision of limited scrolling abilities on individual pages. Although the user, turning to a particular page for the first time, is presented with exactly twenty-five lines of text, up to fifteen additional lines immediately preceding and following the visible excerpt are hidden off-screen in buffers or scrolling regions. This scheme offers two main benefits. First, the reader, on reaching the bottom of a page, may choose to read one or several additional lines before jumping to another part of the text of the Canterbury Tales or concluding a reading session altogether. (If, however, the reader decides to go to the next page after all, the words appearing at the top of that page will be those immediately following the last visible text on the preceding page.) Second, the inclusion of hidden regions containing scrolled text on each provides a brute-force solution to the problem of searching for collocations across page-breaks.
Although the project summarized here is described essentially as a fait accompli, the initial phases involved several false starts and a large amount of trial and error. The specific procedure followed during the developmental process (which may, in practice, never finish) and some technical specifications for the reading-text itself may now be set out concisely. Although it should be stressed at the outset that what is proposed here is essentially a text-independent system, the developmental text used for the project was based on that found in two nineteenth-century printed editions, i.e., those of W. W. Skeat and Thomas Wright. The developmental text was encoded in-house by my collaborator Eric Juvet and revised to constitute what amounts to a specially prepared private edition.13 We hope to release a publicly-distributed form of the reading-text as soon as possible, perhaps by the end of the year. This will embody a wholly new edition of the Middle English text of a critical standard, which according to the current design specification will comprise a semi-diplomatic edition of the Tales based on the readings of a single manuscript.
The ASCII-encoded text file produced by Juvet occupied 968,252 bytes (inclusive of punctuation), thus comprising almost a full megabyte of information. The file contained a total of 185,814 words. The first step in the text-processing phase of the effort involved the sorting of this full text of the Canterbury Tales to produce a lexicon of unique strings through a simple process of pattern-matching. (If the program has "seen" the string before, it was discarded; otherwise it was appended to the lexicon of unique strings.) When this sorting process had been completed, the one-megabyte text containing nearly two hundred thousand words had yielded only 12,181 unique strings. These strings were then sorted into a lemmatizing database (described in greater detail below), i.e., an arrangement of records containing, among other information, a single headword (or lemma) and a complete accounting of all inflected and variant forms of the word encountered in Middle English texts to date.14 (The process of building and maintaining the lemmatizing database is in fact a protracted process; the system will surely continue to remain capable of receiving further refinement almost indefinitely.) When all of the unique strings had been integrated into the lemmatizing database, the records reflected a vocabulary of around six thousand words for Chaucer. The figure may seem low in comparison to a vocabulary of about thirty thousand words for the typical Ph.D. student, but it may in fact seem rather high when it is noted that the vocabulary is distributed throughout a single literary work.
It is perhaps too early to say precisely where the reading-text of the Canterbury Tales described here stands in relation to the main progression of computational studies of medieval texts.15 As far as I can ascertain, surprisingly few applications of computer technology to the Chaucerian canon have been made to date, and in most of these the Middle English text supplies the object example for a fairly restricted application of a particular methodological approach.16
As all words in the text of the Canterbury Tales are treated initially by the lemmatizing database as discrete strings of characters, problems arise whenever two distinct terms share a single spelling. All specialists in the field of lemmatization have observed that the phenomenon of homography (and the concomitant ambiguity of meaning that it engenders) constitutes a major obstacle to performing an efficient and accurate lemmatization of any continuous text in a single pass.24 The problem of homography as a whole may be divided into many subcategories. There are special problems, for example, in the treatment of terms from certain grammatical categories (e.g., participles, possessives, and multiple terms written in Middle English as a single discrete string).25 Cases of homographic ambiguity have traditionally been resolved in one of two ways. The most direct method, involving the intervention of an expert reviser, is also the most time-consuming. Beyond this, complex algorithms have been developed that achieve a reasonable degree of precision in the resolution of ambiguity according to context in the treatment of modern languages. No system of lemmatization, to my knowledge, has yet laid claim to a level of perfect accuracy.26 In the treatment of medieval texts, moreover, it is by no means clear that a system of fully automatic lemmatization is a practical or even desirable goal. The corpus of texts preserved in a particular medieval vernacular may in most cases be viewed as finite and fairly stable. Given the variation in orthography, syntax, and punctuation observed in most medieval vernaculars (and many Latin texts as well), the development of an adequate system of algorithmic contextualization would require scarcely less effort than a thorough appraisal by an expert reviser. The whole question of the resolution of orthographic ambiguity in the computer-assisted analysis of medieval texts deserves closer attention that can be offered here.
No ideal method for dealing with homographic ambiguity has come to light in the course of the present project and none is likely to emerge at any point in the near future.27 In accordance with the efforts of other literary scholars who have used computers in the preparation of texts, however, a system of semiautomatic lemmatization has been adopted as a compromise solution.28 The first stage in this process, which may be termed prelemmatization, involves the assignment of groups of generalized, single-word definitions to homographic strings. (Again, this is an unending process; particularly in the case of medieval literature, additional homographs continue to come to light throughout the course of the preparation of the lemmatizing database.) For example, the Middle English homograph glede, which might represent a form of an adjective ("glad, cheerful, radiant"), a verb ("to make glad"), or any one of several nouns ("gladness," "burning coal," or "kite"). The string glede would thus be associated automatically at the prelemmatization stage with a group of possible single-word glosses: glad cheer happiness fire bird. The substitution of the general term "fire" for "burning coal" and "bird" for the more specific "kite" reflects a significant feature of the design of the prelemmatization resource. By assigning a group of possible glosses to a specific homograph which displays a broad semantic range, it is possible to begin the preliminary phase of proximity-searching by collocation (employing Modern English search terms) even at the prelemmatization stage.
Although it does not remove the necessity of the eventual intervention of an expert reviser altogether, the main strength of the system of prelemmatization sketched out here is that it remains a fully automatic process right up to that point. It allows the textual scholar, within hours of receiving a new copy of a computer-readable text, to associate every word of that text with its appropriate lemma and gloss. The main drawback to the scheme is that it also results initially in the association of homographic character strings with lemmata (and thus definitions) that are incorrect and must eventually be removed by an expert reviser. In practice, the loss of precision incurred by the association of multiple definitions with individual homographs is not great. The process in fact supplies the expert reviser at the outset with a predefined set of choices to assist in refining the precision of the electronically encoded apparatus. By supplying homographic strings with single-word definitions whose semantic compass as broad as possible, it has proved possible in practice to carry out useful proximity-searching once the initial stage of prelemmatization has been completed. The student has convenient access to a range of synonyms for a given Middle English term and the literary scholar is immediately placed in a good position to search for themes, concepts, or topoi.
The lemma, in the procedure described above, serves a fundamental role in the production of the Chaucerian reading-text. It provides a "handle" allowing specific information--most commonly definitions, etymologies, and the like--to be linked to particular words of Middle English. The information in question, viewed collectively, has been referred to here with some deliberate vagueness as the gloss. It is worth stressing that the gloss in the electronically encoded text of the Canterbury Tales (or, for that matter, any other text prepared in a similar manner) might in theory contain any type of data. Though in the present instance the contents of the gloss have been restricted to the part of speech and a range of possible synonyms, expanded versions of the text might accommodate a "single best" translation, a detailed etymology, a set of textual variants, a pedagogical commentary, a scholar's personal notes, or any kind of sound, illustration, or animation.
One immediate objection to the scheme set out above would be that it greatly increases the size of the text file used to store the text of the Canterbury Tales. Every word in the main text has its own gloss and many of these glosses are identical and ostensibly redundant, arising as they do in response to multiple occurrences of a given string or lemma. The one-megabyte ASCII file used to generate the developmental version of the reading-text described here expanded to more than eight megabytes when equipped with the full apparatus of the shadow-text. An arguably more efficient scheme would involve a greatly reduced shadow-text that only provided pointers to individual records in a glossary, which would presumably be stored in the form of a flat-file database. This objection has been set aside for two main reasons. First, the proposed document architecture is intended to be capable of accommodating an apparatus able to achieve a high degree of scholarly precision, particularly in the treatment of unusual senses of words and idiomatic phrases, which might in practice comprise any number of words in various lines of the reading-text. The storage of glosses in a database would not absolutely preclude the achievement of this degree of precision, but it would greatly complicate both the initial developmental process and any future revision of the text. According to the plan set out above, however, any word or phrase in Chaucer's text may eventually receive any sort of treatment that an expert reviser deems appropriate. Second, the redundant and seemingly inefficient repetition of similar glosses at various points in the shadow-text is in fact the single feature of the system that allows searching for themes, concepts, and so on, to be undertaken by the reader on an ad hoc basis. The key to the system of proximity-searching, it will be recalled, is the availability of collocations of terms. One of the most striking (and unexpected) conclusions to emerge from the present course of research is the discovery that the arrangement of the fairly simple, lemmatized synonym sets that constitute the glosses described above in the form of a shadow-text immediately produces an extremely powerful searching capability that provides useful results even before the intervention of an expert reviser. This conclusion may have ramifications for future work on other medieval texts.
In recent literature, the definition of the problematic term hypertext, itself an infelicitously contrived Greco-Latin hybrid that continues to dissonate in the ears of many scholars, has become extremely blurred.30 The introduction of methods of integrating sound and graphic art into computer presentations has exacerbated the situation, giving rise to an even more awkward neologism, i.e., hypermedia.31 The rationale for distancing the Chaucerian text under discussion from these developments, however, rests on criteria more firmly grounded than merely aesthetic objections. At least since the appearance of Cortazar's multistranded novel Hopscotch, experiments in hypertext have involved the introduction of specific linking of passages by an author. Such procedures may be said to rely on a system of explicit links.32 The shadow-text has been designed so as to facilitate the association of words, phrases, and concepts through the use of implicit rather than explicit links. (The Table of Contents mentioned above contains the only instance of the use of explicit links in the system.) This principle--the substitution of implicit links for explicit links--obtains in the cases of both the linking of the gloss to the reading-text and of entire passages to groups of search terms.
Even though it cannot be taken in itself as an example of hypertext, the approach sketched out here may serve to counter some of the objections to the viability of hypertext that have been raised in recent years. There are, after all, many purely practical difficulties involved in the establishment of an extensive system of explicit links in such a system. A single page of a newspaper text or academic publication would, in an ideal system, require hundreds if not thousands of links to be drawn up. As far as anyone has yet been able to ascertain, these links would have to be implemented one at a time either by the author of the original document or an expert reviser. This raises certain questions of trust, since the reader's subjectivity is in effect placed in the hands of unseen indexers whose levels of expertise in any given area that may be of interest to the user are unknown. Even if a comprehensive system of explicit links could be established successfully in a hypertext system, there is a second, even more subtly affective side effect resulting from the use of such links. I would characterize this as a "buried treasure" or "secret door" syndrome. Hypertext systems have the potential to produce a sense that there is something just out of reach, some gloss or interpretation that lies behind the passage at hand but whose precise nature is hard to fathom. In an exhaustively indexed system, the sheer volume of choices might well prove unsettling. It would be hard for the reader to know which way to turn.
The system used here for the production of the reading-text of the Canterbury Tales addresses several of the concerns raised above. Through its employment of a process of prelemmatization, it supplies a virtually infinite number of implicit links without the intervention of an expert agent. Readers are free to explore these at will by formulating queries in any manner that they choose. The fairly simple system of linguistic glossing introduced in the present system, produces a result that is, on the one hand, absolutely predictable--the reader who selects a word will invariably encounter a group of terms in a gloss that exemplifies a consistent style--without compromising the variety of choices available to the reader and, arguably, greatly increasing their number. Most researchers I think would agree that the ability to flip through the pages of a book at random, to scan the books surrounding the one you are seeking in the shelves of a library stack, to notice the title of an article quite by chance on the cover of a journal, is every bit as important as the systematic examination of sequences of continuous prose. It should be noted, however, that modern books and libraries reached their present state over centuries of trial, error, and refinement. Systems of textual computing will inevitably address their present limitations and continue to evolve. The project described here is intended as a small and preliminary contribution to the task at hand: bringing the use of computers more precisely in line with the way students and scholars work in their daily routines.33
University of Washington
2 On the use of computers in literary and textual studies, see Robert L. Oakman, Computer Methods for Literary Research (Columbia: University of South Carolina Press, 1980; 2nd ed.: Athens: University of Georgia Press, 1984); Computers in Literary and Linguistic Research, ed. L. Cignoni and C. Peters, supplement to Linguistica Computazionale, 3 (1983) (Pisa: Giardini, 1984); L'ordinateur et les recherches littéraires et linguistiques, ed. Jacqueline Hamesse and Antonio Zampolli (Paris: Champion: 1985); B. H. Rudall and T. Corns, Computers and Literature: A Practical Guide (Tunbridge Wells: Abacus Press, 1987); Nancy Ide, "The Relevance of Computational Linguistics to Textual Studies," Computers and Texts, 1 (1991):7-9; Computers and Written Texts, ed. Christopher S. Butler, Applied Language Studies (Oxford: Blackwell, 1992), esp. John F. Burrows, "Computers and the Study of Literature," at pp. 167-204; and M. Deegan, S. Lee, and C. Mullings, "Computing in Textual Studies," Computers and Education, 19 (1992):183-91; Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric, ed. Rosanne G. Potter (Philadelphia: University of Pennsylvania Press, 1989).
3 Publications that may be consulted profitably for articles germane to the present discussion include the ALLC [Association for Literary and Linguistic Computing] Bulletin (Swansea: University College et al., 1973-85) and ALLC Journal (Cambridge: Cambridge University Library et al., 1980-85), succeeded by Literary and Linguistic Computing (Oxford: Association for Literary and Linguistic Computing, 1986-); Canadian Humanities Computing (Toronto: Center for Computing in the Humanities, 1987-); Language Technology, incorporating Language Monthly, succeeded by Electric Word (Amsterdam: Language Technology BV et al., 1987-90); Academic Computing (McKinney, Texas: Academic Computing Publications, 1987-90); Computers in Literature and Computers in Literature Update, now Computers and Texts (Oxford: CTI [Computers in Teaching Initiative] Centre for Textual Studies et al., 1990-); and Writing on the Edge (Davis: University of California at Davis Campus Writing Center, 1989-).
4 For details of specific applications, see Susan Hockey, A Guide to Computer Applications in the Humanities (Baltimore: The Johns Hopkins Press, 1980); Humanities Computing Yearbook, ed. Ian Lancashire et al. (New York: Oxford University Press, 1988); and CTI Centre for Textual Studies Resources Guide, March 1992, ed. Caroline Davis, Marilyn Deegan, and Stuart Lee (Oxford: CTI Centre for Textual Studies, 1992). For general discussion of issues involved in text-processing, see Peter Batke, "Text Specific Workstations: A Software Problem," Academic Computing, 4.1 (1988-89):32-35 and 70-72; and Ronald F. E. Weissman, "In Search of the Scholar's Workstation: Recent Trends and Software Challenges," Academic Computing, 4.1 (1988-89):28-30 and 59-64. On specifically pedagogical topics, see John M. Slatin, "Hypertext and the Teaching of Writing," in Text, Context, and Hypertext: Writing with and for the Computer, ed. Edward Barrett (Cambridge, Massachusetts: MIT Press, 1988), pp. 111-29; Susan J. Hockey, Jo Freedman, and J. Cooper, "Computers in the Study of Set Texts," in Humanities and the Computer, ed. Miall, pp. 113-22; and resources listed and reviewed in Bits and Bytes Review: Reviews and News of Products and Resources for Academic Computing (Whitefish, Montana: Bits and Bytes Computer Resources, 1986-).
5 See especially Peter Desmond Smith, An Introduction to Text Processing (Cambridge, Massachusetts: MIT Press, 1990); G. Salton, Automatic Information Organization and Retrieval (New York: McGraw Hill, 1968); Donald E. Knuth, Searching and Sorting, vol. 3 of The Art of Computer Programming (Reading, Massachusetts: Addison-Wesley, 1973). See also W. Martin, B. Al, and P. van Sterkenburg, "Text-Processing and Lexicographical Information--A State of the Art," ALLC Journal, 2 (1981):61-68. For specific applications, see J. McNaught, "Specialized Lexicography in the Context of a British Linguistic Data Bank," in Lexicography in the Electronic Age, ed. Goetschalckx and Rolling, pp. 171-84; Tove Fjeldvig and Anne Golden, "Experiments with Language-Based Aids in Information Retrieval Systems," Nordic Journal of Linguistics, 11 (1988), 33-48; J. K. Proud, The Oxford Text Archive, British Library Research and Development Report, 5985 (London: British Library, 1989); G. Chartron, "Lexicon Management Tools for Large Textual Databases: The Leximet System," Journal of Information Science, 15 (1989):339-44; J. Carroll and C. Grover, "The Derivation of a Large Computational Lexicon for English from LDOCE," in Computational Lexicography for Natural Language Processing, ed. Boguraev and Briscoe, 117-33.
6 Jacques Froger, La critique des textes et son automatisation, Initiation aux nouveautés de la science, 7 (Paris: Dunod, 1968); J. Mau, "Computertechnik im Dienst der Edition lateinischer Texte," in Probleme der Edition mittel- und neulateinischer Texte, ed. Ludwig Hödl and Dieter Wuttke (Boppard: Boldt, 1978), pp. 143-49; La pratique des ordinateurs dans la critique des textes, ed. Jean Irigoin and Gian Piero Zarri, Colloques internationaus du Centre de la Recherche Scientifique, 579 (Paris: CNRS, 1979); Peter L. Shillingsburg, Scholarly Editing in the Computer Age: Theory and Practice (Duntroon: University of New South Wales, 1984); Ulrich Müller, "Personal Computer, wissenschaftliche Manuskripte und Editionen," Editio, 2 (1988):48-72; Gian Piero Zarri, "Some Experiments on Automated Textual Criticism," in Miscellanea di studi in onore di Aurelio Roncaglia, ed. Roberto Antonelli, et al., 1 vol. in 4 (Modena: Mucchi Editore, 1989), 1439-64; Rolf Bräuer, "Historische Edition und Computer. Internationale Tagung von 26. bis 30. Oktober 1988 in Graz," Zeitschrift für Germanistik, 10 (1989):608-11; and Wilhelm Ott, "Computers and Textual Editing," in Computers and Written Texts, ed. Butler, pp. 205-26.
7 See essays collected in Lexicography in the Electronic Age, ed. J. Goetschalckx and L. Rolling (Amsterdam: North-Holland, 1982) and Theorie und Praxis des lexikographischen Prozesses bei historischen Wörterbüchern, ed. Herbert Ernst Wiegand, Lexicographica, series maior, 23 (Tübingen: Niemeyer, 1987); see also Willem Meijs, "Computers and Dictionaries," in Computers and Written Texts, ed. Butler, p. 141-65. Many groundbreaking advances in computational lexicography were made in the course of the treatment of medieval and early modern lexicons. See Computers and Old English Concordances, ed. Angus Cameron, Roberta Frank, and John Leyerle, and A Plan for the Dictionary of Old English, ed. Roberta Frank and Angus Cameron, Toronto Old English Series 1 and 2 (Toronto: University of Toronto Press, 1970 and 1973); Jeffrey F. Huntsman, "Computers and Medieval English Lexicography," Computers and the Humanities, 12 (1978):53-60; Jürgen Schäfer, "Elizabethan Glossaries: A Computer-Assisted Study of the Beginnings of Elizabethan Lexicography," I, ALLC Bulletin, 8 (1980):36-41, and II, in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 235-42; La lexicographie du latin médiéval et ses rapports avec les recherches actuelles sur la civilisation de Moyen-Âge, Colloques internationaux du Centre National de la Recherche Scientifique, 589 (Paris: CNRS, 1981).
8 For some speculation in this area, see John Slatin, "Reading Hypertext: Order and Coherence in a New Medium," in Hypermedia and Literary Studies, ed. Paul Delany and George P. Landow (Cambridge, Massachusetts: MIT Press, 1991), pp. 153-69.
9 The earliest example of a medieval study I have found, on a Medieval Latin topic, is that of Anezka Vidmanová, "Stredolatinská textová kritika a pocítací stroje,"Listy Filologické, 92 (1969):28-52; see also Paul Tombeur, "Le traitement électronique des documents et l'étude de textes médiévaux" (Beckmann et al. 1981), pp. 329-39; Computer Applications to Medieval Studies, ed. Gilmour-Bryson; Mary-Jo Arn, "The Systematic Representation of Early Manuscripts in Computer Form: A Proposal," in Historical and Editorial Studies in Medieval and Early Modern English for Johan Gerritsen, ed. Mary-Jo Arn, Hanneke Wirtjes, and Hans Jansen (Groningen: Wolters-Noordhoff, 1985), pp. 209-19; Paul Tombeur, "Informatique etétude de textes médiévaux," L'homme et sons univers au Moyen Âge, ed. Christian Wenin, Philosophes Médiévaux, 26; 2 vols. (Louvain-la-Neuve: Éditions de l'Institut Supérieur de Philosophie, 1986), 1, pp. 174-86; and C. M. Sperberg-McQueen, "Text in the Electronic Age: Textual Study and Text Encoding with Examples from Medieval Texts," Literary and Linguistic Computing, 6.1 (1991):34-46. Cf. also Helmut Droop, Winfried Lenders, and Michael Zeller, Untersuchungen zur grammatischen Klassifizierung und maschinellen Bearbeitung spätmittelhochdeutschen Texte, Forschungsberichte des Instituts für Kommunikations-forschung und Phonetik der Universität Bonn, III: Linguistische Datenverarbeitung, 55 (Hamburg: Buske, 1976); Maschinelle Verarbeitung altdeutscher Texte, ed. Sappler and Strassner.
10 See, e.g., Patrick W. Conner, The Beowulf Workstation (Morgantown: West Virginia University, Department of English [1991]) and materials produced by Conner, Allen Frantzen, Clare Lees, John Ruffing, and others in connection with their Seafarer project (Chicago, New York, Ithaca, and elsewhere, 1990-). Although the document architecture proposed here is intended to be platform-independent, all prototyping to date has been done with HyperCard software, originally developed by Bill Atkinson for Apple Computer, and its native programming language, developed by Dan Winkler. Workstation hardware, purchased in part with an award from the University of Washington Faculty Workstation Initiative and optimized for speed, included a Macintosh IIci, with cache card and eight megabytes of RAM, operating under system 6.0.8 in one-bit (black-and-white) graphics mode. HyperCard, long regarded as slow and cumbersome, suddenly makes sense in such an environment and emerges as a versatile tool capable of almost unbounded manipulation of large quantities of multiple-typeface, multiple-font text. Remarkably, the configuration described here now constitutes a mid-range system.
11 For an elegant appraisal of related issues, see J. M. Sinclair, Corpus, Concordance, Collocation (Oxford: Oxford University Press, 1991); cf. also Gerald Purnelle, "Recherche automatique de groupes verbaux recurrents et de formules dans les fichiers latins lemmatisés," Revue: informatique, 25 (1989):157-92.
12 For an eloquent defense of this approach, see Wilhelm Ott, "Pages and Lines: Remarks on Some Fundamental Requirements of Text Processing Software," in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 227-33.
13 In recent months, the project has benefited from substantial contributions by Professor Joseph Monda of Seattle University.
14 For accounts of other experiments in this area, see N. Calzolari, "Lexical Definitions in a Computerized Dictionary," Computers and Artificial Intelligence, 2 (1983):225-34; Elmar Seebold, "Die Lemma-Auswahl bei einem etymologischen Wörterbuch," in Theorie und Praxis des lexikographischen Prozesses, ed. Wiegand, pp. 157-71; and B. Slator, "Extracting Lexical Knowledge from Dictionary Text," Knowledge Acquisition, 1 (1989):113-37.
15 As noted above, work in this area is at an early stage. The only volume-length collections I have noted to date are Computer Applications to Medieval Studies, ed. Anne Gilmour-Bryson, Studies in Medieval Culture, 17 (Kalamazoo, Michigan: Medieval Institute Publications, 1984), and Le médiéviste et l'ordinateur. Actes de la Table ronde (Paris, CNRS, 17 november 1989), L. Fossier, chair (Paris: CNRS, 1990).
16 Examples include Charles Moorman, "Computing Housman's Fleas: A Statistical Analysis of Manly's Landmark Manuscripts in the General Prologue to the Canterbury Tales," Association for Literary and Linguistic Computing Journal, 3 (1982):15-35, also printed in Sixth International Conference on Computers and the Humanities, ed. Sarah K. Burton and Douglas D. Short (Rockville, Maryland: Computer Science Press, 1983), at pp. 431-46; Harry M. Logan and Barry W. Miller, "A Case for The Book of the Duchess: A Semantic Analysis of Sentence Structure," in Sixth International Conference on Computers and the Humanities, ed. Burton and Short, pp. 384-90; and Kari Anne Rand Schmidt, "Type/Token Ratio for Consecutive Units of Text as a Variable in Authorship Studies: An Assessment with Special Reference to the Attribution of The Equatorie of the Planetis," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, pp. 333-43. More wide-ranging studies include Walter S. Phelan, "The Study of Chaucer's Vocabulary," Computers and the Humanities, 12 (1978):61-69; ibid., "From Morpheme to Motif in Chaucer's Canterbury Tales," in Proceedings of the International Conference on Literary and Linguistic Computing, ed. Zvi Malachi (Tel Aviv: Katz Research Institute, 1979), 291-316; Eugene Green, "Speech Acts and the Art of the Exemplum in the Poetry of Chaucer and Gower," in Literary Computing and Literary Criticism, ed. Potter, pp. 167-87; Harry M. Logan and Grace B. Logan, "The Case of the Canterbury Pilgrims: Sentence Semantics and World View in Frag. I of The Canterbury Tales," Literary and Linguistic Computing, 5 (1990):242-47; Charles Barber and Nicolas Barber, "The Versification of The Canterbury Tales: A Computer-Based Statistical Study," Leeds Studies in English, new series, 21 (1990):81-103 and 22 (1991):57-83.
17 For an early introductory treatment of the subject that remains valuable today, see M. L. Hann, "Principles of Automatic Lemmatization," ITL: Review of Applied Linguistics, 49 (1973):3-22.
18 On applications in the treatment of texts in modern languages, see Rainer Dietrich, Automatische Textwörterterbücher. Studien zur maschinellen Lemmatisierung verbaler Wortformen des Deutsche (Tübingen: Niemeyer, 1973); Tove Fjeldvig and Anne Golden, Automatisk rotlemmatisering--et lingvistisk hjelpemiddel for tekstøking (Oslo: Universitetsforlaget, 1984); Annette Ostling Andersson, L'identification automatique des lexèmes du français contemporain, Acta Universitatis Upsaliensis, Studia Romanica, 39 (Uppsala: Uppsala University Press, 1987). See also Rudy S. Spraycar, "Automatic Lemmatization in Serbo-Croatian," ALLC Journal, 1 (1980):55-59; Josse de Kock, "De la lematización," Lingúistica española actual, 9 (1987):255-56. For details of specific projects, see Hans Eggers, et al., SALEM: Ein Verfahren zur automatischen Lemmatisierung deutscher Texte (Tübingen: Niemeyer, 1980); Christine Schneider, "Lemmatisierung im Projekt JUDO," ALLC Bulletin, 8 (1980):166-74; Wolfgang Krause and Gerd Willée, "Lemmatizing German Newspaper Texts with the Aid of an Algorithm," Computers and the Humanities, 15 (1981):101-13; Gerd Willée, "Anwendungen des Algorithmus LEMMA2 zur Lemma-tisierung deutscher Wortformer," in Computers in Literary and Linguistic Research, ed. Cignoni and Peters, pp. 279-300; Normand Beauchemin and Michel Theoret, "MICRO-SOLIVO. Un Lemma-tiseur semi-automatique pour le québécoise parle," Revue québécoise de linguistique, 3 (1984):19-38; Nicoletta Calzolari, Maria Luigia Ceccotti, and Adriana Roventini, "A Lexical Data Base for Interactive Lemmatization," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, pp. 107-14; Étienne Evrard, "Le L.A.S.L.A," Revue: informatique, 25 (1989):206-7; Pieter Masereeuw, "Les travaux de l'Universite d'Amsterdam," Revue: informatique, 25 (1989):207-11.
19 For surveys of recent advances in machine translation, see William John Hutchins, Machine Translation: Past, Present, Future (Chichester: Ellis Horwood, 1986); Derek Lewis, "Computers and Translation" in Computers and Written Texts, ed. Butler, pp. 75-113; and Muriel Vasconcellos, et al., "State of the Art: Machine Translation," Byte, 18.1 (1993):152-86.
20 For a general introduction, see Edward F. Kelly and P. J. Stone, Computer Recognition of English Word Senses (Amsterdam: North-Holland, 1975).
21 Computational Lexicography for Natural Language Processing, ed. Bran Boguraev and Ted Briscoe (London: Longman, 1989); Natural Language Processing, ed. M. Eilgueiras, L. Damas, N. Moreira, and A. P. Tomás (New York: Springer-Verlag, 1991); Geoffrey Sampson, "Natural Language Processing," in Humanities Research Using Computers, ed. Turk, pp. 125-36; Terry Patten, "Computers and Natural Language Parsing," in Computers and Written Texts, ed. Butler, p. 29-52. On specific applications, see Boris Katz, "Text Processing with the START Natural Language System," in Text, Context, and Hypertext, ed. Barrett, pp. 55-76; F. Antonacci, M. Russo, M. T. Pazienza, and P. Velardi, "A System for Text Analysis and Lexical Knowledge Acquisition," Data and Knowledge Engineering, 4 (1989):1-20.
22 On lemmatization in the treatment of medieval texts, see Hans Fix, "Automatische Normalisierung: Vorarbeit zur Lemmatisierung eines diplomatischen altislandischen Textes," in Maschinelle Verarbeitung altdeutscher Texte, ed. Paul Sappler and Erich Strassner (Tübingen: Niemeyer, 1980), pp. 92-100; H. Kamp, "Die automatische Lemmatisierung frühmittelalterlicher Personennamen," DAI, 42C.4 (1981):697 (no. 4543C); René Pellen, "DILEM: Construire un dictionnaire lemmatisé avec l'informatique. Texte d'experience: Berceo, Los milagros de Nuestra Senora," La Licorne, 7 (1983):197-231; Bernard Derval, "A Computer-Aided System of Text Lemmatization Applied to the Romances of Chrétien de Troyes," ed. Charles Doutrelepont, in Computer Applications to Medieval Studies, ed. Gilmour-Bryson, pp. 31-44; and Dietmar Najock, "Lemmatization of Latin and Greek Concordances and Word-Indexes: Problems and Solutions," in L'ordinateur et les recherches littéraires et linguistiques, ed. Hamesse and Zampolli, vol. 2, pp. 53-66. For an early modern study, see P. S. di Virgilio, "Homographs and Lemmata in Thresor de la langue francoyse by Jean Nicot: A Diachronic Perspective?", Quaderni di semantica, 8 (1987):103-14.
23 For comparable approaches, see K. Devine and F. J. Smith, "Direct File Organization for Lemmatized Text Retrieval," Information Technology Research Development Applications, 3 (1984):25-32; G. David Huffman, Dennis A. Vital, and Royal G. Bivins, "Generating Indices with Lexical Association Methods: Term Uniqueness," Information Processing and Management, 26 (1990), pp. 549-58; P. S. Jacobs, G. R. Krupka, and L. F. Rau, "Lexico-semantic Pattern Matching as a Companion to Parsing in Text Understanding," in Speech and Natural Language. Proceedings of a Workshop, ed. P. Price (Palo Alto: Morgan-Kaufman, 1991), pp. 337-41.
24 See the monographs of Hans Dieter Maas, Homographie und maschinelle Sprachübersetzung, Linguistische Arbeiten, 8 (Saarbrücken: Germanistisches Institut, 1969) and M. Boot, Homographie. Ein Beiträg zur automatischen Wortklassenzuweisung in der Computerlinguistik (Utrecht: Rijksuniversiteit, 1979); see also M. Boot, "Homography and Lemmatization in Dutch Texts," ALLC Bulletin, 8 (1980):175-89; Normand Beauchemin, "Homographie et solutions pratiques en lemmatisation minimale," in Méthodes quantitatives et informatiques dans l'étude des textes: En hommage a Charles Muller, pref. Etienne Brunet, 1 vol. in 2, Travaux de linguistique quantitative, 35 (Geneva: Slatkine, 1986), pp. 25-36.
25 A. Duro, "Un angoissant problème de lemmatisation: le traitement du participe," in Proceedings of the Second International Round Table Conference on Historical Lexicography, ed. W. J. J. Pijnenburg and F. de Tollenare (Dordrecht: Foris, 1980), pp. 117-42, and Peter J. Lucas, "Computer Assistance in the Editorial Expansion of Contractions in Middle English Text," ALLC Bulletin, 9.3 (1981):9-10.
26 Notable attempts include Philip J. Hayes, Some Association-Based Techniques for Lexical Disambiguation by Machine, Computer Science Department Technical Report TR25 (Rochester, New York: University of Rochester, 1977); Yaacov Choueka and Serge Lusignan, "Disambiguation by Short Contexts," Computers and the Humanities, 19 (1985):147-57; and Dave Taylor, "Wordz that Almost Match," Computer Language, 3 (November 1986):47-59. On the larger linguistic issues, see Lexical Representation and Process, ed. William Marslen-Wilson (Cambridge, Massachusetts: MIT Press, 1989); N. Calzolari and A. Zampolli, "Methods and Tools for Lexical Acquisition," in Natural Language Processing, ed. Eilgueiras et al., at pp. 4-24. For details of specific applications, see J. A. Leavitt and J. L. Mitchell, "SPAN: A Lexicostatistical Measure and Some Applications," in Computing in the Humanities, ed. Lusignan and North, pp. 59-71; J. Wiederman, "On The Complexity of Lexicographic Sorting and Searching," Aplikace Matematiky, 26 (1981):432-36; L. Blume, A. Brandenburger, and E. Dekel, "An Overview of Lexicographic Choice under Uncertainty," Annals of Operations Research, 19 (1989):247-72.
27 One possible approach to the problem would employ existing algorithms designed for phonetic analysis; see, e.g., V. W. Zue and D. P. Huttenlocher, "Computer Recognition of Isolated Words from Large Vocabularies: Lexical Access Using Partial Phonetic Information," in Proceedings of the International Conference on Advanced Automation, 1983, Julius T. Tou, chair (Taipei: Institute of Information Science, 1984), pp. 343-47; Jim Howell, "An Alternative to Soundex," Dr. Dobb's Journal, November 1987, 62-65; M. D. Riley and A. Ljolje, "Lexical Access with a Statistically-Derived Phonetic Network," in Speech and Natural Language, ed. Price, pp. 289-92.
28 Maria Assumpta Brossa i Alavedra, "Lematització semiauto-matitzada i regularització gràfica de Tirant lo Blanch," Ph.D., Universidad de Barcelona, 1990; see DAI 51C.3 (1990):338-C (no. 1414).
29 Theodor Holm Nelson, Literary Machines 90.1, rev. ed. (Sausalito: Mindful Press, 1990). Nelson's addressing scheme in some respects resembles the Dewey decimal system insofar as any address (e.g., 768.1004.3.345620987) can form the basis of another by means of the incrementation of one of its numerical subsections or the interpolation of another decimal point. Nelson traces the origin of his scheme back to an article by Vannevar Bush, "As We May Think," Atlantic Monthly, July 1945:101-8. He also cites Douglas Engelbart as a major influence in the de-velopment of the concept of hypertext, but Engelbart's papers do not seem to have been issued in any readily accessible form.
30 On current notions of hypertext, see Text, Context, and Hypertext, ed. Barrett; George P. Landow, "Hypertext in Literary Education, Criticism, and Scholarship," Computers and the Humanities, 23 (1989):173-98, reissued as "Changing Texts, Changing Readers: Hypertext in Literary Education, Criticism, and Scholarship," in Reorientations: Critical Theories and Pedagogies, ed. Bruce Henricksen and Thäis E. Morgan (Urbana: University of Illinois Press, 1990), pp. 133-61; Hypertext: Concepts, Systems, and Applications, ed. N. Streitz, A. Rizk , and J. André, Cambridge Series on Electronic Publishing (Cambridge: Cambridge University Press, 1990); Jay David Bolter, Writing Space: The Computer, Hypertext, and the History of Writing (Hillsdale, New Jersey: Erlbaum, 1991) and ibid., "Topographic Writing: Hypertext and the Electronic Writing Space," in Hypermedia and Literary Studies, ed. Delany and Landow, pp. 105-32; Emily Berk and Joseph Devlin, ed., The Hypertext/Hypermedia Handbook 1991 (New York: McGraw Hill, 1991). See also "[Bibliography:] Hypertext and Hypermedia," in CTI Centre for Textual Studies Resources Guide, March 1992, ed. Davis, Deegan, and Lee, p. 68, and Adam Hodgkin, "Tekst of Hypertekst," tr. Harald Engelstad, Vinduet, 45.2 (1991), 62-64.
31 On "hypermedia" see Hypermedia and Literary Studies, ed. Delany and Landow; J. Bradley, "Research Challenges in Information Technology: Hypermedia," Canadian Humanities Computing, 3.1 (1989):4-8; and Geri Younggren, "Using an Object-Oriented Programming Language to Create Audience-Driven Hypermedia Events," in Text, Context, and Hypertext, ed. Barrett, pp. 77-92.
32 On the problems of linking, see Terence Harpold, "The Contingencies of the Hypertext Link," Writing on the Edge 2.2 (Spring 1991):126-38.
33 References to current literature on computing, a discipline that is founded on systematization, are only accessible through a surprisingly diverse group of resources. The longest-running annual bibliography is "Language, Literature, and the Computer," Annual Bibliography of English Language and Literature, 46- (1971-). Researchers, particularly those working on medieval topics, will generally need to supplement this source by reference to specialized bibliographies, e.g., Wilhelm Ott, "Bibliographie. Computer-Anwendung im Editionswesen," in Probleme der Edition mittel- und neulateinischer Texte, ed. Hödl and Wuttke, pp. 175-85, reissued in the Italian translation as "Bibliografia. Uso del computer nella scienza editoriale," in La critica dei testi latini medievali e umanistici, ed. H. Furhmann and A. d'Agostino (Rome: Jouvence, 1984), pp. 203-14; Joseph Rudman, "Selected Bibliography for Computer Courses in the Humanities," Computers and the Humanities, 21 (1987):245-54; Pauline Caras, "Literature and Computers: A Short Bibliography, 1980-87," College Literature, 15 (1988):69-72; S. N. Matsuba, "Computer Applications in the Humanities: A Reading List," Canadian Humanities Computing, 4.1 (1990):1-8; Heyward Ehrlich, "An Interdisciplinary Bibliography for Computers and the Humanities," Computers and the Humanities, 25 (1991):315-26; and "Bibliography," in CTI Centre for Textual Studies Resources Guide, March 1992, ed. Davis, Deegan, and Lee, 64-76. See also references indexed under "computers" and other terms in the MLA International Bibliography and International Medieval Bibliography (Leeds: Maney, 1977-). There is also an important section treating "Elaborazione elettronica dei dati" in the annual bibliography Medioevo Latino (Spoleto: Centro italiano di studi sull'alto medioevo, 1980-). I would like to thank Marilyn Deegan for the opportunity to read this paper at the 1992 International Congress on Medieval Studies at Western Michigan University, and the staffs of the University of Washington Engineering Library and the Microsoft Library at Redmond, Washington, for their assistance in tracking down several hard-to-find items.