| Publication Type | Journal Article | |
| Year of Publication | 1991 | |
| Authors / Editors | Sperberg-McQueen, C.M. | |
| Journal Title | Literary and Linguistic Computing | |
| Volume | 6 | |
| Issue | 1 | |
| Pages | 34-46 | |
| Abstract / Notes | This paper discusses characteristic problems in designing methods of encoding texts in machine-readable form for textual study. Any electronic representation of a text embodies specific ideas of what is important in that text. A well-developed encoding scheme is thus in some sense a theory of the texts it is intended scheme is thus in some sense a theory of the texts it is intended to mark up. The paper describes, with examples, the theory implicit in the work of the Text Encoding Initiative (TEI), a project to develop guidelines for the encoding of machine-readable texts. Any machine-readable representation of texts must use markup, but no finite vocabulary of markup items can be complete, since neither the set of textual features worth marking nor the set of texts to be studied is finite. Any useful markup scheme must therefore be extensible. Additionally, a markup scheme must allow several discrete views of texts. Texts are both linguistic and physical objects. They have simultaneously a linear, a hierarchical, and a directed-graph structure. They refer to objects in real or fictive universes Texts, finally, are cultural and thus historical objects a useful encoding scheme must be able to represent textual variation, parallel texts, and the gradual accretion of interpretation and commentary with which human culture adorns venerated texts. | |
| Export | Tagged XML BibTex |