Literary and Linguistic Computing - current issue
What's going on?
Here I survey activities in the digital humanities as a primary source for our conceptualization of the field. I argue for the fundamental nature of modelling to these humanities and describe three varieties: analytical, synthetic and improvisational. I argue that these three kinds are distributed unevenly over the affected fields according to the degree to which each primarily reports on its objects of study, interprets them or invents new genres of expression. The changes in the disciplines are of course incremental—old things done better, more thoroughly and so forth. But what requires our attention and effort is the refiguration of them, of disciplinarity itself and of the conflicted economies in which academic work is increasingly taking place. I conclude by recommending that the institutional structures we build for the digital humanities should reflect the nature of the practice as it has emerged in the last few decades.
Thinking about interpretation: Pliny and scholarship in the humanities
Pliny is a piece of software that is meant to stimulate discussion within the Digital Humanities (DH) about how tools might be built that could find greater acceptance within the wider humanities community; something that has eluded the DH to date. Unlike many other tool projects within the DH, which are meant to show new and novel ways to apply technology to transform scholarly practice, Pliny is designed to support the act of conventional scholarly interpretation. It is meant to be a tool that blends so well into the task of the development of an interpretation, as scholars actually conventionally practice it, as to be almost invisible. In this, it follows some of the H-LAM/T design principles of Douglas Englebart, some of whose principles can be seen in software such as the word processor. In this article, several of the principle elements of conventional scholarly practice are described—centred on the act of annotation, notetaking, and the using of these notes as the basis for exploring ideas that emerge from working with the objects of study. Pliny's design is then discussed in the context of how aspects of its design—its affordances—support the scholar who is working with these elements. In particular, it illustrates an approach to the modelling of notes and associated ideas at the time when they are still largely un- or only partially structured.
Digital visualization as a scholarly activity
Thought processes are enhanced when ways are found to link external perception with internal mental processes by the use of graphic aids. Such aids range from scribbled diagrams to sophisticated linkages between thought, images, and text such as those employed by Leonardo da Vinci. These tools allow visual perception to be harnessed in the dynamic processes associated with the creation or discovery of new knowledge. Digital humanists are applying digital versions of these age-old tools in many areas of research, from the graphs generated by text analysis applications to virtual reality models of ancient buildings, methods known collectively as ‘digital visualization’. This article begins with a brief review of the current application of visualization in the digital humanities before moving on to establish a context for digital visualization within ‘traditional’ humanities scholarship. This provides a context for an examination of what is required in order to ensure that digital visualization work is performed with identifiable intellectual rigour. The London Charter is used as a case study for a possible framework for the development of appropriate methods and standards. Digital visualization as a scholarly methodology is discussed and demonstrated as being part of a continuum of established academic practice rather than something that is in some way new, ‘revolutionary’, or lacking in rigorous scholarly value.
What is transcription?
This paper describes preliminary sketches for a formal account of transcription as it is performed in scholarly editing and in the creation of digital resources. After a general outline of our approach, we present two formal models of transcription. The first addresses only the very simplest cases, the second addresses some but not all of the gaps in the first. Finally, we mention some less simple cases and discuss some elaborations of the model which we hope to develop in future work.
Expressing complex associations in medieval historical documents: the Henry III Fine Rolls Project
This article focuses on the use of technologies traditionally associated with knowledge representation to express complex associations between entities in historical texts that have been marked up in XML, according to the Text Encoding Initiative guidelines. In particular, we describe our exploration of the potential role of an ontology in facilitating the interpretation of implicit and hidden associations in the sources of interest, examining its use, and limits in a digital humanities project in connection with editing tools and delivery issues. We demonstrate our findings based on the Henry III Fine Rolls project, where an ontology—built using the RDF (Resource Description Framework)/OWL (Web Ontology Language) technologies—is being developed to make explicit information about person, place, and subject entities marked up as instances in the core texts themselves. For any historian, there is a natural tension between primary sources (as documentary records) and the analysis that produces a context for interpretation. We will argue that the combination of core mark-up (encoded in TEI) and an ontology (in RDF/OWL) provides a powerful model for representing the complexity of this tension and facilitates the necessarily dynamic process of scholarly interpretation.
An evaluation of text classification methods for literary study
This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naïve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson's poems and the sentimentalism classification of chapters in early American novels. The algorithms were also combined with three text pre-processing tools, namely stemming, stopword removal, and statistical feature selection, to study the impact of these tools on the classifiers’ performance in the literary setting. Existing studies outside the literary domain indicated that SVMs are generally better than naïve Bayes classifiers. However, in this study SVMs were not all winners. Both algorithms achieved high accuracy in sentimental chapter classification, but the naïve Bayes classifier outperformed the SVM classifier in erotic poem classification. Self-feature selection helped both algorithms improve their performance in both tasks. However, the two algorithms selected relevant features in different frequency ranges, and therefore captured different characteristics of the target classes. The evaluation results in this study also suggest that arbitrary feature-reduction steps such as stemming and stopword removal should be taken very carefully. Some stopwords were highly discriminative features for Dickinson's erotic poem classification. In sentimental chapter classification, stemming undermined subsequent feature selection by aggressively conflating and neutralizing discriminative features.
Mining millions of metaphors
One of the first decisions made in any research concerns the selection of an appropriate scale of analysis—are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level—the level of words, images, and metaphors—long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.
'A thing not beginning and not ending': using digital tools to distant-read Gertrude Stein's The Making of Americans
The particular reading difficulties engendered by the complicated patterns of repetition in The Making of Americans by Gertrude Stein make it almost impossible to read this text in a traditional, linear manner. However, by visualizing certain patterns and looking at the text ‘from a distance’ through textual analytics and visualizations, we are enabled to make readings that were formerly inhibited. Initial analysis on Making within the MONK (metadata offer new knowledge) project (http://www.monkproject.org/) has yielded evidence which suggests that the text is intricately and purposefully structured. Using text mining to retrieve repetitive patterns and treating each as a single object makes it possible to visualize and compare the three dimensions upon which these repetitions co-occur—by length, frequency, and location—in a single view. Certainly, reading The Making of Americans in a traditional way appears to have yielded limited material for scholarly work, but reading the text differently, as an object of pairings or as parts of combinations, ultimately works in contrast to the supposition that the text is only meaningful to the extent that it defeats making meaning. A distant view of the text's structure allows us to read the text as an object that becomes, as it continues to turn in on itself with a centrifugal force, a whole history without beginning or ending.
The master builders: LAIRAH research on good practice in the construction of digital humanities projects
Although many digital humanities resources are being developed for online use, there is little understanding of why some become popular, whilst others are neglected. Through log analysis techniques, the LAIRAH project identified twenty-one popular and well-used digital humanities projects, and in order to ascertain the factors they had in common, which predisposed them to be well used, conducted in-depth interviews with the creators of these resources. This article presents the findings of the study, highlighting areas that developers should be aware of, and providing a set of recommendations for both funders and creators, which should ensure that a digital humanities resource will have the best possible chance of being used in the long term.





