blog: Text mining

Two weeks ago I was in Glasgow, discussing Text Mining for Historians. The workshop started with a couple of presentations that gave a more general introduction into the field, describing specific projects, tools or concepts such as corpus linguistics. We then moved on to explore tools and databases such as WMatrix or The Historical Thesaurus of English in several hands-on sessions.

In the concluding discussion, it was stressed that text mining is not only useful for a quantitative approach to corpora, but can also be employed to enhance an analysis of even a single text: conclusions drawn from a qualitative reading of a text can be contextualised to show you where 'your' text stands in relation to others so that you can easier test assumptions about its uniqueness.
To give an example from my research: For my Ph.D. dissertation on the maritime aspects of English national identity in the early modern period I looked at how topoi such as the Royal Navy as the Wooden Walls of the country were used in specific circumstances. I did learn a lot about how ‘navalist’ activist such as John Dee or 18th century pamphletists used it, but if I could have combined text mining tools with access to a large corpus of early modern English sources, it would have been much easier to find out how popular the topos was outside the special interest groups and how successful they were in disseminating it.

We also talked about server based tools that allow you to upload your sources and then analyse and annotate them – and also to share information with others (dictionaries with spelling variants for instance). What I would like to see is an interface that enables you to use different text mining tools on the same source (be it a source you have as full text or one that is available as part of a database), compare the results and then safe and annotate the output. Imagine if you could run an in-depth analysis of your sources, identify topoi and rhetorical structures etc., and then compare these results to thousands or hundreds of thousands of texts from different corpora just by ticking a few boxes!

The event in Glasgow, organised by Zoe Bliss and Ian Anderson, showed that there is a lot of interest in discussing the application of text mining methods, which is why we decided to set up an online group to continue the discussion and exchange of ideas. We would like to invite you to join Text Mining at the Digital Arts & Humanities community site!

Materials from the workshop

Handouts and presentations from the workshop are available for download at our group forum. You can also comment on and discuss these resources!

Syndicate content