| Abstract / Notes | | The international, interdisciplinary and multilingual LICHEN project (The Linguistic and Cultural Heritage Electronic Network), initiated by the Department of English and the MediaTeam research group (Dept. of Electrical and Information Engineering) at the University of Oulu and the SCOTS corpus project at the University of Glasgow, focuses on the languages and cultures of the northern circumpolar region, that is the region north of the 55th parallel. Its underlying assumption is that language and culture are as important to the survival and well-being of populations as more obvious ecological, social and health issues. We believe that the creation of a digital portal giving access to written and spoken texts in the languages of the region will further its well-being.
Humanities scholars have studied linguistic, educational and social questions related to minority speakers but have been held back by the inability to process and analyse large quantities of data in an effective manner. Humanities computing researchers, in particular, have long recognized the need for new, more sophisticated tools to aid scholarly research of textual data. Although a number of tools have been developed, they suffer from various restrictions, e.g. they are only applicable to the data they were developed for, importing data is laborious, user interfaces and encoding standards are outdated, no support for multilinguality is included, or they promise more than they offer. A central aim of the present project is to develop an electronic framework that will address all these problems.
The aim of the project is two-fold: firstly, the project aims to collect, preserve and disseminate information about the languages spoken in the circumpolar region, thus also enabling research on them. This will also help to promote the linguistic confidence and self-image of the speakers of these languages, strengthening their cultural awareness and facilitating cross-cultural communication between these peoples in an age of rapid global change. In this way the project will benefit not only the academic community, but also the speakers of the languages concerned, and indeed other communities around the world battling with the same kinds of issues.
Secondly, and more importantly, the project aims to create an electronic framework for the collection, management, online display, and exploitation of existing corpora of the languages of the circumpolar regions, which is also applicable to other corpora that represent regional, social and other varieties of languages. Compilers of corpora that document regional and social languages and varieties of language must necessarily have different needs and goals and yet also face common problems. Thus there is a need for collaboration both for the use of computer tools among the researchers and for greater ease of use by varied audiences. The project has begun by exploring the parameters of the various corpora, whether written language and/or spoken, whether recorded in writing, audio or video. We are exploring the parameters of access and analysis, whether public or private, whether for a general audience or for specialists. It is indeed possible, practical and desirable for us to apply common methods to our common problems and to this end we propose specific recommendations for what methods we should apply in our work, beginning with emerging international standards for metadata for language archives and continuing with best practices for the collection, preservation and presentation of corpus data. To this end the project collaborates in a wider context with other corpora intended for regional and social analysis of language, including the American Linguistic Atlas (Professor William Kretzschmar, University of Georgia), The Newcastle Electronic Corpus of Tyneside English (Dr Karen Corrigan, University of Newcastle) and The Corpus of Sheffield Usage (Dr Joan Beal, University Sheffield).
|