[Poster] Grapho-phonological parsing: Corpus annotation for historical phonology


Date
Nov 30, 2017 12:00 AM
Location
University of Edinburgh

B. Molineaux, J. Kopaczyk, V. Karaiskos, D. Smith, W. Maguire, R. Alcorn and B. Los

While electronic corpora have improved data access for historical linguists, they are rarely built with phonological questions in mind. Such methods typically focus on identification and labelling of units on higher levels of linguistic analysis, such as morphology, syntax and semantics, overlooking the phonic layer. One of the main reasons for this is that sound substance, if encoded at all, is mediated by a graphic system which may not be altogether transparent. That said, variation in non-standardised alphabetic systems, such as those of pre-modern Europe, has long been exploited to reconstruct diachronic and diatopic variation in phonological histories, so it is surprising no bespoke tools have been developed to assist in this painstaking process.

In this paper, we report on a corpus-annotation method developed for the From Inglis To Scots (FITS) project, which maps individual 15c Scots spellings onto their presumed sound values, allowing for a fine-grained examination of the phonotactic and morphotactic distribution of individual segments as well as variation in their values over time, space and text. This database of grapho-phonologically parsed forms is compiled on the basis of the Linguistic Atlas of Older Scots (LAOS – Williamson, 2008) which brings together c. 1250 local Scots documents dating from 1380 to 1500.

We assume that our source materials were set down by scribes capable of “sophisticated and subtle linguistic analysis” (Laing and Lass 2003: 258), so we expect there to be a systematic connection – albeit not necessarily a one-to-one match – between orthographic choices and underlying sound systems. As a result, we are able to reconstruct the array of spellings for individual sounds, and conversely, the array of sounds that can be represented by individual graphemes and can display the spatial and temporal distribution of individual sound-spelling pairings.

In addition, we link each Germanic root morpheme to its etymological source and propose a path for the development of its attested forms. The result is a corpus of detailed form histories, supported by a Corpus of Changes. This paper will discuss the technical and theoretical challenges of such procedures and exemplify the types of questions that such a quantitative yet dynamic approach affords researchers in historical phonology.

Benjamín Molineaux
Benjamín Molineaux
Lecturer in Linguistics

I am a historical linguist, working on sounds, spellings, word structure and stress in Mapudungun and Older Scots.