TNA Fellow Theodorus Fransen
From Modern to Early Irish: retrogressive diachronic morphological tagging methods and UD tagset interoperability solutions based on detailed linguistic analysis
Theodorus Fransen is post-doctoral researcher on the Irish Research Council funded project Cardamom, led by Dr John P. McCrae at the University of Galway, which focuses on deep learning for under-resourced languages. The aim of this project is to work towards automated morphological tagging for Early Modern Irish (c. 1200-1650) and Early Irish (c. 600-1200), using the Universal Dependencies (UD) framework. A UD treebank does not yet exist for Early Modern Irish, and for Early Irish no tagger is available. The proposed methodology involves retrogressive diachronic morphological tagging methods based on a recently established Natural Language Processing pipeline using the UD framework; it produces good results for pre-standard Modern Irish texts going back as far as c. 1600. This project will investigate, based on detailed linguistic analysis, how to further adapt this pipeline to successfully tag Early Modern Irish and Early Irish texts, representing increasingly more typologically different and morphologically complex language stages.