One question, different annotation depths
A case study in Early Slavic
Keywords:Old Church Slavonic, Participle, Clauses, Dative absolute, Discourse relations
AbstractThis paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity.
Annotating Historical Corpora special issue
Articles appearing in Journal of Historical Syntax are published under a Creative Commons Attribution License. Authors retain copyright.