TNA Fellow Andressa Rodrigues Gomide
Compiling a literary corpus with minimal resources
Andressa Rodrigues Gomide is a researcher at the University of Coimbra. With this project she addresses the creation of the literary subcorpus of Corpus Pluricêntrico da Língua Portuguesa (CPLP), a large reference corpus of Portuguese language varieties. Given the obstacles to deal with copyrighted documents, the aim of this project is to devise a system that allows optimal literary data collection with limited resources. To achieve that, quantitative linguistic analysis will be piloted on six literary datasets that pose distinct levels of difficulties concerning data collection. It is expected that the knowledge of how these datasets differ from each other will aid the creation of a framework that allows for fast and simple data collection that does not affect the quality of the final data.
Today I start my 10-week CLS INFRA Fellowship at the @TrierUni. Thanks to @CLSinfra, @CDHTrier, @christof77 for this opportunity. Looking forward to the upcoming weeks. #equipacelgailtec @CelgaIltec pic.twitter.com/ZVXqHtLxMN
— Andressa Gomide (@gomidear) September 26, 2022