Project outputs

Table of contents

CLS INFRA TNA Fellow Radim Hladík

 In this video, CLS INFRA TNA Fellow Radim Hladík discusses a TNA Fellowship project: “A Sprint for FAIRer Data: development of features for data exchange and publication in research software for qualitative data analysis

D8.2: Report and prototypes for annotation as enrichment

Annotations (or textual encoding) can serve many purposes depending on the task at hand. In literary studies specifically, annotation is implemented for a number of reasons: 1) for reference for a scholar’s own use or for others (i.e. scholarly edition), 2) to share notes and ideas with others, 3) linguistic markups and so forth. These annotations are then used in many different ways including used as a source for scholarly editions to qualitative and quantitative pattern tracing that could include a number of methods from discourse or theme identification to the use of NLP such as identification of named entities.

In this deliverable we report on the development of a number of tools that aim to fill the gap between TEI & NLP tool use. This includes TEITOK, TXM, PoetryLab API, Rantanplan, and Alberti-stanzas.

Read the full report here:

CLS INFRA TNA Fellow Botond Szemes

In this video, CLS INFRA TNA Fellow Botond Szemes speaks about the project: ‘New Metrics for Computational Drama Analysis’

TNA Mini-Conference, UNED

Three CLS INFRA TNA Fellows at LINHD-UNED in Madrid took part in an online mini-conference on 24 April, 2024. Each presented on their research and discussed their plans with each other.  

  • Jyothi Justin, Digital Humanities and Publishing Studies Research Group, IIT Indore. Project title: SEEING THE UNSEEN: LOCATING THE WOMEN OF INDEPENDENT
  • Marko Milosev, Central European University, Vienna, Austria: WORDS TO ACTION: HOW AND IF IDEOLOGY TRANSLATES TO
  • Bonus video: all three discuss their experience of being a CLS INFRA TNA Fellow

D7.3: On Versioning Living and Programmable Corpora

Digital corpora, which are proving more and more to be the most important epistemic objects of Computational Literary Studies (CLS), are by no means always static objects. On the contrary, it is becoming increasingly clear that the digitisation of our cultural heritage needs to be understood as an ongoing process, which also implies that a number of the epistemic objects of CLS must be conceptualized as genuinely dynamic. We address this specific quality of some epistemic objects of the CLS by speaking of “living corpora”. Where corpora — as the data of CLS — are also conceptually combined with code (e.g. in the form of an API) to form more complex research artifacts, we speak of “programmable corpora”, as described in detail in CLS INFRA Deliverable D7.1 “On Programmable Corpora”.

However, both living and programmable corpora usually face a considerable problem when discussed with regard to the reproducibility of research. This report considers possible solutions for the stabilization of living and programmable corpora and thus shows ways of making them available for reproducing research in a sustainable and long-term manner.

Read the full report here:

CLS INFRA TNA Fellow Agnieszka Szulińska

 In this video, CLS INFRA TNA Fellow Agnieszka Szulińska discusses her TNA Fellowship project: “Behind the scenes. Integrating TEI Panorama of Polish drama data into DraCor Programmable Corpora”

D7.2 API Libraries for R and Python for DraCor

Accessing literary corpora via application programming interfaces (APIs) proves to be a promising approach in the development of an infrastructural ecosystem for Computational Literary Studies (CLS). The use of APIs is further facilitated by so called “API wrappers”, i.e. programming libraries that simplify working with the API directly from within a programming language. For the system DraCor (“Drama Corpora Platform”), which is being developed as a prototype for Programmable Corpora within the framework of CLS INRFA, and for its DraCor API, such libraries were developed in the two widely used programming languages R and Python and published on the corresponding platforms: “rdracor” was published on the platform CRAN, “pydracor” on the platform PyPiy.

Python library "pydracor"

D3.3 Showcases for the Application of CLS Methods and Tools

Are you an academic or non-academic literary researcher, linguist, or educator interested in how digital tools and methods can expand your knowledge? A scholar in the Digital Humanities field interested in CLS tools and methods?

The four showcases brought together in this deliverable illustrate, in a concrete, visual and interactive way, how some of the key methods in CLS work when they are applied to collections of literary texts. They are demonstrations, with a relatively low threshold for access, of how tools and datasets as elements of an integrated infrastructure such as the one developed in CLS INFRA, can be used in research.

Multilingual Stylometry Showcase​

  • Designed for: scholars and students, linguists
  • Topics: the intersection of language, corpus composition, and stylometric methods of authorship attribution.
  • Dataset:  a curated dataset from the ELTeC corpus, including texts in English, French, Hungarian, and Ukrainian, each translated to facilitate comparative analysis across languages.

By presenting stylometric analysis through interactive heatmaps, our tool invites users to engage directly with the data, exploring how linguistic features and corpus characteristics influence the identification of authorial style.

Detecting Small Worlds in a Corpus of Thousands of Theater Plays

  • Designed for: scholars and students with basic knowledge of network theory
  • Topics: typology of theatre plays from a network perspective
  • Dataset:  VeBiDraCor – the “very big drama corpus” created by aggregating all individual corpora available on DraCor 9 Aug 2022.
With platforms like DraCor, homogenized TEI corpora of theater plays from different languages are becoming more and more available. This enables a specific approach of comparative study which is based on the method of formal network analysis and its modeling of texts as semantic structures. In this showcase, we take the “Small World” concept from general network theory and try to identify “Small World”-structured texts in a huge multilingual corpus of almost 3,000 plays.

Averell: A corpus management tool to transform poetic corpora into a JSON format compliant with the POSTDATA ontology (Poetrylab Suite, part 1)

  • Designed for: scholars and students interested in poetry
  • Function: to help scholars create their own corpora of poetry by merging different corpora together

Averell is a tool that tries to lower the barrier for researchers interested in the study of multilingual poetry corpora. It provides a unified interface to query, manage, download, and merge corpora of poetic nature in multiple languages based on features relevant for poetry scansion and meter analysis.

Poetrylab + rantanplan: Using rantanplan for accurate scansion analysis and visualization (Poetrylab Suite, part 2)

  • Designed for: scholars, researchers in linguistics, literature, and culture, poetry enthusiasts
  • Topics: Spanish poetic forms
  • Function: explore the nuances of meter, rhyme, and structure within various literary traditions of Spanish poetry

Poetrylab presents a suite of tools for the analysis and visualization of Spanish poetry. This comprehensive platform is designed to cater to a diverse audience, including scholars, researchers, students, and poetry enthusiasts. At the heart of Poetrylab lies rantanplan, a specialized Python library meticulously crafted for the scansion of Spanish poems. Leveraging advanced computational linguistics algorithms, rantanplan facilitates the identification and categorization of syllabic patterns, meter, and rhythmic structures within Spanish poetic compositions. Users can engage with Poetrylab’s user-friendly interface to conduct detailed scansion analyses, gaining valuable insights into the metrical intricacies and poetic nuances of Spanish verse. Whether exploring established literary traditions, studying poetic techniques, or seeking inspiration for creative endeavors, Poetrylab serves as an invaluable resource, fostering deeper understanding and appreciation of Spanish poetry among its users.

CLS INFRA TNA Fellow Jana Mende

In this video, CLS INFRA TNA Fellow Jana Mende (Martin-Luther-Universität Halle-Wittenberg) speaks about the project: ‘Monolingualism Deconstructed: Modelling Hidden and Invisible Multilingualism in German literature (1790-1890)’

CLS INFRA TNA Fellow Haimo Stiemer

Seminar: The Clash of the Modernists

For a long time, literary journals played only a minor role in book-centred philologies. Since their mass digitisation in the early 2000s, however, this has changed. Using the example of two German-language journals from the first half of the 20th century, this talk will demonstrate the heuristic potential of computational philology working with literary journals. As field artefacts, not only the content but also the form of the journals will be considered. According to Pierre Bourdieu, the form finally has its own message with which the actors position themselves in the world of literature and fight against each other.

Haimo Stiemer was a CLS Infrastructure Fellow at the Moore Institute, University of Galway, and works as a research associate at the Technical University of Darmstadt. Beside Computational Literary Studies, his areas of research include the literature of German-language modernism, Literary Sociology and Theory.

TNA Review video

In this video, CLS INFRA TNA Fellow Haimo Stiemer speaks about the project: ‘The clash of the modernists – Comparing the expressionist journal “Der Sturm” and the journal “Der Querschnitt” from the Weimar Republic’

CLS INFRA TNA Fellow Ewa Data-Bukowska

In this video interview, Round 5 CLS INFRA TNA Fellow Ewa Data-Bukowska (Jagiellonian University, Krakow, Poland) speaks about the project: ‘Language use in the paradigm of sustainability: Swedish translations of the works of Witold Gombrowicz’. 

CLS INFRA TNA Fellow Eduardo Fernández

In this video interview, Round 4 CLS INFRA TNA Fellow Eduardo Fernández (European University Institute, Italy) speaks about the project: ‘Distant reading of early modern prophecies across Europe: corpus formation and testing’. 

Cls Infra TNA Fellow Maciej Maryl

This seminar by Maciej Maryl was produced at the Moore Institute, University of Galway, 13 Dec 2023. It includes a presentation and discussion around the recent ALLEA report Recognising Digital Scholarly Outputs in the Humanities, which underscores the transformative impact of digital practices on humanities scholarship. It addresses challenges in digital humanities, focusing on transparency in linking resources to publications, recognising updates as scholarly contributions, reevaluating authorship, fostering digital skills, and adjusting evaluation methods. It also provides recommendations on the assessment of digital outputs like editions, databases, infographics, code, blogs, and podcasts. Each case study includes practical examples and suggested readings.

In this video, CLS INFRA TNA Fellow Assistant Professor Maciej Maryl speaks about the project: ‘Social Network Analysis of Career Trajectories in Polish Literature after 1989’

CLS INFRA TNA Fellow Richard Změlík

In this video interview, Round 3 TNA Fellow Richard Změlík discusses the project “Building a Literary Corpus of 19th Century Czech Prose’

D6.3 Standards beyond TEI / Extended Transformation Matrix / Alternative Formats

This deliverable builds on and further extends the findings of D6.1 “Inventory of existing data sources and formats” surveying the landscape of literary corpora, as well as D8.1 “Tools for NLP” cataloguing the set of tools in the context of CLS. Focusing on the wealth of formats used when encoding and processing text, it offers a comprehensive overview of common formats for encoding textual data, beyond the “lingua franca”, TEI, both in the domain of computational literary studies and computational linguistics, highlighting potential discrepancies in the approach between these two areas of research. The overview reveals a very heterogeneous landscape with a plethora of formats, devised for differing tasks, from philological encoding of historical text material, to computational annotation and processing of text.

Considering interoperability an indispensable key to reusability, the deliverable explores the challenges and approaches converting between formats.

CLS Beyond academic research: Interviews for Task 3.5

Dr Jennifer Edmond (Trinity College Dublin and DARIAH-EU) and Vera Yakupova (CLS INFRA and Trinity College Dublin) discuss the result of their interviews with non-academics on current and potential uses of Computational Literary Studies tools in the fields of:

  • Policy
  • Consultancy
  • Journalism
  • Medicine/ Psychology
  • Publishing
  • Arts

More will arise from this task – watch this space! If you are working in these fields and would like more information, contact us.

D5.2 Case Studies in Data Preparation and Sharing

Building on previous CLS INFRA deliverables, this report provides step-by-step case studies of research questions involving digitisation and transformation processes of literary corpora. The case studies: 

  1. Creation of an ELTeC affine corpus of the Slovak novel (chapter 2)
  2. Finding the haiku across multilingual corpora (chapter 3)
  3. Measuring entropy and surprisal in the prose of the Tsarist Empire Devoted to Terrorism (Russian and Polish Texts) (chapter 4)

These case studies uniquely address not only the tools and resources available to CLS researchers but the complexities of collaborative decision-making regarding research methodologies. 

Nauka reči Slovenskej (The Theory of Slovak) by Ľudovít Štúr (1846).
The original uploader was Adrian at Slovak Wikipedia., Public domain, via Wikimedia Commons

D8.1: Tools for Basic Natural Language Processing (NLP) Tasks

In this video, Prof. Dr. Julie Birkholz and Mgr. Dr. Silvie Cinková discuss D8.1. This report lists and describes a selection of Natural Language Processing (NLP) tools which are considered to form a Corpus-Enrichment and NLP toolchain for common CLS research tasks. The tools were selected to be:

  • safely positioned in their life cycle, i.e., state-of-the art, and mature as well as continuously maintained, or in development and promised as CLS Infra Deliverables by March 2025
  • as multilingual as possible (beyond English and several major European languages)
  • as interoperable as possible with other tools and texts in other languages.

Read the deliverable here

D3.2: Survey of Methods

This survey documents current, widespread practices in research areas or issues that are prominent within CLS. Though it is not intended as a primer, the Survey Grid provides useful, targeted information in a format suitable for gaining a broad understanding of methods and issues. Fields include authorship attribution, genre analysis, literary history, gender analysis, and canonicity.

Click here to use the Survey Grid and review the deliverable.


In this video CLS INFRA TNA Fellow, Khanim Garayeva, speaks about the project: ‘Calculations of similarities or distances in Peter Ackroyd’s historiographic metafictions and lexical diversity in Dan Brown’s straightforward storytelling’.


The CLS INFRA Transnational Access Fellowship Programme funds scholars from literary studies or with an interest in Computational Literary Studies methods to visit leading research institutions and infrastructures and become part of the larger CLS community.

This archive includes interviews with the TNA Fellows on their experience, their research project and outcomes, video testimonials produced during their fellowship and access to their full reports. Check out the archives here.

D7.1: On programmable Corpora and DraCor

Work Package 7 of the CLS project, entitled “Building the Ecosystem of and for Programmable Corpora”, is developing a small-scale, but highly functional prototype for an infrastructural ecosystem for CLS research, following the concept of a network-based software architecture. The prototype, implemented as the multi-component system “DraCor” (Drama Corpora Platform), realizes the concept of “Programmable Corpora”, which is defined as corpora that expose an open, transparently documented and (at least partly) research-driven API to make texts machine-actionable. This report gives a detailed description of the DraCor system as a prototype for “Programmable Corpora”.  It also shares two first experiments in adapting and transferring the approach of an API-based CLS research infrastructure to other systems and resources.

The first stable version of the DraCor API (1.0.0) was released in December 2023, marking a significant milestone in the system’s development. Watch this space for more announcements!

DraCor: Drama Corpora Project. Click logo for link.

CLS INFRA TNA Fellow Cassandra Ulph

In this video CLS INFRA TNA Fellow, Dr Cassandra Ulph, speaks about her project:’Developing Attribute-based Sentiment Analysis Model for Romantic-period Letters’.

CLS INFRA TNA Fellow Federico Pianzola

In this video CLS INFRA TransNational Access Fellow, Assistant Professor Federico Pianzola, speaks about the project: ‘Programmable Corpora as Linked Data’.

Training School: Madrid

From 9-11 May 2023, CLS INFRA offered a crash course in how to “Dig for Gold” in a corpus of texts. From Stylometry to Natural Language Processing, participants completed their own analyses and visualisations of the results in a hands-on way. We worked with existing code that is plug and play, so it was not necessary to have existing experience in Python or R. Most of all, it was a fun and safe environment to boost textual analysis skills. 
Training materials will be available on DARIAH-CAMPUS soon.

Group photo of participants & instructors at TS: Madrid, 2023


In this video CLS INFRA TNA Fellow, Ivan Pozdniakov, speaks about his project: ‘Building an R Package and Web Application to Interact Within a Digital Ecosystem for Literary Studies’. 

Deliverable 5.1 Review of the Data Landscape

In this video PD Dr. Michal Mrulgaski summarises CLS INFRA Deliverable 5.1 ‘Review of the Data Landscape’. This landscape review focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while D6.1 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. Yet we coordinate our efforts – beginning with the compilation of the table of literary collections – therefore one can regard these as two sides of the same coin. The review’s point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?

Read the deliverable. 

Srishti Sharma - Can a Book Make You Happy?

On Wednesday 29 June Srishti Sharma presented the results of her fellowship as a Transnational Access Research Fellow on the H2020 Computational Literary Studies project at GhentCDH. Her research project explores the effect of the emotions expressed in fictional novels on the emotions experienced by their readers. The corpus includes more than 400 English books from 9 different genres and their corresponding reviews from the Goodreads platform. Using sentiment analysis and emotion recognition she seeks to investigate the emotional links between genre, plot, and reader response.


In this video CLS INFRA TNA Fellow, Srishti Sharma, speaks about her project ‘Can A Book Make You Happy?’ which is being hosted at Ghent University.

CLS infra tna fellow Riva quiroga

In this video CLS INFRA TNA Fellow, Riva Quiroga, speaks about her ‘Improving Part of Speech Tagging for Latin-American Spanish Corpora’ project which is being hosted at Charles University, Prague.

Cls Infra tna fellow lou burnard

In this video CLS INFRA TNA Fellow, Lou Burnard, speaks about his ‘Reviving the Victorian Play’s Project’, which is being hosted by the Moore Institute at NUI Galway, Ireland. 

Deliverable 4.1: Skills gap analysis

We have explored gaps in teaching of research skills for computational literary studies to inform the CLS INFRA project’s own approach to training schools and chart the territory to gain broader insight into current CLS teaching practices. To understand supply we have manually annotated a sample of European university courses in Digital Humanities and summer school workshops. To index demand we set up an online survey to ask the community to evaluate a set of predetermined ‘skills’ based on its perceived future prospects in the field and teaching (1-5 scale response, 118 participants).

The survey also offered a chance to observe the demographic structure of the CLS community. The prevalence of early career respondents indicates a new generational wave within computational literary studies. Participant gender was balanced, although introduction of variables such as career stage, self-reported proficiency, and discipline demonstrated skewness. Researchers who work in the field of CLS also report more experience in computational methods, which suggests that these go hand in hand in current practice. Despite the gap in skills education being more general in nature, we identified areas of heightened interest. These are the skills that make up the backbone of computational research: from designing the study to text collection, to multivariate analysis and statistical modeling. Survey responses reiterated that the current gap in schooling is quantitative rather than qualitative. Moreover, there was a consensus among participants that the institutionalized training of a new generation of researchers is instrumental to disciplinary advancement of CLS.

  • Download and read the full report here.
  • Download and view the poster presentation of results here.

Deliverable 3.1: Baseline Methodological User Needs Analysis

The purpose of this task was to identify, document and show-case best practices in CLS research in order to specify infrastructure requirements for the community. The central concerns of our study are data formats, tools and methods most widely mentioned in the publications related to CLS. These findings play an important role also for the training programme within the project, as they show what key qualifications are required for literary studies, what data formats researchers deal with and what methods and tools are especially relevant in the CLS field.

Training School: Prague

The training school on Data and Annotation took place from 7 to 9 June 2022 and be hosted by the Institute of Formal and Applied Linguistics of the Faculty of Mathematics and PhysicsCharles University in Prague, Czechia. 

Image of students at Prague Training School

The materials, videos of sessions, and other information can be found on the DARIAH campus.


CLS INFRA Principal Investigator, Professor Maciej Eder (IJP PAN) gives a quick tour around the CLS INFRA project for computational literary studies at the Kraków DH Lunch on 11th February 2022.

This special DH lunch introduces the Computational Literary Studies Infrastructure (CLS Infra) project – a multinational European collaboration to connect people, data, tools, and methods, focused on large-scale analysis of literary sources.


Researcher profile: Professor Maciej Eder (PI)

In this video, our project’s Principal Investigator Professor Maciej Eder introduces us to the CLS INFRA project and how Computational Literary Studies methodologies and research intersect with his own scholarship and teaching. Make sure to subscribe to our YouTube channel so that you don’t miss our upcoming researcher profiles.