Project outputs

Table of contents

CLS INFRA TNA Fellow Eduardo Fernández

In this video interview, Round 4 CLS INFRA TNA Fellow Eduardo Fernández (European University Institute, Italy) speaks about the project: ‘Distant reading of early modern prophecies across Europe: corpus formation and testing’. 

CLS INFRA TNA Fellow Marko Milosev

In this video, CLS INFRA TNA Fellow Marko Milosev (Central European University, Austria) speaks about the project: ‘Words to Actions: how and if ideology translates to violence’.

Cls Infra TNA Fellow Maciej Maryl

This seminar by Maciej Maryl was produced at the Moore Institute, University of Galway, 13 Dec 2023. It includes a presentation and discussion around the recent ALLEA report Recognising Digital Scholarly Outputs in the Humanities, which underscores the transformative impact of digital practices on humanities scholarship. It addresses challenges in digital humanities, focusing on transparency in linking resources to publications, recognising updates as scholarly contributions, reevaluating authorship, fostering digital skills, and adjusting evaluation methods. It also provides recommendations on the assessment of digital outputs like editions, databases, infographics, code, blogs, and podcasts. Each case study includes practical examples and suggested readings.

In this video, CLS INFRA TNA Fellow Assistant Professor Maciej Maryl speaks about the project: ‘Social Network Analysis of Career Trajectories in Polish Literature after 1989’

CLS INFRA TNA Fellow Richard Změlík

In this video interview, Round 3 TNA Fellow Richard Změlík discusses the project “Building a Literary Corpus of 19th Century Czech Prose’

D6.3 Standards beyond TEI / Extended Transformation Matrix / Alternative Formats

This deliverable builds on and further extends the findings of D6.1 “Inventory of existing data sources and formats” surveying the landscape of literary corpora, as well as D8.1 “Tools for NLP” cataloguing the set of tools in the context of CLS. Focusing on the wealth of formats used when encoding and processing text, it offers a comprehensive overview of common formats for encoding textual data, beyond the “lingua franca”, TEI, both in the domain of computational literary studies and computational linguistics, highlighting potential discrepancies in the approach between these two areas of research. The overview reveals a very heterogeneous landscape with a plethora of formats, devised for differing tasks, from philological encoding of historical text material, to computational annotation and processing of text.

Considering interoperability an indispensable key to reusability, the deliverable explores the challenges and approaches converting between formats.

Typewriter transforming to digital

CLS Beyond academic research: Interviews for Task 3.5

Dr Jennifer Edmond (Trinity College Dublin and DARIAH-EU) and Vera Yakupova (CLS INFRA and Trinity College Dublin) discuss the result of their interviews with non-academics on current and potential uses of Computational Literary Studies tools in the fields of:

  • Policy
  • Consultancy
  • Journalism
  • Medicine/ Psychology
  • Publishing
  • Arts

More will arise from this task – watch this space! If you are working in these fields and would like more information, contact us.

D5.2 Case Studies in Data Preparation and Sharing

Building on previous CLS INFRA deliverables, this report provides step-by-step case studies of research questions involving digitisation and transformation processes of literary corpora. The case studies: 

  1. Creation of an ELTeC affine corpus of the Slovak novel (chapter 2)
  2. Finding the haiku across multilingual corpora (chapter 3)
  3. Measuring entropy and surprisal in the prose of the Tsarist Empire Devoted to Terrorism (Russian and Polish Texts) (chapter 4)

These case studies uniquely address not only the tools and resources available to CLS researchers but the complexities of collaborative decision-making regarding research methodologies. 

Nauka reči Slovenskej (The Theory of Slovak) by Ľudovít Štúr (1846).
The original uploader was Adrian at Slovak Wikipedia., Public domain, via Wikimedia Commons

D8.1: Tools for Basic Natural Language Processing (NLP) Tasks

In this video, Prof. Dr. Julie Birkholz and Mgr. Dr. Silvie Cinková discuss D8.1. This report lists and describes a selection of Natural Language Processing (NLP) tools which are considered to form a Corpus-Enrichment and NLP toolchain for common CLS research tasks. The tools were selected to be:

  • safely positioned in their life cycle, i.e., state-of-the art, and mature as well as continuously maintained, or in development and promised as CLS Infra Deliverables by March 2025
  • as multilingual as possible (beyond English and several major European languages)
  • as interoperable as possible with other tools and texts in other languages.

Read the deliverable here

D3.2: Survey of Methods

This survey documents current, widespread practices in research areas or issues that are prominent within CLS. Though it is not intended as a primer, the Survey Grid provides useful, targeted information in a format suitable for gaining a broad understanding of methods and issues. Fields include authorship attribution, genre analysis, literary history, gender analysis, and canonicity.

Click here to use the Survey Grid and review the deliverable.


In this video CLS INFRA TNA Fellow, Khanim Garayeva, speaks about the project: ‘Calculations of similarities or distances in Peter Ackroyd’s historiographic metafictions and lexical diversity in Dan Brown’s straightforward storytelling’.


The CLS INFRA Transnational Access Fellowship Programme funds scholars from literary studies or with an interest in Computational Literary Studies methods to visit leading research institutions and infrastructures and become part of the larger CLS community.

This archive includes interviews with the TNA Fellows on their experience, their research project and outcomes, video testimonials produced during their fellowship and access to their full reports. Check out the archives here.

D7.1: On programmable Corpora and DraCor

Work Package 7 of the CLS project, entitled “Building the Ecosystem of and for Programmable Corpora”, is developing a small-scale, but highly functional prototype for an infrastructural ecosystem for CLS research, following the concept of a network-based software architecture. The prototype, implemented as the multi-component system “DraCor” (Drama Corpora Platform), realizes the concept of “Programmable Corpora”, which is defined as corpora that expose an open, transparently documented and (at least partly) research-driven API to make texts machine-actionable. This report gives a detailed description of the DraCor system as a prototype for “Programmable Corpora”.  It also shares two first experiments in adapting and transferring the approach of an API-based CLS research infrastructure to other systems and resources.

The first stable version of the DraCor API (1.0.0) was released in December 2023, marking a significant milestone in the system’s development. Watch this space for more announcements!

DraCor: Drama Corpora Project. Click logo for link.

CLS INFRA TNA Fellow Cassandra Ulph

In this video CLS INFRA TNA Fellow, Dr Cassandra Ulph, speaks about her project:’Developing Attribute-based Sentiment Analysis Model for Romantic-period Letters’.

CLS INFRA TNA Fellow Federico Pianzola

In this video CLS INFRA TransNational Access Fellow, Assistant Professor Federico Pianzola, speaks about the project: ‘Programmable Corpora as Linked Data’.

Training School: Madrid

From 9-11 May 2023, CLS INFRA offered a crash course in how to “Dig for Gold” in a corpus of texts. From Stylometry to Natural Language Processing, participants completed their own analyses and visualisations of the results in a hands-on way. We worked with existing code that is plug and play, so it was not necessary to have existing experience in Python or R. Most of all, it was a fun and safe environment to boost textual analysis skills. 
Training materials will be available on DARIAH-CAMPUS soon.

Group photo of participants & instructors at TS: Madrid, 2023


In this video CLS INFRA TNA Fellow, Ivan Pozdniakov, speaks about his project: ‘Building an R Package and Web Application to Interact Within a Digital Ecosystem for Literary Studies’. 

Deliverable 5.1 Review of the Data Landscape

In this video PD Dr. Michal Mrulgaski summarises CLS INFRA Deliverable 5.1 ‘Review of the Data Landscape’. This landscape review focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while D6.1 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. Yet we coordinate our efforts – beginning with the compilation of the table of literary collections – therefore one can regard these as two sides of the same coin. The review’s point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?

Read the deliverable. 

Srishti Sharma - Can a Book Make You Happy?

On Wednesday 29 June Srishti Sharma presented the results of her fellowship as a Transnational Access Research Fellow on the H2020 Computational Literary Studies project at GhentCDH. Her research project explores the effect of the emotions expressed in fictional novels on the emotions experienced by their readers. The corpus includes more than 400 English books from 9 different genres and their corresponding reviews from the Goodreads platform. Using sentiment analysis and emotion recognition she seeks to investigate the emotional links between genre, plot, and reader response.


In this video CLS INFRA TNA Fellow, Srishti Sharma, speaks about her project ‘Can A Book Make You Happy?’ which is being hosted at Ghent University.

CLS infra tna fellow Riva quiroga

In this video CLS INFRA TNA Fellow, Riva Quiroga, speaks about her ‘Improving Part of Speech Tagging for Latin-American Spanish Corpora’ project which is being hosted at Charles University, Prague.

Cls Infra tna fellow lou burnard

In this video CLS INFRA TNA Fellow, Lou Burnard, speaks about his ‘Reviving the Victorian Play’s Project’, which is being hosted by the Moore Institute at NUI Galway, Ireland. 

Deliverable 4.1: Skills gap analysis

We have explored gaps in teaching of research skills for computational literary studies to inform the CLS INFRA project’s own approach to training schools and chart the territory to gain broader insight into current CLS teaching practices. To understand supply we have manually annotated a sample of European university courses in Digital Humanities and summer school workshops. To index demand we set up an online survey to ask the community to evaluate a set of predetermined ‘skills’ based on its perceived future prospects in the field and teaching (1-5 scale response, 118 participants).

The survey also offered a chance to observe the demographic structure of the CLS community. The prevalence of early career respondents indicates a new generational wave within computational literary studies. Participant gender was balanced, although introduction of variables such as career stage, self-reported proficiency, and discipline demonstrated skewness. Researchers who work in the field of CLS also report more experience in computational methods, which suggests that these go hand in hand in current practice. Despite the gap in skills education being more general in nature, we identified areas of heightened interest. These are the skills that make up the backbone of computational research: from designing the study to text collection, to multivariate analysis and statistical modeling. Survey responses reiterated that the current gap in schooling is quantitative rather than qualitative. Moreover, there was a consensus among participants that the institutionalized training of a new generation of researchers is instrumental to disciplinary advancement of CLS.

  • Download and read the full report here.
  • Download and view the poster presentation of results here.

Deliverable 3.1: Baseline Methodological User Needs Analysis

The purpose of this task was to identify, document and show-case best practices in CLS research in order to specify infrastructure requirements for the community. The central concerns of our study are data formats, tools and methods most widely mentioned in the publications related to CLS. These findings play an important role also for the training programme within the project, as they show what key qualifications are required for literary studies, what data formats researchers deal with and what methods and tools are especially relevant in the CLS field.

Training School: Prague

The training school on Data and Annotation took place from 7 to 9 June 2022 and be hosted by the Institute of Formal and Applied Linguistics of the Faculty of Mathematics and PhysicsCharles University in Prague, Czechia. 

Image of students at Prague Training School

The materials, videos of sessions, and other information can be found on the DARIAH campus.


CLS INFRA Principal Investigator, Professor Maciej Eder (IJP PAN) gives a quick tour around the CLS INFRA project for computational literary studies at the Kraków DH Lunch on 11th February 2022.

This special DH lunch introduces the Computational Literary Studies Infrastructure (CLS Infra) project – a multinational European collaboration to connect people, data, tools, and methods, focused on large-scale analysis of literary sources.


Researcher profile: Professor Maciej Eder (PI)

In this video, our project’s Principal Investigator Professor Maciej Eder introduces us to the CLS INFRA project and how Computational Literary Studies methodologies and research intersect with his own scholarship and teaching. Make sure to subscribe to our YouTube channel so that you don’t miss our upcoming researcher profiles.