Computational literary studies rely on extraction of information from text corpora, be it information on style or content. Many research questions can be answered by metadata alone (“Which is the oldest work by this author?”, “Which authors exchanged most letters?”) or structural markup (“How many chapters do the books by that author have on average?”), or by extracting and counting individual tokens from the content data (e.g. topic modeling or stylometry).

However, corpus linguistics comes with yet a set of tools to enhance your content searches. For many languages there are automatic tools to add linguistic information to each token about its basic dictionary form (lemma), morphological part of speech (“noun”) as well as additional morphological details (“past tense”), and even syntactic relations between words (“this noun is the subject of this verb”). This additional markup helps you approximate quite abstract concepts such as “narration” (usually third person past tense), or agentivity in selected actions (“who speaks the most?” rendered as nouns/pronouns as subjects of verbs of speaking).

The upcoming training school will guide you through simple corpus querying methods to explore these concepts.