TNA Fellow Lara Nugues: Building a digital corpus of Vaudeville to study humour: issues and challenges

Building a digital corpus in order to be able to exploit it properly is a challenge in itself. How to select texts? Which data banks to use? Which OCR techniques to employ if the plain text is not directly accessible? What position should be taken with regard to the original text? Rather conservative, rather interventionist? And finally what level of granularity to choose when tagging in XML and according to which criteria? All these steps, because they are preliminary, are often quickly addressed in scientific works, whereas they are nevertheless fundamental. Indeed, the creation of a research object, whatever it may be, conditions the research itself and its results. This project proposes to examine all these questions through the construction of a digital corpus of the early nineteenth-century french vaudevilles, a kind of play mixed with song, with the longer-term objective of studying the humour present in their couplets.