Uppsala Computational Literary Studies Group
The computational literary studies group at Uppsala University (UCOL) was initiated in 2017 as a 2-year collaborative project (funded by UU), which included scholars in literature, the Scandinavian languages, and computational linguistics. The project was fruitful and in 2020 a more permanent research group was established.
The main focus of UCOL is to deploy and develop computational methods for the investigation of Swedish literature and its contexts. Consequently, this includes a broad range of research questions, methods, and materials: from literary stylistics and quantitative approaches to narratives to sociology of literature and textual scholarship; from 19th century classics to contemporary popular fiction; from small or even single-novel corpora to large-scale datasets; from basic word counts and descriptive statistics to complex machine learning algorithms.
At the moment, the work in UCOL is focused on three larger research projects:
"Patterns of Popularity: Towards a Holistic Understanding of Contemporary Bestselling Fiction" aims to investigate the most popular contemporary novels at scale, and through a combination of empirical approaches, covering digital text material (ebooks), contextual and book trade material, and reader consumption data. The ambition is to find out in what ways bestsellers stand out, and how formats such as the audiobook affect writing styles and narratives. The project includes a collaboration with Storytel that provides access to data points on real-time book consumption, a dataset that enable new ways to merge publishing studies and readership studies.
Participants: Karl Berglund (PI), Mats Dahllöf
Funder: Swedish Research Council
"Fictional Prose and Language Change: The Role of Colloquialization in the history of Swedish 1830–1930" aims to investigate if language change in Swedish in the 19th century was driven by fiction and its move towards naturalism (The Modern Breakthrough). Since it has been claimed that colloquialization first was expressed in fictional prose, the project focuses on stylistic variability in literary texts and investigates whether colloquial linguistic features have spread from dialogue to narrative by developing and using digital methods of corpus stylistics in large scale materials. The empirical point of departure is Litteraturbanken, a corpus of >4200 Swedish works from 1650 to 1940.
Participants: David Håkansson (PI), Sara Stymne, Johan Svedjedal, Carin Östman
Funder: Swedish Research Council
"The Astrid Lindgren Code: Accessing Astrid Lindgren’s shorthand manuscripts through handwritten text recognition, media history, and genetic criticism" explores a material previously untouched by research. It does so primarily through the combination of two digital methods: development and adaptation of algorithms for handwritten text recognition (HTR), and crowd/expert sourcing. The project utilises the joint competences of literary scholars, computer scientists, and professional stenographers to unlock the potential of Lindgren’s original drafts, enable a starting point for full digitalisation and transliteration of Lindgren’s original manuscripts, and provide a general vehicle for methodological development for analysis of handwritten documents.
Participants: Malin Nauwerck (PI), Karolina Andersdotter, Anders Hast, Raphaela Heil
Funder: Riksbankens jubileumsfond (RJ)
ReCENT REsearch output
- Karl Berglund & Ann Steiner (forthcoming). Is Backlist the New Frontlist? Large-Scale Data Analysis of Bestseller Book Consumption in Streaming Services. (Paper submitted October 2020.)
- Karl Berglund & Sarah Allison (forthcoming). A Computational Perspective on Transatlantic Publishing: Tracking Changes from the Swedish in Stieg Larsson’s Millennium Trilogy. (Paper submitted September 2020.)
- Karl Berglund (forthcoming 2021). Strömmade bästsäljare. Litteraturkonsumtion i digitala prenumerationstjänster utifrån Storytels användardata. [Streamed bestsellers: Book Consumption in Digital Subscription-Services through Storytel User Data.] Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021.
- Sara Stymne & Carin Östman (2020). SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction. Twelth International Conference on Language Resources and Evaluation (LREC'20). May 13–15, 2020, Marseilles, France. [text]
- Berglund, Karl, Mats Dahllöf & Jerry Määttä (2019). Apples and Oranges? Large-Scale Thematic Comparisons of Contemporary Swedish Popular and Literary Fiction. Samlaren, vol. 140, pp. 228–260. [text]
- Dahllöf, Mats & Karl Berglund (2019). Faces, Fights, and Families: Topic Modeling and Gendered Themes in Two Corpora of Swedish Prose Fiction. DHN 2019 Copenhagen, Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries. March 6-8 2019, Copenhagen, Denmark, pp. 92–111. [text]
- Håkansson, David & Carin Östman (2019). ”afbröt skolläraren ifrigt”: En diakron studie av anföringssatsen i svensk skönlitteratur. [“the teacher interrupted eagerly”: A Diachronic Study of the Speech-Tag in Swedish Fiction.] Samlaren, vol. 140, pp. 261–280. [text]
- Sara Stymne, Johan Svedjedal & Carin Östman (2018). Språklig rytm i skönlitterär prosa. En fallstudie i Karin Boyes Kallocain. [Linguistic Rhythm in Narrative Prose: the case of Karin Boye’s Kallocain.] Samlaren, vol. 139, pp. 128–161. [text]
- Interview with Malin Nauwerck and Anders Hast in Forskning & Framsteg on the use of AI and HTR in transcribing Astrid Lindgren's shorthand: "AI får pippi på Astrids kråkfötter"
- Popular article/review by Karl Berglund in Svenska Dagbladet on analyses of race in literature with computational methods:
- Karl Berglund (Literature)
- Mats Dahllöf (Computational Linguistics)
- Anders Hast (Information Technology)
- David Håkansson (Scandinavian Languages)
- Malin Nauwerck (Literature, SBI)
- Sara Stymne (Computational Linguistics)
- Johan Svedjedal (Literature)
- Carin Östman (Nordic Languages)
- Karl Berglund (group coordinator)