Generative AI and Canon Studies

Abstract

Despite the dominance of novels, short stories play a crucial role in education and the global reach of literature; for example, in the case of Hans Christian Andersen or Anton Chekhov. This spark reflects on the process of mapping a world canon of impactful short stories, emphasising the importance of combining subjective knowledge with statistical data from sources like UNESCO’s Index Translationum and Goodreads. Furthermore, the potential for large language models (LLMs) like ChatGPT to enhance literary research represents a promising avenue for exploration, offering additional insights and context. Despite concerns about their accuracy, LLMs can expand the scope of literary studies and help identify influential short stories and authors overlooked by traditional methods. Integrating generative AI into canon studies offers a new dimension to understanding literary influence and impact.

During my stay at Freie Universität in February and March 2024 at Research Area 5 ‘Building Digital Communities’ of the EXC 2020 Temporal Communities, I worked on an article on short stories in World Literature. My key interest was, following the work of , to map instances of short fiction that have been successful over a long period of time and beyond their borders of origin. In a literary world increasingly dominated by novels at the expense of poetry, drama, and short forms, it is often the shorter forms that have a chance of being included in educational systems and reaching a much broader group of readers than texts in the open market. So, while short stories are not dominant and possibly only rarely canonised, their function in the global literary system can be significant. One obvious example, coming from my native country of Denmark, is Hans Christian Andersen, whose fairy tales have been translated into more than 200 languages. By studying which authors of short fiction are successful in World Literature, it is also my hope to uncover more traits of internationally canonised stories.

How, then, can we map a world canon of short stories? At any rate, it is not an exact science. Still, it is also far from a random list, as the perception of the international impact of writers should be supported by numbers showing that the works did, in fact, circulate widely. So, where to begin and how to finish? The first step is usually to rely on one’s own knowledge. If a work or an author does not come to mind for a literary scholar, how widely canonical can it be? This is highly subjective but provides, at least, a starting point with, for example, Hans Christian Andersen, Jorge Luis Borges, Anton Chekhov, Ernest Hemingway, Franz Kafka, and Alice Munro. Consulting literary histories and other sources would also be an obvious step in the initial survey of the field.

The next step would be to support any claims with numbers that show that the works are, in fact, impactful and still part of a living canon. With digitisation, this has become easier and more nuanced. UNESCO’s Index Translationum gives an indication of sustained interest in translating an author, and worldcat.org provides the number of editions in which works have been published. Ratings on Goodreads are interesting not so much for the actual rating, but for the number of ratings as an expression of current interest in the works. Open Syllabus gives an indication among more than a million courses whether specific texts are being taught. Studies in Wikipedia have also proved valuable in mapping the international field of literature . Finally, Google Books’ Ngram Viewer can indicate the ebbs and flows in the interest afforded to an author’s works .

These are all good sources and make studies of global literature much more solid in their argumentation for whether a work really has an impact or if it is mostly a projection of a scholar’s personal preferences (to put it bluntly). However, with the recent advent of large language models (LLMs) and their easy accessibility through interfaces such as ChatGPT, Claude, Mistral, and Gemini, it is possible to address questions of the canonical landscape in World Literature—as well as many other questions—in new ways.

There are numerous reasons for being cautious when using LLMs. Some of their output may be completely unrelated to reality and simply made up. Other outputs can be inaccurate, for example, by being overly generous about a writer’s importance. Sometimes, the errors are obvious and comical; at other times, only double-checking its claims will prevent the embarrassing distribution of fictional statements disguised as facts.

However, there are also several uses of LLMs that make them interesting and useful in the context of understanding literary influence, and they could be the missing third step after basic hermeneutic-recollection and the gathering of statistical data. Since LLMs essentially are statistical models, they may be well-suited for answering questions that do not seek the fringes of the literary world but rather those that have been repeated again and again, as one would expect from influential works. And unlike the raw data of translations, scores on Goodreads, and inclusion in the curriculum, LLMs can present arguments for what is distinctive about certain authors and works, not least when suggesting writers who may not have been at the forefront of an academic’s (limited) mind.

For all the reservations one should have about generative AI, used the right way, it can expand the frame of reference in a way that traditional research and Digital Humanities approaches cannot quite provide. LLMs are well-suited to finding the right short stories in the proverbial haystacks of texts, both when it comes to overlooked-authors and supporting claims one has already spotted.

Using the prompt ‘Give me a list of writers who are as known, if not more known, for their short stories than for their novels. Only widely circulated authors’ on three different LLMs (ChatGPT, Gemini, and Claude), I obtained a more extensive and commented list, which could then be checked against the metrics provided by Index Translationum, GoodReads, etc., with the result that writers were added whom I knew but had not considered (like Raymond Carver) or with whom I was not familiar (like Katherine Mansfield). The LLMs also reproduced a number of my initial selections, such as Chekhov, Borges, and Munro; and while Andersen was not listed at first, dialogues with the LLMs provided context and clarification of why he should be listed and how he compared to other writers on the list.

While LLMs may seem like black boxes, they are, in cases like this, remarkably consistent, even across different language models. However, we have only just begun to explore how they can analyse texts . If the understanding of literary influence was already bolstered by new data made available through digitisation, the ability to pose questions in a form of distant reading, which was once unimaginable when Franco Moretti coined the term in , remains a modest but very useful aspect of LLMs.

Author(s)	Mads Rosendahl Thomsen
Contribution Type	Spark
Published	July 2024
Author’s Tags	Canonisation Digital Humanities Reception
Editor’s Tags	Data Fiction Novella/Short Story
Licence	Attribution-NonCommercial CC BY-NC (4.0)
DOI	https://doi.org/10.60949/nhvp-cg07
Version	1.0

Generative AI and Canon Studies

Selected Bibliography

Citation

About

Contents

Tags