Computational literary studies

Autor / Autorin des Berichts
Sebastien Dupont
Zitierweise: Dupont, Sebastien: Computational literary studies, Tagungsberichte, 2014. Online:, <>, Stand:

[PDF-Version of the report]

Within the field of Literature, there is a growing interest in applying computational techniques. At the same time, a subfield of Computational Linguistics is developing, that addresses a range of problems in the field of Literature. This session’s contributors demonstrate the range of possible usages of computational methods, at the meeting points of those two fields of study. In this report, we will see such techniques in action in three case studies. The first one aims to identify the proportion of actual writing that is done by each co-author in a collective literary work. The second one traces the influence of the genre on author writing style in french classical theatre. The third one analyses the different character voices in Virginia Woolf’s “The Waves” monologues.

Beyond style: literary capitalism and the publishing industry

Pioneer of book advertising on television, the prolific and successful author James Patterson approaches the literary production process in a collaborative way: “I’ll write an elaborate outline, maybe 70 pages, very detailed, clear, and focused. The co-author will write the first draft, and I’ll see the work every few weeks. I’ll do two to seven more drafts.”1
JAMES O’SULLIVAN (Cork, PA, USA) and SIMON FULLER (Maynooth, Ireland) are looking at how much of the actual writing is done by every co-author. Using a "bootstrap consensus tree" cluster analysis over maximum frequency words with the Burrow’s Delta metric, the novels can be classified by main authors (see Fig. 1). The data suggest that the bulk of the actual writing is done by Patterson’s collaborators, which is often observed in “literary assembly lanes”2. We can trace back the industrialization of the literary production process to famous names such as Alexandre Dumas. Patterson provides a striking example where the capitalist system has been applied to produce books with plot-oriented storytelling in a very efficient way.

Fig. 1 - Classification of novels by main author

Progress through Regression. Modeling Style across Genre in French Classical Theater

In French classical theater, the genre (e.g. comedy, tragedy, tragi-comedy, ...) is governed by a set of formal and stylistic constraints. CHRISTOF SCHÖCH (Würzburg, Germany) and ALLEN RIDELL (Dartmouth, NH, USA) analyse the influence of the genre on the style of the author in french classical theatre plays3. They compare three models that predict the author of a section in a play: first based on word frequency alone, then adding the likelihood of an author writing in each genre and finally based on word frequencies for each genre separately (see Fig. 2). This analysis suggests that the genre can be an useful factor to perform authorship attribution, and also shows the importance of logistic regression in modelling a range of information jointly with authorship.


Making Waves: Algorithmic Criticism Revisited

Virginia Woolf’s probably most experimental book “The Waves” consists of soliloquies spoken by the book's six characters. DAVID L. HOOVER (New York, NY, USA) is trying to recreate Ramsay’s analysis of character individualization of those six distincts character voices. Various author determination techniques4, such as tf-idf (See Fig. 3), word frequency list, 2-grams or Zeta5 confirm that there is a stylistic voice for each of Virginia Woolf’s characters. The usage of algorithms to analyse literary works also raises a very interesting problematic for the literary critics: identifying computationally-tractable and computationally-intractable questions and the significance of the boundary between those.

Fig. 3 - Tf-idf and Character Individualization

Those interesting applications of authorship determination techniques clearly show that some of the questions that the literary scholar may encounter can be properly answered with the help of computational techniques. Since questions have the tendency to keep popping up, those tools will continue to prove very valuable in the future.

1. Patterson, James. (2012) Life’s Work: James Patterson. Harvard Business Review. Web.
2. Deighton, John. (2006) Marketing James Patterson. Case Study. Boston. Harvard Business Publishing. Web. 24 August 2013. pp 4.
3. Fièvre, P., ed. (2007-2013). Théâtre classique,
4. Craig, H., and A. Kinney. (2010). Shakespeare, Computers, and the Mystery of Authorship. Cambridge UP.
5. This method measures consistency of use rather than frequency.

Paper Session 4, 10. Juli 2014
Chair: Jan Rybicki

Beyond style: literary capitalism and the publishing industry
James O’Sullivan (Cork, PA, USA) and Simon Fuller (Maynooth, Ireland)
Link to paper:

Progress through Regression. Modeling Style across Genre in French Classical Theater

Christof Schöch (Würzburg, Germany) and Allen Ridell (Dartmouth, NH, USA)
Link to paper:

Making Waves: Algorithmic Criticism Revisited
David L. Hoover (New York, NY, USA)
Link to paper:

Digital Humanities Conference 2014
Organisiert von
Alliance of Digital Humanities Organizations


Art des Berichts