What can data science tell us about literature?

In a paper recently published in Shakespeare Quarterly, IDSS postdoctoral associate Santiago Segarra and team, including a Shakespearean scholar, used a network-based tool to analyze Shakespearean plays—specifically, all three parts of Henry VI. (“Attributing the Authorship of the Henry VI Plays by Word Adjacency” – Santiago Segarra, Mark Eisen, Gabriel Egan, Alejandro Ribeiro.) There has long been some contention that Shakespeare may not have written these three plays alone, and the team used data science methods to help determine authorship.

Word adjacency networks were generated for each potential candidate author as well as for the three plays of interest. In these networks, the nodes represent “function” words—such as prepositions and articles that do not carry meaning on their own—and the edges contain information of co-appearance of the function words. Authorship attribution is then based on comparing the networks of the Henry VI plays with those associated with potential authors using information theoretical measures. This research is rooted in the concept that “when people write, they unconsciously tend to use the same structures,” says Segarra. “The words themselves become fingerprints of how authors write—independent of the topic they are writing about.”

The analysis reveals that, with high probability, Marlowe’s hand is present in the three Henry VI plays. Moreover, when combining these findings with other complementary studies and historical evidence, Marlowe’s collaboration with Shakespeare becomes certain. Indeed, Christopher Marlowe is now officially stated as a co-author of William Shakespeare in the three parts of Henry VI.

Segarra’s work at IDSS also focuses on networks and graphs, including work with Professor Richard Nielsen and others who use network-based tools to better understand the proliferation of radical Islam’s ideologies.

Related press coverage: “New Oxford Shakespeare Edition Credits Christopher Marlowe as a Co-author”The New York Times

“Big debate about Shakespeare finally settled by big data: Marlowe gets his due
” – The Washington Post

“Penn engineers use big data to show Shakespeare had coauthor on ‘Henry VI’ plays”The Philadelphia Inquirer

“Computer analysis reveals Shakespeare’s collaborators”New Atlas


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764