The Henry James Sentence: New Quantitative Approaches

The house had a name and a history; the old gentleman taking his tea would have been delighted to tell you these things: how it had been built under Edward the Sixth, had offered a night’s hospitality to the great...

A Macro-Etymological Analysis of The Canterbury Tales

Chaucer's Canterbury Tales exhibits one of the richest vocabularies of Middle English literature, a vocabulary that reveals influences from a number of native and foreign languages: Old English, French, Latin, Greek, and Hebrew, among others. While some of this foreign...

Probabilistic Detection of Character Voices in Fiction

In James Joyce's novel Ulysses, the school headmaster Mr. Deasy quotes Shakespeare in a lecture in financial responsibility to his employee Stephen Dedalus. “[W]hat does Shakespeare say?” he asks, “Put but money in thy purse” (Joyce 1986, 25). As Stephen...

A Generator of Socratic Dialogues

In the influential 1948 paper “A Mathematical Theory of Communication,” the mathematician Claude Shannon conducts a thought experiment to construct an algorithmic approximation of language. The algorithm can be described like this: Choose a book at random from your bookshelf,...

Chapterize: a Tool for Automatically Splitting Electronic Texts into Chapters

If you do computational analyses of books, and need to break up the book’s text file into its constituent chapters, I’ve just released a tool that you might find useful. It’s called chapterize, and it breaks a book into chapters....