Stephen Ramsay’s chapter “Algorithmic Criticism” (from Reading Machines, anthologized in A Companion to Digital Literary Studies) describes conducting a tf-idf analysis on the speech from the characters in Virginia Woolf’s novel The Waves. The Waves is probably the ideal novel for computationaly studying fictional speech, since it is all in the form
“A short clause,” said Speaker, “a continuation of the sentence.”
“I see a ring,” said Bernard, “hanging above me. It quivers and hangs in a loop of light.”
Chris Forster describes a process for extracting the characters’ speech using some manual edits and some command-line-fu, but bemoans the lack of a TEI edition which would simplify this process. I took that as a challange and threw together a quick and dirty TEI version of The Waves as an exercise, and used a XSL transformation to color-code the characters’ dialog.
From here, there are a number of ways of extracting the dialog of a particular character or group of characters. One easy way is to edit the CSS for the character (i.e.
p.Bernard) in the Web Inspector of Firefox or Chrome, adding
display: none; under
If you do this for all the male characters, only female dialog will display. You can also edit the XSLT to selectively display dialog, or extract it with an XML parser like the BeautifulSoup python module like this:
from bs4 import BeautifulSoup tei=open('waves-tei.xml').read() #read the file soup.find_all(who=u'#Bernard')
Future versions of the webpage will hopefully provide user-interface buttons that will allow the visitor to turn on or off a character’s dialog, or to turn off the color-coding, as I attempted to do with the commentary and metadata in my TEI edition of The Art of Good Behaviour.
Disclaimer: this text is an experiment in textual analysis.