Grad School in Wonderland

This assignment, as much as this semester, has felt somewhat like an Alice in Wonderland experience. Weightlessly bumping up and down in slow motion I am still slightly confused while gazing into this new universe with wide eyed awe.

As Alice stands bewildered in the forest deciding upon the right path to take, so have also I been confused at where to start and exploring text mining has been yet another trip to wonderland.



For the project I chose to use a scene from “Antony and Cleopatra” by William Shakespeare. The play revolves around the romance between a Roman General and the Queen of Egypt set against the politics of territorial gain and political power. I used scene xv in the fourth act which is a short but significant scene where Antony dies whereupon Cleopatra decides to beat the Romans to it and “make death proud to take her”.


The text was sampled from the Gutenberg Project and was chosen based on a desire to get a sense of how text mining works and how useful it could be in the context of analyzing a play for research purposes. For an actor studying for a part text mining does not seem to be of much use as actors try to connect with the feeling of words rather than frequency but for scholars researching Shakespeare it’s a tool that could prove practical.


Initially the text was analyzed in Voyant with ambiguous results. The most frequent words came up as Antony, Cleopatra, Charmian, come and women. The program, however, wasn’t able to detect the difference between character names and other frequently used words such as “come”. The word “Antony” appears 17 times, but no further categorization is made differentiating how the word is used. When manually searching through the text it appears that seven of those times indicates a line spoken, seven refers to his name being uttered, and three incidents refers to the name being used in stage directions, demonstrating that Voyant does not differentiate between character name, spoken name and name in stage directions. So for instance used as a tool searching for detailed information regarding how many times a character speaks the tool can produce misleading information while it can produce an overview as to what a scene is about in singling out the most common words describing which actions take place in the scene as shown below where the words death, dying and dead pretty much sums it up.

In order to obtain a more nuanced filtering of words and character names I tried different text mining tools without success. The tools tested, such as “Bookworm” and especially “Orange” seemed to be great tools, but I was not able to operate them properly, thus not being able to utilize their potential.


As for challenges I have come across many. Coming from a non-tech background means starting from scratch, trying to figure out the basics, covering everything from understanding concepts to converting files into different formats. I might have been going into this new field being slightly naïve in thinking that technology has become so advanced, yet simple and intuitive that computer programs can be navigated with ease. So far I can say that technology holds the potential for some adventurous trips, there’s just a lot of planning and packing needed to get done before one can set off.