Text mining of Native American Speeches with Voyant

Analysis of Short Well-known Speeches by Native Americans using Voyant

My students have to recite these speeches when I teach Voice and Diction, so I have copies on my desktop.[1] These are not long speeches. They vary in length from 75 to 100 seconds, or two or three paragraphs.[2] The speeches are all about how badly the White Man has treated them. Most are defiant speeches, calling for resistance. A few are speeches of peace or surrender, and one in particular, is a cry of pain (Standing Bear). I thought these would be interesting speeches to analyze.

At first, I decided to do a test run with three of them, so I copied and pasted three of the speeches into the text box.  This did not provide the results I wanted because Voyant treated them as one document, not as three short ones. This was my fault. I saw that Voyant let users upload individual files, but I didn’t do it.

Starting over, I uploaded all eleven speeches as word documents. That worked. It gave me an analysis of all the speeches, as well as information about the individual speeches. Overall, this corpus has 2,180 total words and 757 unique word forms. The longest speech was Sitting Bull’s, with 301 words. The shortest was Chikataubat’s at 153[3].  Interestingly, Chickataubat’s speech was the shortest, but it had the most words per sentence (30.6) and the highest vocabulary density (0.758)[4][5].

They did not calculate the overall vocabulary density of the corpus, but based on their formula, it’s 0.347. That seems low, but, at the same time, most of these speeches are calls to action, and the arguments presented in speeches like might to be more straightforward.

The most common words were “people” (15 times), “man” (12), “shall” (9), “white” (9), and “away” (8). Voyant also provided the most common words in the individual pieces. For instance, “tells” occurs four times in Osceola’s speech and “neighbors” occurs three times in Sitting Bull’s speech.

Considering that these are all speeches that arose out of conflict, the most common words are unsurprising: these words are used to call out to the community, and to tell them not to just to resist, but who to resist.  The frequency of “tells” in Osceola’s speech is not surprising. Osceola is calling on the Seminole to resist relocation to what is now Oklahoma. “Neighbors” in Sitting Bull’s speech refers to White Americans who are continually encroaching on Dakota territory.

I noticed that Voyant didn’t make links between related words: died and dead are treated as separate words, they aren’t really linked in any way in the analysis.

Sometimes, though, the link is more subtle: Sitting Bull’s “neighbors” clearly refers to white people, like “white”, but Voyant doesn’t link them either, which isn’t really a surprise.

Voyant has a function called “links”, which showed the three most common words in the corpus and the words “in close proximity” to them. It also has a “context” function, where you can click on a word, and it will show you all the sentences that word appears in. It also marks which speeches those sentences come from.

Next, I decided to split the works up into categories to see what, if anything, changed. I chose to focus on word count, most common words, and vocabulary density for this.

First, I divided the speeches into two groups: those that were given before the Civil War, and those given after. The pre-Civil War speeches were by Metacom, Chikatabaut, James Logan, Pushmataha, Wabashaw, and Osceola. The post-Civil War speeches were by Red Cloud, Spotted Tail, Sitting Bull, Chief Joseph, and Standing Bear.

The pre-Civil War speeches had a total of 1095 words, with 480 unique words. The most common words were “people” (10), “father” (7), “man” (7), “English” (6), “White” (6), “Logan” (6).

This makes sense: these speeches are all calls to resist, so they’d be making appeals and talking about their enemy. The interesting one is “Logan”. It is one of the most frequently encountered words in the entire corpus, yet it appears only in James Logan’s speech.

The longest speech here is James Logan’s at 202 words. The shortest is Chikataubat’s at 153.

The overall vocabulary density is 0.483. I’m not sure why. If, as I said above, calls to action tend to be less complicated than other kinds of speeches, the density should be lower, not higher. My initial hypothesis is either wrong or too simplistic.

As above, Chikataubat’s speech is the most dense, at 0.758, while the least dense speech is Osceola’s at 0.456. The longest speech is James Logan’s (202 words), and the shortest is Chickataubat’s (153). This is interesting because they both spend time describing individual situations in their speeches: the murder of Logan’s family at the hands of the Whites, and the desecration of the graves of Chickataubat’s family. The other speeches in this category are more generalized calls to resist.

The post-Civil War speeches (Red Cloud, Spotted Tail, Sitting Bull, Chief Joseph, and Standing Bear). These speeches are more of a mix. Chief Joseph is surrendering[6]; Standing Bear, asking for help[7], Spotted Tail saying resistance is futile, and Red Cloud and Sitting Bull calling for war.

The most frequently appearing words were “shall” (7 appearances), “children” (6), “died” (6), “men” (6), and “things” (6).

Standing Bear’s speech accounted for all the appearances of “died” and four of the appearances of “children”.  The most common word in Spotted Tail’s speech is “alas” (3).  Both of these make sense. Standing Bear is describing the state of his people: many died on the road to the new reservation, and more died once they got there. Spotted Tail has been defeated, and his speech reflects that. Meantime, the most common word in Red Cloud’s speech is “brought”, which appears three times. Again, context matters. Red Cloud is listing the things the White Man has done to his people, so the usage of “brought: makes sense.

The longest speech was Sitting Bull’s, and the shortest was Chief Joseph’s at 161 words. Chief Joseph’s speech, coming after weeks of flight and retreat, may be so short because he was exhausted and demoralized.

In terms of vocabulary density, the densest speech of this set is Spotted Tail’s at 0.655; Standing Bear’s is the least dense at 0.545. I don’t know why. I guess that linking vocabulary density to theme of speech doesn’t work, or at least doesn’t work with this corpus.

Finally, I decided to try to analyze these speeches by language family. This was difficult because the speakers’ languages came from five different language families. Only one language family, the Siouan, had more than two representatives. Since it had five: Ponca, Spotted Tail, Sitting Bull, Red Cloud, and Wabasha, I decided to try it to see if there were similarities.

Overall, this corpus contains 1,117 total words and 454 unique words, for a vocabulary density of 0.406.

Sitting Bull’s speech was the longest, at 301 words, while Wabashaw’s speech was the shortest at 193. Wabashaw’s speech had the highest vocabulary density, however, at 0.665 while Standing Bear’s had the lowest at 0.545. Again, the shortest speech is the densest (at least in terms of vocabulary).

The most common words overall were “died” (6), “man” (6), “shall” (6), “things” (6), and “children” (5). We can see Standing Bear’s influence here again, since, as mentioned above, all the occurrences of “died” and four of the occurrences of “children” were from his speech.

“Father” was the most common word in Pushmataha’s speech. Again, in context, this makes sense: Pushmataha was calling his people to war, extolling their bravery in the name of their father.

Overall, I thought this was interesting. I can see how almost all these tools can be useful. I’m not sure about vocabulary density, though: I can see that it has descriptive value. The argument can be made that a speech with higher vocabulary density might be more complex, but I don’t know that I saw that. I’d have to work with longer speeches to see if that bears out.


[1] These speeches are by Chief Joseph of the Nez Perce, Chikataubat of the Massachuset, James Logan of the Cayuga, Metacom (or King Philip) of the Wampanoag, Osceola of the Seminole, Pushmataha of the Choctaw, Red Cloud of the Oglala Dakota, Sitting Bull of the Hunkpapa Lakota, Spotted Tail of the Brulé Lakota, Standing Bear of the Ponca, and Wabashaw of the Dakota.

[2] Because I teach speaking skills, I think of length more as a function of time (how long it takes to recite) rather than word count.

[3] This makes me wonder if I shouldn’t replace this one with something a little longer.

[4] Voyant calculates vocabulary density like this: unique words/total number of words= vocabulary density.

[5] Chikataubat’s speech is essentially five run-on sentences. Maybe that’s why I’ve kept in it. It’s short, but it’s more difficult than the others.

[6] After being forcibly removed from their lands in Oregon to a reservation in Idaho, Chief Joseph and his people fled their lands, in an attempt to get to Canada while being pursued by the U.S. Cavalry. They had to surrender forty miles from the border. Look up his story. It’s worth a read.

[7] This is probably the saddest of the speeches.

2 thoughts on “Text mining of Native American Speeches with Voyant

  1. Dax Oliver

    I agree with your ambivalence about vocabulary density. It shouldn’t be considered positive or negative. We tend to think of higher density as being “better” (since it’s rewarded in places like the GRE), but it can also be a sign of superficiality and vagueness. With translations, density is always very suspect, because it depends on the vocabulary of the translator.

    It was interesting to see how words change their connotations. I think of “neighbors” as having a generally positive connotation today, but that clearly isn’t the case here. Also “father,” which might sometimes be a reference to “White Father,” which was a euphemism for “white conqueror.”

    1. Sean Patrick Palmer Post author

      Re: Vocabulary density
      I wonder if this would be different if I compared these to a comparable set of speeches by Native English speakers. Of all these speakers, James Logan was probably a native speaker of English, and the two from New England probably spoke the English based pidgin spoken int he area at the time, but the others were not.

      Then again, it might not.

      Re: Connotations
      In Pushmataha’s speech, “father” specifically refers to the British king, whereas in Osceola’s speech, “Father” is “Great Father”, so God.

Comments are closed.