Category Archives: Uncategorized

Some Resources with Indigenous Maps

Today is Indigenous People’s Day, so I thought I’d look around for some indigenous mapping projects, which certainly count as DH. I found a couple resources that I wanted to share here.

Native Land seems to be a well-known resource. The first thing a user of the website sees is a disclaimer:

This map does not represent or intend to represent official or legal boundaries of any Indigenous nations. To learn about definitive boundaries, contact the nations in question.

Also, this map is not perfect — it is a work in progress with tons of contributions from the community. Please send us fixes if you find errors.

If you would like to read more about the ideas behind Native Land or where we are going, check out the blog. You can also see the roadmap.

…So this may not be an accurate source of information and it’s definitely a work in progress. According to its “About” page, the website is run by Victor G. Temprano, who is not himself Native. However, I do think that being upfront about the potential flaws in the map is a good move.  Additionally, the map links to a page about using its information critically. In particular, this page deals with some of the difficulties of using the colonial format of a map to illustrate the overlapping indigenous territories. Interestingly, this map also doesn’t address the issue of time, so we can’t see how territories may have changed over time.

All that said, I really like the immediacy of this map and the way it shows those overlaps.  According to the map, the Graduate Center is on Lenape land, and the Delaware and Montauk languages were spoken here. The map also includes links to information about these languages and the websites for the nations/tribes (both words seem to be used in the links?)

In any case, the other interesting resource I came across was this Indigenous Mapping Workshop, which provides ” geospatial training and capacity building to bring culturally relevant and appropriate earth observation technologies to support Indigenous mapping.” This workshop has been offered annually since 2014. I poked around the website but didn’t see links to any of the projects people have created in this workshop. However, it reminds me Miriam Posner’s piece, ” “What’s Next: The Radical, Unrealized Potential of Digital Humanities.”  In that piece, she critiques the way that many DH projects have built on existing, colonialist infrastructure. I’m interested in how the work done in this workshop breaks free of that.

I’m & love: an analysis of the lyrics of selected David Bowie albums

For this assignment, I selected the lyrics from five albums of David Bowie’s corpus. Since this was an unscientific review of his work, the simple parameters were to choose an album from each decade that he published. I started with Space Oddity, looked at 1983’s Let’s Dance, moved to Outsider, then Reality, and finally, Blackstar. All the lyrics were taken from the website AZLyrics.com. Co-authored lyrics were not excluded, but should be for future studies.

I start by exploring each album and looking at the highest frequency words from each. Space Oddity is the earliest album I examine and find the highest frequency word or phrase is “I’m” used 27 times. The next closest word is “want” at 19 times in the album.

Moving forward to the 1983 album, Let’s Dance’s, we see the word “long” used 44 times, mostly in the song, “Cat People.” The next album is from 1997, Outsider. This album seems to deviate from the others in that there are no stand out words with the highest frequency word being “it’s at 28 followed by “filthy“, “heart’s“, and “lesson” all tying at 24 words. Then we see in the 2003 album Reality, a return not only to stand out words but also to “I’m” as the leading phrase; it being used 64 times in this album. Finally, when we look at his last album, Blackstar, we see “I’m” used most frequently at 71 times.

After looking at the most frequently used words, I want to see what they are linked to and found that in Blackstar “I’m” is linked to the words “dying,” “man,” “blackstar,” and “trying.” As we know, Blackstar is Bowie’s last album and the one that he worked on after he was diagnosed with liver cancer.

Diagram of Blackstar.

Combining all the texts, I find that “I’m” is used 193 times which is almost 3 times more than the next phrase of “it’s” at 68 times or “love” at 66 times.  While love is not a consistent standout word, we can see by the trend line that it is used with some regularity.  I don’t use Stopwords, but would probably put “it’s” into the group, but that needs to be more closely examined.

And when we look at the final links diagram, we see that the three terms, “I’m,” “love,” and “it’s” are not related to each other at higher levels, but

Corpus links diagram level 3

even when we go several levels of word connections deep, it is hard to tie the terms together.

Corpus links diagram level 7

What could this mean that the terms are not correlated? Without thoroughly looking at the entire body of Bowie’s work (which is not the scope of the exercise), it’s difficult to draw conclusions. This could suggest that his lyrics embody different concepts and those concepts don’t overlap because they are embedded in each of his musical characters such as Major Tom, the Thin White Duke and his final, Blackstar.

Voyant is an easy to use tool.  Before I used Voyant,  I tried to use MALLET and have written something about that, too which I will publish at a later time. The biggest problem I’ve had is trying to get the images to show up. I’m not sure what I’m doing wrong, but I have to pay a visit to the fellows as I was about to lose my mind.

Link

I started by selecting the transcripts from the Kavanaugh-Ford Senate Hearing. I focused on the opening statements of Brett Kavanaugh and Dr. Christine Blasey Ford since it was an uninterrupted summary of what each had to say. I wasn’t sure what to expect from this assignment, given this was my first time using Voyant and having chosen a highly charged subject matter, but I was not disappointed.

Fig. 1 Kavanaugh    

Fig. 2 Ford

Figure one shows links between Brett Kavanaugh’s five most frequently used words (blue), and those associated with it (orange). I highlighted Kavanaugh’s second most frequently used word, “school”, to demonstrate the interactive aspect of Voyant. highlighting “school”, also narrowed down other charts to show detail for the selected term (fig. 3). 

Fig. 3 Kavanaugh

Fig. 4 Ford

Additionally, I was struck by Kavanaugh’s frequent use of the word, “women”, and was quickly able to explore why with Voyant.

Fig. 5

In figure five, I was able to select “women” from the terms tab and view the context in several different ways.

I’ve avoided adding my interpretation to this blog post because I think Voyant is a powerful enough tool to speak for itself. I strongly suggest everyone interested in this topic take some time to compare themselves and share interesting finds that I may have missed. you can find the opening transcripts for Kavanaugh here and Ford here.

Data for Mapping Workshop

This workshop provided a definition and general overview of GIS (Geographic Information Systems), presented by Javier Otero Peña and Olivia Ildefonso, two of the GC’s Digital Fellows with expertise in this area.  Their presentation was very well-organized, and they both provided examples and useful tips along the way.

GIS are tools that enables one to manipulate and represent data spatially on maps.  While these tools can be complex and are very powerful, Javier and Olivia provided an introduction on the way that maps are organized (vector or raster layers).  Vector layers contain data from files which can be in many different formats.  It is these data layers that can enrich a map by providing a spatial representation of the data in visual form (as opposed to reading a table with rows and columns).

I’ve done a few assignments using a GIS tool this past summer– and while those projects were focused on how to use the tool, there was little discussion on how one gets the data (it was already provided in the exercises).  I remember struggling when searching for data to add to my map:  how did I know which database was reliable, what format to use, how to search for the correct fields?  These were the questions I had, and I did my best to muddle through it.

I appreciated that his workshop was focused on how to search for different data sets and load them into the GIS mapping program.  The whole point of using GIS is to marry the data with the map, and I suspect that this critical step is often not touched on in GIS tutorials.

For this session, the intention was to walk the group through a mapping exercise using Carto, an open-source mapping program.  There was an unanticipated change in the software, so we were unable to open accounts and log into Carto.  No matter, as we were able to focus on the main point of the session.

Given that the subject of this workshop was to locate the data and import it into the program, we were able to focus on this (and not be distracted with creating a map at this point).  I thought that both Javier and Olivia did a great job of walking us through each step, and offering tips and strategies for saving files, naming fields, etc.  We searched for the US Census data, chose a table and then narrowed down the fields that we needed and saved the file.  Then we searched for a shapefile for the census tracks; and then “joined” the information from the table with the shapefile (using a common field, in this case ‘state’).

The slides were very clear, and Javier emailed the Powerpoint slides to us afterwards – which now serve as a mini-tutorial for us to replicate on our own.

Javier and Olivia were both knew their stuff and were very effective at tailoring their presentation to the group’s level. I thought that this was just enough for an intro to the topic, and I’m definitely interested in a follow up that delves deeper into finding and evaluating sources of data.

Text Mining: Three Articles About Facebook

For the text mining praxis assignment, I decided to text mine three articles about Facebook. I had found these articles for an assignment in another class, “Media Literacy,” in which I had to select three clips to see how Facebook is portrayed in the news. These are the three articles I picked:

That same week, Facebook had a huge security breach that affected 50 million years, so needless to say… the coverage wasn’t exactly good. (Then again, when was the last time Facebook’s portrayal in the news been good?) And, all three articles I picked came in light of Facebook’s security breach, so it was likely referenced in some way.

My conclusions from this little “scavenger hunt” for Media Literacy were essentially how Facebook has a very negative light cast on it based on its coverage, and the abundance of negative coverage of Facebook makes it difficult for readers to know at a specific moment in time what exactly is going on with Facebook. With this text mining exercise, however, I wanted to see if there were any commonalities between the three articles I had chosen, as well as what may have been distinct about the topics each article covered.

Obviously, “Facebook” was the most common term in all of the articles…

Many of the other common terms, however, were very distinct to a particular article. For instance, you can see the term “stories” is rather big in this Cirrus view — that term only appeared in the article “Facebook Is Cannibalizing Itself.” Upon further investigation (in other words, simply looking back at that article), the piece focuses on Facebook Stories and how they compare to similar features on other social media platforms, so it’d make sense the term “stories” appears specifically for that article. “Sorry,” another big term on the Cirrus view, obviously applies to the article “Sorry, not sorry.”

In terms of what all articles had in common… not much. The terms “data” and “users” appear in all articles, but there were a few different contexts in which these terms were used. (Note: “user” in singular form is also a term on the Cirrus view, but it appears in only two of the three articles.) “Data” was only referenced once in the article “How Facebook Was Hacked And Why It’s A Disaster For Internet Security,” and it seemed to only be used as an afterthought. It came in the last sentence:

“They almost certainly DO do a better job securing sensitive data than a zillion small sites wouldBut when they get breachedit’s a catatrasophe of ecological proportion.”

It surprised me that the term “data” would only be used once in this article since in my mind, the terms “data” and “security” come hand in hand. On another note, you see the term “privacy” in the Cirrus view as well, right? That term was distinct for the article “Sorry, not sorry,” another surprise to me.

I’ll end with some context about how the term “users” was used in each article. The obvious might be using the term when referencing the amount of users on the site, which was certainly the case.

Next came the Cambridge Analytica news — a massive data privacy scandal that affected 87 million Facebook users.” (“Sorry, not sorry”)

“Facebook (NASDAQ:FBnow has 300 million daily active users (DAUson Messenger Stories and Facebook Stories.” (“Facebook Is Cannibalizing Itself”)

Facebook dropped a bombshell on Friday when it revealed an unknown hacker had breached the sitecompromising the accounts of 50 million users.” (“How Facebook Was Hacked And Why It’s A Disaster For Internet Security”)

The rest of the context of the term “users” in these articles pertained to these users taking specific actions on the platform. For example, from “Sorry, not sorry”:

“Gizmodo reported this week that Facebook allows its advertising partners to target a Facebook user by their phone number  — where users gave Facebook that phone number for the implicit purpose of enabling 2FA account security.”

There wasn’t much that was similar between all three articles’ use of the term “users,” which is a reminder of how important context is when text mining. This might’ve been a text mining of rather uninteresting results, but I was excited to get my feet wet with Voyant and have realized that there’s much more work to be done than just recognizing common words that were used in all sets of texts.

Text Mining – The Rap Songs of the Syrian Revolution/War

The purpose of this text mining assignment is to understand the main recurrent themes, phrases and terms in the rap songs of the Syrian revolution/war (originally in Arabic) and their relation (if any) to the overall unfolding of the Syrian war events, battles and displacement. In what follows, I will highlight the main findings and limitations of the tool for this case study.

The rap songs can be found The Creative Memory of the Syrian Revolution, that is an online platform aiming to archive Syrian artistic expression (plastic art, poetry, songs, calligraphy, etc) in the age of the revolution and war. Interestingly, the website also incorporates digital tools (mapping) to map the location of demonstrations, battles, and the cities in which or for which songs were composed. It’s useful to mention that I’ve worked for the website/songs & music section since March 2016, and thus translated most of these songs lyrics. Overall, the songs cover variety of themes elucidating the horror of war, evoking the angry echo of death, and expressing aspirations for freedom and peace.

To begin with, I went over the 390 songs archived to pick the translated lyrics of the 32 rap songs stretching from 2011 until this day (take for example, Tetlayt). 

https://creativememory.org/en/archives/142345/tetlayt-tilt/

I then entered the lyrics, from the most recent to the oldest, into Voyant. And here:

fig. 1

fig. 2

 

Unsurprisingly, the top 4 trends are: people, country, want, revolution (fig. 1 & 2).

 

 

 

 

 

The analysis shows that the word “like” comes fourth, when the word mostly appears in a song where the rapper repeats “like [something/someone] for amplification (fig. 2 & 3).

 

 

fig. 3

Next, I looked into when or at what phase of the revolution/war some terms were most used. It was revealing to see the terms “want” and “leave” (fig. 4 & 5) were popular at the beginning of the revolution in 2011, the time when the leading slogan was “Leave, Leave, oh Bashar” and “the people want to bring down the regime“.

fig. 4

fig. 5

fig. 6

On another note, it doesn’t seem that Voyant can group the singulars and plurals of the same word (child/children in fig. 6). Or is there a way we can group several words together?

 

 

 

 

 

So although the analysis gives a good insight into general trends, I would argue that song texts require a tool that is adaptable to the special characteristics of the genre. After all, music reformulates language in performance, and what may be revealed as a trend in text may very well not be the case through the experience of singing and listening. Beyond text, rap songs (any song really) are a play on  paralinguistic features such as tones, rhythms, intonations, pauses; and musical ones, such as scales, tone systems, rhythmic temporal structures, and musical techniques–all of which of course, a tool like voyant cannot capture. I know there are speech recognition software that are widely used for transcription, but that’s not what I’m interested in. I’m thinking of tool that do analysis of speech as speech/sound. I’m curious to know what my colleagues who did speech analysis thought of this.

Text Mining, Diversity Mission Statements Across Several Colleges

So this is my first time using Voyant, and I’m pleasantly surprised by how intuitive and easily I was able to make use of some it’s cool features. For my assignment, I wanted to reflect on the current and former academic institutions I’ve had experience in both professionally and academically.

Throughout my academic journey, I’ve noticed the term “Diversity” greatly varies depending on the needs and values each respective institution embodies. Based upon our individual perspectives, the term diversity can be quite broad, making it’s application in college settings more difficult to track. Going into this project some guiding questions I pondered were: Would there be differences in the marketed (term) values of diversity between private and public colleges? How would I narrow down the list of colleges? What were my own definitions of diversity, and what aspects are most valued in my own perceived ideal of what diversity means in a college setting?

To begin I narrowed down my college list to six institutions:                                                              (1) Cuyahoga Community College                                                                                                             (2) Smith College                                                                                                                                         (3) LaGuardia Community College                                                                                                            (4) Amherst College                                                                                                                                      (5) CUNY Graduate Center                                                                                                                          6) Hunter College

After attaching the urls to all “Diversity Mission Statements” of each college into Voyant, the first image of “trends” appeared:

To dive in deeper, I did a separate chart of the top 5 terms from each institution’s Diversity Statements, which excluded the glaring amount of times each institution self-referenced itself as seen by the large texts of “Smith” & “Amherst” in the image above. While the amount of times each college felt the need to reference itself was intriguing in itself, I wanted to focus on terminology outside of the name to which followed below and chose to exclude it from the following charts. Aside from the college names, here are the Top 5 terms associated with the Diversity Statements of each college:

Reminder: (1) CuyahogaCC (2) Smith (3) LaGuardia CC (4) Amherst (5) CUNY GC (6) Hunter

As shown above, the top 5 trends are, Diversity, Student, Academic, College, and Community. Whilst the obvious mention of diversity did not surprise me, the remaining terms did. So in response, I next, wanted to compare these top trends with the five terms (values) I thought were most important in identifying what diversity should mean in college setting:

Values: Inclusion, Equity, Disabilities, Gender, & Race

As demonstrated in the chart above, these 5 terms were only mentioned at less than half the amount of the top trending terms. Personally it was greatly disheartening to see that Race was mentioned the least out of the terms which I had selected in what I had perceived to be the most important aspect of diversity.

I also thought it was noteworthy to explain why I chose to select Inclusion & Equity as separate categories. Inclusion can be thought of being granted the permission, or in college terms, “acceptance” into an academic setting, while Equity is what it means to valued in a space without conforming to the standards and values of an institutional space. It was through this distinction one of the most compelling differences between private and public institutional values. According to the chart above, Inclusion was most mentioned in (4) Amherst & (2) Smith, and Equity termed most at (1) Cuyahoga CC & (2) Smith. While the Equity overlaps at Smith College, Inclusion is remarkably mentioned at a much higher rate, especially in comparison to the remaining public colleges which mark either of the terms as almost nonexistent.

Another important aspect of the data correlates circles back to Race. While still the most underrepresented term within this category, it was mentioned the most at: (3) LaGuardia CC, (5) CUNY GC, and (6) Hunter College, in comparison to (4) Amherst, and (2) Smith college which were almost invisible on the chart.

I could’ve delved much further into this project, but felt it could easily become overwhelming through Voyant to distinguish the demographics of each college respectively, and compare how that might reflect the terms prioritized in each Diversity Statement, but this is still an intriguing indicator to how different colleges determine what terminology best encompasses their missions of diversity. In many ways marketing diversity is a huge advertisement which entices students of all walks of life into an expected experience to be had, versus a declaration of Equality within spaces in higher education. Affirmative action, and other policies created in an attempt to equalize higher education can be easily lost in the growing definition of what it means to be a diverse space. I appreciate playing around with Voyant as a sort-of “reality check” into how diversity is constantly manipulated in ways which can result in it’s impact and original meaning being lost in an ever-growing perception of #colorblindness in our nation.

Lastly, (if you’re still reading at this point) I thought it would be a nice bonus to include this nice chart Voyant suggested for me:

As titled, these distinctive words are categorized as trends outside of the corpus that were individually mentioned the most. What can we continue to further draw from these valued terms in each college mission? And how do our own perceptions affect how we decide to gather data from these types of mining software?

https://voyant-tools.org/?corpus=31ea7d1a0f48ff91307440084261a51a&panels=cirrus,reader,documentterms,summary,contexts

 

 

From allegation to cloture: text mining US Senators’ formal statements on Kavanaugh

# overview

For this project I examined Senators’ formal public statements on the Kavanaugh nomination in the wake of Dr. Christine Blasey Ford’s allegation that he attempted to rape her as a teenager. I edited this out initially, but including now that this is an attempt to do something productive with how sick I feel at how hostile American culture remains toward women, our sexuality, and our safety.

 

## process

I built my corpus for analysis by visiting every single one of the 99* official (and incredibly banal) US Senator websites and searching the term “Kavanaugh” using the search function on each site. I reviewed the first 20 search results** on each website and harvested the first result(s) (up to three) which met my criteria. My criteria were that they be direct, formal press released statements about Kavanaugh issued on or after September 15, 2018 up until the time of my data collection, which took from 5pm-10pm EST on October 5th, 2018. Some Senators had few or no formal statements in that period. I did not include in my results any speeches, video, news articles or shows, or op-eds. I only included formal statements, including officially-issued press released comments. For instances in which statements included quoted text and text outside of the quote area, I included only the quote area.

I have publicly posted all of my data and results.

My working list of Senators and their official websites is from an XML file I downloaded from the United States Senate website.

I opened the XML file in Excel and removed information not relevant to my text mining project, such as each Senate member’s office address. I kept each member’s last name, first name, state represented, party affiliation, and official webpage URL. This is my master list, posted to Google Sheets here.

I created a second sheet for the statements. It contains the Senators’ last name along with date, title and content of the statement. I did a search for quote marks and effectively removed most or all of them. This statement content data is available in a Google Sheet here.

I joined the two sheets in Tableau (outer join to accomodate future work I may do with this), and used Tableau’s filtering capabilities to get plain text files separating out the Democrat statements, Republican statements, and Independent statements, along with a fourth file which is a consolidation of all statements. The plan was to perform topic modeling on each and compare.

 

### in the mangle

Mallet wasn’t too hard to install following these instructions. I input (inputted?) my consolidated Democrat, Republican, and Independant statements and had it output a joined mallet file with stopwords removed. Then I ran the train-topics command, and here I really don’t know what I was doing other than closely following the instructions. It worked? It made the 3 files it was supposed to make – two text files and a compresed .gz file. I have no idea what to do with any of them. Honestly, this is over my head and the explanations on the Mallet site presuppose more familiarity than I have with topic modeling. Here is a link to the inputs I fed Mallet and what it gave back to me.

 

#### discussion

At this point I’m frustrated with Mallet and my ignorance thereof (and, in the spirit of showing obstacles along the way, I’m cranky from operating without full use of my right arm which was injured a few days ago). I’d like to know more about topic modeling, but I’d like the learning process to be at least somewhat guided by an actual in-person person who knows what they’re doing. The readings this week are not adequate as sole preparation or context for trying to execute topic modeling or text mining, and my supplemental research didn’t make a significant difference.

I like my topic and corpus. Something I found interesting when I was collecting my data is that not all Senators issued formal press release statements on Kavenaugh during the period I examined. I was suprised by some who didn’t. Kamala Harris, Elizabeth Warren and Kirsten Gillibrand issued no formal statements referencing Kavanaugh between September 15th and the date of writing (October 5th), whereas Lindsay Graham issued four. This is not to say the former Senators were silent on the topic. Just that they did not choose to issue formal statements. Somewhat alarmingly, searching for “Kavanaugh” on Chuck Schumer’s site returned no results at all. Thinking this was in error, I manually reviewed his press release section going back to September 15th. Indeed, though Schumer issued very many press releases during that period, Kavanaugh was not mentioned a single time in the title of any.

And here’s where I need collaborators, perhaps a political scientist and/or public relations expert who could contextualize the role that formal statements play in politics and why different Senators make different choices about issuing them.

There were other interesting findings as well. The search functions on the websites visited were all over the yard. Many had terrible indexing, returning the same result over and over in the list. Cory Booker’s website returned 2,080 results for “Kavanaugh”. Dianne Feinstein’s site returned 6. The majority of Senators did engage with the Kavanaugh nomination through the vehicle of formal statements. Only ten Senators’ websites either lacked a search function entirely or the search returned zero results for Kavanaugh.

I will likely run the data I gathered through Voyant or perform a different analysis tomorrow. If so, I will update this post accordingly.

 

##### update 10/7

I wonder if I should be feeding Mallet the statements individually, rather than in consolidated text files grouped by party affiliation. I also realized I wanted to have these individually, rather than as cells in a CSV, so that I can feed into Voyant and see the comparisons between statements. I don’t know how to write macros in Excel, but this seemed like a great application for a Python script. I’ve been trying to learn Python so decided to write a script that would import a CSV and export parts of the individual records as individual text files.

I wrote some Python code and got it working (with an assist from Reddit when an extraneous variable was tripping me up, and suggestions on how I could improve a future iteration from Patrick Smyth). I’ve posted the individual statements in a shared folder here. The filenaming convention is as follows. Filenames start with “D”, “R”, or “I” to indicate which party the senator belongs to (Democrat/Republican/Independent), followed by the Senator’s surname, and a number that kept multiple statementss from the same senator from overwriting each other.

I plan to try analyzing these individual statements tomorrow.

 

###### update 10/8

I took the statements I broke out in Python and ran them through Voyant. I ran the 56 statements from Democrats separately from the 42 statements from Republicans. I did not analyze the 4 statements from Independents, 3 of which were from Bernie Sanders.

Voyant seems to be a bit buggy. I added “Kavanaugh,” and “judge” to Voyant’s default stopword list, as “Judge Kavanaugh” appeared in every single result, but it took a couple of tries and ultimately only worked on the Cirrus tool. Voyant refused to acknowledge my stopword list on the other tools. I’d also attempted to supress “Kavanaugh’s”, but Voyant kept showing it, including on the Cirrus tool, despite my adding it to the stopwords list. “Fire” is on the default stoplist, and I think it shouldn’t be. Voyant also would not honor font changes, though there was a dropdown menu to do so.

Both groups showed great variability in length. Democrats’ statements ranged from 24 to 612 words. Republicans’ statements ranged from 48 to 887 words.

The Collocates tool was interesting but mysterious. There was a little slidey bar at the bottom that changed the results, but there were no labels or other support to interpret why that was happening or what was being measured. I made sure to keep both my Democrat and Republican analyses at “5” so at least I had consistency. I searched for more information on the tool in the documentation, but the Collocates tool isn’t even listed.

Republicans often linked Dr. Ford’s name with verbs such as heard, said, appear, provide, and named. Democrats used more descriptors, such as credible, courage, and bravely.

Collocator graphs from Voyant Tools

It was fun watching the Mandalas tool build, showing relationships between the documents in the corpus and the top 10 terms used. The Democrat mandala (shown first) built off the words “court”, “ford”, “dr”, “senate”, “investigation”, “supreme”, “fbi”, “allegations”, “sexual”, and “assault”. The Republican mandala (shown second) built of their top 10 words which were “dr”, “committee”, “senate”, “process”, “court”, “ford”, “fbi”, “supreme”, “evidence”, and “judiciary”. The Democrats’ statements called attention to the specific nature of the allegations, while the Republicans’ statements focused on the legal process.

Voyant Tools text analysis mandala visualization

Voyant tools text analysis mandala visualization

Another fun but under-documented tool is called the StreamGraph. This seems more about visual interest than effectively communicating information, as the areas of the different segments are quite hard to compare. Again, the Democrats’ statements visualization is shown first, followed by the Republican. The Democrats highlight “investigation”, whereas the Republicans highlight “process.”

Voyant text mining Stream graph

Voyant text analysis Stream graph

####### text mining tools review

In closing, here are my reviews of the text mining tools I used.

Voyant: buggy, unreliable, good fun but about as rigorous as a party game
Mallet: a machine may have learned something, but I didn’t

 

NOTES

*Jon Kyl, John McCain’s replacement, does not yet have an official Senate website of his own. A quick Google search revealed no official press release statements in the first 20 results.

**Bob Corker, Cindy Hyde-Smith, and John Kennedy did not have a search function on their sites. The search function on Rand Paul’s site was not functioning. Each has a news or media section of their site, and that is where I looked for press releases. Chuck Schumer and Tina Smith’s sites’ search functions returned zero results for “Kavanaugh”. I reviewed titles of all press releases on their sites since September 15th and found no reference to Kavanaugh.

TEXT MINING — OBITS + SONGS + ODES

My process with the Praxis 1 Text Mining Assignment began with a seed that was planted during the self-Googling audits we did in the first weeks of class, because I found an obituary for a woman of my same name (sans middle name of initial).

From this, my thoughts went to the exquisite obituaries that were written by The New York Times after 9-11 which were published as a beautiful book titled Portraits. One of my dearest friends has a wonderful father who was engaged to a woman who perished that most fateful of New York Tuesdays. My first Voyant text mining text, therefore, was of his fiancee’s NYT obituary. And the last text I mined for this project was the obituary for the great soprano Monserrat Caballe, when I heard the news of her passing as I was drafting this post.

The word REVEAL that appears above the Voyant text box is an understatement. When the words appeared as visuals, I felt like I was learning something about her and them as a couple that I would never have been able to grasp by just reading her obituary. Indeed, I had read it many times prior. Was it the revelation of some extraordinary kind of subtext? Is this what “close reading” is or should be? The experience hit me in an unexpected way between the eyes as I looked at the screen and in the gut.

My process then shifted immediately to song lyrics because, as a singer myself who moonlights as a voice teacher and vocal coach, I’m always reviewing, teaching and learning lyrics. I saw the potential value of using Voyant in this way in high relief. I got really juiced by the prospect of all the subtexts and feeling tones that would be revealed to actors/singers via Voyant. When I started entering lyrics, this was confirmed a thousand fold on the screen. So, completely unexpectedly, I now have an awesome new tool in my music skill set. The most amazing thing about this is that I will be participating in “Performing Knowledge” an all-day theatrical offering at The Segal Center on Dec. 10 for which I submitted the following proposal that was accepted by the Theater Dept.:

“Muscle Memory: How the Body +  Voice Em”body” Songs, Poems, Arias, Odes, Monologues & Chants — Learning vocal/spoken word content, performing it, and recording it with audio technology is an intensely physical/psychological/organic process that taps into and connects with a performer’s individually unique “muscle memory”, leading to the creation of vocal/sound art with the body + voice as the vehicle of such audio content. This proposed idea seeks to analyze “songs” as “maps” in the Digital Humanities context. Participants are highly encouraged to bring a song, poem, monologue, etc. with lyric/text sheet to “map out”. The take-away will be a “working map” that employs muscle memory toward learning, memorizing, auditioning, recording and performing any  vocal/spoken word content. –Conceived, written and submitted by Carolyn A. McDonough, Monday, Sept. 17, 2018.” [I’m excited to add that during the first creative meeting toward this all-day production, I connected my proposed idea to readings of Donna Haraway and Kathering Hayles from ITP Core 1]

What better way to celebrate this, than to “voyant” song/lyric content and today’s “sad news day” obituary of a great operatic soprano. Rather than describe these Voyant Reveals through writing further, I was SO struck by the visuals generated on my screen that I wanted to show and share these as the findings of my research.

My first choice was “What I Did For Love” from A Chorus Line (on a sidenote, I’ve seen the actual legal pad that lyricist Edward Kleban wrote the score on at the NYPL Lincoln Center performing arts branch, and I thought I had a photo, but alas I do not as I really wanted to include it to show the evolution from handwritten word/text to Voyant text analysis.)

I was screaming as the results JUMPED out of the screen at me of the keyword “GONE” that is indeed the KEY to the emotional subtext an actor/singer needs to convey within this song in an audition or performance which I KNOW from having heard, studied, taught, and seen this song performed MANY times. And it’s only sung ONCE! How does Voyant achieve this super-wordle superpower?

I then chose “Nothing” also from A Chorus Line as both of these songs are sung by my favorite character, Diana Morales, aka Morales.

Can you hear the screams of discovery?!

Next was today’s obit for a great soprano which made me sad to hear on WQXR this morning because I once attended one of her rehearsals at Lincoln Center:

A complex REVEAL of a complex human being and vocal artist by profession.

AMAZING. Such visuals of texts, especially texts I know “by heart” are extremely powerful.

Lastly, over the long weekend, I’m going to “Voyant” this blog post itself, so that its layers of meaning can be revealed to me even further. –CAM