Category Archives: Uncategorized

A Genealogy of Distant Reading and my own spin…

Patrick Grady O’Malley

 

This article offers fascinating input that clarifies the history of distant reading, the possibilities for its future, and diplomatic terms for engaging more closely with the Digital Humanities as a separate, but perhaps mergeable field. While it may sound at first as if DH and distant reading automatically work together, there are differences. “Digital humanists don’t necessarily share distant readers’ admiration for social science. On the contrary, they are often concerned to defend a boundary between quantitative social science and humane reflection” whereas “distant reading, on the other hand, is not primarily concerned with technology at all: it centers on a social-scientific approach to the literary past.” What’s more is that this article tells us that distant reading has been going on since the latter part of the 20thcentury (maybe even before) and that computers have only recently begun to emerge as something usable in literary history.

 

What I would be interested in learning more of, or working more closely amongst, is how distant reading could be used to identify the underlying literary theory in a text. As an emerging/wannabe theorist, I see great value in using computational methods to look beyond the plot and themes, and into the guiding principles that define a particular work. Would it be possible for computers to read a text and determine whether or not something is post-colonial in nature? How could we as scholars benefit from machine learning that recognizes feminism and gender studies and could then refer a reader to relevant information regarding that very historical theory? If a text’s underlying theory is ambiguous, would it be possible for a machine to detect what it most likely includes to guide close readings more carefully? Which could then dictate newer and more relevant questions for distant reading?

 

“Critics of digital humanities often assume that computer science ought to remain merely instrumental for humanists; it should never “challenge” our “fundamental standards or procedures.”“ While this statement is not Underwood’s position in the article, I want to show my support for the author in saying how wrong this strikes me as. As computer science evolves and changes rapidly, how could us as humanists not challenge our work based on computational methods becoming richer? I think the problem is, like we discussed in our last class meeting, that academia is so afraid of humanities evolving themselves, simply because that puts those critics at risk for becoming irrelevant. Computers don’t necessarily change the humanities, the humanities aren’t going anywhere. But this article was very motivating for me in thinking of learning to engage with literature in new and interesting ways. Work has already been done exploring genre with computation, I wonder how long it will take for my theoretical aspirations to be realized? Certainly, and if nothing else, this gives me a goal to work toward to guide future research.

 

Overall, it was extremely interesting to learn how DH and distant reading relates and differs and how distant reading has a vivid and diverse history all of its own. This certainly attests for the creativity of the researchers who engaged in these projects before the advent of digital tools and gives us as Digital Humanists a great jumping off point to understand the theory of our own research more clearly.

Text mining the ASA Dissertation Prize

The mangle came out in full force as I was deciding which texts to use for this assignment. My first thought was to explore the language of DH research centers around the world, aiming to identify similarities or differences in their mission statements. This idea proved to be too ambitious (and a little too meta for me today), and in my efforts to scale it back I ran into the problem of picking a few centers to compare and, by extension, determining which centers were “representative” of DH. Even on a small scale with low stakes, that task felt a little loaded, so I switched directions.

I then uploaded a corpus of 70 texts from an undergraduate sociology seminar on cultural dimensions of violence, covering a range of historical and contemporary examples from a variety of disciplines, but ran into the problem(s) of formatting. The PDF format clouded Voyant’s reading process, as the language on the cover page of each document (“use,” “published,” “rights”) registered as the most frequently used words. As much as I wanted to work with a larger corpus, I could not figure out how to only upload the text of articles or a page range from a PDF — not to mention that multiple articles registered as having 0 words in them — so I decided to take a different track and pick texts that I could streamline more easily.

With the writing process, publishing, and peer review on my mind from last class, I decided to think about the early stages of the prestige/reputation economy and went to the American Sociological Association website. Of the ASA’s annual awards, including teaching and career accomplishment awards as well as more outward-facing awards for public sociology and social reporting, I was drawn to the Dissertation Award as a way to explore the language of peer review and evaluation. The submissions instructions prove to be fairly detailed, but the selection criteria, not so much:

“The ASA Dissertation Award honors the ASA members’ best PhD dissertation from among those submitted by advisers and mentors in the discipline. Dissertations from PhD recipients with degrees awarded in the current year, will be eligible for consideration for the following year’s award. (e.g. PhD recipients with degrees awarded in the 2018 calendar year will be eligible for consideration for the 2019 ASA Dissertation Award.)”

To get more information about this particular prize, I pulled the press releases for each award decision from 2008-2018 to give me information about the 13 dissertations (counting two years with joint winners and not counting years that included an honorable mention), and put the body of each release into a separate PDF. With much cleaner documents to run through Voyant, I uploaded all 13 at the same time and started to dig in. I was curious to see if the language of these press releases betrayed some logic or reasoning behind the language of the selection criteria. I hypothesized that each press release would include some mention of timeliness and/or novelty (why this particular research mattered to the field at the time and/or why the method contributed something to a subfield or entire field) and that the results in Voyant would show this language accordingly.

Instead, I was struck by the most frequently used words (besides “dissertation”): global (31 times total in the 13 documents); political (25); cultural (24); and social (24). None of these words were used in every text; “global” was concentrated heavily in Kimberly Kay Hoang’s “New Economies of Sex and Intimacy in Vietnam” (which won the award in 2012) and Larissa Buchholz’s “The Global Rules of Art” (2013). Of the most frequently used words, though, “political” appeared in 12 out of 13, only absent in the write-up of Alice Goffman’s “On the Run” (2011), and “social” was only absent in Christopher Michael Muller’s “Historical Origins of Racial Inequality in Incarceration in the United States” (2015). In both cases of absence, I found that these absent words could have applied to the project at hand (i.e., the announcement of “On the Run” could have mentioned the political components to Goffman’s methods and “Historical Origins of Racial Inequality in Incarceration in the United States” touches on undeniably social dimensions), which made me think about who was writing these releases in the first place and choosing to focus on which dimensions of the projects: intensive fieldwork in one case, novel methods in another.

On this note, the releases for 2017 and 2018 projects (Karida Brown’s “Before they were Diamonds: The Intergenerational Migration of Kentucky’s Coal Camp Blacks” and Juliette Galonnier’s “Choosing Faith and Facing Race: Converting to Islam in France and the United States,” respectively), plummeted in word count (105 and 149) compared to the average of 642 words from 2008-2016, which peaked with Goffman’s 789-word announcement in 2011 and had been decreasing since 2013. The vocabulary density of these announcements has also been on the rise, but not consistently, fluctuating between a high density of .771 in 2017 (Brown’s “Before they were Diamonds”) to a low of .451 in 2013 (Daniel Menchik’s “The Practices of Medicine”), and the average words per sentence have also been widely varied, from 85.5 words/sentence in 2009 (Claire Laurier Decoteau’s “The Bio-Politics of HIV/AIDS in Post-Apartheid South Africa”) to 20.9 words/sentence in 2015 (Muller’s “Historical Origins of Racial Inequality in Incarceration in the United States”). Although vocabulary density and average words/sentence tell their own stories, the most striking difference in my eyes has been document length. The sudden change from ~400 to <150 words makes me think that the winners of the Dissertation Award used to write their own announcement, but there was a shift between 2016 and 2017 that moved the announcements much closer to a dense, factual press release format with little embellishment and no outside quotations from supervisors or mentors.

I was also interested to discover through Voyant that these announcements generally do not make a big deal out of each dissertation’s timeliness or novelty; “timely” appeared in three of the 13 announcements, “new” (in context) in two, “groundbreaking” or “breaking new ground” in two, and “ambitious” in three, not the kind of language I predicted. Instead the announcements often mentioned other aspects of quality, from Decoteau’s “masterful” research to Buchholz’s “theoretically and methodologically sophisticated analysis,” and — in seven of the 13 documents — a “contribution” to the field without pointing back to novelty or newness specifically. In a certain way, this lack of specific language about timeliness or novelty and a focus on overall quality creates an in-group feeling. The reader learns about the content of each project — what each scholar studied and how they approached it — and is left to read the rest for themselves. 

In hindsight, I think I was hoping that the write-ups for each dissertation award would give a bit more insight into the selection and review process for this particular prize. However, the selection process for many prizes — from receiving a named award from a scholarly organization to a securing a spot at a music festival to being signed to a particular modeling agency — is often deliberately vague and, even after the fact, information can be limited about how a committee arrived at a decision. This example suggests that, for now, certain academic prizes are no exception.

Data for Mapping workshop notes

This past Tuesday I attended a Digital Fellows workshop called Data for Mapping: Tips and Strategies. The workshop was presented by Digital Fellows Javier Otero Peña and Olivia Ildefonso. Highlights of this workshop were learning how to access US Census data and seeing a demo of mapping software called Carto.

Javier started the workshop encouraging us to interject with any questions we had at any time. The group maybe too enthusiastically took him up on this, and he had to walk it back in the interests of time after we spent 20+ minutes on a single slide. After that, the workshop moved along at a nice, steady clip.

There was a technical challenge, which I see as an unexpected boon. Carto changed their access permissions within the few days before the workshop, and nobody except the Digital Fellows could access it. The Digital Fellows had an existing account, so they were still able to demo for us how to use Carto. 

I think it’s for the best that we weren’t able to access Carto and set up accounts. Many workshops, including a Zotero one I went to a couple of weeks ago, bleed pretty much all their allotted time on getting software set up on each of the 10-20 attendees’ varied personal laptops. I find this incredibly painful to sit through. But in this workshop we established early on that we wouldn’t be able to individually install Carto, and so we were able to cover many more specifics on how to actually use Carto. Users who need installation help can always go to Digital Fellows office hours on their own.

Javier and Olivia shared their presentation deck with us. It is a thorough walkthrough of the steps needed to get Census data on the median age by state, and map that data in Carto. One note: in the upfront where it says the contents are for QGIS, replace that in your head with Carto. It is all about Carto. The QGIS references are accidentally in there from an older version.

I did some digging after the workshop on how to register to use Carto. Student access for Carto now requires a student developer GitHub account (which also includes free versions of other fun looking tools). GitHub says it can take from 1 hour – 5 days after applying on their site for your student developer account to be approved. I applied to have my regular GitHub account classified as a student developer account 5 hours ago using a photo of my GC ID card and haven’t heard anything yet, so I guess this really is going through some sort of vetting process. Maybe using a GC email address for verification would be faster.

This workshop was a good time, not least because Javier was extremely funny and Olivia was super helpful coming around to us to address individual questions. Five out of five stars. Would workshop again.

Text Mining Game Comments (Probably Too Many at Once!)

To tell the truth, I’ve been playing with Voyant a lot, trying to figure out what the most interesting thing is that I could do with it! Tenen could critique my analysis on the grounds that it’s definitely doing some things I don’t fully understand; Underwood would probably quibble with my construction of a corpus and my method of selecting words to consider.  Multiple authors could very reasonably take issue with the lack of political engagement in my choice. However, if the purpose here is to get my feet wet, I think it’s a good idea to start with a very familiar subject matter, and in my case, that means board games.

Risk Legacy was published in 2011. This game reimagined the classic Risk as a series of scenarios, played by the same group, in which players would make changes to the board between (or during!) scenarios. Several years later,* the popularity and prevalence of legacy-style, campaign-style, and scenario-based board games has skyrocketed.  Two such games, Gloomhaven and Pandemic Legacy, are the top two games on BoardGameGeek as of this writing.

I was interested in learning more about the reception of this type of game in the board gaming community. The most obvious source for such information is BoardGameGeek (BGG).  I could have looked at detailed reviews, but since I preferred to look at reactions from a broader section of the community, I chose to look at the comments for each game.  BGG allows users to rate games and comment on them, and since all the games I had in mind were quite popular, there was ample data for each.  Additionally, BGG has an API that made extracting this data relatively easy.**

As I was only able to download the most recent 100 comments for each game, this is where I started.  I listed all the games of this style that I could think of, created a file for each set of comments, and loaded them into Voyant. Note that I personally have only played five of these nine games. The games in question are:

  • The 7th Continent, a cooperative exploration game
  • Charterstone, a worker-placement strategy game
  • Gloomhaven, a cooperative dungeon crawl
  • Star Wars: Imperial Assault, a game based on the second edition of the older dungeon crawl, Descent, but with a Star Wars theme. It’s cooperative, but with the equivalent of a dungeon master.
  • Near and Far, a strategy game with “adventures” which involve reading paragraphs from a book. This is a sequel to Above and Below, an earlier, simpler game by the same designer
  • Pandemic Legacy Season One, a legacy-style adaptation of the popular cooperative game, Pandemic
  • Pandemic Legacy Season Two, a sequel to Pandemic Legacy Season One
  • Risk Legacy, described above
  • Seafall, a competitive nautical-themed game with an exploration element

The 7th Continent is a slightly controversial inclusion to this list; I have it here because it is often discussed with the others. I excluded Descent because it isn’t often considered as part of this genealogy (although perhaps it should be). Both these decisions felt a little arbitrary; I can certainly understand why building a corpus is such an important and difficult part of the text-mining process!

These comments included 4,535 unique word forms, with the length of each document varying from 4,059 words (Risk Legacy) to 2,615 (7th Continent).  Voyant found the most frequent words across this corpus, but also the most distinctive words for each game. The most frequent words weren’t very interesting: game, play, games, like, campaign.*** Most of these words would probably be the most frequent for any set of game comments I loaded into Voyant! However, I noticed some interesting patterns among the distinctive words. These included:

Game Jargon referring to scenarios. That includes: “curse” for The 7th Continent (7 instances), “month” for Pandemic Legacy (15 instances), and “skirmish” for Imperial Assault (15 instances). “Prologue” was mentioned 8 times for Pandemic Legacy Season 2, in reference to the practice scenario included in the game.

References to related games or other editions. “Legacy” was mentioned 15 times for Charterstone, although it is not officially a legacy game. “Descent” was mentioned 15 times for Imperial Assault, which is based on Descent. “Below” was mentioned 19 times for Near and Far, which is a sequel to the game Above and Below. “Above” was also mentioned much more often for Near and Far than for other games; I’m not sure why it didn’t show up among the distinctive words.

References to game mechanics or game genres. Charterstone, a worker placement game, had 20 mentions of “worker” and 17 of “placement.” The word “worker” was also used 9 times for Near and Far, which also has a worker placement element; “threats” (another mechanic in the game) were mentioned 8 times. For Gloomhaven, a dungeon crawl, the word “dungeon” turned up 20 times.  Risk Legacy had four mentions of “packets” in which the new materials were kept. The comments about Seafall included 6 references to “vp” (victory points).  Near and Far and Charterstone also use victory points, but for some reason they were mentioned far less often in reference to those games.

The means by which the game was published. Kickstarter, a crowdfunding website, is very frequently used to publish board games these days. In this group, The 7th Continent, Gloomhaven, and Near and Far were all published via Kickstarter. Curiously, both the name “Kickstarter” and the abbreviation “KS” appeared with much higher frequency in the comments on the 7th Continent and Near and Far than in the comments for Gloomhaven. 7th Continent players were also much more likely to use the abbreviation than to type out the full word; I have no idea why this might be.

Thus, it appears that most of the words that stand out statistically (in this automated analysis) in the comments refer to facts about the game, rather than directly expressing an opinion. The exception to this rule was Seafall, which is by far the lowest-ranked of these games and which received some strongly negative reviews when it was first published. The distinctive words for Seafall included two very ominous ones: “willing” and “faq” (each used five times).

In any case, I suspected I could find more interesting information outside the selected terms. Here, again, Underwood worries me; if I select terms out of my own head, I risk biasing my results. However, I decided to risk it, because I wanted to see what aspects of the campaign game experience commenters found important or at least noteworthy. If I had more time to work on this, it would be a good idea to read through some reviews for good words describing various aspects of this style of game, or perhaps go back to a podcast where this was discussed, and see how the terms used there were (or weren’t) reflected in the comments. Without taking this step, I’m likely to miss things; for instance, the fact that the word “runaway” (as in, runaway leader) constitutes 0.0008 of the words used to describe Seafall, and is never used in the comments of any of the other games except Charterstone, where it appears at a much lower rate.**** As it is, however, I took the unscientific step of searching for the words that I thought seemed likely to matter. My results were interesting:

(Please note that, because of how I named the files, Pandemic Legacy Season Two is the first of the two Pandemics listed!)

It’s very striking to me how different each of these bars looks. Some characteristics are hugely important to some of the games but not at all mentioned in the others! “Story*” (including both story and storytelling) is mentioned unsurprisingly often when discussing Near and Far; one important part of that game involves reading story paragraphs from a book. It’s interesting, though, that story features so much more heavily in the first season of Pandemic Legacy than the second. Of course, the mere mention of a story doesn’t mean that the story of a game met with approval; most of the comments on Pandemic Legacy’s story are positive, while the comments on Charterstone’s are a bit more mixed.

Gloomhaven comments are much more about characters than any of the other terms I used; one of the distinguishing characteristics of this game is the way that characters change over time. Many of the comments also mentioned that the characters do not conform to common dungeon crawl tropes. However, the fact that characters are mentioned in every game except for two suggests that characters are important to players of campaign-style games.

I also experimented with some of the words that appeared in the word cloud, but since this post is already quite long, I won’t detail everything I noticed! It was interesting, for instance, to note how the use of words like “experience” and “campaign” varied strongly among these games.  (For instance: “experience” turned out to be a strongly positive word in this corpus, and applied mainly to Pandemic Legacy.)

In any case, I had several takeaways from this experience:

  • Selecting an appropriate corpus is difficult. Familiarity with the subject matter was helpful, but someone less familiar may have selected a less biased corpus.
  • The more games I included, the more difficult this analysis became!
  • My knowledge of the subject area allowed me to more easily interpret the prevalence of certain words, particularly those that constituted some kind of game jargon.
  • Words often have a particularly positive or negative connotation throughout a corpus, though they may not have that connotation outside that corpus. (For instance: rulebook. If a comment brings up the rulebook of a game, it is never to compliment it.)
  • Even a simple tool like this includes some math that isn’t totally transparent to me. I can appreciate the general concept of “distinctive words,” but I don’t know exactly how they are calculated. (I’m reading through the help files now to figure it out!)

I also consolidated all the comments on each game into a single file, which was very convenient for this analysis, but prevented me from distinguishing among the commenters.  This could be important if, for example, all five instances of a word were by the same author.

*Note that there was a lag of several years due to the immense amount of playtesting and design work required for this type of game.

**Thanks to Olivia Ildefonso who helped me with this during Digital Fellows’ office hours!

***Note that “like” and “game” are both ambiguous terms. “Like” is used both to express approval and to compare one game to another. “Game” could refer to the overall game or to one session of it (e.g. “I didn’t enjoy my first game of this, but later I came to like it.”).

****To be fair, it is unlikely anyone would complain of a runaway leader in 7th Continent, Gloomhaven, Imperial Assault, or either of the Pandemics, as they are all cooperative games.

Text mining the Billboard Country Top 10

My apologies to anyone who read this before the evening of October 8. I set this to post automatically, but for the wrong date and without all that I wanted to include.

I’m a big fan of music but as I’ve gotten further away from my undergrad years, I’ve become less familiar with what is currently playing on the radio. Thanks to my brother’s children, I have some semblance of a grasp on certain musical genres, but I have absolutely no idea what’s happening in the world of country music (I did at one point, as I went to undergrad in Virginia).

I decided to use Voyant Tools to do a text analysis of the first 10 songs on the Billboard Country chart from the week of September 8, 2018. The joke about country music is that it’s about dogs, trucks, and your wife leaving you. When I was more familiar with country music, I found it to be more complex than this, but a lot could have changed since I last paid attention. Will a look at the country songs with the most sales/airplay during this week support these assumptions? For the sake of uniformity, I accepted the lyrics on Genius.com as being correct and removed all extraneous words from the lyrics (chorus, bridge, etc.).

The songs in the top 10 are as follows:

  1. Meant to Be – Bebe Rexha & Florida Georgia Line
  2. Tequila – Dan + Shay
  3. Simple – Florida Georgia Line
  4. Drowns the Whiskey – Jason Aldean featuring Miranda Lambert
  5. Sunrise, Sunburn, Sunset – Luke Bryan
  6. Life Changes – Thomas Rhett
  7. Heaven – Kane Brown
  8. Mercy – Brett Young
  9. Get Along – Kenny Chesney
  10. Hotel Key – Old Dominion

If you would like to view these lyrics for yourself, I’ve left the files in a google folder.

As we can see, the words “truck,” “dog,” “wife,” and “left” were not among the most frequently used, although it may not be entirely surprising that “ain’t” was.

The most frequently used word in the corpus, “it’s” appeared only 19 times, showing that there is a quite a bit of diversity in these lyrics. I looked for other patterns, such as whether vocabulary density or average words per sentence had an effect on the song’s position on the chart, but there was no correlation.

I think we should talk about this…

Given that we are in the Humanities, and this seems to be attacking some of the scholarship there.

I will say this, on one level, I agree with one point that Lindsay, Pluckrose, and Boghossian are making, and that is that there is a lot of crap scholarship out there.

I’ve read many pieces that were written in a deliberately confusing way.

For instance, I once read a (in my humble opinion) a poorly written piece on colonialism. When I brought up my issues with the article in class, I was told that I “fetishize clarity.”

This set off my crap detector.

I understand that complicated ideas require complex discussions, but I think there’s a difference between complex and convoluted. Convoluted arguments are built to confuse the reader.

Having said that, I don’t think that Lindsay, Pluckrose, and Boghossian really showed us all that much, and their whole concept of “grievance studies” is openly dismissive of the real struggles faced by women and minorities in society.

They way they describe gender studies, ethnic studies, LGBT studies, etc. as creating or encouraging a cult of victimization is wrongheaded to me. From what I have read in these studies, most of them are calling out the issues in our culture, and proposing change.  They aren’t just whining “”Woe is me!” as Lindsay, Pluckrose, and Boghossian imply.

 

Text Mining Using Voyant Tools – A Comparison of Barack Obama & Donald Trump’s Respective Inauguration Speeches

So a couple of years ago I was introduced to Voyant Tools, and it was then that I splashed around in it in order to simply get a feel for text mining. Today, however, I dove right into it. I figured a fun way to really get a diversified experience out of the software would be to text mine two separate things and compare them. So then raises the question, what would be a solid set of texts to compare? You guessed it, I went ahead and plugged in former President Barack Obama’s and current “President” Donald Trump’s inauguration speeches to see the difference between their directions as newly elected presidents. The results I came across were very interesting, yet not entirely surprising all at the same time.

To preface the impending conversation, we all know how Donald Trump’s campaigning for the 2016 election went versus Barack Obama’s campaigning for the 2008 election. So the speeches they gave at their respective inauguration ceremonies really echoed what they were preaching throughout their campaigns. To start off with some simple numbers, Trump’s speech ended up being a total of roughly 1,434 words, containing 542 unique word forms. Unique words forms are basically different words. So words such as “the” are only counted once. Meanwhile, Obama’s speech reached a staggering 2,439 words, 910 unique words forms! That is almost double the length of Trump’s speech. Even their average words per second were quite different from Obama’s average being 21.6 and Trump’s only being 16.5. This could lead to a lot of sociological theories on why these speeches needed to be so different, but we’ll get to that later.

These word counts were just the tip of the iceberg. Next, we’re going to look at the Cirrus feature in order to receive a visual of what words were emphasized more in each of their speeches (visual provided below). Right off the bat, we notice that Obama (left) had a much wider range of vocabulary, which clearly shows why his unique word count was so high. From this image, you can understand how Obama was really emphasizing the idea of a new America in his speech. He used phrases such as “new,” “common,” “world,” “generation,” “peace,” and “spirit.” As for Trump (right), he kept his speech rather simple and drove the very nationalistic nail in the ground. Notice he used phrases such as “America,” “American,” “country,” “wealth,” “power,” “allegiance,” “fight,” “action,” and “destiny.”

Obama went about his campaign by attempting to give volume to the voices who weren’t heard, and we can see that in his diverse choice of words and the direction his speech was going in. Meanwhile, Trump preached much hatred toward foreigners. He boasted about the nationalism of American citizens and how he would get them jobs. He claimed he’d grant them some power through himself as a vessel, and these points all come through in his points during his speech.

Next up, I decided to dig (or should I say “mine”) a little deeper into the context of these repeated words. This led me to the very convenient Knots tool. This tool took those repeated words and phrases and provided their context. So I got to see where and how these words were used and where they overlapped. For the sake of comparison, I looked specifically at their use of the word “America.” When it came to Obama’s speech (left), he always used the term America when discussing creating a newer, more ambitious, and more equal era for the United States. It was very closely associated with his other commonly used phrase, “new.” Trump, however, repeatedly used the term America when referring to reclaiming American power and greatness. More specifically how American’s will come first before immigrants, and how it will prosper because of it.

The last discovery I’ll share with you guys is quite a hilarious one. So Voyant Tools features a messaging software called Veliza, it’s where you can chat with a bot regarding your uploaded text. You can also click a from text button that pulls a sentence from your uploaded text and responds to it accordingly. Veliza is a sister program to the much wider known Eliza, which was an AI software developed by the MIT Artificial Intelligence Laboratory back in the 1960’s. It basically would simulate a conversation using basic text patterns in order to create the illusion that it is communicating with you. However, the conversations are obviously very shallow since the software isn’t a sentient being. So, I went ahead and played with Veliza using lines from the speeches and well, it was really funny! Why? Well because the sentences pulled from Trump’s speech (right) were simplistic enough that Veliza was able to simulate a realistic conversation from what she was given! Meanwhile, Obama’s speech (middle) was too complex for Veliza to formulate a realistic response.

Overall, I had a very intriguing and pleasant experience text mining with Voyant Tools! It was incredibly user-friendly and I would recommend it to any aspiring digital humanists out there! Also, text mining as a whole is super fun so I would also suggest taking random works you like and plugging them in. There are loads of discoveries to be made out there. Physical texts are just what’s on the surface, using tools like this really resonates with the heart of DH. You just have to dive in!

Do the Humanities have Impostor Syndrome?

Kathleen Fitzpatrick opens her book ‘Planned Obsolescence’ exposing anxiety over the supposed decline of reading as anxiety over a perceived threat to the form of the book, which is a status-laden cultural token. From our previous readings, I gather the humanities community had a similar defensive threat response to the waxing digital humanities. The digital humanities have not obviated the existence of traditional humanities so much as cut into their degree of cultural power. Yet casting the threat to the humanities as critical created a protective reflex to maintain the status quo through the implied threat of extinction.

What do the traditional humanities have to be so anxious about? From the first section of this book, it sounds like plenty.

The humanities are anxious at the increasingly common requirements from administrations and funding sources to show empirical processes and/or quantified results (p47).

Humanities scholars have also failed to apply the very critical thinking they claim as their vital contribution to society to their own academic practices and norms, as evidenced by the long-time lack of challenge to traditional peer review (p10).

Why might humanities scholars have avoided interrogating the peer review process that had such heavy influence on their professional lives? Fitzpatrick proposes five reasons.

  1. philosophical slipperiness around what “truth” is
  2. insufficient empirical skills to prove the disutility of traditional peer review
  3. resistance to self-analysis due to anxiety about self-exposure
  4. resistance to changing traditions
  5. fear of loss of power and prestige

Fitzpatrick also tells us that traditional humanities work, more than other fields, happens in isolation. Collaboration is relatively uncommon. Humanities scholars are in general therefore unused to navigating the sharing of credit or open sourcing of work, and may have anxieties around the implications of doing so to their academic identities.

Taken together, this reads to me like the traditional humanities may have a case of impostor syndrome. Impostor syndrome, per Wikipedia (deliberately chosen as a source, as it was in the reading), is “a psychological pattern in which an individual doubts their accomplishments and has a persistent internalized fear of being exposed as a “fraud”.”

Let the record show I do not think the humanities are fraudulent. Only that they seem to be behaving in an overly fearful and slightly neurotic way. How to alleviate this counterproductive mindset?

Fitzpatrick exhorts the reader to embrace an elevation of the community product and good over that of any individual. The traditional humanities may be happy to realize that working collaboratively balances the risks to any one individual. With the input and support of a community, loner scholars no longer need to fear being caught out. New methods can be seen as new opportunities for collaboration. The digital humanities cease to be a threat and become a new playing field.

Workshop: Python

Last Tuesday, I, and a few others from our class, attended the Python Workshop offered by the GCDI fellows.  

Clearly, the purpose of the workshop was to introduce people with little programming background to some basic principles of programming and to a few foundations for the syntax of Python.  First off, I think it is next to impossible for a 2 hour programming workshop to do more than help participants clear initial hurdles (or inertia) necessary to start the long, hopefully exciting and empowering, journey into learning how to program.  Secondly, the specific hurdles you need to clear really depend on where you are in the learning process. Many of the participants, for example, had never programmed before, but some in that group came in with a clearer idea of what coding is than others — having absorbed mental models of the abstract spaces in which it works.  So, it’s not an easy workshop to design.

That said, I think Rachel (the lead instructor) and Pablo (the support instructor) did a really good job of getting us going with the language.  What I would have liked to have seen was a little bit more on where we should go next to make Python a useful tool for us, and to give some of us an idea of the kind of investment needed to do so.  Rachel did mention something that seems hugely important to approaching learning Python: instead of trying to learn the language in some sort of systematic and holistic way from the outset, start with a problem you are trying to solve, a thing you want to do, and learn how to do that thing.  You’ll have stakes, then, that will motivate you to push on when you run into inevitable impediments. You’ll also pick up a lot of the surrounding programming, API and implementation principles in a more grounded and transportable way.

Okay, so what did we cover?

1.

Initially, why program in the first place?

  • Learning programming helps you understand computing better in general
  • It thus makes you a smarter computer user
  • Importantly, it develops problem solving skills, systems thinking skills, and gives you experience in reducing complex problems into simpler components
  • And, hey, if you get good at it, it’s extremely marketable

2.

Why use Python in particular?

  • It’s not the hardest language to start with
    • It’s interpreted rather than compiled, so you can very quickly and easily see the results of what you are coding
  • Lots of online resources for learning (famous for its great documentation)
  • Many open-source libraries for Python, which means lots of tools you can use to build programs
  • Quickly becoming industry standard for certain fields (particularly machine learning and text analysis)*

3.

As far as the language goes, we covered:

  • Data types (e.g. integers, floats, strings)
  • Assignation of data values to variables
  • How Python stores and manipulates those variable values in memory
  • Defining and calling functions
  • The “list” data structure (called “array” in other languages)
  • And the use of “Definite Loops” (loops that iterate through a list a fixed number of times)

We didn’t get to what Rachel called “Decision Structures” due to time — (decision structures manifesting as if/else if/else constructions that evaluate inputs and run different code based on the value of said inputs).  

All of this stuff, including the decision structures lesson, you can see up on Rachel’s GitHub page for the Workshop here: https://github.com/rachelrakov/df_code/blob/master/Intro_to_Python_Programming_py3.6.ipynb

One of my favorite parts of the workshop, however, was being introduced to Jupyter Notebook (http://jupyter.org/) which Rachel used as the presentation mechanism.  You can see its output on the GitHub page. It seems like an amazing tool for teaching (particularly code), because you can include instructional text alongside code blocks that actually run in the notebook.  Pablo mentioned that Jupyter Notebook also works with an assortment of visualization packages. So, while I went in to get some Python information, I came out with a new pedagogical tool to explore!

Final thoughts:

As has been mentioned, I’ve done a lot of programming in the past, just not with Python.  If this is true of you as well, I would not recommend the workshop — you are not the intended audience.  However, if you want to get started programming in general, and/or with Python in particular, I think it’s great.  Not only will you get the initial nudge everyone needs, but you’ll meet some great Digital Fellows who can be resources for you in the future.  I recommend you ask them where to go next to start using Python productively in your work.

Edit: one final thing, don’t forget that Patrick Smyth, our fearless advisor, is highly proficient and experienced in using Python; he is a tremendous resource both for getting started and hacking on the code you’re working on.

*I pulled this section almost directly from the GitHub page

Peer review + power dynamics in Planned Obsolescence

Keeping in the spirit of Sandy’s post on collaboration vs. “ownership,” I wanted to mention Fitzpatrick’s idea of peer review, share my hesitancy about her diagnosis of the problem and solution, and hopefully hear what everyone else thought about it.

In Planned Obsolescence, Fitzpatrick considers From Book Censorship to Academic Peer Review by Mario Biagioli (full text at http://innovation.ucdavis.edu/people/publications/Biagioli%202008%20Censorship_review.pdf) to describe how “peer review functions as a self-perpetuating disciplinary system, inculcating the objects of discipline into becoming its subjects” (Fitzpatrick 22). As Biagioli puts it, “subjects take turns at disciplining each other into disciplines” in academia (12). This concept makes sense across types of peer review; Biagioli focuses on the royal academies and the associated “republic of letters” as a way to conceptualize peer review beyond a singular project, and I am also thinking of contemporary practices that are designed to evaluate and recalibrate a power dynamic (like the time I realized that the department head in the back of a classroom was actually there to evaluate the instructor).

This entire process of peer review, but particularly familiar version that Fitzpatrick considers in her first and third chapters in detail, is wrapped up in notions of who counts as a peer. We have discussed the idea of collaboration throughout the semester, starting with the notion that DH projects often accommodate, even require, a variety of skills and contributions; Sandy’s post speaks to this point and flags the critical “decision point about whose contributions to include” in the first place as a good place to start for identifying a project’s collaborators and expanding our notion of a peer. All of this points to a more inclusive notion of the peer which, in turn, aligns with a field like DH that strives to be participatory and democratic in multiple senses of the words.

The peer review process that Fitzpatrick outlines in Chapter 3 seems like a good place to start putting this expanded idea of the peer into practice. She compares how digital commenting functions as one level of peer review for projects such as “Holy of Holies,” Iraq Study Group Report, her own article “CommentPress: New (Social) Structures for New (Networked) Texts,” Expressive Publishing, and a digital release of The Golden Notebook (112-117), describing a spectrum of options from an entirely open commenting feature where any reader could leave a comment to relatively closed off systems where only select readers could provide feedback. As I made my way through this chapter, the phrase “the wisdom of the crowd” (which we first encountered in the context of The Digital Humanities Manifesto 2.0 as described in “This Is Why We Fight” by Lisa Spiro) kept coming to mind. From my perspective, this notion underlies Fitzpatrick’s model for online peer review, which strives to be a social, open process while “managing the potential for chaos” (117). (Granted, this chaotic or more generally negative mob/mass/crowd was much more familiar to me from French history, Romantic literature, early urban sociology, and general concern about trolling, but I have come around to the idea that the crowd can be a force for good in so many DH contexts.)

However, Fitzpatrick also notes that the author of  Expressive Processing experienced that “the preexistence of the community was an absolute necessity” (116) to make its comment structure useful. This experience logically translates to other projects: peer review that turns to the “wisdom of the crowd” can only be as helpful as its crowd. I see how the crowd might offer more variety of feedback and how a more expansive notion of peer review in general could magnify the voices of individuals who may not have gotten the chance to participate in the process otherwise, whether because they fall slightly outside of academic circles, have not yet acquired the prestige to “do peer review” for a publisher, or any other reason. But to become a member of that peer review community or crowd — one of the seven women with commenting privileges on The Golden Notebook, for example — in the first place, I see the same social and technical barriers to access that we have talked about in class. As a result, I am struggling to see how a more democratic comment structure in digital spaces changes the disciplinary power dynamic of peer review. In your reading, does Fitzpatrick’s proposed version of peer review (in certain contexts) adequately address this power dynamic?