Text mining the ASA Dissertation Prize

The mangle came out in full force as I was deciding which texts to use for this assignment. My first thought was to explore the language of DH research centers around the world, aiming to identify similarities or differences in their mission statements. This idea proved to be too ambitious (and a little too meta for me today), and in my efforts to scale it back I ran into the problem of picking a few centers to compare and, by extension, determining which centers were “representative” of DH. Even on a small scale with low stakes, that task felt a little loaded, so I switched directions.

I then uploaded a corpus of 70 texts from an undergraduate sociology seminar on cultural dimensions of violence, covering a range of historical and contemporary examples from a variety of disciplines, but ran into the problem(s) of formatting. The PDF format clouded Voyant’s reading process, as the language on the cover page of each document (“use,” “published,” “rights”) registered as the most frequently used words. As much as I wanted to work with a larger corpus, I could not figure out how to only upload the text of articles or a page range from a PDF — not to mention that multiple articles registered as having 0 words in them — so I decided to take a different track and pick texts that I could streamline more easily.

With the writing process, publishing, and peer review on my mind from last class, I decided to think about the early stages of the prestige/reputation economy and went to the American Sociological Association website. Of the ASA’s annual awards, including teaching and career accomplishment awards as well as more outward-facing awards for public sociology and social reporting, I was drawn to the Dissertation Award as a way to explore the language of peer review and evaluation. The submissions instructions prove to be fairly detailed, but the selection criteria, not so much:

“The ASA Dissertation Award honors the ASA members’ best PhD dissertation from among those submitted by advisers and mentors in the discipline. Dissertations from PhD recipients with degrees awarded in the current year, will be eligible for consideration for the following year’s award. (e.g. PhD recipients with degrees awarded in the 2018 calendar year will be eligible for consideration for the 2019 ASA Dissertation Award.)”

To get more information about this particular prize, I pulled the press releases for each award decision from 2008-2018 to give me information about the 13 dissertations (counting two years with joint winners and not counting years that included an honorable mention), and put the body of each release into a separate PDF. With much cleaner documents to run through Voyant, I uploaded all 13 at the same time and started to dig in. I was curious to see if the language of these press releases betrayed some logic or reasoning behind the language of the selection criteria. I hypothesized that each press release would include some mention of timeliness and/or novelty (why this particular research mattered to the field at the time and/or why the method contributed something to a subfield or entire field) and that the results in Voyant would show this language accordingly.

Instead, I was struck by the most frequently used words (besides “dissertation”): global (31 times total in the 13 documents); political (25); cultural (24); and social (24). None of these words were used in every text; “global” was concentrated heavily in Kimberly Kay Hoang’s “New Economies of Sex and Intimacy in Vietnam” (which won the award in 2012) and Larissa Buchholz’s “The Global Rules of Art” (2013). Of the most frequently used words, though, “political” appeared in 12 out of 13, only absent in the write-up of Alice Goffman’s “On the Run” (2011), and “social” was only absent in Christopher Michael Muller’s “Historical Origins of Racial Inequality in Incarceration in the United States” (2015). In both cases of absence, I found that these absent words could have applied to the project at hand (i.e., the announcement of “On the Run” could have mentioned the political components to Goffman’s methods and “Historical Origins of Racial Inequality in Incarceration in the United States” touches on undeniably social dimensions), which made me think about who was writing these releases in the first place and choosing to focus on which dimensions of the projects: intensive fieldwork in one case, novel methods in another.

On this note, the releases for 2017 and 2018 projects (Karida Brown’s “Before they were Diamonds: The Intergenerational Migration of Kentucky’s Coal Camp Blacks” and Juliette Galonnier’s “Choosing Faith and Facing Race: Converting to Islam in France and the United States,” respectively), plummeted in word count (105 and 149) compared to the average of 642 words from 2008-2016, which peaked with Goffman’s 789-word announcement in 2011 and had been decreasing since 2013. The vocabulary density of these announcements has also been on the rise, but not consistently, fluctuating between a high density of .771 in 2017 (Brown’s “Before they were Diamonds”) to a low of .451 in 2013 (Daniel Menchik’s “The Practices of Medicine”), and the average words per sentence have also been widely varied, from 85.5 words/sentence in 2009 (Claire Laurier Decoteau’s “The Bio-Politics of HIV/AIDS in Post-Apartheid South Africa”) to 20.9 words/sentence in 2015 (Muller’s “Historical Origins of Racial Inequality in Incarceration in the United States”). Although vocabulary density and average words/sentence tell their own stories, the most striking difference in my eyes has been document length. The sudden change from ~400 to <150 words makes me think that the winners of the Dissertation Award used to write their own announcement, but there was a shift between 2016 and 2017 that moved the announcements much closer to a dense, factual press release format with little embellishment and no outside quotations from supervisors or mentors.

I was also interested to discover through Voyant that these announcements generally do not make a big deal out of each dissertation’s timeliness or novelty; “timely” appeared in three of the 13 announcements, “new” (in context) in two, “groundbreaking” or “breaking new ground” in two, and “ambitious” in three, not the kind of language I predicted. Instead the announcements often mentioned other aspects of quality, from Decoteau’s “masterful” research to Buchholz’s “theoretically and methodologically sophisticated analysis,” and — in seven of the 13 documents — a “contribution” to the field without pointing back to novelty or newness specifically. In a certain way, this lack of specific language about timeliness or novelty and a focus on overall quality creates an in-group feeling. The reader learns about the content of each project — what each scholar studied and how they approached it — and is left to read the rest for themselves. 

In hindsight, I think I was hoping that the write-ups for each dissertation award would give a bit more insight into the selection and review process for this particular prize. However, the selection process for many prizes — from receiving a named award from a scholarly organization to a securing a spot at a music festival to being signed to a particular modeling agency — is often deliberately vague and, even after the fact, information can be limited about how a committee arrived at a decision. This example suggests that, for now, certain academic prizes are no exception.

1 thought on “Text mining the ASA Dissertation Prize

  1. Nancy Foasberg

    What an interesting project!
    I particularly appreciated your discussion of the various false starts on the way to the project you initially settled on, as my process was very similar! Finding the right corpus is really hard — and sometimes, as you point out here, the information you want isn’t what the people producing the data want to make clear!

Comments are closed.