Many of our readings, and other readings that these have pointed to, have referred to issues Digital Humanists have in finding and working with data. Occasionally, the authors of these readings have made explicit calls for more humanities datasets in general, while some have focused on issues with using data in research and pedagogy. It is probably important to observe that there are a wide range of notions as to what constitutes data for the discipline, including, most broadly, the internet itself, or the entirety of content on social media platforms, but often referring to material that has been digitized and made available as “unstructured data” on the internet. The latter comes in the form of e-texts, images of photos, art and text, video, and etc.
There is an important distinction between data that is collected through experiment or exists already in a form delineated as computationally interrogatable, and what humanists generally designate as objects of study — which tend to have prior significance in their own cultural domains. Humanistic “raw data” (a troubled term) is functionally transformed when it is viewed as data.
As Miriam Posner writes, “When you call something data, you imply that it exists in discrete, fungible units; that it is computationally tractable…”, (Posner, “Humanities Data”) but notes that for most humanists, the “data” that is used is only “computationally tractable” to the extent that it arrives in digitized form; that, say, a film comes to us as something that can primarily be understood as it is observed across its visual, aural (often), temporal and cultural dimensions, whereas what is generally considered as data to a scientist might take the form of a list of each of the film’s frames.
Clearly, this concept implies the need for an additional step in the process of converting digitized culturally-significant objects into data that can be analyzed computationally. Thus, the problem of transforming data objects of interest to a digital humanist, e.g. a digitized novel, a collection of images of an historical correspondence, into a such a form, while not unique to the digital humanities, is certainly fundamental to any digital humanities project that analyzes data.
Posner’s article goes on to describe how this distinct relationship to data troubles the humanities around issues of computational research:
There’s just such a drastic difference between the richness of the actual film and the data we’re able to capture about it…. And I would argue that the notion of reproducible research in the humanities just doesn’t have much currency, the way it does in the sciences, because humanists tend to believe that the scholar’s own subject position is inextricably linked to the scholarship she produces. (“Humanities Data”)
But she does recognize the importance of being able to use quantitative data. One of her students engaged in art history research on the importance of physical frames in the valuation of art in the late 17th through the 18th century. He was able to make a statement about the attractiveness of “authenticity” based on an analysis of sales records, textual accounts and secondary readings. Posner concludes, “So it’s quantitative evidence that seems to show something, but it’s the scholar’s knowledge of the surrounding debates and historiography that give this data any meaning. It requires a lot of interpretive work.” (“Humanities Data”)
The problematics of humanities data Posner identifies include:
- the open availability of data, i.e. conflicts with publisher pay walls and other kinds of gating
- the lack of organized data sets to begin with and the difficulty of finding what data sets have been released
- the fact that humanities data is generally “mined”, requiring specific tools both for mining and organizing what is mined
- the lack of tools for (and/or datasets that include the tools for) modeling the data in the ways appropriate to specific inquiry given a humanist’s understandable lack of experience with manipulating data
Our own praxis assignments in the Intro to Digital Humanities class, have brought us face to face with these problematics. We were tasked, with the aim of exploring digital humanities praxis more generally, to create our own inquiries, find our own data, “clean” said data, use a specific methodology for transforming/visualization it, and then use these transformations to address our original inquiries. If we started with an inquiry in mind, we had to find suitable data that promised to reveal something interesting with respect to our questions, if not outright answer them. We also had to wrangle the data into forms which could be addressed with digital tools, and then make decisions about what it means, given our goals, for data to be “clean” in the first place. In so doing we ran into a number of other issues, such as: what is being left out of the data we have identified? What assumptions does the choice of data make? What reasonable conclusions can we draw from specific methodologies?
As James Smithies writes in Digital Humanities, Postfoundationalism, Postindustrial Culture, digital humanists tend to regard their practice as “a process of continuous methodological and, yes, theoretical refinement that produces research outputs as snapshots of an ongoing activity rather than the culmination of ‘completed’ research”. (Smithies) Using that idea as a springboard, it seems fair to posit that humanists often adopt an attitude towards data that does not halt the processes of interpretation and analysis at the point when the data’s incompleteness and necessary bias is discovered but will seek to foreground the data’s unsuitability as a point of critique — and thus incorporate it into the conclusions of the theoretical work as a whole.
An example from our readings of this sort of thing done right is Lauren Klein’s article “The Image of Absence” wherein she recounts the story of a man (a former slave of Thomas Jefferson’s), of whom little trace remains on record, by looking at his absence in the available sources (primarily a set of correspondence) and, in so doing, reconstructs the social milieu that contributed to his erasure. What is especially exciting about Klein’s work is how she maintains her humanistic orientation — which enables her to use data critique as a vehicle for forming a substantive statement. Indeed, this is a wonderful example of turning the very fact of a data set’s incompleteness into a window on an historical moment, as well as choosing the right visualizations to make a point, and focusing on what is most important to humanists, the human experience itself.
It was clear to us that to complete projects of this scope responsibly, and with a similar impact as Lauren Klein’s work, not only requires a significant time investment, but also specific skills. Our class provided us with a generalist’s knowledge of what skills a complete digital humanities project might require, but it was beyond the scope of the class to train us in every aspect of digital humanities praxis.
Project T.R.I.K.E. is thus designed to support students who might lack some of the skills necessary to contend with digital humanities praxis by providing them with practical references and their instructors with the tools to focus on domains that fit with the pedagogical goals of their classes and institutions. It is important to note we don’t participate in a methodological agenda — in other words, it is not our goal to prescribe pedagogy, but to support it in all its reasonable forms.