“it’s just awful trying to find a humanities dataset”

What is the value of teaching methodological tools with no inclusion of theoretical support that informs analysis? But what good is the theoretical when students struggle to learn arduous methodology in software like R Studio? Learning to program, from the outset, seems impossible. It is literally learning a new language, one that is mathematical and statistical. Andrew Goldstone articulates some very promising angles to approaching these dilemmas.

When it comes to having the methodological skillset, the big question is “so what?” What can you say about the visualization on your screen? I recently had this problem with my network analysis… okay, I curated data and created networks of characters that conversed with one another in Hamlet. What good is that for scholarship? Well, it is certainly good for my own personal scholarship. We all were told going into these Praxis assignments that the projects were more about getting experience with digital tools than necessarily revealing anything groundbreaking. You need to test the waters before you can commit to a full-on swan-dive. Goldstone understands this, but at the same time was teaching at the Ph.D. level where results mattered. His course sounded intensely painful but very rewarding at the same time.

My experience with R Studio is limited to one class I took. It is really hard software to learn because, beyond software functionality, there is also the problem of interpreting the R language and making it “do stuff” for you. In Goldstone’s fast-paced one-semester textual analysis course, the students sounded highly committed, intelligent and professional but that would be a given going into the design of the course from the outset, I mean they are Ph.D. students. How could he design pedagogy that would inform his students to create intelligent work and mobilize them to ask worthwhile questions of that work? In a very short time frame.

It seems that Goldstone had three major takeaways from his experience with this trial run of his course:


“1. Cultivating technical facility with computer tools—including programming

languages—should receive less attention than methodologies for analyzing

quantitative or aggregative evidence. Despite the widespread DH interest in the

former, it has little scholarly use without the latter.


  1. Studying method requires pedagogically suitable material for study, but good

teaching datasets do not exist. It will require communal effort to create them on

the basis of existing research.


  1. Following the “theory” model, DH has typically been inserted into curricula as a

single-semester course. Yet as a training in method, the analysis of aggregate data

will undoubtedly require more time, and a different rationale, than that offered by

what Gerald Graff calls “the field-coverage principle” in the curriculum.”


When I took Digital Humanities courses at NYU, the layout of the program was much different than ours. In the first semester of their sequence, you are taking an Intro to Python course that proved to be very challenging, especially to people with little programming experience (like me), because like Goldstone’s course, it met once a week. I struggled with homework and went to office hours every Monday morning. I would have benefited from this back-end approach of learning to look at and analyze what is being quantified before being expected to create it on my own. Then, when I would go back to the programming course at a later time, I would know what to expect to come out of the “other end.”

In DH courses, the internet is our oyster, to mark Goldstone’s second point. In other words, it is all of our responsibility to keep an eye out for that perfect database that has everything we all need (does that exist?). Sometimes it does take being a little creative and problem-solvable, I had to make my Hamletdataset by hand, but is that the worse thing for an intro-level course?

There isn’t much to say about the third point other than as we already have drunk the Kool-Aid of this program, we know, one semester just won’t cut it. There are so many concepts, theories, methods, programs, languages, practitioners and articles to read. We are lucky to call our program home because we get the time we need to delve into all that.

While learning to work with data, we must learn not only how to make the data “do stuff” but know how to ask the right questions of it at the right time. Because as Goldstone points out, how can one be sure a “trend is real, and not a random fluctuation?” It’s fun to look at data and believe you are pointing something worthwhile out. It’s less fun to learn what you’re looking at isn’t actually interesting by someone that knows.

It is important to learn to program because another point Goldstone makes is when using GUI interfaces, you are limited to the confines of the system. He uses Voyant for an example. Without having knowledge of coding, you are literally locked out from asking questions other than what Voyant allows you to. Perhaps this is another weakness in the tool that could be addressed in our letter to Voyant (if that hasn’t happened already).

The problem with learning too much methodology at once is what scholarly good is it serving? A balance of the methodological and the theoretical is essential for keeping checks and balances. I know in my course with R Studio, there was a great deal of both. My professor was a proponent of being sure to include theoretical readings along with practical assignments every week. I learned a great deal, and this class is what turned me on to data and DH. It is only through understanding the theoretical that the methodological clicks in such a way that scholars can ask appropriate questions. This is a very important aspect of pedagogy to me and is something that is put into practice in our program.

And of course, Goldstone makes an excellent point in that having guided datasets for beginner students is a great way for one to get their feet wet; “so that instead of being forced to fish for interesting phenomena in an empty ocean, students can follow a trajectory from exploration to valid argument.” It is always helpful to have a guide, especially when learning something so new and complex as programming and any other kind of work with data.

12 thoughts on ““it’s just awful trying to find a humanities dataset”

  1. Nancy Foasberg

    Yep yep yep. There’s definitely a part of me that wishes I’d somehow learned Python beforehand! (In other news: I’ve ordered two Python books through the mail and I’m not being especially patient as I wait for them.) It can be challenging to learn to do something and learn to think about it at the same time! But I constantly find myself running into methodological questions. I worked with Gephi and it was really exciting, but I had to confess that I didn’t really understand what all the different modes did! It felt a little like I was pushing a “statistics” button and seeing what came out rather than directing a scholarly inquiry or even really exploring the data.
    And I think that’s okay in the context in which I did it, because it’s best to have an idea of what sorts of things a tool can do and why you’d want to use it, whether we’re talking about technical tools or the more metaphorical tools of methodology. But this also tells me I need to read more, and what sorts of things I need to be learning about … although it stops short of providing a reading list.

    1. Rob Garfield

      Hey Nancy, perhaps before next semester we can create a cohort for learning python (and other programming languages/environments) from this class and the dh program. I’m having a blast learning Python and would greatly benefit from working with others and sharing my knowledge.

  2. Nancy Foasberg

    AND a second comment, because I think this is important enough to deserve its own — there is such a need for humanities datasets, but they’re not going to materialize until we have good infrastructure for them.
    There are so many pdfs out there, and so many HTML files, because we have developed ways to easily store and transmit them. But datasets….
    CUNY Academic Works, for instance, is really great for sharing scholarly work…as long as it’s a Word document or a PDF. Sure, we accept other formats, but the platform is MUCH less well developed for them. Even video is pretty tricky, but data? well…
    And this is a real problem, because as Bode pointed out, we need to be sharing our data! Not only is this important for transparency of research and allowing people to check whether our conclusions are valid, but think of how many different analyses can be done with the same dataset!
    And anyway, if we want to value this kind of work, we need to make sure it’s possible to share what’s behind it.

    1. Rob Garfield

      I’m thinking this could fit into a final project proposal. After reading the Goldstone piece (as well as most of the rest of this semester’s readings) , the idea started to take shape in my head.

    1. Hannah House

      This would benefit something I’ve been thinking about, which is a potential pedagogy-based project around making the intro to DH more inclusive of different learning styles and improving the scaffolding outward for students who are inspired and want to dig further into one of the praxis areas.

      Having clean datasets to work with and further explication of good practices and ‘watch-outs’ in each area would be really helpful!

      It looks like CUNY is part of the NYC OpenData platform, which is a platform for sharing data, but CUNY only has one thing up.

      I’d love to discuss this more. I’ll reach out to you both via Commons message bc I don’t believe I have your email, Nancy. If you don’t hear from me let me know. I’ve had Commons eat both incoming and outgoing messages.

Comments are closed.