I spent some time this week with Phototrails, a Mellon Foundation-funded collaboration between the University of Pittsburgh’s Department of History of Art and Architecture, California Institute for Telecommunication and Information’s Software Studies Initiative, and The Graduate Center. Photorails maps patterns between 2.3 million Instagram photos from 13 global cities and describes itself as a work of cultural analytics, using computational methods to identify “visual signatures” within this vast amount of data for each city. Our conversation last week about mapping, representing, or else visualizing personal places was on my mind; I was drawn to Phototrails in part because my moves this past summer — to Boston in June and New York in August — have prompted me to think about how to represent my time in these distinctive cities via social media to family and friends in Virginia. Do I share the iconic or expected images (the Manhattan skyline from the Manhattan Bridge, for example, or the arch in Washington Square Park), offer variations on themes (a crowd of people in a museum with Starry Night tucked in a corner), or geotag otherwise nondescript images to signal that, yes, I am in these spaces (a top-down view of my coffee on a table that could be anywhere in the world, only identifiable as my neighborhood coffee shop in New York because of the geotag)? After learning more about what Phototrails aimed to accomplish, I not only wanted to evaluate how my own visual data about and representations of experiences of New York might fit into a dataset or approach to data, but to share a few takeaways about visualizing photographs as qualitative data points and photographic metadata.
First, I wanted to describe the project’s visualization layouts, borrowing language from those sections on the website. The team describes four options for presenting the data: radial visualizations, which organize photos in a circle across their visual attributes (hue, brightness, texture), location, and timing; montage visualizations, which offer a more grid-like organization; PhotoPlot software, available for more investigation here; and points and lines, which use a color-coded system on a gradient to capture the time of day that each photo was taken. The idea with these various layouts is that the data can adjust to show visual characteristics of the data as well as metadata (filters, spatial coordinates, upload date and time). Phototrails describes a “multi-scale reading” capable of “moving between the global-scale cultural and social patterns and the close-ups revealing patterns of individual users,” a middle ground between close and distant reading of behavior, experiences, and representations. With this information in store, however, I began to wonder what other information may have gotten captured. (This is where I loved Drucker’s distinction of data (a “given”) and capta (that which is captured). She elaborates that “capta is not an expression of idiosyncracy, emotion, or individual quirks, but a systematic expression of information understood as constructed, as phenomena perceived according to principles of interpretation” and I am still puzzling over if this notion undercuts the idea that we can find something like a pattern across 2.3 million individual photographs.)
In this sense, Phototrails reminded me of our conversations about text analysis, as when some of us ran became uncertain about if/how Voyant would store our data and ended up pursuing different lines of thought than originally planned. In the case of Phototrails, I was curious about how the team gained access to this data of 2.3 million photographs, then realized that they were publicly posted on Instagram. What are the ethical implications of conducting a large-scale project like this, drawing on social media where those who “participated” in the project might not know that they offered data for this purpose? How do ideas about informed consent — those ideas that shape the concept of the IRB and standards for human-based research, but also notions of privacy more broadly — intersect with this type of scholarship that necessarily casts a wide net and, in many ways, crowdsources from a crowd that often does not recognize itself? It reminds me of when I noticed signs at The Grad Center orientation that said, essentially, “your presence in this space is consent to be photographed and documented on film,” and found myself acting differently — smiling and gesturing more, going into a corner to check a notification on my phone — because I had this heightened awareness of the potential future uses of my image. Because the participants in this project did not have the benefit of such a sign, the Phototrails data is arguably more “real” or “authentic,” but the uneasiness lingers.
At the same time, there are parallels between this challenge of digital data collection and more traditional methods of anaylsis. It makes me uncomfortable to know that any of my own data, visual and otherwise, might very well end up in someone’s research and take on meaning(s) that I did not intend, and that I probably will never even know. In the same way, that farmhand in the nineteenth century might have kept a diary for future insight on labor conditions in their industry, but the diary more likely served a set of purposes in its time and took on new meaning later. Part of conducting responsible research, whether focusing on objects or literature or documents, is recognizing these multiple layers accordingly and not distorting or overstating one aspect to get a desired result.
This is where I found myself disagreeing with Phototrails’s own distinction between big data and thick data. “Zooming into a particular city in specific times, we suggest that social media can also be used for local reading of social and cultural activity,” the Phototrails team wrote. “In other words, we do not necessarily have to aggregate user generated content and digital traces for the purpose of Durkheim-like mapping of society (where individual people and their particular data trajectories and media diaries become invisible). Instead, we can do “thick reading” of the data, practicing “data ethnography” and “data anthropology.”” In my mind, a thick reading of this data would include explanations for why a user shared one location and not another (even to the level of sharing the street address versus a building name, as Sandy mentioned in her discussion of mapping stops on a global tour), information about captions, details about the hashtags, and the consideration of if the photo was taken in New York or just tagged there, all non-visual components that influence a visual signature. Without such context, I think this project is an ambitious and impressive example of visualizing big data, but falls just short of a thick reading that reaches the possible depth of “cultural, social, and political insights about particular (local) places and particular time periods” it aimed for.
Like many of us have mentioned in class and in other conversations, I also find the sense of collapsed boundaries — the idea that we are all constantly, quietly, accidentally providing data, whether it ends up in a peer-reviewed academic journal and helps provide a new perspective on an important social issue or whether someone uses the same data for something far more unsettling — troubling. To illustrate this, I followed the Phototrails website links to its new project, Selfiecity (http://selfiecity.net/). Selfiecity addresses the selfie in artistic, theoretical, and quantitative frameworks, including visualizations of 3200 of selfies around the world and an interactive photoset. Close to the bottom of a page of insights, the website offers headshot and bios of a team of eight and then, at the very bottom, single attribution of sorts: “A DigitalThoughtFacility project, 2014.” The link takes you to http://www.offc.co/web/index.php, which greets you with a description of OFFC, a “a research and design studio based in New York City” that describes its work as follows: “We work with global brands, research institutions and start-ups to explore new product applications for today’s emerging technologies.” This isn’t to say that corporate interests can’t engage with DH scholarship — that’s a huge, ongoing conversation about higher education and business in general — but just to note the curious flow from project to project. This week of readings and projects has provided a good path forward for continuing to explore the interplays between access, democracy, inclusion, and privacy, particularly in the middle ground between close and distant reading.