Category Archives: Uncategorized

Wikidata Workshop

Yesterday, December 13, Megan Wacha presented a small workshop on Wikidata for librarians, and because I am totally not procrastinating from working on the final project (sorry, Rob and Hannah), I decided to share what I learned with all of you.

What is Wikidata?

Some background for those who may not be familiar: the Wikimedia Foundation is the organization that sponsors Wikipedia and several other, related projects, including the Wikimedia Commons, Wikibook, Wikiquote, and others. Wikidata, which started in 2012, is the youngest of these projects. In the last six months, Wikidata has attracted a lot of attention and funding dollars in the last six months as Ivy League universities, national libraries, and major museums have gotten involved, including the National Library of Wales, the Met, and others. Like other Wiki projects, Wikidata is the work of volunteers who believe in the mission of making information available to everybody, so it operates on an ethos of openness.

Wikidata collects structured, linked data (which I’ll explain a bit further on) about all sorts of things; the example Megan used was Sarah Schulman (a writer and CUNY faculty member), and the entries I’ve found have included:

…so as you can see, it’s really versatile. However, as a relatively new project, it’s still very incomplete!

How does Wikidata Work?

Wikidata can be edited by both humans and bots, but making the interface approachable for humans is very important. For that reason, it slightly changes the usual language of linked data to make it slightly more accessible to people who aren’t metadata or grammar experts. Where most linked data sets refer to subject, objects, and predicates, with the three together forming a triple, Wikidata uses the language of items, properties, and values.

So, to stick with Reiner Knizia for a moment (and I can’t screenshot the whole thing, so this will make a lot more sense if you click through and correlate with the Wikidata item): his entry would be an item. It’s under his name, but there is also a number associated with each item as a unique identifier. His is Q61838. Knizia has several properties, that is, characteristics you could associate with him. The first is always “instance of,” which groups entries into very general categories – in this case, “human.” Other properties may vary depending on what’s in the instance property, but for humans, you get categories like “sex or gender,” “country of citizenship,” and “occupation.” The answers to the questions implied by the properties are values, so in Knizia’s case, the values of the properties listed above are “male,” “Germany,” and “game designer,” respectively. Properties can have multiple values; in this case “occupation” is “game designer,” “university professor,” and “game author.” I’m a little confused about the distinction between “game designer” and “game author,” but there are still some aspects of Wikidata that don’t quite work smoothly. There is a process for working this out, though.

In any case, the item, property and value come together to form a claim, like “Reiner Knizia is a game designer;” multiple claims can form a statement (“Reiner Knizia is a Spiel des Jahres-winning game designer from Germany.”).

One nice thing about this is that it has what catalogers call “authority control.” That is, it has the ability to distinguish among entities with the same name, and it has the ability to connect information about the same thing under different names, even across languages.

Why is Wikidata Useful?

.One of the primary original purposes of Wikidata was to gather information scattered across multiple articles. This could allow for the automatic generation of infoboxes (still not too common on English Wikipedia, largely due to a lack of references), but had some other uses as well. Wikipedia puts a lot of its articles into various categories. For instance, Knizia’s Wikipedia article puts him in “Living people,” “Board game designers.” “1957 births,” and “People from Illertissen,” which means he will automatically be included in each of these lists, and that I can go to the list of board game designers and see all the designers who have articles in Wikipedia (and I can also notice that it doesn’t include Hisashi Hayashi OR Ryan Laukat, which is nonsense and I should get on that). However, Wikipedia doesn’t provide a good way to combine these characteristics, the way you could with an AND search in any catalog, database or search engine.

For instance – I apologize for not remembering whose project this is, but one of the projects presented last week was by someone who was interested art stolen from colonized countries and currently held by Western museums, and had noted that getting this information isn’t easy! Wikipedia could be a useful tool to start with, except that you can’t look at where a work of art came from and where it’s currently located at the same time. However, Wikidata provides a possible way to do that, if the data is there. However, the data mostly isn’t there now—though it could certainly serve as a platform on which this sort of work could be conducted.

There are also several projects built on Wikidata, including:

Wiki ShootMe! — not a good name, but this finds articles near your location that could be improved with a photograph
Crotos — helps get at works of art in Wikimedia Commons
Histropedia — generates timelines using SPARQL queries
Scholia – generates information about scholarly authors, their co-authors, and where they are publishing.

Questions, Problems, and Hopes

There are a lot of things that the community is still figuring out about Wikidata.

Scope could be a problem. Where Wikipedia has strict notability guidelines, these same guidelines don’t apply to Wikidata, and the potential scope is very large – think of the list I included above. However, this data is hosted on Wikibase, the capacity of which is not infinite. One possible solution is federated wikibases. This is especially interesting in a scholarly communication context, because it creates the possibility that each institution could curate its own information about people affiliated with it, or even create a form allowing faculty to submit the bibliographic information about their work. This solves the problem of server load and the authority and vandalism problems that are always present. Of course, there is also a need for policies that will protect faculty from surveillance and trolling.

If libraries take Wikidata seriously, it could change how cataloging is done. There are still a lot of conversations happening about how to model this data, but this is an excellent time for academics to join this conversation. Currently, a lot of shared cataloging goes through OCLC, which facilitates the process but which also behaves very much like a commercial company, despite being a technical nonprofit, and which charges libraries for the work done by library colleagues.

And of course, there’s the question of how the data should be modeled and what the properties should be. Some of the data is entered automatically and edited by bots, which is very efficient but can certainly be problematic at times. The way the bots deal with gender, for instance, is not ideal. There’s also a question of who gets to be involved in this conversation, and a need for the participation of more types of libraries. Public libraries and special libraries definitely have information and models that could be useful here!

So, Wikidata is definitely a work in progress, but it’s a very interesting project with a lot of potential, and a way for the public to engage with data on a level that we don’t often see.

Mapping Praxis: Bonus Round

Hey all,

I just did a little thing that’s essentially another mapping praxis project. I mapped the character Stephen Dedalus as he moves around Dublin in James Joyce’s A Portrait of the Artist as a Young Man, and I wrote about what geographic coordinates can and can’t show. This was for Jonathan Reeve up at Columbia (some of you met him at Studio@Butler).

I currently have the work posted on my GitHub scratchpad blog. Have a read if you’re interested, and feel free to give any feedback.

https://hannimalcrackers.github.io/parseltongue/posts/007_joyce_portrait.html

Workshop: ESRI Story Maps

As you may remember, I created an ArcGIS Story map for my 5-minute project presentation last week. Although I created it in a rush and it was longer than I had time for, I think story maps can be a good alternative to power point slides depending on what you’re doing and are also a fun way of keeping a journal, so I want to share what I learned in the ESRI Story Map workshop I took with Olivia Ildefonso and Javier Otero Penya.

Olivia told us that ESRI is the leading story mapper right now. They’ve really taken over the market, she said. The cool thing with ESRI is you don’t even need a map to create a story map. You just need pictures. Olivia doesn’t use it for mapping. She uses it for pictures and text.

ESRI is a for-profit company. They offer the story-mapping tool for free. Why use ESRI? Because it’s free and open-sourced. Therefore, if you’re a developer and want to customize it even more you can download it and customize it. You can get code from Github. Furthermore, It’s easy. You don’t have to know how to code or map. You can embed maps if you want but they have to be ESRI maps. You can’t use Carto.com. Now here’s the rub: if you want to create a map with ESRI you have to pay. But there’s a way around this: GCDI has a one-year ESRI interactive map for students. Go chat with GCDI if you want to access that map.

ESRI story maps are like some of the articles we see in the NY Times (this article on Yemen, for example, was built with ESRI or a very similar program).

To build a story map with ESRI, go to https://storymaps.arcgis.com and create an account. Choose the kind of story map you’re going to create. Olivia suggested starting with Cascade. Create a story board before starting creating the story map! Olivia recommends doing it in PPT. Include photos and notes. Videos have to be shared from youtube. This is the story map I built in the workshop.

In it you’ll find notes I took of some of Olivia’s and Javier’s suggestions interspersed with a lot of nonsense I wrote as I feverishly followed Olivia’s directions, and an odd assortment of photos I pulled at random from my files. I like how the little dog becomes the big dog; that was a lucky accident. You’ll figure out how to use ESRI easily if you just dive in and play around.

Protected: Data Security “Protect Your Work!” ITP Skills Lab

Voyant Inquiry + Intro to DH Part II Spring 2019

With Matt’s permission, I’m reaching out to classmates about continuing the Voyant Inquiry as a possible group project in Intro to DH Part II Spring 2019.

Anyone interested?

The course descrip for Intro to DH Part II mentions group work, so I’m thinking ahead as I’d like to follow through with the inquiry that began in Intro to DH Part I Fall 2018.

The above would be subject to Prof. Silva’s approval of the project.

I hope to hear from any interested classmates!

It’s a bird! It’s a plane! It’s a project proposal!

Background

I grew up in a small town south of Pittsburgh, and I have always been fascinated by history. As I read about Native Americans, I thought to myself, “There really aren’t any Native Americans around here. Why is that?” So, I started looking into it[i].

As part of the Treaty of Paris (1783)[ii], the Northwest Territory (now the Midwest) was awarded to the United States. At the time, white settlements were just starting in the region, while the Native American tribes lived on much of the land.

Conflict between the United States and native tribes began almost immediately, with the outbreak of the Northwest War[iii]. This war lasted for ten years, with the forces of the United States eventually winning. The Treaty of Greenville (1795)[iv] ended this conflict. This treaty established a clear line dividing the White and Native American territories. It also included provisions allowing the tribes to sell their lands. As you can imagine, this treaty was broken with some frequency.

This is a fascinating time period, and I thought about focusing on it, but the sources are scarce, especially for the Native Americans[v].

After years of dealing with the loss of their land, Native American leaders began to resist. Tecumseh, a Shawnee Indian, became leader of this confederacy and traveled widely to encourage support from those on the fence and to organize.

In 1809, the governor of the Indiana Territory William Henry Harrison signed the Treaty of Fort Wayne[vi] with the Miami, Kickapoo, and Potawatomi tribes (among others), in which they ceded territory along the Wabash River.

Tecumseh, and other Native leaders, disputed the legality of this treaty, saying that the land didn’t belong to any one tribe, but, rather, to all. Harrison ignored this. This led to Tecumseh’s War in 1811, which, in 1812, became part of the larger War of 1812.

This project aims to map and analyze the rhetoric of Tecumseh’s War. As was pointed out in class, the focus is very broad, and I need to scale it down. One option would be to focus on Tecumseh’s travels, which would work. Another might be to focus on the treaties themselves, as the Treaties of Fort Wayne and Greenville are not the only two treaties involving Native American rights in this region in this time period. I’ll think about this over the weekend. Honestly, I could just focus on these treaties.

______________

[i] As I said in class, I’ve also done research on the Holocaust and the AIDS crisis. I’m not sure what it says about me that I’m interested in these subjects, but, odds are, it isn’t complimentary.

[ii] The date is included because there are literally dozens of Treaties of Paris.

[iii] When I first researched this topic, it was called Little Turtle’s War, after Little Turtle, a Miami chief who was a Native American leader in the conflict. It is also known as the Ohio War.

[iv] There are only two Treaties of Greenville.

[v] Honestly, the sources for Tecumseh’s War aren’t exactly abundant, but I’ve found more of them.

[vi] Again, two treaties of Fort Wayne.

Protected: Intro to DH Presentation on Points Of Reference / PREFERENCE — Carolyn A. McDonough — Dec. 4, 2018

I Guess We Should Talk about Tumblr

We talked last week about the oppressive effects of a lot of media-based technology, so I guess it’s a convenient time for Tumblr to have announced their latest policies.

For those whose news stream may differ from mine, what happened is this: Tumblr was removed from Apple’s App Store for child pornography (a genuine problem). They announced that they were planning to change their policies, and yesterday, they announced this policy, which includes a ban on “female-presenting nipples” (???) and vaguely-defined “sex acts,” although there are exceptions for political or medical situations and fine art (who gets to decide what’s fine art and what’s just smutty art? Tumblr does, apparently). There are lots of news stories about it; this is one: https://www.latimes.com/business/technology/la-fi-tn-tumblr-adult-content-ban-20181203-story.html

In any case, there are several interesting features of in relation to our readings and discussions:

The enforcement mechanism here is Apple. These changes don’t come in response to the needs of people using the platform. There have been multiple complaints from Tumblr users about the content that’s made available on the platform, but these complaints are apparently ineffective (judging by the fact that this is even happening). They also don’t come in response to any kind of official regulation. Noble asks a lot of questions about how companies online are to be held accountable for the content they present; it appears that the answer at present is via rules set by other kinds of corporate gatekeepers (advertisers, of course, also wield a lot of power in this area).
Tumblr is relying on algorithms to do this work. These algorithms aren’t very accurate — many users have already posted examples of content that has been flagged for no obvious reason at all — but they’re probably a lot faster than humans, and Tumblr apparently has so much trust in them that they’re removing Safe Mode (another algorithmic tool and doubtless with many problems of its own) to rely on this general filtering exclusively. Noble, again, has a lot to say about how relying on algorithms to identify acceptable and unacceptable content can shield a company from accountability; that seems pretty clearly what is happening here. They’ve muted certain hashtags entirely; there isn’t a lot of subtlety to their approach.
The focus on pornographic content, especially when that is poorly defined, puts a particular kind of bracket around what’s considered acceptable and unacceptable. Child pornography should obviously be removed (and prosecuted), but the particular framing of sexual content in this (and similar) policies tells us something about the platform’s priorities. For instance, Tumblr is not trying to eliminate racist content from their platform. Many users have posted examples of white supremacist blogs that show up quite easily in a search. On the other hand, by framing this around what is and isn’t considered pornographic, the context and purpose of the blogs is not considered. So they’re not specifically going after pornbots, which are a persistent nuisance on the site. On the other hand, we know from experience with other kinds of “filtering” used on the internet, policies like this often target LGBTQ information because they’re created with the unspoken assumption that sexual minorities are somehow inherently sexualized. Sex workers who use Tumblr are also likely to be targeted (which is in line with a “no sexual content” policy and which I can understand from a corporate point of view, but which is dangerous to sex workers who’ve been using the platform as a relatively safe place to do their work). So this is done in a way that is more harmful to the more vulnerable users of Tumblr. One of the things I really appreciated about Noble’s analysis was that she was critical of pornographic content specifically as it sexualizes marginalized people in an exploitative way; she writes that “[m]arginalized and oppressed people are linked to the status of their group and are less likely to be afforded individual status and insulation from the status of the group with which they are identified” (26). She is interested in how this reflects from online pornographic content into the broader society — but here’s an example where we can see that it also reflects back into how the work created by members of marginalized groups are treated when internet companies decide to take a more active role.

It’s not clear what this will do to Tumblr — a lot of people are comparing this to “strikethrough” which happened on LiveJournal in 2007, and I don’t know enough about LiveJournal to say to what extent that contributed to the decline of that site.

But it’s interesting. I have lots of questions about who CAN be trusted to make decisions about how searching and social media can work; corporate platforms aren’t making a great argument for themselves as the appropriate guardians for this sort of thing.

(aaaannd I should note that as I’m choosing the tags for this post, I’m making decisions based on how it’ll be algorithmically “seen.” If I tag it “pornography” because it talks about that kind of content, will it disappear from the blog and/or from Google? Have I already used the word too many times in this post?? Remember how we talked about the Panopticon in class???)

post-class response to Simone Browne’s “race and surveillance”

As we have discussed in the class on Nov 27th, I think, Simone Browne’s article/chapter of “Race and Surveillance” brings us back to the task of constructing or weaving infrastructure for the equal distribution of social materials & tools and knowledge for everyone engaged in the construct of a community. Browne’s cautious (which is, I sense, “rationally” and scientifically processed for her scholarship) methodology of seeing technology and its operation as “marginalizing surveillance” is focused on seeing its unequal “racializing outcomes” on the level of governmental control of the populations. Instead of widening her scholarly attention to “marginalizing technology” as a lens to see subconscious layers of science & technology and its ways of perceiving every social agent and actor as well as their socio-geographies along the lines of races and classes, Browne examines the factual elements of “racializing outcomes” by addressing how the governmental agencies use photographies (eg. mug shots), collected data, biometrics, and so forth. However, the correlations between technology and racializing surveillance, I believe, permeate the form of everyday suspicion & paranoia and interpersonal (and communal) interaction that we (everyone) in the class are shaping and controlling in the logic of safety, sanity, and self-care. As much as we live with the class anxieties and pursue idealized or standardized values of well-being, we tend to categorize the others who don’t promote those values efficiently as “shameful” and even “dangerous” beings and push them away from our boundaries of living and communicating. Though I don’t want to sound emotional or metaphorical, I would say that technological, cognitive and social mechanism of surveillance is infinitely shaming itself as much as anxieties of class and racial degeneration wouldn’t decrease in the system of using marginalizing technologies and othering social differences along racial lines.

After reading Browne’s article which is centered on state surveillance and its technology and history, I wanted to question what if this racializing (and classifying) technologies are used in private sectors in agreement with the governmental policies that justify the use of surveilling technologies and media for perceiving the “truth” of the individuals against the privacy and dignity of the selective individuals and communities. This question reminded me of the article I read earlier this year that discusses the surveillance technology that groups and potentially criminalizes the neighbors of color: http://bostonreview.net/race-law-justice/clarence-harlan-orsi-hoverboarding-while-black (Please take a look when you have a moment.) My question is, if the contemporary ways of self-care and well-being are promoted by such marginalizing technology and “citizens” support it, how would we structurize and operate the digital humanities as resistance to it? Although Browne’s and other readings didn’t discuss it in particular, I believe that bell hooks’ argument for “oppositional gaze” still functions as an alternative way for us to approach the DH differently from the governmental (as well as privatizing) technologies and networks (over virtual and actual socio-geographies). How we can build the DH platforms where the marginalized can “look back” at the system-builders & agencies without fear of being punished and stigmatized? As we discussed in the class, learning from the narrative/storytelling and experience of “repairs” on the side of the marginalized in the social infrastructure would be a first step for this. But I believe that this will require interdisciplinary work that even provides legal aids to social actors and agents to counter-act the dominant use of technology and its networks for the governmentalization of the population as much as such oppositional acts or resistance can be promptly criminalized by the norms of safety in any modern societies that we are now living in. I don’t have a concrete answer here but I think binding the purpose of the DH and “oppositional gaze” under the appropriate legal support is important as much as the reparative or resistant narratives can be easily dissipated under the master narrative of safety and development. (For example, if activist hackers deliberately create the glitches to oppose marginalizing technologies, how are they going to avoid punishments? )

DHUM 7000 Proposal Presentation Sign-Up

Hello folks! Matt asked Anthony and I to put together a doodle poll for our upcoming two sessions of student presentations on 12/4 & 12/11. Please sign up for one of these dates, as we must condense the presentations so that everyone has a chance to share their work. As Matt mentioned in class each proposal should be no longer than 6-8 minutes each unless you are collaborating which allows from an extended presentation time.

DHUM 7000 Proposal Presentations

DHUM 70000 – Introduction to Digital Humanities

Fall 2018 CUNY Graduate Center | #dhintro18