Tag Archives: Dataset

css.php

Network Analysis of Wes Anderson’s Stable of Actors

I had initially planned to have my network analysis praxis build on the work I had started in my mapping praxis, which involved visualizing the avant-garde poets and presses represented in Craig Dworkin’s Eclipse, the free on-line archive focusing on digital facsimiles of the most radical small-press writing from the last quarter century. Having already mapped the location of presses that had published work in Eclipse’s “Black Radical Tradition” list, I thought that I might try to expand my dataset to include the names and addresses for those presses that had published works captured in other lists in the archive (e.g., periodicals, L=A=N=G=U=A=G=E poets). My working suspicion was that I would find through these mapping and networking visualizations unexpected connections among the disparate poets in Eclipse and (possibly, later) those featured in other similar archives like UbuWeb or PennSound, which could potential yield new comparative and historical readings of these limited-run works by important poets.

The dataset I wanted and needed didn’t already exist, though, and the manual labor involved in my creating it–I would have to open the facsimile for each of multiple dozens of titles and read through its front and back matter hunting for press names and affiliated addresses–was more than I was able to offer this week. So I’ve tabled the Eclipse work only momentarily in favor of experimenting with a more or less already-ready dataset whose network analysis I could actually see through from beginning (collection) to end (interpretation).

Unapologetically twee, I built a quick dataset of all the credited actors and voice actors in each of Wes Anderson’s first nine feature-length films: Bottle Rocket (1996), Rushmore (1998), The Royal Tenenbaums (2001), The Life Aquatic with Steve Zissou (2004), The Darjeeling Limited (2007), Fantastic Mr. Fox (2009), Moonrise Kingdom (2012), The Grand Budapest Hotel (2014), and Isle of Dogs (2018). As anyone who has seen any of Anderson’s films knows, his aesthetic is markedly distinct and immediately recognizable by its right angles, symmetrical frames, unified color palettes, and object-work/tableaux. He also relies on the flat affective delivery of lines from a core stable of actors, many of whom return again and again to the worlds that Anderson creates. Because of the way these actors both confirm and surprise expectations–of course Adrian Brody would be an Anderson guy, but Bruce Willis?–I wanted to use this network analysis praxis to visualize the stable in relation to itself and to start to pick at interpreting the various patterns or anomalies therein.

Fortunately IMDB automated a significant portion of the necessary prep work by providing the full cast list for each film and formatting each cast member’s first and last name in a long column–a useful tip I picked up while digging around Miriam Posner’s page of DH101 network analysis resources–so I was able to easily copy and paste all of my actor data into a Google Sheet and manually add the individual film data after. (I couldn’t copy and paste actor names from IMDB without grabbing character names as well, so I kept them, not knowing if they would end up being useful. For this brief experiment, they weren’t.)

I used Google’s Fusion Tables and its accompanying instructions to build a Network Graph of the Anderson stable, the final result of which you can access here. As far as other tools went, Palladio timed out on my initial upload, buffering forever, and Gephi had an intimidating interface for what I intended to be a light-hearted jaunt. Fusion Tables was familiar enough and seemed to have sufficient default options for analyzing my relatively small dataset (500-ish rows in three columns), so I took the path of least resistance, for now.

A quick upload of my Sheet and a + Add Chart later, my first (default) visualization looked taxonomical and useless, showing links between actor and character that, as you might expect, mapped pretty much one-to-one except in those instances where multiple actors played generic background roles with identical character names (e.g., Pirate, Villager).

A poorly organized periodic table of characters

I changed the visualization to instead show a link between actor and film, and was surprised to find that this still didn’t show me anything expected (only one film?) or intriguing. Then I noticed that only 113 of the 449 nodes were showing, so I upped the number to show all 449 nodes. Suddenly, the visualization became not only more robust and legible, but also quite beautiful! Something like a flower bloom, or simultaneous and overlapping fireworks.

Beautiful as the fireworks were, I felt like the visualization was still telling me too much information, with each of the semi-circles consisting primarily of actors who had one-off relationships to these films. Because I wanted to know more about the stable of actors and not the one-offs, I filtered my actor column to include only those who had appeared in more than one of Anderson’s films (i.e., names that showed up on the list two or more times). I also clicked a helpful button that automatically color-coded columns so that the films appeared in orange and the actors in blue. This resulted in a visualization just complex enough to be worth my interrogating and/or playing with, yet fixed or structured enough to keep my queries contained.

As far as reading these visualizations go, it’s something like this: Anderson’s first three films fall bottom-left; his next three films fall top-center; and his three most recent films fall bottom-right. Thus, the blue dots bottom-left are actors featured among the first three films only; blue dots bottom-center are actors who appear consistently throughout Anderson’s work; and blue dots bottom-right are actors included among his most recent films. As you can see by hovering over an individual actor node: the data suggests (e.g.) that Bill Murray is the most central (or at least, most frequently recurring) actor in the Anderson oeuvre, appearing in eight of the nine feature-length films; meanwhile, Tilda Swinton, along with fellow heavyweights Ed Norton and Harvey Keitel, appears to be a more recent Anderson favorite, surfacing in each of his last three films.

Also of interest: the name Eric Chase Anderson sits right next to Murray at the center of the network; Eric is the brother of Wes, the illustrator of much of what we associate with Wes Anderson’s aesthetic, and apparently also an actor in the vast majority of his brother’s films. (I’m not sure this find would have surfaced as quickly without the visualization.)

Elsewhere, the data suggests that Anderson’s first film Bottle Rocket was more of a boutique operation that consisted of a relatively small number of repeat actors (8), only two of which–Kumar Pallana and Owen Wilson–appeared in films beyond the first three. Anderson’s seventh film The Grand Budapest Hotel, released nearly twenty years later, expanded to include a considerable number of repeat actors (22: the highest total on the list), nine of whom were first “introduced” to the Anderson universe here and subsequently appeared in the next film or two.

I wonder what we would see if we visualized nodes according to some sort of sliding scale from “lead actor” to “ensemble actor” in each of these films, perhaps by implementing darker/more vibrant edges depending on screen time or number of lines? Would Bill Murray be more or less central than he is now? Would Eric Chase Anderson materialize at all?

And I wonder what opportunities there are to further visualize nodes based on actor prestige (say, award nominations and wins get you a bigger circle) or to create “famous actor” heat maps (maybe actors within X number of years of a major award nomination or win get hot reds and others cool blues) that might show us how Anderson’s casting choices change over time to include more big names. Conversely, what could these theoretical large but cool-temperature circles tell us about Anderson’s use of repeat “no-name” character actors to flesh out his wolds?

Further, I wonder if there are ways of using machine learning to analyze these networks and to predict the likelihood of certain actors’ being cast in Anderson’s next film based on previous appearances (i.e., the “once you’re in, you’re in” phenomenon) or recent success. Could we compare the Anderson stable versus, say, the Sofia Coppola or Martin Scorsese stables, to learn about casting preferences or actor “types”?

Ten Things: Mapping the Eclipse Archive’s “Black Radical Tradition”

1 // Most of my reading and writing centers on poetic experiments. Usually the adjectives involved include at least one from a short list that is: computational, constraint-based, conceptual. Other common adjectives are avant-garde and radical, the latter of which appears twice in the source material for my mapping praxis.

2 // Constraint-based, conceptual poet Craig Dworkin manages Eclipse, the free on-line archive focusing on digital facsimiles of the most radical small-press writing from the last quarter century. I return to the Eclipse archive regularly to look at works from poets like Clark Coolidge, Lyn Hejinian, Bernadette Meyer, and Michael Palmer. These are the poets with whom I most familiar. There are many poets in this particular archive with whom I am not familiar at all. In fact, I would say most. These are the poets with whom I want to get familiar. My sense is that I would say most of the poets with whom I am not familiar at all, given their proximity in this particular archive to those poets with whom I am familiar, deserve to have I would say most of their work looked at regularly alongside the others’.

3 // “Given their proximity in this particular archive…”: I am jumping ahead and have one eye on our third dataset/network praxis assignment, wondering to what extent spatial, temporal, racial, gendered, and influential proximity manifests in this particular network of poetic experiments. Conceptual poetry is notoriously white and male, but where isn’t it that way? Where are the radical and avant-garde titles that aren’t being looked at? Where are they? With one eye on our third praxis assignment, I start building a dataset to use for the second. I start with the Black Radical Tradition.

4 // As a rule, for each title in the archive, Eclipse offers: a graf on the title’s publication and material history, a facsimile view of each page, and a PDF download. With lousy Amtrak wifi, I let the facsimiles of each of the 39 titles in the Black Radical Tradition slowly drip down my screen. I don’t yet know what I’ll want for my dataset down the line, but to get started I try to snag from Dworkin’s notes and the first three/last three pages the most obvious data points: author, title, publisher, publication date. Because Eclipse features both authored titles and edited volumes, I learn to add a column to distinguish between the two. I soon add another column to capture notes on the edition, usually to reflect whether the title is part of a series or is significantly different in a subsequent printing. Because I aim to map these spatially–I’m guessing these will cluster on the coasts, but I don’t know this for sure–I snag addresses (street, city, state, zip, country) for each of the publishers. Except for Russell Atkins’s Juxtapositions, which Dworkin notes is self-published and for which I can find no address.

5 // I start my map with ArcGIS’s simplest template, noting two other available templates–the Story Map Shortlist, which allows you to curate sets of places like Great Places in America‘s three “neighborhoods,” “public spaces,” and “streets” maps, and the Story Map Swipe, which allows you to swipe between two contiguous maps like in the Hurricane Florence Damage Viewer–that I might return to in the future if I want to, say, provide curated maps by individual poet, or else compare “publisher maps” of the Black Radical Tradition and the L=A=N=G=U=A=G=E poets (another set of titles in the Eclipse archive).

6 // Even with the basic template, I experience four early issues with ArcGIS:

First, the map doesn’t recognize, and therefore can’t map, the addresses for each of my three United Kingdom-based publishers. This seems to be a limit of the free version of ArcGIS or possibly the specific template I am working with. This is problematic because it keeps me from making an international analysis or comparison, if I want to.

As I click ahead without a lot of customization, the default visualization presented to me assigns each author a different colored circle (fine). The problem with this is that it, for some reason, lumps four of the poets into a single grey color as “Other,” making it impossible to distinguish Bob Kaufman in San Francisco from Joseph Jarman in Chicago.  Those in the grey “Other” category each have one title to their name, but, confusingly, so do several “named” authors, including Fred Moten in green and Gwendolyn Brooks in purple.

Third, beyond placing a dot on each location (fine), the map suggests and kind of defaults to confusing aesthetic labels/styles, such as making the size of the dot correspond to its publication year. In my first map, the big dots signal the most recently published title, which, worse than telling me nothing, appears to tell me something it doesn’t, like how many titles were published out of a single city or zip code. The correlation between year and dot size seems irrelevant, and ArcGIS is unable to read my data in such a way as to offer me any other categories to filter on (e.g., number of titles by a single author in the dataset, so that more prolific authors look bigger, or smaller, I’m not sure).

Once I make all the dots equally sized, a fourth problem appears: from a fully scoped-out view, multiple authors published in the same city (e.g. San Francisco) vanish under whichever colored circle (here: grey) sits “on top.” This masks the fact that San Francisco houses three publishers, not just one. You don’t know it until you drill down nearly all the way (and, even then, you can barely see it: I had to draw arrows for you).

7 // I test out the same dataset in Google Maps, just to compare. I find the upload both faster and more intuitive. Google Maps is also able to handle all three of my UK addresses, better than the ArcGICS zero. Unlike in ArcGIS, though, Google Maps isunable to map one of my P.O. boxes in Chicago, despite having a working zip code; this is almost certainly a problem with my formatting of the data set, but Google Maps does virtually nothing to let me know what the actual problem is or how I can fix it. Nevertheless, Google Maps proves to be more responsive and easier to see (big pins rather than small circles), so I continue my mapping exploration there.

8 // A sample case study: my dataset tells me that New York in 1970 saw the publication of Lloyd Addison’s Beau-Cocoa Volume 3 Numbers 1 and 2 in Harlem; Tom Weatherly’s Mau Mau American Cantos from Corinth Press in the West Village; and N. H. Pritchard’s The Matrix from Doubleday in Garden City, Long Island. When I look on the map, the triangulation of these 1970 titles “uptown,” “downtown,” and “out of town” roughly corresponds to the distribution of other titles in the following decade. Is there any correlation between the spatial placement of publishers and the qualities of the individual literary titles? Do downtown titles resemble each other in some ways, out of town titles in other ways? Is the location of the publisher as important as, say, the location of the author–and even then, would I want the hometown, the known residence(s) at the time of writing, the city or the neighborhood?

9 // And what about this “around the corner” phenomenon I see in New York, where clusters of titles are published on the same block as one another. My dataset is small–a larger one would tell me more–but, as a gathering hypothesis, perhaps there’s something to having a single author’s titles “walk up the street,” moving through both space and time. What, or who, motivates this walk? There’s a narrative to it. What might the narrative be in, say, Harlem, where after publishing the first two instances (Volume 1 and Volume 2 Number 1) of the periodical Beau-Cocoa from (his home?) 100 East 123 Street, editor/poet/publisher Lloyd Addison moves (in the middle of 1969) Beau Cocoa, Inc. to a P.O. box at the post office around the corner. Did an increased national or international demand for this periodical require more firepower than Addison’s personal mailbox?

And what might the narrative be in the West Village, where Tom Weatherly publishes his 1970 Mau Mau American Cantos and his 1971 Thumbprint with two publishers in a four block radius? A larger dataset might show me a network of poets publishing within this neighborhood. Could it lead me to finding information about poetry readings, salons, collaborative projects? (I’m making a leap without evidence here to evoke a possible trajectory.)

10 // Future steps could have me expand this dataset to include data from the rest of the titles in the Eclipse archive (see #5 // above). It could also go the other direction and have me double down on collecting bibliographic data for these authors in the Black Radical Tradition: the material details and individual printings of their titles (some of which Dworkin provides in an unstructured way, but I skipped over during my first pass through my emerging dataset), perhaps performances of individual poems from these titles that have been documented in poetry/sound archives like PennSound, maybe related titles (by these authors, by others) in other “little databases” like UbuWeb. Stay tuned.