Digital Humanities for What?

ASA 2012 talk, with sources:

American Studies as a field remains committed to social activism and critique of power relations, on the one hand, and confronts the current rapid transformation of technologies, on the other. It has a strong “political consciousness,” but lacks a necessary “technological consciousness.” Given that algorithmic knowledge has become indispensable for research on contemporary social protest–research that depends, among other things, on born digital archives–what use Digital Humanities, a field that does have a strong “technological consciousness,” can be to critical American Studies?

Digital archives and search ranking algorithms structure the way we access news and conduct research, from the Google Scholar search engine, to facial recognition and text mining algorithms used by digital humanities scholars. Research on contemporary social movements especially relies on data provided via the Internet in general and social media in particular. Just one example: when in March 2011 Egyptian protesters broke into Hosni Mubarak’s state security buildings, one of the first places they posted the scanned secret police files was Facebook.

Of course, Twitter did not cause the Arab Spring. But we can learn from online tools like Twitter about the development of global social movements and political sensibilities. As Wikileaks, Arab Spring, and Occupy movements unfolded, Twitter users repeatedly complained that the service did not “trend” these seemingly globally important events–they didn’t appear on the most popular tweets list. At the time, Tarleton Gillespie asked, “Can an Algorithm Be Wrong?” His answer was basically “Yes.” He explained that Twitter programmers chose to build the algorithm that only registered short-term spikes in themes, not the themes that persist and slowly build momentum. Twitter is thus incapable of seeing social movements: if it existed in the mid-twentieth century, it would have trended the Watts Riots; but would have completely missed the “long civil rights movement” crucial for understanding why the riots happened.

But that does not mean that all algorithms cannot see historical social change; it means that Twitter’s particular algorithm has been designed specifically, if unintentionally, to ignore it. Gillespie argues that Twitter algorithm’s social blindness—it could see Justin Bieber’s haircut but couldn’t see hundreds of people in Zuccotti Park–was an unlucky byproduct of Twitter’s programming. “These algorithms are not perfect,” he argues, “they are still cudgels, where one might want scalpels.” We need to understand their limitations and ways to improve their analytical power.

As we do this, it is useful to look back in Internet history as well as forward. Consider the history of “These Weapons of Mass Destruction Cannot Be Displayed” page, a spoof of a Windows Explorer 404 error page that used to come up when a web page was unavailable. The mock version offers advice such as “click About US foreign policy to determine what regime they will install,” and “If you are an Old European Country trying to protect your interests, … scroll to the Head in the Sand section and check settings for your exports to Iraq.” Antony Cox, a pharmacist from West Midlands, England, created this page as a private joke in February 2003 but within five months it went viral. By July, the fake error page came up when one searched in Google for “weapons of mass destruction” and clicked the “I’m feeling lucky” button. This unexpected top ranking demonstrated a shift in global opinion: a gradual belief of a growing number of people over time that George W. Bush lied. Google PageRank algorithm was not intentionally designed to show that kind of global public opinion but in fact it was the first instance of “trending” that demonstrated the emergence of a mass anti-war political consciousness.

Unfortunately, net historians and Google programmers have interpreted this serendipitous algorithmic political insight as an error. Today, Wikipedia includes the page among its examples of “Google bombs”–intentional attempts to “game the algorithm” and place a webpage at the top of Google rankings. Google programmers later redesigned PageRank to make such manipulation more difficult, in the process also making the detection of popular belief represented by the “Weapons of Mass Destruction” error page impossible. If we knew more about the past and present of algorithmic programming, we could more easily discern the particular politics algorithms acquire because of people and institutions that create them, and also could better recognize the unexpected political insights that certain iterations of algorithms can provide.

Digital Humanities is the one field that intersects with American Studies where knowledge of coding and algorithms is a requirement. This field have been justly credited with opening public access to scholarly research, creating new jobs and careers for humanities scholars in and beyond academia, and pioneering new methods of collecting and processing data, such as digital archives, text mining, and visualization. Nevertheless, there is still a certain disconnect between the critical American Studies tradition, focused on power relations, and digital humanities, where the fight for open access and collaboration often overwhelms concerns for underlying social inequalities. While pioneering digital humanists in the 1990s vowed to share the task of interpreting history and society with lay audiences, today builders of digital tools and archives herald a non-interpretive, “post-theoretical age,” analogous to the cataloguing and collating trend in the late nineteenth and early twentieth century science. (For more on hacking vs. theory debate, see “Conversations,” in the Winter 2011 issue of the Journal of Digital Humanities.)

Digital humanists (myself included) tout experimentation and unfinished projects, and perhaps for this reason the critical insights these projects have provided have been minor so far. One promising data mining study, With Criminal Intent, analyzed the Old Bailey database of trial records of the Central Criminal Court in London from 1674 to 1913. The results included the frequent mention of coffee in poison cases and the rise in plea bargains from the second quarter of the nineteenth century. These two findings still need to be explained; they may not be, however, because the project ran out of funding and the scholars moved on. Another well-known text mining project, on the works of Agatha Christie, at the University of Toronto, confirmed that Christie had Alzheimers–by her early eighties, her vocabulary decreased and her use of indefinite nouns increased; both features are signs of the illness. According to press reports, this discovery proved more useful for medical research on the disease than for the study of literature.

Moreover, scholars who use computational methods sometimes overstate their power or understate their problematic provenance. Franco Moretti, in his influential book Graphs, Maps, Trees, proposes that “distant reading,” via data mining of thousands of texts, will replace conventional reading of individual novels. However, once after a talk Moretti was forced to admit that the computational analysis he described in his book would not have been possible without him reading several novels closely in the first place. An innovative online historical project, The Real Face of White Australia, uses facial recognition software to find and catalogue documents on non-white immigrants and indigenous residents in Australian archives. But the site does not consider that software used by the project has roots in algorithms originally developed for surveillance purposes and built on the very assumptions about exclusion, appearance, and race that the project is trying to contest.

I would suggest that DH technical experiments lack self-awareness because their research lacks a proper purpose. For example, scholars of contemporary U.S. imperialism working with digital archives–often stolen data provided by transparency groups like Wikileaks and Anonymous–could benefit from digital humanities methods such as text mining and visualization. Unfortunately, they do not have the technical expertise to fully examine these archives. Jeremy Kuzmarov used Wikileaks U.S. war logs and diplomatic dispatches for his book Modernizing Repression, which describes how American officials trained police in occupied territories and client nations, from the Philippines to Latin America to Iraq and Afghanistan. Kuzmarov employed two research assistants to go through the Wikileaks documents and find data relevant to a small part of his book. To read and analyze the entire set without the help of digital tools would be nearly impossible.

But digital humanities researchers do not seem to be interested in materials and questions concerning U.S. imperialism. I tried to take an informal poll of digital humanities scholars on whether they would be interested in analyzing Wikileaks data sets. Most did not reply at all. One responded, “I don’t think the data is really a good fit for the … interests of the DH folks. … With so much other material and so many wide open topics it just seems like a bad move to jump into this kind of a hornets nest. This is to say, there is a lot of data out there to explore that comes without any baggage.” This idea, that one data set is just as good as the other for computational experiments is symptomatic of the entire field. It also seems antithetical to the culture of commitment in American Studies.

We are left with the gap between political commitment on the one hand, and technical knowledge on the other. In 1939, sociologist Robert S. Lynd entitled his book about the place of social science in American culture Knowledge for What? One could ask the same question of the digital humanities.

Science and Technology Studies Caucus Sponsored Session, “What Is the Future of Technology in American Studies?” American Studies Association annual meeting, San Antonio, Texas, November 2012.