Blog

categories

tags

Tow Tea: Network Analysis for Investigations

“Everything is connected” may be a popular refrain, but as journalists we of course always want to know, “In what way?” This was the theme for the final Tow Tea of the 2015 fall semester, where Joe Karaganis from American Assembly, a public policy institute hosted by Columbia University, and Matthew Weber, network analyst and assistant professor of the School of Communication and Information at Rutgers University discussed their use of network analysis for research and investigation. In a packed room, the presentations and subsequent Q&A, moderated by Susan McGregor, assistant director of the Tow Center at Columbia Journalism School, centered around the concept and practical applications and problems of analyzing networks.

 

What kind of networks are good subjects for investigative analysis? As Weber explains, any kinds of group where members can be connected according to a commonality they share can be interrogated this way. Weber’s own research, for example, focuses on the news ecosystem and the way the Internet’s technological disruption altered that ecosystem during the last 20 years. Showing some of his recent work focusing on media organizations in New Jersey, Weber presented visualizations of the relationship between digital and traditional organizations in the local New Jersey news industry. An abstract map of colored cluster points representing publishers and news websites, modeled the spread and relative “distance” between the two. Whereas traditional newspapers and emergent news websites occupied clearly demarcated spaces in the media industry around 1998, the colored cluster points began to merge and mingle in a more integrated whole by 2006.

Networks

“Network maps are visualized data,” said Weber, referring to the double nature of such maps: at once visually legible enough to provide insight, yet abstract enough to display the entities or members in a group (“nodes”) and the relationship between members (“edges” or lines representing the relationship) in multiple ways.

 

Joe Karaganis, meanwhile, employs in practice what Weber researches: he uses network analysis to map slow or large-scale trends in public policy, markets, and intellectual property. Karaganis’s work showcases the various examples of network-based analysis in action, such as in the American Assembly’s Media Piracy Report on the systematic intellectual property theft in developing countries, or littlesis.org (as opposed to Big Brother), a grassroots watchdog organization which maps connections between members of the social elite and financial organizations in the United States, based on campaign finance data. Matching political candidates and recurring transactions from familiar donors are a key methodology in this project, which heavily relies on network analysis.

 

“How do you tell stories with huge datasets? Do you use clouds? Topography?” asks Karaganis with reference to the problem often met by investigative—and more particularly: data—journalists. While extensive datasets can make attractive sources due to their apparent empirical authority and the promise of computational analysis, it can be very difficult to present findings from data in such a way that the conclusions are meaningful to a broad audience.

 

The problematic relationship between big data and a good story is a recurring one in many of the Journalism School’s conferences and workshops throughout this last semester. As McGregor pointed out during the panel discussion: “You don’t need a p-value below 0.05 to write a good story.” But the flip-side, unfortunately, is also true: “Sometimes data doesn’t pan out into a story, [but] they get published anyway because of the pretty pictures,” said Karaganis.

 

Beyond the philosophical deliberation, data and the network maps based on data are increasingly useful—even indispensable—in a journalist’s work today. But how does one begin using network analysis?

 

Weber’s advice is to think of it first in simple terms. “For people in journalism: first ask right questions, find specific questions. For example: take legislation A and legislation B. Find the connections and record them in an Excel file. Excel then can calculate centrality.”

 

Measures of centrality—along with “distance,” “bridge,” and “degree”—are the basic conceptual building blocks of network analysis. What entities in a network are close to one other? Which member bridges two major clusters? What are the distances between particular nodes? These are the features of a networked group that can be computed and eventually visualized in a network.

 

Karaganis recommends kumu.io as an easy-to use free online tool for building visual presentations on networks. Weber suggests Excel and NODxL (a Windows-only add-on), which allows users to create lists of network data.

 

But open-source visualization tools are aplenty: Weber cites gephi.org, a beta-version of a free software which draws network maps based on imported data. The School of Communication and Information at Rutgers has also released its own piece of code for network analysis via github (aekeus). Finally, Karaganis advises students to take advantage of Lada Adamic’s (University of Michigan) elaborate tutorial on social network analysis on Coursera.
Network analysis is an increasingly popular way to approach data, more and more of which seems to be interconnected: financial transactions, social interactions, documents, court decisions and so on. But while computational tools are essential in performing this kind of analysis, Karaganis stressed that crafting good analyses and visualizations requires both subject knowledge and creativity: “At this point,” he said, “It’s more like an art than a science.”