Blog

categories

tags

RELATED Posts

Event: Doing Data Journalism

The archived video of this event can also be accessed here.

A panel of six of journalism’s movers and shakers convened at Columbia Journalism School on March 28 to debate the current state of data-centered reporting and interactive visualizations.

The panel, moderated by Columbia professor Susan McGregor, tackled numerous issues surrounding data journalism, but their first hurdle was to simply define data journalism — a type of storytelling quickly gaining momentum in newsrooms.

“Data journalism is just journalism,” said Julia Angwin, Wall Street Journal’s technology editor.

Angwin likened the collection of data sets to the age-old process of conducting interviews. The difference is that the technology available today allows journalists to examine data sets more exhaustively, beyond the limits of interviews and common knowledge, she added.

Angwin has worked on projects like “What They Know” which examined the tangled nature of online privacy. Her team secured data using code forensics, which she said helps “break stories” and “expand journalism.”

Jo Craven McGinty, projects editor for Computer Assisted Reporting at The New York Times, called data journalism “documents reporting on steroids,” implying data journalism allows journalists to dive into larger and more complicated data sets with the help of database systems and spreadsheets.

Scott Klein, editor of news applications at ProPublica, said the field of data journalism should also recognize the potential of “news applications” which weigh the presentation of data as greatly as its gathering, reporting and analysis.

Web scraping for jouranalism

Blog post on data scraping for ProPublica’s “Dollars for Docs.” Photo: Rani Molla.

ProPublica projects like Dollars for Docs — which examined doctor payoffs from drug companies using data scraped from pharmaceutical websites — allows users to search for their own doctors and view any payments they received.

Klein said this type of user interface was a key component of data journalism: “It can tell your personal story…and how it matters to you.”

But the use of data or technology in storytelling does not change the inherent concepts of journalism, Klein added.

“This is journalism that is native to the web, but it’s still just journalism,” he said. “The rules all still apply, the methodology is the same, the rigor is the same…the editorial judgement is all the same.”

It is the concept of data, though, that might need restructuring, according to Aron Pilhofer, editor of interactive news at The New York Times. Tools like Document Cloud (which he and Klein helped develop) allow even plain text documents to become data, by enriching them with metadata and providing search functionality.

The panel then turned to a discussion about which comes first: the data or the story. The panelists unanimously agreed the story idea almost always leads to the data research.

But data analysis rarely — almost never, according to Mo Tamman, a Reuters data journalist — yields the expected results. It almost always reroutes the story to an unanticipated conclusion.

Angwin stressed this kind of journalism can be thought of “testing hypotheses.” It is ultimately using data to verify or rethink story ideas.

Tamman added it is crucial to bring in outside experts almost immediately and “suck their brains dry” in order to better understand, authenticate and contextualize the meaning of the data.

But as the burgeoning practices of data journalism expands, newsrooms must adapt, according to Angwin. Newsrooms are currently “allergic to margins of error,” she said, and they must learn to cope with results that cannot be verified 100 percent — a typical situation when dealing with large data sets.

Newsrooms must also become more math friendly and data literate — something McGinty says can mean simply knowing when data and documents can successfully augment one’s storytelling.

Embracing data journalism may even support new business models, such as the new joint venture from Reuters and The New York Times data teams which will offer “white glove” Olympics coverage, Pilhofer said. However, even small changes in newsrooms — like seating data teams together — can be essential in fostering innovative thinking among the staffers.

And most importantly, Tamman said, newsrooms and journalists doing data driven journalism must incorporate into their reporting practice the process of finding a story’s “empirical spine.”

The fundamentals of this process rely on using data analysis to develop the story’s hypothesis, and then allowing the reporting to “flow” from that analysis or “spine.” This process contrasts with the practice of many journalists, panelists said, who only look to data after substantively completing their story — sometimes to discover that that story is completely inconsistent with the data.

Technical consultant and privacy expert Ashkan Soltani says this issue can be addressed in part by having a reporter seek out qualitative interviews while a data team independently looks into the quantitative data, thereby simultaneously obtaining both sides of the story.

“You can come together and ask ‘Do they confirm each other or have different findings?’,” he said. “That can then merge together to form the spine of the story.”

Angwin says another solution could stem from news organizations collecting their own data sets.

“Data itself is political,” Angwin said, referring to choices and process involved in gathering data.  If news organizations amass their own data, she said, it could help reporters find the data that best addresses the questions raised by their qualitative reporting – something existing data sets are not always sufficient to do.

The panelists also debated the best platform to convey a data driven story, but ultimately felt nuance can be expressed in graphic visualizations just as well as in long-form narratives or news apps.

It is the journalistic backbone and purpose of such pieces — which use data intelligently and appropriately — that truly makes them data journalism. Visualizations or data sets without these qualities don’t deserve the title.

As Pilhofer put it: “If you aren’t telling a story in the presentation piece or approaching it with a journalistic intent, then you’re wasting everyone’s time.”

Featured image by Rani Molla.