Announcements, CU Community

A Brief History of Computer-Assisted Reporting


It’s been more than a year since The New York Times declared this The Age of Big Data, but for most Americans, the news really hit home on election night, 2012.

Nate Silver’s uncannily accurate predictions about how the presidential race would turn out made him one of the most talked about people of the campaign, even in media circles, where the journalistic merit of Silver’s statistically driven work was vigorously debated.

Yet Silver’s work is arguably less revolution than evolution, one facet of a journalistic practice that has actually been around for decades, even if, like Silver, it only recently made it into the mainstream.

“We started out with this a long time ago—before the Web, before even reasonably simple computers,” says Sarah Cohen, editor of the computer-assisted reporting (CAR) team at The New York Times. As early as the late 1960s, journalists like Philip Meyer and Elliott Jaspin were using social science methods and data analysis—sometimes with the help of mainframe computers—to generate and test their journalistic hypotheses. “That was how a generation of us learned what [computer-assisted reporting] was,” says Cohen.

CAR is a practice that, while producing powerful results (see the Pulitzers of Jaspin, Meyer, Dedman, and others) for many years existed only at the margins of most newsrooms, the domain of a few motivated reporters. For much of that time, the methods of CAR hewed closely to those described in Meyer’s seminal book, Precision Journalism, and the tools remained fairly constant: spreadsheets, database software, and, eventually online resources. Likewise, the end product was the same as for any other news story: a printed text article.

In recent years, however, a slew of new terms have filtered into journalists’ vocabularies and job titles, like data journalism, computational journalism, news apps, and data visualization. To the uninitiated, what these descriptors mean—much less how they differ—may seem inscrutable. Yet even to insiders, their intersections and boundaries are often hard to resolve, and somewhere behind the semantics hovers a difficult question: Are these just new methods for executing the old jobs of journalism, or are they a fundamentally new philosophy of what journalism can be?

“In terms of terminology, I think it can be both misleading and enlightening,” says Troy Thibodeaux, the editor for newsroom innovation at the Associated Press. “It’s a very strange thing, because we’re all doing very closely related work.”

But what is it?

Perhaps the first step in discussing these practices is to distinguish between process and product. News apps and data visualization generally describe a class of publishing formats, usually a combination of graphics (interactive or otherwise) and reader-accessible databases. Because these end products are typically driven by relatively substantial data sets, their development often shares processes with CAR, data journalism, and computational journalism. In theory, at least, the latter group is format agnostic, more concerned with the mechanisms of reporting than the form of the output.

“CAR reporters are good at getting records,” says Reg Chua, data editor at Thomson Reuters. “A lot of CAR is data journalism; it’s interrogating data. Computational journalism represents a new step in what you can do—use of computers, and the processing power of computers and programming, to do types of reporting that were unimaginable even a few years ago.”

Harnessing that computational power, however, has meant bringing new practitioners into the field, and their ideas come from outside the typical CAR tradition.

“Now there’s this whole other path of people who were developers who have a very different perspectives,” says Thibodeaux.

At this year’s National Institute of Computer-Assisted Reporting (NICAR) conference in early March, Thibodeaux created and moderated a panel called “From CAR to newsapps and back again,” composed of two-person teams that have collaborated to produce some of the most influential work in digital journalism.

While on the whole the tone of the panel was mutually complimentary, Sarah Cohen conceded that some journalists still tend to trivialize the visual aspects of journalism.

“There are still some editors, though they are fewer and fewer, who really just think of graphics and interactive as just the candy,” rather than a legitimate news format, she says.

For their part, however, the developers present seemed to welcome a move towards more story-driven news apps and visualizations.

“News apps are now edited, which is fairly new,” said Derek Willis, interactive developer at The New York Times. “I think we now hope to treat the editing process as seriously as you do with any story, including asking, ‘Does this work? Does it deserve to stay up?’ I think this has been the growth in the domain.”

Yet while many news app developers will agree that news apps need story, they also assert that journalism needs news apps, which Thibodeaux says do “the thing that a story can’t do, which is let you drill down.”

Rather than focusing only on individual, moment-in-time accounts, Chua says, journalistic publishing needs to include work that is both more focused and more incremental. “The real example of this is Homicide Watch: It updates in essentially real time, and you can drop in anytime and see what the trends are.” This sort of in-progress publishing, Chua believes, is essential, “if we want to get all the value of all the reporting we do every day, and also better serve these communities.”

What’s next?

Whether or not they agree on the need to diversify the way news is published, CAR reporters, data editors, and news app developers alike see new technologies changing the way that journalism is both conceptualized and executed.

As much was indicated by the strong impression made on many attendees by Jeff Larson and Chase Davis’s NICAR presentation, “Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom.”

“I’m pretty conservative on this stuff,” says Thibodeaux. “Source reporting leads to the best data reporting.” But after Larson and Davis’s presentation, he says, he can see how “the techniques start to act like sources. The tools let us ask questions that we couldn’t even conceive of before.”

Likewise, Cohen sees significant opportunities in algorithmic document analysis. “Our ability to make sense of messy original records has been revolutionized,” she says.

Whether the broader use of data science tools to do journalism will increase the acceptance of work like Silver’s remains to be seen, but his methods are more likely to be embraced than abandoned. If nothing else, the economic advantages of offloading more work to machines is hard to finesse:

“We don’t have the financial wherewithal to waste the kind of time we waste,” says Cohen. “If we spend a week doing document analysis that could be done by an algorithm, then we deserve to be replaced by machines.”

“We need to reserve the work for things that take human creativity and human insight.”

Announcements, CU Community

What the Tesla Affair Tells Us About Data Journalism


Consider for a moment two scenarios.

One, a malicious energy reporter tasked with reviewing an electric car decides he is going to fake the review. Part of this fictional narrative, is that the car needs to run out of battery power sometime in the review. He arrives at one of the charging stations, and instead of plugging in, spends a few minutes circling the parking lot trying to drain the battery.

Second, an energy reporter is tasked with reviewing the potential of a new electric car charging network. He arrived at one of the charging location in the dark, and can’t find the charging station. He drives around the parking lot several times looking for it, before finding it and charging his car.

Here is the thing. As Craig Silverman recently pointed out to me, we actually have no idea, based on the interpretation of the review data released by Tesla, which narrative is true. All the data shows is a car driving around a parking lot. And here in lies the principle lesson from the whole Tesla affair: Data is laden with intentionality, and cannot be removed from the context in which it was derived. We do not know, from these data alone, what happened in that parking lot.

David Brooks touched on this very issue in a recent (somewhat overly maligned in my opinion) column on the limits to big data. While his Italian banking analogy felt misplaced, there is actually a large amount of research backing up his general themes. And his point that data struggles with context, is directly relevant to the Tesla dispute:

Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. The human brain has evolved to account for this reality. People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.

In the case of the Tesla review, it is this context that was both poorly recorded by Broder, and which is missing from the Tesla data analysis. This does not mean the analysis is wrong.  But it does mean it’s incomplete.

A couple of further points about the role data played in this journalistic dispute.

First, the early triumphalism against the New York Times in the name of both Telsa and data transparency, were clearly premature. In Tesla’s grand rebuttal, Musk clearly overplayed his rhetorical hand by arguing that the review was faked, but he also overstated both the case he could make with the data, as well as the level of transparency that he was actually providing. Tesla didn’t release the data from the review. Telsa released their interpretation of the data from the review. This interpretation took the form of the graphical representation they choose to give it, as well as the subjective write-up they imposed on it.

What is interesting is that even with this limited and selective data release (ie, without the raw data), entirely different narrative interpretations could be built. Broder and his New York Times team presented one. But Rebecca Greenfield at the Atlantic  provided an even more detailed one. There are likely elements of truth scattered across these three interpretations of the data.  But they are just that - interpretations.

Second, the only person who can provide the needed context to this data is Broder, the reviewer himself. And the only way he can convey this information is if we trust him. Because of his “problems with precision and judgement,” as the New York Times’ Public Editor Margaret Sullivan put it, his trust was devalued. So the missing journalistic piece to this story is lost. Even in a world of data journalism, trust, integrity and journalistic process still matter. In fact, they matter all the more.

Finally, we can’t lose sight of the outcome Tesla wanted from this. They wanted PR for their new vehicle. So amongst all of the righteous indignation, it is worth noting that journalistic principles are not their core objective – good stories about their products are. These may or may not be aligned. This is why, for example, Broder was given significant support and access during his review trip (some of which ultimately proved to be misguided).

An example of this discrepancy surrounds the one clear reality about the Model S (and presumably electric cars in general) that was revealed in the review – they lose significant charge when not plugged in during cold weather. Now, Tesla would rather this fact had not emerged in the review. But it did. And as Steven Johnson pointed out, this has significant implications, specifically for city drivers. For one, it makes parking the Tesla S on the street in the winter (what many urban dwellers would have to do), largely impractical.

So, to recap. The Tesla Affair reinforces that: data does not equal fact; that context matters enormously to data journalism; that trust and documentation are even more important in a world of data journalism; and that companies will continue to prioritize positive PR over good journalism in reviews of their products.

CU Community

Tow is hiring!


The Tow Center is hiring a research administrator.

The research administrator will work closely with the Director and the Associate Research Director on the management of grant funds, research support for Tow projects, and day-to-day administrative tasks related to the grants, the Tow website, and Tow events.

The position is a one year temporary appointment, with the possibility of renewal up to a second year.

The ideal candidate will manage grant funds and process. Manage spending and ensures compliance with terms and restrictions for ongoing, proposed and new initiatives. Work closely with PI’s, funders, finance and budget managers to review budget proposals, submissions and revisions. Write and review grant proposals and reports.

Click here for more information and to apply.

CU Community, The Tow Center

What can journalism learn from computer science?


Journalism needs an algorithm. That’s not to say machines should replace reporters, but that reporters should be thinking more like machines: systematically. From computer programs that automate news stories, to data-driven narratives and mobile app development — journalism’s relationship with computer science is becoming ever more involved. Integrating technology into journalism, however, doesn’t simply mean installing Excel on newsroom computers, or teaching journalism students basic HTML and CSS. Applying core computing concepts to reporting and story telling can not only improve journalists’ production efficiency, but also shape their narratives. (more…)

CU Community

Columbia University and Reuters to work on Advanced Data Visualization Project


Columbia University and Thomson Reuters announced the launch of the Advanced Data Visualization Project (ADVP) based at Columbia’s Graduate School of Architecture, Planning and Preservation (GSAPP). The initiative, sponsored by Thomson Reuters, will facilitate research into data visualization and its implications for academia and industry in a world increasingly awash with data.

Read the full Reuters press release here.


(Photo: AP /Diane Bondareff)

CU Community

Contribute to the Blog


We welcome and encourage submissions to the Tow blog. In particular, we’re looking for people to cover events, tell us How It’s Made or write tips and tutorials. We’d also love to hear from you if you have a critique of our research.

We accept submissions from anyone, including student journalists, as long as the pitch is in line with Tow’s areas of interest. The best way to get a sense of the type of content we’d accept, is to have a look around the site for a clearer picture of our work.

If you’re like to contribute, contact Tow’s DMA Anna Codrea-Rado with an outline of your idea.