Follow our coverage of the co-hosted Dart Center and Tow Center symposium on Sandy Hook, breaking news and trauma.
Follow our coverage of the co-hosted Dart Center and Tow Center symposium on Sandy Hook, breaking news and trauma.
It’s been more than a year since The New York Times declared this The Age of Big Data, but for most Americans, the news really hit home on election night, 2012.
Nate Silver’s uncannily accurate predictions about how the presidential race would turn out made him one of the most talked about people of the campaign, even in media circles, where the journalistic merit of Silver’s statistically driven work was vigorously debated.
Yet Silver’s work is arguably less revolution than evolution, one facet of a journalistic practice that has actually been around for decades, even if, like Silver, it only recently made it into the mainstream.
“We started out with this a long time ago—before the Web, before even reasonably simple computers,” says Sarah Cohen, editor of the computer-assisted reporting (CAR) team at The New York Times. As early as the late 1960s, journalists like Philip Meyer and Elliott Jaspin were using social science methods and data analysis—sometimes with the help of mainframe computers—to generate and test their journalistic hypotheses. “That was how a generation of us learned what [computer-assisted reporting] was,” says Cohen.
CAR is a practice that, while producing powerful results (see the Pulitzers of Jaspin, Meyer, Dedman, and others) for many years existed only at the margins of most newsrooms, the domain of a few motivated reporters. For much of that time, the methods of CAR hewed closely to those described in Meyer’s seminal book, Precision Journalism, and the tools remained fairly constant: spreadsheets, database software, and, eventually online resources. Likewise, the end product was the same as for any other news story: a printed text article.
In recent years, however, a slew of new terms have filtered into journalists’ vocabularies and job titles, like data journalism, computational journalism, news apps, and data visualization. To the uninitiated, what these descriptors mean—much less how they differ—may seem inscrutable. Yet even to insiders, their intersections and boundaries are often hard to resolve, and somewhere behind the semantics hovers a difficult question: Are these just new methods for executing the old jobs of journalism, or are they a fundamentally new philosophy of what journalism can be?
“In terms of terminology, I think it can be both misleading and enlightening,” says Troy Thibodeaux, the editor for newsroom innovation at the Associated Press. “It’s a very strange thing, because we’re all doing very closely related work.”
But what is it?
Perhaps the first step in discussing these practices is to distinguish between process and product. News apps and data visualization generally describe a class of publishing formats, usually a combination of graphics (interactive or otherwise) and reader-accessible databases. Because these end products are typically driven by relatively substantial data sets, their development often shares processes with CAR, data journalism, and computational journalism. In theory, at least, the latter group is format agnostic, more concerned with the mechanisms of reporting than the form of the output.
“CAR reporters are good at getting records,” says Reg Chua, data editor at Thomson Reuters. “A lot of CAR is data journalism; it’s interrogating data. Computational journalism represents a new step in what you can do—use of computers, and the processing power of computers and programming, to do types of reporting that were unimaginable even a few years ago.”
Harnessing that computational power, however, has meant bringing new practitioners into the field, and their ideas come from outside the typical CAR tradition.
“Now there’s this whole other path of people who were developers who have a very different perspectives,” says Thibodeaux.
At this year’s National Institute of Computer-Assisted Reporting (NICAR) conference in early March, Thibodeaux created and moderated a panel called “From CAR to newsapps and back again,” composed of two-person teams that have collaborated to produce some of the most influential work in digital journalism.
While on the whole the tone of the panel was mutually complimentary, Sarah Cohen conceded that some journalists still tend to trivialize the visual aspects of journalism.
“There are still some editors, though they are fewer and fewer, who really just think of graphics and interactive as just the candy,” rather than a legitimate news format, she says.
For their part, however, the developers present seemed to welcome a move towards more story-driven news apps and visualizations.
“News apps are now edited, which is fairly new,” said Derek Willis, interactive developer at The New York Times. “I think we now hope to treat the editing process as seriously as you do with any story, including asking, ‘Does this work? Does it deserve to stay up?’ I think this has been the growth in the domain.”
Yet while many news app developers will agree that news apps need story, they also assert that journalism needs news apps, which Thibodeaux says do “the thing that a story can’t do, which is let you drill down.”
Rather than focusing only on individual, moment-in-time accounts, Chua says, journalistic publishing needs to include work that is both more focused and more incremental. “The real example of this is Homicide Watch: It updates in essentially real time, and you can drop in anytime and see what the trends are.” This sort of in-progress publishing, Chua believes, is essential, “if we want to get all the value of all the reporting we do every day, and also better serve these communities.”
Whether or not they agree on the need to diversify the way news is published, CAR reporters, data editors, and news app developers alike see new technologies changing the way that journalism is both conceptualized and executed.
As much was indicated by the strong impression made on many attendees by Jeff Larson and Chase Davis’s NICAR presentation, “Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom.”
“I’m pretty conservative on this stuff,” says Thibodeaux. “Source reporting leads to the best data reporting.” But after Larson and Davis’s presentation, he says, he can see how “the techniques start to act like sources. The tools let us ask questions that we couldn’t even conceive of before.”
Likewise, Cohen sees significant opportunities in algorithmic document analysis. “Our ability to make sense of messy original records has been revolutionized,” she says.
Whether the broader use of data science tools to do journalism will increase the acceptance of work like Silver’s remains to be seen, but his methods are more likely to be embraced than abandoned. If nothing else, the economic advantages of offloading more work to machines is hard to finesse:
“We don’t have the financial wherewithal to waste the kind of time we waste,” says Cohen. “If we spend a week doing document analysis that could be done by an algorithm, then we deserve to be replaced by machines.”
“We need to reserve the work for things that take human creativity and human insight.”
A collection of online resources about data journalism:
+ Jonathan Stray’s Computational Journalism Class. Journalism and Media Studies Center at the University of Hong Kong, Spring 2013
+ Brian Boyer, Hacker Journalism 101
+ Online Book: The New Precision Journalism, Philip Meyer
+ The Tech Data Journalism Bootcamp, Civic Media Lab, MIT
+ The Reporters Lab, Tools, Techniques and Research for Public Affairs Reporting. Sarah Cohen’s and now Tyler Dukes’
+ Facts Are Sacred: The Power of Data, Simon Rogers
Coding the news now has a manifesto. ProPublica’s developers launched a series of news application guides, including a coding manifesto, this morning. The guides, which all live on GitHub, are intended to give insight into the programming ethos of the non-profit investigative journalism outfit. As the manifesto says, “We’re not making any general statements about anything beyond the environment we know: Doing journalism on deadline with code.”
Scott Klein, Jeff Larson and Jennifer LaFleur wrote the guides, which include a news app style guide, a data check-list and a design guide. These resources add to the ever-growing community of news application developers, many of whom are actively blogging about and sharing their working processes.
How western technology companies are helping autocratic governments monitor and control their citizens
There is a new arms race emerging between people who want to communicate freely and securely and governments that want to monitor and limit this communication. In democratic countries, this government interference ranges from the mass monitoring of telecoms to flirtations with cutting off social media flows and shutting down cell towers in protest areas. When autocratic countries face crisis and conflict, however, the battle for control over communication is more troublesome and the risks are more acute.
Linking the interference being run by governments in democratic and autocratic countries is the technologies being deployed by both. And therein lies a paradox: The tools that enable autocratic governments to monitor and control their citizens are produced by western technology companies.
Much like the arms trade, this often creates an awkward scenario in which western countries end up supporting opposition movements that are fighting against technology bought from western countries. Sometimes this collusion backfires in provocative and potentially controversial ways. For example, in Syria, American journalist Marie Colvin and French photographer Rémi Ochlik were killed by a mortar attack that was most likely carried out by targeting their satellite phones. It is widely held that this technology was provided by western companies.
There are many recent examples of this phenomenon, especially within the context of the Arab Spring. High profile technology companies such as Gamma (UK) and FinSpy offered surveillance services to regimes in Egypt, Tunisia, Libya, Bahrain, and Syria. Google Engineers discovered contract proposals between Gamma and the Mubarak regime – €250,000 worth of spy technology that would “enable them [Egypt] to intercept dissidents’ emails, record audio and video chats, and take copies of computer hard drives.” The SpyFiles operation by Wikileaks and Privacy International further revealed 287 documents indicating that these surveillance companies such French arms dealer, Amesys, sold both spyware and malware technologies (including Trojans) to the Gaddafi regime.
The Citizen Lab at the University of Toronto has uncovered a wide range of examples of complicity between western companies and authoritarian regimes. Most recently, it showed that devices manufactured by Blue Coat Systems, a California-based hardware company, were in use by 61 countries, with histories of human rights abuses. In 2011, it detailed how Syria used Blue Coat software to both censor the Internet and root out particular activities linked to pro-democracy activists.
Western governments use this same type of commercial filtering and monitoring technology to monitor and restrict the online behaviour of their employees. This means that western governments could very well be implicitly supporting private companies that develop technologies that assist the oppressive regimes the oppose.
Indeed, if one were to attend a trade show for such technologies, as a Washington Post journalist recently did, one would find more than 35 United States federal agencies buying the very same technologies as the autocrats. As reported in the Atlantic, Jerry Lucas, who runs a trade show called ISS world, which is known as the “Wiretapper’s Ball” was asked by the Guardian whether he would be comfortable with Zimbabwe and North Korea buying technology at his trade shows. He responded, “That’s just not my job to determine who’s a bad country and who’s a good country. ”That’s not our business, we’re not politicians … we’re a for-profit company. Our business is bringing governments together who want to buy this technology.”
The U.S. State Department, which has spent $70 million promoting internet freedom abroad, is part of a government that has few regulations on the trade of the technology that prevents such freedom. A bill has been before the United States Congress to prevent the sale of this technology to “Internet-restricting countries” since 2006, but the bill faces implementation challenges, as the list of countries in question now includes most nation states. And there are other real limits to what western governments can do, due to both the scale of the industry, estimated at $5 billion a year globally, and the limits of contemporary international law.
There have been some positive steps: Last year a U.S. congressional subcommittee passed the Global Online Freedom Act (GOFA), “creating a new transparency standard for Internet companies listed on U.S. stock exchanges and operating in countries that substantially censor or control the Internet.” The GOFA would force U.S. companies listed on the U.S. Stock exchange to release information on their human rights due diligence.
Of course, these technologies have the potential to be used for both positive and negative impact (they are dual-use). This poses a particular challenge to governments trying to use these technologies for good. For example, the U.S. government is funding Commotion Wireless, a sophisticated hacking project that seeks to enable activists by undermining internet censorship in countries such as Syria and Iran, however the FBI recently warned that these same anonymizing and encryption tools might be “indicators of terrorist activities.”
The question for policymakers, then, is whether anything beyond challenging regulatory measures can be done to overcome the dual-use dilemma, or whether it is simply a fact of life in a radically open operating environment. Whatever the reply, a relatively simple place to start would be to support the development of technologies that empower individuals, rather than enabling the production and trade of tools used for surveillance and oppression.
For example, a Swedish research team recently developed a new tool that allows Tor communication (a tool that anonymizes internet use) to be cloaked within services like Skype in order to circumvent recent changes to the Chinese “firewall” that had compromised those who used those services. Similarly, a team at Columbia University’s Graduate School of Journalism, in partnership with Stanford Computer Science, has built an app called Dispatch that allows for secure communication between journalists and their sources in areas of conflict. Another app, Silent Circle, allows users to send encrypted files of up to 60 megabytes via text message. These are tools that our governments should support. One can even imagine a virtual embassy incentivizing such projects. Too often, however, these surveillance-evading tools ruffle the feathers of autocratic and democratic governments alike.
What we are ultimately seeing is an arms race between oppressive governments and their citizens. It is high time that our democratically elected governments cease supporting, either tacitly or explicitly, the technologies enabling government surveillance.
Consider for a moment two scenarios.
One, a malicious energy reporter tasked with reviewing an electric car decides he is going to fake the review. Part of this fictional narrative, is that the car needs to run out of battery power sometime in the review. He arrives at one of the charging stations, and instead of plugging in, spends a few minutes circling the parking lot trying to drain the battery.
Second, an energy reporter is tasked with reviewing the potential of a new electric car charging network. He arrived at one of the charging location in the dark, and can’t find the charging station. He drives around the parking lot several times looking for it, before finding it and charging his car.
Here is the thing. As Craig Silverman recently pointed out to me, we actually have no idea, based on the interpretation of the review data released by Tesla, which narrative is true. All the data shows is a car driving around a parking lot. And here in lies the principle lesson from the whole Tesla affair: Data is laden with intentionality, and cannot be removed from the context in which it was derived. We do not know, from these data alone, what happened in that parking lot.
David Brooks touched on this very issue in a recent (somewhat overly maligned in my opinion) column on the limits to big data. While his Italian banking analogy felt misplaced, there is actually a large amount of research backing up his general themes. And his point that data struggles with context, is directly relevant to the Tesla dispute:
Data struggles with context. Human decisions are not discrete events. They are embedded in sequences and contexts. The human brain has evolved to account for this reality. People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.
In the case of the Tesla review, it is this context that was both poorly recorded by Broder, and which is missing from the Tesla data analysis. This does not mean the analysis is wrong. But it does mean it’s incomplete.
A couple of further points about the role data played in this journalistic dispute.
First, the early triumphalism against the New York Times in the name of both Telsa and data transparency, were clearly premature. In Tesla’s grand rebuttal, Musk clearly overplayed his rhetorical hand by arguing that the review was faked, but he also overstated both the case he could make with the data, as well as the level of transparency that he was actually providing. Tesla didn’t release the data from the review. Telsa released their interpretation of the data from the review. This interpretation took the form of the graphical representation they choose to give it, as well as the subjective write-up they imposed on it.
What is interesting is that even with this limited and selective data release (ie, without the raw data), entirely different narrative interpretations could be built. Broder and his New York Times team presented one. But Rebecca Greenfield at the Atlantic provided an even more detailed one. There are likely elements of truth scattered across these three interpretations of the data. But they are just that - interpretations.
Second, the only person who can provide the needed context to this data is Broder, the reviewer himself. And the only way he can convey this information is if we trust him. Because of his “problems with precision and judgement,” as the New York Times’ Public Editor Margaret Sullivan put it, his trust was devalued. So the missing journalistic piece to this story is lost. Even in a world of data journalism, trust, integrity and journalistic process still matter. In fact, they matter all the more.
Finally, we can’t lose sight of the outcome Tesla wanted from this. They wanted PR for their new vehicle. So amongst all of the righteous indignation, it is worth noting that journalistic principles are not their core objective – good stories about their products are. These may or may not be aligned. This is why, for example, Broder was given significant support and access during his review trip (some of which ultimately proved to be misguided).
An example of this discrepancy surrounds the one clear reality about the Model S (and presumably electric cars in general) that was revealed in the review – they lose significant charge when not plugged in during cold weather. Now, Tesla would rather this fact had not emerged in the review. But it did. And as Steven Johnson pointed out, this has significant implications, specifically for city drivers. For one, it makes parking the Tesla S on the street in the winter (what many urban dwellers would have to do), largely impractical.
So, to recap. The Tesla Affair reinforces that: data does not equal fact; that context matters enormously to data journalism; that trust and documentation are even more important in a world of data journalism; and that companies will continue to prioritize positive PR over good journalism in reviews of their products.
The Pew Research Center is launching a data blog. The nonpartisan think tank is currently recruiting a senior writer to lead this new project which will produce editorial content based on the data Pew produces, president Alan Murray said.