essay helper





Imagining a newsroom powered by artificial intelligence

Artificial intelligence can help journalists automatically adapt stories to the personalities, moods and locations of their readers.


The News and information ecosystem is in the midst of change — again.

Mobile-first consumption is on the rise, smart homes are becoming mainstream and connected cars will soon take over the roads of major cities around the world.

Smart devices will require “smart content.” It’s only a matter of time before artificial intelligence (AI) becomes the backbone of the media industry of the future.

A change in how we interact with news

Today, most people find information via search or social. And while these two channels are radically different in functionality, they have one thing in common — any given article surfaced through these platforms is exactly the same for everyone in the world.

For example, The New York Times article, “Health Officials Split Over Advice on Pregnancy in Zika Areas,” reads the same to me, a 20-something male living in New York, and for an expecting mother in her mid-30s residing in Rio de Janeiro.

Content today is one size fits all. And why wouldn’t it be? A journalist writes a story hoping to reach as many people as possible.

Search and social help tailor information choices to individuals to a degree, but Google, Facebook and Twitter know that artificial intelligence will fundamentally change the equation. That’s why, since 2013, these companies have been investing substantial resources into the space and acquiring startups.

In Facebook Messenger, for example, several news organization such as CNN and The Wall Street Journal are already using bots and some level of automation to deliver news through the platform.

Beyond automation: What is artificial intelligence?

Artificial intelligence understands the environment it operates in and performs certain actions as a result of it. AI seeks to learn what its users want and how they want it.

In the specific case of news media, articles can be processed through algorithms that analyze readers’ locations, social media posts and other publicly available data. They can then be served content tailored to their personality, mood and social economic status, among other things.

AI allows journalists and media companies to create infinite versions of an article, resulting in increasingly relevant information that speaks directly to individuals — ultimately forming a more engaged audience.

Examples of artificial intelligence, machine learning and automation in content creation

Crystal is a program that adapts emails you write to the personality of recipients. For example, if you’re sending a note to a more laid-back person, the software suggests a change in tone from a formal introduction such as “Dear John” to a more colloquial “Hi” or “Hey.”

Crystal uses previous emails to that recipient as well as their social media posts to recommend certain language, tone and sentiment. It’s easy to see how this approach can be adapted for a newsroom –in fact, it can build off of pioneering efforts already underway such as automated earnings reports by The Associated Press. The Tow Center has also conducted an extensive analysis of the implications of automation in journalism.

Artificial intelligence can even localize stories. If you live in California, you might not read a story entitled “Texas residents poisoned by toxic waste plant for years.” But if the story included an automated note highlighting a similar past incident in your city, you would probably be more inclined to look at it.

The future of journalism powered by artificial intelligence

Beyond tailoring content to users, AI can help journalists do more investigative work by analyzing massive sets of data and pointing to relationships not easily visible to even the most experienced reporter.

While this technology can improve efficiencies in newsrooms, though, it should work in tandem with journalists, not replace them. Going forward, the challenge will be to make sure we continue to adhere to our standards and ethics.

What will the future of news look like when it becomes powered by AI? We will soon find out!

Francesco Marconi is the strategy manager at The Associated Press and a Innovation Fellow at the Tow Center.

What is the NYC Media Ecosystem?

How do you define news media in New York City? Who counts? Who doesn’t count? What companies are at the center of the New York City media hub? What about freelancers? New Jersey?

These questions are just some of the challenges that we faced as we began our research into understanding the 21st century newsroom workforce.  

Technological disruption continues to impact the news industry, and many organizations are grappling with the transition to a digital environment, as well as the increasing importance of mobile and social technologies. According to the 2015 Reuters Institute Digital News Report, 25% of those sampled across the globe use a smartphone as the main device for news consumption; in the US alone, 44% use a smartphone to access the news. Further, the report found that 41% of the sample use Facebook for news each week.  The findings are reinforced by Pew’s 2015 study, which found that half of Web-using adults in the US get political news from Facebook.  

So how do modern newsrooms respond to these changes in audience news consumption behavior?

Our research examines changes in the nature of newsroom workforces and changes in the prerequisite skills for news company workers. Digital and data-centric roles requiring computational science and advanced analytic skills occupy a key role in the reinvented production and distribution of news. This is evident, for example, in the ways newsrooms are integrating algorithmic and automated production into traditional news processes. Moreover, a new area of professional expertise is emerging as computational skills and journalistic practice integrate (see our Tow colleagues’ work on Muck, specifically directed at reducing barriers between programmers and non-programmers in the newsroom). The new space is quickly developing, and yet our understanding of this transformation is skin deep.

Our goal is to present a systematic analysis of the challenges facing managers of modern news organizations as newsrooms adapt to increasing complexity and new skill sets in the digital news environment. Our starting point in this research is the NYC media market; this is a biased sample, as it clearly is one of the most active media markets in the country, but it’s also an opportunity to examine a market that is at the forefront of digital boundaries.

Of course, the process must begin with a defensible and representative sample of news media outlets. But again, what counts? We used a combination of databases and sources, including Cision PR and LinkedIn, in order to get a sense of the major news organizations headquartered in NYC. We focused exclusively on organizations that produce daily news, and those that are headquartered in NYC. These were necessary decisions to create a manageable yet defensible study.

This resulted in a sample population consisting of 8,027 employees from a list of 17 organizations including: TheBlaze, NowThis, Slate, The Daily Beast, Mic, Patch, FOX Business Network, MSNBC, BuzzFeed, The New York Daily News, The Wall Street Journal, The Huffington Post, FOX News Channel, NBC News, CBS News, ABC News, and The New York Times.
This provides us with a starting point for analyzing the different employment histories of the individuals working at these organizations. There are clearly a number of limitations with this study, as there are with any research. Factors such as data and resource availability mean that you need to make tough decisions about what you’re going to focus your analysis on. Moreover, we’re working with imperfect data. US Census data is outdated; Cision PR database serves another purpose, and LinkedIn data is self-reported, but these are all examples of “best available” data, and we’re working to creatively address many of the challenges associated with this project.


This network visualization maps the job histories of 300 employees from the Newsroom21 pilot study analyzing the workforces at Asbury Park Press, Vox Media, and New York Daily News. Organizations are connected with other organizations when multiple employees came from a given organization.

This network visualization maps the job histories of 300 employees from the Newsroom21 pilot study analyzing the workforces at Asbury Park Press, Vox Media, and New York Daily News. Organizations are connected with other organizations when multiple employees came from a given organization.

Computational Campaign Coverage with the PollyVote: The data

Our goal within the research project Computational Campaign Coverage is to (a) develop a fully automated news platform for covering forecasts of the 2016 US presidential election and (b) analyze how people perceive the quality of the automated news content. For creating the automated news, we rely on forecasting data provided by the project. As described in a previous post, the PollyVote has successfully predicted elections since 2004 by applying evidence-based forecasting principles. In particular, the PollyVote applies the principle of combining by averaging forecasts from six different methods, namely polls, prediction markets, expert judgment, citizen forecasts, econometric models, and index models.

In order to generate automated news from these data, the first step is to ensure that the underlying data are available and of high quality. That is, you want to have data that are accurate and complete. This blog post describes our efforts in gathering these data and transferring them to a format that can be used to automatically generate news.

The PollyVote method and the underlying data are published in peer-reviewed scientific journals and are thus fully transparent and publicly available. Since the PollyVote incorporates all available forecasts in the combination, the dataset is quite extensive. For example, the data that were used to predict the 2012 election include nearly 8,000 individual daily forecasts (e.g., individual polls or model predictions). Note, however, that this figure only refers to predictions at the national (popular vote) level. If one also includes forecasts at the state level, which is our goal for the 2016 election, the dataset grows dramatically. Needless to say, this situation perfectly meets to conditions under which automation is most useful: if (a) there are good data available and (b) a large number of routine news stories need to be written.

For generating automated news stories, we collaborate with the German company AX Semantics, which is responsible for developing the underlying algorithms. Therefore, a first challenge within our project was to develop an interface through which AX Semantics can automatically obtain the PollyVote data in a structured (i.e., machine-readable) format. To allow for this possibility, project member Mario Haim developed an API, which contains both historical and 2016 forecast data for the combined PollyVote as well as its components at the national and the state level. However, access to the API is not limited to our project partners. Instead, in an effort to make our procedures fully transparent, we decided to make all data publicly available and free to use under the MIT license. Interested users may obtain data through a specific URL, and a dedicated API call generator allows for specifying an exact request. Details on the data as well as instructions for how to obtain them can be found here. Also, note that this is work in progress. Please write to us if you find any errors in the data.

In the next post, I will describe our approach for generating automated news articles, some of which have already been published in both English and German language. Note, however, that we are still early in the process. The quality of the texts will further improve. Yet, we decided to start publishing right away so that users can track how the texts have improved over time.

Creating Community at the International Journalism Festival


(Photo credit to Andrew DeVigal)

Last week I had the pleasure of attending the International Journalism Festival in Perugia, Italy – a distinctly medieval town wherein we discussed decidedly modern issues about the future of journalism. In its 10th yearly incarnation, the event attracts throngs of journalists from across Europe, and the world, eager to learn about the latest in the field. I participated in two panels, one called “Can a Robot Do My Job?” (No, obviously) and another about “Creating Community”, which I’ll recap in this post (you can also watch the video here). Creating Community was a discussion between Greg Barber (The Washington Post), Mary Hamilton (The Guardian), Mathew Ingram (Fortune Magazine), myself, Nick Diakopoulos, (University of Maryland), and moderated by Federica Cherubini.

Greg kicked off the panel by defining community as an “interaction among people” including sharing of information, and acknowledging the range and diversity of types of communities that emerge around news information. Mary spoke about serving different types of community on The Guardian including communities around shared circumstances, desires, aspirations, and needs. Each of those types of communities can be served in different ways, sometimes by meeting and finding communities that are forming in other (off-site) places. Mathew spoke about the ideal or fictitious community that we all think exists, but doesn’t really. Communities are filled with real human beings, some of whom are flawed. The struggle of media companies is to deal with the community they have rather than some ideal community.

For my part I spoke about some of the ways that algorithms might be able to shape community. In particular to help (1) filter out the low quality (profane, vulgar, inappropriate) stuff, (2) surface and highlight the really great comments, (3) identify and build social context that can inform moderation decisions, and (4) alert moderators to threads or articles where “having an adult in the room” might be good for getting the conversation back on track. The Coral Project is working on some of these problems, and other tools, like KeepCon can help with moderating out the low end. My own project at UMD called CommentIQ is focused on trying to automatically rank comments based on editorial quality criteria so that really good and interesting comments can be surfaced.

A key point of discussion that emerged was what to do about the context collapse of so many different types of communities operating in what is essentially the same public space on news websites. From a design perspective you can have small communities and large communities, communities with high turnover or low turnover, and indeed communities that only need to be around for a day versus those that could last a lifetime. Different people can have vastly different motivations for being in an online news community: some want more facts, others want to be entertained, or to test their opinion out and develop their identity. Essentially there are a multitude of communities on any given news site, and each of those communities might need different feeding and care. Each of these types of communities potentially demands a different suitable design such as in the way contributions are moderated, in the use of “real” names or anonymity, and in the “rewards” that people are seeking in their community interactions. In terms of designing an online community it’s hard to be everything to everyone all the time.

Underlining the discussion about design challenges for creating a successful sociotechnical commenting system was this simple observation: News organizations first need to define what they want to do with their online communities. What’s the strategy? It’s only when news organizations get past the simple conceptualization of comments as something you hang off the bottom of the page, and start thinking about them as something that’s there strategically, that they’ll be able to reap the benefits and rewards of cultivating relationships with the individuals in those communities.

Report Launch: The Curious Journalist’s Guide to Data

By Efrat Nechushtai

On March 24, the Tow Center launched “The Curious Journalist’s Guide to Data” – a research project led by Tow Fellow Jonathan Stray. (The report is available to download and read at the Tow Center’s GitBook repository.) In this book, Stray examines the principles behind data journalism and, more broadly, the fundamental ideas behind the human tradition of counting things.

The launch event, held at Columbia Journalism School, featured a presentation of the report followed by a panel discussion with Meredith Broussard, Assistant Professor at the Arthur L. Carter Journalism Institute of New York University; Mark Hansen, Director of the David and Helen Gurley Brown Institute for Media Innovation; and Scott Klein, Assistant Managing Editor at ProPublica. A full audio recording of the discussion is available on SoundCloud.

The event started with a short presentation by Stray, who explained why journalists in particular will benefit from adding quantitative concepts to their toolkit: “Sometimes you look at a chart and you think you see the story, but do you really? There’s more than one story you can pull out of a dataset. In fact, there’s more than one story you can pull out of a single number. Which one do you report? That’s a point of journalistic ethics. The story that you can’t prove wrong is your best shot. If you want to prove that something didn’t happen by chance, calculate how unlikely it is.”

In the panel discussion that followed, Mark Hansen noted, “Every discipline on campus is seeing its core artifacts digitized and opened to some kind of data analytics, whether we’re talking about History, English, Architecture, Business, and so on. In journalism, as part of this larger process, we have tremendous possibilities to tell stories in dramatically new, engaging, and frankly beautiful ways, that have nothing to do with spreadsheets. We should look for data opportunities, and it doesn’t always have to mean going to the census – there are so many things we can bring in.”

“Data is socially constructed,” said Meredith Broussard. “One of the things I say to my students is that working with data is not the same as doing pure math. Data is about people counting things, so if you can understand people, you can understand data.”

When asked by the audience about the option of relying on expert statisticians, Scott Klein said, “We are starting to see more complicated models, but I would say, know how to calculate the odds. That’s a basic, basic skill that will tell you if something that happened is unusual. That will solve a lot of problems in your stories.”

Broussard added, “I recommend the buddy system – having a buddy that you can talk to about all the different kinds of topics that you walk into when you’re a journalist. Have a buddy who’s an accountant, a lawyer, a doctor, a mathematician. When I get stuck on something as a journalist, or when I feel like I’m in it over my head, I call my buddy who’s a mathematician and we talk about it.”

Panel members were then asked how news teams can render complicated graphs and figures on mobile phones, where more people are likely to read the story. “My only advice is to care enough to do it,” said Klein. “It’s a long process of testing, and it’s worth doing. I think that there is no such thing as mobile content. Wherever it is I’m reading at that moment, I want to know the same things.”


Efrat Nechushtai is a current Ph.D. candidate at the Columbia University Graduate School of Journalism. 

New Report — Engaging Communities Through Solutions Journalism

On April 14, the Tow Center launched “Engaging Communities Through Solutions Journalism” – a research project led by Tow Fellows Andrea Wenzel, Daniela Gerson, and Evelyn Moreno in collaboration with the Metamorphosis research group.

The report is available to download and read at the Tow Center’s GitBook repository.

Executive Summary

Substantive local news is a rare commodity in many communities across the United States. For areas with high levels of violence, crime, and poverty, this absence can be compounded by a history of stigmatization. Often the only local news available is negative.

This report explores potential impacts of local solutions journalism, particularly for underrepresented and stigmatized communities. Solutions journalism explores responses to systemic social problems—critically examining problem-solving efforts that have the potential to be scaled.

Proponents of this genre suggest these stories offer a pathway to engaging audiences. Preliminary research suggests readers of solutions-oriented stories may be more likely to share articles and seek related information.

However, little research has explored solutions journalism at the local level or in stigmatized communities. This study attempts to address this gap. Following a community-based media project in South Los Angeles, six focus groups were held with 48 African American and Latino residents examining how participants responded to the solutions journalism format.

Study findings illustrate how residents navigate and critically interpret mainstream local coverage—often using alternative digital sources to cross-check stories and seek information. The study also suggests that these residents will respond positively to solutions journalism —though participants’ enthusiasm may be tempered by larger concerns regarding structural inequalities. Participants suggested they would be more likely to seek out news and share stories if solutions journalism was more common, and many suggested these stories helped them envision a way to become personally involved in community problem-solving.

Introducing “Beyond 140 Characters”: A Tow Center project about the forces that shape journalists’ strategic Twitter engagement

id-iom, CC BY-NC 2.0

[Photo credit: id-iom, CC BY-NC 2.0]

In 2009, The Washington Post’s Paul Farhi wrote a feature for the American Journalism Review on what he called the “Twitter Explosion”. He questioned the micro-blogging platform as the “world’s most overhyped technology” and “latest media obsession” and set out to argue for its value as a journalistic reporting tool. Seven years are a lifetime in the tech world and today, we don’t need much convincing (if any at all) that Twitter has long since found its place in American newsrooms and journalists’ digital lives.

In an attempt to control the negative repercussions and benefit from the positive ones of the “Twitter Explosion,” legacy media organizations have since created social media news desks and came up with social media training for reporters as well as organizational policies that guide desirable, acceptable and punishable behavior (in some cases more successfully so than in others, take, for example, CNN’s recent suspension of Elise Labott).

Journalists themselves, like every new Twitter user at the time, had to go through a trial and error phase to discover the risks and opportunities of Twitter. We have witnessed a learning curve that united early adopters, laggards, and everyone in between at the promise of capitalizing on what Twitter can do for them as professionals and for news production as a whole. Oftentimes, as we know from previous research, journalists have embraced Twitter to break news, disseminate content, source stories and engage their audiences with an unprecedented immediacy. While none of these are genuinely novel elements in the news cycle, Twitter as a medium enables professional journalists to pursue these activities in a broader variety of ways and under profoundly different conditions than ever before.

Despite an overall stagnating user base and recent reports of Twitter struggling as a business, it remains as one of the key social media platforms for journalists and news organizations. For example, the numbers of users (62% according to Pew data from 2015) for whom Twitter serves as a source for news about events and issues outside the realm of friends and family are consistently rising. More surprisingly, they now cut across nearly every demographic group.

Over the short span of just a few years, news organizations and journalists have been propelled into a time where they are now ever more strategic in identifying the prospects of platforms like Twitter. Amidst the myriad opportunities of the current age of post-industrial journalism, they are pondering new monetization strategies, the power of clicks, user generated content and crowdsourcing.

If the on-going social media hype puts pressures on journalists to constantly be active on platforms like Twitter, and to capitalize on the many affordances of digital technologies, then what are the underlying forces that shape the conditions, degree and breadth of their Twitter engagement? Many of their activities on Twitter are directly observable, but we know little about why, how and when they engage in one way or another, or which outcomes (both actual and sought after) journalists’ efforts on the platform, ultimately, generate.

As a Tow Fellow, I’ve spent the fall of 2015 speaking to 26 political journalists about the motivations and strategies that shape and drive their Twitter use, as well as the benefits they get out of engaging with the platform and its users. These journalists work for some of the most influential legacy media organizations in the country, both in print and broadcasting. While I’m still sifting through the data I’ve collected, I can already say as much: journalists’ individual approaches to and skills of tweeting vary and range across a broad spectrum, from submitting to the mandates of organizational social media policies, to competitive pressures and “fear of missing out”, or making substantive investments into Twitter because the platform is viewed as a potential career asset in an unstable labor market.

So watch this space for an upcoming mini-series of blog posts that discuss the project’s key findings.

Illuminating 2016: Helping political reporters cover social media during the 2016 presidential campaign

Although Donald Trump seems to be getting a lot of love from journalists who cover what he is tweeting about, all of the presidential candidates are tweeting, posting, Snapchatting, and Instagramming. Their social media not only drives what reports say about the campaigns, but also mobilizes their supporters, generates needed cash, and draws out opponents in extended tit-for-tats.

Logo-Comps-02Several projects have begun to visualize the frequency of social media postings by candidates, mostly on Twitter, reporting the numbers of followers and changes in follower rates, for example, but none are looking carefully at what the candidates are actually saying. Part of the reason for that is that it’s easy to count frequencies of already structured data, like number of supporters. It’s much harder to categorize, analyze, and count what they are saying in the unstructured data of tweets and Facebook posts.

That’s where Illuminating 2016 comes in. Our goal is to advance public understanding of what the presidential candidates are saying through their social media accounts. We are doing that by using start-of-the-art computational approaches for studying unstructured text.

We’ve been collecting all of the announced major party candidates’ Twitter and Facebook posts since they declared their presidential bids. In all we have filled 6 servers with 24 presidential candidates social media messages, and of course we’re still collecting. We’ve developed categories for classifying the candidates’ messages, looking at their strategic messages that promote their policies or attack their opponents, their calls-to-action, their conversations with the public, and their attempts to inform. We’ve trained computer models to categorize the messages along 9 categories, and have so far achieved a 70% accuracy with the classifications (by comparing how the algorithm performs when categorizing data where the “truth” has been established). Our categories include: attack, advocacy, image, issue, endorsement, calls-to-action, conversation, information, and ceremonial.

Our next step is to learn more about what political reporters need to help them understand what the candidates are saying on social media. We will be conducting interviews in the month of April. We also need to learn how journalists can best make use of the data we have been collecting and generating to support their job of covering what has become a remarkable election year. In May, we will do user tests with journalists to see what they find useful and what they don’t based on our data and visualizations so far.

Capturing Journalistic Practices and Norms in the Digital Age

In my research on journalism and emerging media, I am often asked to answer questions about the fast changing nature and uses of social media. How can researchers capture the constantly changing uses and nature of social media of journalists across time? And how can we use this knowledge to help academics, students, and practitioners adapt to the fast changing media world? These are two important questions that Monika Djerf-Pierre, Marina Ghersetti, and Ulrika Hedman (from the Institutionen för journalistik, medier och kommunikation) answered in a recent article.

In their article, titled “Appropriating Social Media. The changing uses of social media among journalists across time”, Djerf-Pierre, Ghersetti, and Hedman examined a number of pertinent questions: To what extent do journalists use social media, and how has the usage changed across time? How are the changing uses of social media related to journalists’ personal (age, gender) and organizational (type of work, workplace location, workplace) identities and positions? And which social media affordances do journalists recognize and value, and how does the valuation change across time?

In 2012 and 2014, the authors conducted a web survey with Swedish journalists. They found that journalistic usage of social media increased slightly in the two years between their studies. However, they argued that “some early adopters were abandoning social media, and there was a noticeable decline in the journalists’ valuation of social media affordances.” These are important findings that show how journalistic uses of social media vary through time.

My colleagues (Colin Agur and Nick Frisch) and I have also found similar results in our study of chat apps. To build on the survey findings of Djerf-Pierre et al. we conducted in-depth interviews. Our findings suggest that during crises, citizens tend to contribute more user-generated content, and journalists tend to use social media more to gather on-the-ground sources first account of the events. I made a similar argument in my book Social Media at BBC News: The Re-Making of Crisis Reporting (Routledge, 2015). FirstDraftNews, too, has done a marvellous job documenting journalistic uses of social media in crises.

Several factors influence the type and level of journalistic uses of chat apps during political unrest. For example, journalists who have strong knowledge of the subject matter and context (e.g. geography, culture, and language) will have advantages over journalists who have flown in to report a story. Large organizations offer advantages to their journalists: they have in-house technical teams and a collaborative approach to chat app networks. Large organizations use these advantages to maximize the institutional benefits of network effects. This is an important finding that we will reveal more in our final report.

To learn more about our upcoming report, join Colin Agur and me at the 2016 International Journalism Festival panel titled The good, the bad and the ugly: roadmap to mobile technologies and digital journalism. We will talk about how mobile journalism is changing the ways reporters and audiences produce news. We will explore the ways mobile technology is used in news production, as well as the limits and privacy and security issues associated with it.

(co-written with Colin Agur, Yale University)

New Report – The Curious Journalist’s Guide to Data

On March 24, the Tow Center launched “The Curious Journalist’s Guide to Data” – a research project led by Tow Fellow Jonathan Stray. The launch event, held at Columbia Journalism School, featured a presentation of the report followed by a panel discussion with Meredith Broussard, Assistant Professor at the Arthur L. Carter Journalism Institute of New York University; Mark Hansen, Director of the David and Helen Gurley Brown Institute for Media Innovation; and Scott Klein, Assistant Managing Editor at ProPublica.

The report is available to download and read at the Tow Center’s GitBook repository.

Executive Summary

This is a book about the principles behind data journalism. Not what visualization software to use and how to scrape a website, but the fundamental ideas that underlie the human use of data. This isn’t “how to use data” but “how data works.”

This gets into some of the mathy parts of statistics, but also the difficulty of taking a census of race and the cognitive psychology of probabilities. It traces where data comes from, what journalists do with it, and where it goes after—and tries to understand the possibilities and limitations. Data journalism is as interdisciplinary as it gets, which can make it difficult to assemble all the pieces you need. This is one attempt. This is a technical book, and uses standard technical language, but all mathematical concepts are explained through pictures and examples rather than formulas.

The life of data has three parts: quantification, analysis, and communication. Quantification is the process that creates data. Analysis involves rearranging the data or combining it with other information to produce new knowledge. And none of this is useful without communicating the result.

Quantification is a problem without a home. Although physicists study measurement extensively, physical theory doesn’t say much about how to quantify things like “educational attainment” or even “unemployment.” There are deep philosophical issues here, but the most useful question to a journalist is simply, how was this data created? Data is useful because it represents the world, but we can only understand data if we correctly understand how it came to be. Representation through data is never perfect: all data has error. Randomly sampled surveys are both a powerful quantification technique and the prototype for all measurement error, so this report explains where the margin of error comes from and what it means – from first principles, using pictures.

All data analysis is really data interpretation, which requires much more than math. Data needs context to mean anything at all: Imagine if someone gave you a spreadsheet with no column names. Each data set could be the source of many different stories, and there is no objective theory that tells us which true stories are the best. But the stories still have to be true, which is where data journalism relies on established statistical principles. The theory of statistics solves several problems: accounting for the possibility that the pattern you see in the data was purely a fluke, reasoning from incomplete and conflicting information, and attempting to isolate causes. Stats has been taught as something mysterious, but it’s not. The analysis chapter centers on a single problem – asking if an earlier bar closing time really did reduce assaults in a downtown neighborhood – and traces through the entire process of analysis by explaining the statistical principles invoked at each step, building up to the state-of-the-art methods of Bayesian inference and causal graphs.

A story isn’t isn’t finished until you’ve communicated your results. Data visualization works because it relies on the biology of human visual perception, just as all data communication relies on human cognitive processing. People tend to overestimate small risks and underestimate large risks; examples leave a much stronger impression than statistics; and data about some will, unconsciously, come to represent all, no matter how well you warn that your sample doesn’t generalize. If you’re not aware of these issues you can leave people with skewed impressions or reinforce harmful stereotypes. The journalist isn’t only responsible for what they put in the story, but what ends up in the mind of the audience.

This report brings together many fields to explore where data comes from, how to analyze it, and how to communicate your results. It uses examples from journalism to explain everything from Bayesian statistics to the neurobiology of data visualization, all in plain language with lots of illustrations. Some of these ideas are thousands of years old, some were developed only a decade ago, and all of them have come together to create the 21st century practice of data journalism.

Digital 8bit House