LIVE BLOG: Quantifying Journalism: Data, Metrics, and Computation
UPDATED June 3, 2014:
The following is the Live Blog from the Tow Center’s first Tow research conference Quantifying Journalism: Data, Metrics, and Computation held Friday, May 30, 2014 at Columbia Journalism School.
The day-long conference included panel discussions, lectures, lightning talks, and the launch of three Tow Center reports. All sessions can be viewed here: http://cuj.tw/1g8MvZU
Download a PDF of the Conference Program Live Blog curated by
9:15am–10:30am PANEL | Beyond Clickbait: How are news organizations actually using analytics, and what does It mean for content?
- Caitlin Petre, Tow Fellow (@cbpetre)
- James Robinson, Director of News Analytics, The New York Times (@JamesGRobinson)
- Tony Haile, CEO, Chartbeat (@arctictony)
- Daniel Mintz, Director of Business Intelligence, Upworthy (@danielmintz)
[9:10 a.m.] Welcome to the live blog! We’re kicking off today’s talks with a subject that is on the minds of many newsrooms: how to leverage analytics to drive meaningful traffic. Our panelists are just about to hit the stage.
Tune into our live stream!
[9:15 a.m.] Tow Center Director Emily Bell (@ takes the stage and welcomes the panel to the Tow Center’s first research conference. [9:20 a.m.] “We need to talk about the hashtag!” Tweets about “#towtalk” [9:21 a.m.] Caitlin Petre introduces the panelists on stage. Petre says the goal for the talk is to map out the landscape of metrics, and to gain a more nuanced understanding of how newsrooms are using metrics. [9:27 a.m.] Tony Haile, CEO of Chartbeat takes the mic. Haile informs the crowd that even when newsrooms have data, it’s difficult to precisely predict reader engagement. Choosing the right metrics to align the end-goal is important. Here are some of the trends Haile has noticed in the world of analytics:
“Caring about traffic to caring about audience.”
“New and better ways to measure.”
[9:35 a.m.] Petre introduces Daniel Mintz, Director of Business Intelligence of Upworthy: Mintz says choosing the right metric to measure audience engagement is vital. He says there is a distinction between page views, clicks, and how long a user spent on a page, article, etc. (Upworthy uses a metric called Attention Minutes.)
“Data just for data’s sake is useless.” “You are what you measure.”
Mintz says that as Upworthy fights the zero-sum game for attention for things that really matter, choosing the right kind of metrics to advance the website’s goals is the way to go. [9:41 a.m.] Petre introduces James Robinson, Director of News Analytics at the New York Times: The difference between reporting and insight-generated analytics.
“How did my story do?”
Metrics are “a means to an end.”
[9:50 a.m.] Petre throws out a question to the panel about commensuration: how do we compare metrics? Can we compare them?
Petre then asks, who is going to be interpreting the data? Who should and who does have the role of interpreting the data? Should it be reporters? Should it be the ones who understand what a p-value is? Editors? “The answer is ‘yes’,” says Haile. Haile says the key question is: “What can I do for this story right now?”
“If you just give numbers to people, that’s no good.”
Mintz says that data is only useful insofar as it helps make decisions in context. His analytics team handles engagement; his business team handles monetizing the engagement. Robinson doesn’t have a rule about who handles the analytics.
Here are some thoughts from the audience:
[10:09 a.m.] Questions from the audience:
Q: What’re some of the metrics used?
A: Are you paying attention to the content or not? Mintz says you can pull up a video’s API (application programming interface) and see how long a user is playing the video. (This is like Upworthy’s Attention Minutes.) Google analytics is “super janky” and better for measuring engagement on e-commerce sites, not necessarily for news/content.
Q: To what extent are advertisers considering attention?
A: Hale says “increasingly.” Brands advertisers want to be able to communicate their message to audiences that are paying attention. Advertisers are increasingly getting specific about how much time they want to show their ads to X customer.
Q: What are good tools to measure social shares?
A: There are a set standard given tracking tools – Haile Mintz recommends buying off-the-shelf internal analytics suites. Robinson says social and mobile are often connected.
Q: [To Robinson] – Any advice for building up a baseline for parsing out differentiation (of users)? What are the most valuable lines to draw–is it age? Demographics?
A: Robinson says they’re still in the prototype days [so it's hard to say, exactly].
Q: Where do we draw the line between making decisions based on data versus based on intuition and experience?
A: Robinson says it’s a combination of both relying on analytics, statistics, and intuition. Mintz says, “I ask people to tell me a story.” If you can’t tell him how A got to B, then there is no correlation.
That’s a wrap! Stay tuned in for our next panel.
TOW REPORT LAUNCH | The Art and Science of Data Journalism
- Alexander Howard, Tow Fellow
[10:35 a.m.] Alexander Howard says data and programming originated in the 1960s with computer-assisted reporting, and has since escalated to a surge in data creation, with the addition of new devices: “This is a trendy thing, but not a new thing.”
[10:40 a.m.] Howard applauds news outlets such as WNYC, New York Times and La Nación for innovative data journalism.
[10:44 a.m.] “This is just another set of tools, but the story itself still matters,” Howard says, predicting that data journalism will cease to be a niche in the future. “We don’t talk about telephone journalism, or email journalism — it’s just journalism.”
[10:48 a.m.] Howard says people need to understand the basics of data analysis and numeracy: average vs. median, statistical significance, correlation and causation.
[10:56 a.m.] There is this huge amount of data flowing now, Howard says. From startups to social data flowing online on social networks, as well as open government data platforms, there is an explosion of tools that allows people to put data to use and make sense of it. A question of rights to these data mining tools becomes relevant. [10:59 a.m.] Howard says there are new risks for discrimination, with personalized red lining. “People who understand data and statistics will find examples of it.” [11:05 a.m.] “Data journalism is the new punk,” Howard says. “Anyone can learn new punk. And there is a lot of bad punk music out there, but the fact is that we all can learn these things.” He says we won’t all be computer whizzes right away, but there are many opportunities for data journalism for the masses. [11:09 a.m.] Howard notes the necessity of government data.
[11:09 a.m.] Data-ism is a thing, Howard says. Embrace it. Be a skeptic. This kind of work matters to reach everyone and report on everyone.
@rachelllowry: stay tuned for our next panel. 11:30am–12:45pm
PANEL | Data: What is (and isn’t) it good for?
- Jonathan Stray, Tow Fellow
- Amanda Cox, Graphics Editor, The New York Times
- Dan Gardner, Author and Journalist
- Jen Lowe, Data Scientist, datatelling
- Mark Hansen, Director, David and Helen Gurley Brown Institute for Media Innovation & Professor of Journalism, Columbia University
[11:32 a.m.] “Data is never just data,” Jonathan Stray says. “It’s never about answering the question.” There are politics attached to it. “Can you really use data to decide whether two people of the same gender can marry? Stray says much of what journalists deal with are empirical questions that we cannot use data to answer. Sometimes, he says, it’s difficult to determine which.
[11:41 a.m.] Data-backed journalism is opinion journalism, Stray says, quoting Richard Lanham: “There is no truth. There is only opinion.”
[11:48 a.m.] And yet on the other hand, our political system would not work without data, Stray says. How, then, to reconcile between the two and distinguish between quantitative vs.qualitative?
[11:48 a.m.] Dan Gardner says the problem today is ignoring the empirical evidence unless it happens to coincide with our biases. How to guard against such biases? “You have to demand more and better evidence,” Gardner says. He hopes one day the real problem is that we are paying too much attention to the data. For Gardner, we’ve got a long way to come. [12:02 p.m.] It is important to be aware of the human side of data and its implications, Mark Hansen says. Journalists should be able to interrogate and tell stories around data: “Stories may come from a clever use of data that was used for an entirely different purpose,”Hansen says. “Data can be a source of speculation, exploration and answers. It can be useful for helping us arrive at the right question.” [12:08 p.m.] Stray: At the onset of increasingly sophisticated models and techniques, if we can’t explain how we arrived at a conclusion, nobody will believe us. [12:10 p.m.] Gardner spoke on an upcoming book he is authoring. The book brings in volunteer intelligence analysts who have access to classified information: “One of them is a pipeline worker in Alaska,” Gardner says. “And he’s kicking the CIA’s ass.”
[12:25 p.m.] Stray asks if journalism should be representative. And if so, with who and how do we do that? “Journalists do a lot of generalization without really looking at it closely.” How to guard against that? [12:28 p.m.] Often, Gardner says, journalists and politicians must make an empirical claim to cover a morale claim. Good journalism has to have both a data and a non-data component. [12:31 p.m.] Stray asks, if we want to improve the quality of data journalism, does there have to be a standard? Gardner : “It’s that great collective argument that eventually hashes out the truth.” [12:35 p.m.] Hansen adds: Yes, we need to set best practices and be tool builders and not tool users, but it can be a trap to focus on the places we get it wrong, rather than pay attention to the places where we get it right.
[12:36 p.m.] Questions from the audience:
Q: Is it even possible to do representative journalism and can we learn from non-representative journalism?
A: No, Cox says. She is comfortable with slight bias: “Representativeness is not always desirable.”
Q: Are journalists going to be able to use data to call out bullshit on politicians?
A: Gardner says data there are many strong improvements: You have to have some faith in the progression of man and as politicians are being held to the fire, we are seeing more evidence that data can reward that kind of behavior. Hansen agrees. But he says there needs to be a place where children, at a k-12 level, learn how to use data.
LUNCH TALKS | Reports from Tow Fellows ongoing Tow Research Projects
- Andy Carvin
- Brian Abelson and Michael Keller | NewsLynx
- Nicholas Diakopoulos | Data Journalism: Algorithmic Accountability
- Susan E. McGregor | Journalism Security
Brian Abelson and Michael Keller discuss NewsLynx: A suite of open source tools for online analytics and a research project. It combines data from many sources (Google Analytics, Twitter, Facebook, Press Clippings, etc.); incorporates a framework for logging qualitative ‘impact events’.
Software features: tracking of social media “mentions” and “likes” over time and integration with Google Analytics. Nicholas Diakopoulos, Tow Fellow, talks about algorithmic power. @ says algorithm is becoming pervasive in society, including in romance. Open questions about algorithms: How is an algorithm discriminatory/unfair? Does it make a mistake that denies a service? Censorship? Breaks the law or social norm? False prediction? Diakopoulos’ research addresses teaching journalists algorithmic accountability, legal issues, algorithms in the newsroom and transparency policy. [1:37] Susan E. McGregor presents her paper, which will be released in full next month, on source protection. “If we don’t have sources, we don’t have journalism,” she said.
Source protection is non negotiable, she said. All reporters needs to worry about source protection, not just national security reporters. There is a new technology known as Stingrays that can be used to hack cell conversations. These devices mimic and cell phone tower and can be used to triangulate the location of a cell phone signal. The majority of the devices are controlled by the federal government, but sometimes they are shared with local law enforcement officers who use the tech to identify communications. Unfortunately there isn’t a lot of clarity about what our [reporters] rights are in the context of the law. There is often a sense of helplessness when thinking about how to resolve these protection issues.McGregor suggests that we need to educate ourselves about what is visible, and how these systems work so we can protect ourselves. We need to educate; we need to organize; we need to innovate. “Digital security is herd protection.” By our doing due diligence and learning about and using digital security, we will be doing a service to all reporters around the world that may not have access to this sort of technologies. Look out for McGregor’s full report, which will be released on June 18th! [1:48] Andy Carvin on Broken News. Carvin speaks about what happens when new organizations get it wrong, and social media makes it worse.
He recounts NPR’s misreporting of Gabby Gifford’s shooting/death. The misreporting took place on social media, but the correction did too. Through social media because of the inherent column response nature of it, there is a quick way to correct misinformation/misreporting.
A similar situation happened with the Newtown Massacre when CNN misreported Ryan Lanza as the “murderer.” This information spread like wildfire on Twitter. Boston Bombing: Carvin flips the data-approach on its head. Here, online communities made the mistake and the media followed, as opposed to the other way around like with the Gifford and Newtown cases. What I’m really trying to do is understand the interplay between social media and the news cycles, he says. Corvin’s project will look at how journalists can embed themselves in these communities and avoid these mistakes. [2:00pm] Journalism by the numbers: Measuring a rapidly moving target. Jesse Holcomb (speaker)
“We do data–that’s our hedgehog on politics” “we try to tell big stories” How is journalism being produced, consumed, and distributed? [2:06pm] “let’s get journalism out of the ivory tower,” says Holcomb. [2:09pm] Holcomb exploring the nonprofit news landscape. Remark: we seem to be publishing data journalism about data journalism… Questions that remain from ongoing conversations: What is a non-profit news room? How many are there? 16,000 news jobs lost in the past decade. What’s happening in digital news publishers? 5000 jobs hosted by about 500 digital news outlets. [2:13pm] With rising citizen efforts in measurement, we still have limitations in how well we can collect data. Data on digital revenue has been harder to come by. [2:15pm] Professional Journalism Revenue: $63-65 billion today. Ratio of revenue distribution has changed, more in favor of audience and non-traditional revenue. Media deserts–sources that lack good information, often which serve communities of color, that speak English as a second language. What are some data challenges that we’re encountering today. The CNN’s, Buzzfeeds, etc.–how are we evaluating the engagement, consumption, quality of these sources? It’s harder than ever for people to remember where they got their news–so many sources, whether radio, newspaper, TV, online/mobile news. Variation in social desirability of “Important” information. Young people consider news part of the social atmosphere–present on Facebook, etc. “Ambient” news–present on feeds, but not as actively sought out. [2:20pm] We are finding that it is becoming increasingly difficult to aggregate and normalize data from social media sites–secured by firms, guarded by individual users, etc. makes studying digital and social news behavior more and more difficult. Shout out to work being done at MIT — mapping diffusion of information on digital and social networks. Detecting and Tracking Political Abuse in Social Media: http://bit.ly/1nB0mvx Modeling Social Diffusion Phenomena using Reality Mining: http://bit.ly/1mRroLL Trends Prediction Using Social Diﬀusion Models: http://web.media.mit.edu/~yanival/SBP-Behavior-shaping.pdf Information Diﬀusion Through Blogspace: http://people.csail.mit.edu/dln/papers/blogs/idib.pdf Holcomb warns that audiences ought to continue to expect incomplete, imperfect sources of data and information.
Emphasizes the importance of remaining transparent about how much exactly data can say, the validity of studies.
[2:30pm] Families are becoming more and more multi-screen users. Watching various news sources simultaneously. So we start to see the disappearance or obsolescence of certain legacy platforms, but they’re not going away tomorrow, and are still important to our studies. Our obligation is to understand how people are still engaging with news and information in places where they have not become completely digitally immersed.
Check out this storify, made by Yangbo Du @mitgc_cm
2:45pm–3:30pm TOW REPORT LAUNCH | Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output
- Claire Wardle, Tow Fellow
- Sam Dubberley, Tow Fellow
[2:51pm] Now, in 2014 we take social media involvement in current event information dissemination for granted, says Wardle. “User-generated content” — anyone have suggestions for a better phrase? requests Wardle. UGC is the wild west–no normalization of practices. Many news rooms, as a consequence, really wanted to know what other news rooms were doing. [2:55pm]: Two phases of the research, as described by Wardle: 1) The How, When, and Why that the issues are built upon. Quoted 8 news channels from around the world, to look at UGCs, international scope. Found a way to record the channels (surprisingly most news channels don’t actually record their own outputs). Observed the differences among news sources–internet vs. TV–and how they used UGCs. The news sources of different countries also broadcast these findings at different rates, in different amounts. Some countries making motions to use UGCs as data. [3:05pm] “the V word: Verification” Journalists often groan when asked to verify things, yet news sources are always absolutely terrified of putting out incorrect information, which is why they are especially hesitant to rely on social media sources for information. How do we verify information? Claiming experience as a journalist who has been in the field for X number of years is simply not sufficient–how about the empirical methods we have developed to study the flow and sourcing of information that have been put on on social media outlets? –Wardle, on Verification. Journalist need to take the legal aspects of crediting and verification seriously. Their reaction to advocates for properly crediting sources is often that their creativity is being “stifled” and that this whole issue is simply “bollocks.” –Sam Dubberley, on Crediting. [3:10pm] Crediting vs. Labeling. Naming the source of the photo vs. simply acknowledging that it isn’t your own. Being transparent with your audience about where the information and photography is coming from, whose work is it? Need an industry standard for how we ought to site and label content. [3:15pm] Present some suggestions: 1) crediting: often news rooms gave “screen cluttering” as a reason for not crediting sources 2) news room technology: even news rooms with advanced media asset management system, they could not process the details about information used. credit needs to be burned into the UGC video before it enters the system, so that information is not lost in the process 3) agencies: all have different standards. news rooms need to ask some important questions. Reuters, for example, do not credit and can’t even if they wanted to because they so often do not speak to their sources in a room, so there’s no way of knowing the accuracy of the identities of the sources. 4) social networks need to work towards developing a standard of use 24/7 for news sources that is like creative commons–to have a common standard across industry so news rooms can know what they can do with content. 5) very limited resources, very limited training for journalists to develop these skills and to be aware of these practices.
Question: What sources can journalists look to learn these skills?
Answer: Verification handbook, by Craig Silverman. Specifies how to verify videos, tweets. Can use storify as well. Rise of these sources of learning how to credit. Absence of these skills in newsrooms–really worth the time, if you’re a freelancer, to look into these.
REPORT LAUNCH | Sensors and Journalism
- Fergus Pitt, Tow Fellow
- Scott Klein, Assistant Managing Editor, ProPublica
- Shannon Dosemagen, Co-founder and Executive Director, Public Lab
- Joe Procopio, VP of Product, Automated Insights
- Nabiha Syed, Associate, Levine Sullivan Koch & Schulz
Fergus Pitt, on Robot Reporters, and what new journalistic data gathering tools, such as drones and sensors are on the horizon.
Sensors and Journalism — report.
[3:30pm] Joe Procopio elaborates on the development of automated content, and its application to a wide array of websites. Algorithms for tone, style, topic, lexicon, prioritization of content. Usually applied to content that journalists leave out–fantasy football, etc. recaps of sports, diverse, thousands at a time, compiling information in a timely manner, presenting this information to journalists, efficiently canceling out outliers, as long as data is there Automated Insights can provide insights.
[3:35pm] Scott Klein, Propublica. Nonprofit news outlet–producing long-form investigative journalism. Speaking on satellite journalism. News Applications–data journalism effort, statistics, data science, build large-scale interactive databases, tools that allow you to look up why data is important and interesting to individuals and their communities.
[3:40pm] reference sensor journalism workshop at the tow center: http://bit.ly/1mRQWsb
[3:45pm] Using satellite journalism to draw attention to important phenomena–for instance, the fact that Louisiana is experiencing serious land erosion at its southern coast–seen from above, losing football-sized pieces of land every hour.
Using people’s stories, audio, photographs, satellite imagery, to tell stories and to enhance journalism.
New way of recording history, live cams of ongoing problems, orbiting satellites capturing sections of the earth quickly every day.
[3:50pm] Shannon Dosemagen, Co-founder + Executive Director of Public Lab
Nonprofit, open source community supporter. For open Technology and Science. Started low-cost air sampling. Worked with environmental justice groups. During BP oil spill, complete media blackout–took to boats, beaches, with basic cameras on balloons to take pictures of the events as they unfolded. 100,000 different images. Mapped about 100 miles of coastline in a community-driven manner.
Created open source software platforms to allow this data to be sent out to the public. Created open-archiving systems to allow individuals to download metadata, maps, to see exact coordinates, the individuals involved in creating the data, videos, pictures, ground-field notes, etc. Doing work now on air quality sensing, water quality sensing.
Community-driven monitoring. Engage people as researchers, not as subjects. Creating access–to low-cost tools to involve people in community monitoring, journalism, involve individual in the process of science and journalism. Pull complexity off the shelf. Turn simple camera into effective data-collection device. Reimagining our relationship with the manufacturing environment.
[3:55pm] Build in openness and accountability. Create collaborative workflows. Maintaining public data archives. Mainstream true accountability. Creating local versions of tools.
— Nick Diakopoulos (@ndiakopoulos) May 30, 2014 [4:00pm] Losing a sense of concepts like good, bad, superlatives, the number 10, in favor of percentages, ratios, etc. — values in context, rather than raw data. Producing a more robust report. Saying an athlete “had a great day” vs. an athlete had x number of touch downs, performing at what percentile compared to other players or him/herself– Joe Procopio “We don’t have to be great but we can never be wrong.” – Joe Procopio, VP of Product at Automated Insights, on automated journalism #towtalk — Matt Waite (@mattwaite) May 30, 2014
[4:08pm] Public lab: emphasizing access. — Shannon Dosemagen Federal agencies have approached @PublicLab about helping to fill gaps in their data #towtalk — Current Public Media (@currentpubmedia) May 30, 2014Lots for newsrooms to learn from @PublicLab‘s process in terms of openness, transparency, engagement and more. Thanks @sdosemagen. #towtalk — Josh Stearns (@jcstearns) May 30, 2014
[4:09pm] producing new knowledge and information — research scientists and journalists come at things with different intensions, use the same tools, but are after the same “truth.” Very much a symbiotic flow between the two fields. — Scott Klein. 9 Key Principles for Open Tech, by @PublicLab: Summary of @SDosemagen‘s #TowTalk https://t.co/EpXStaKPeI — Jeremy Caplan (@jeremycaplan) May 30, 2014 [4:11pm] Bottom-up research. Problem identified by community member, calls upon team of interdisciplinary forces to come together to engage people in solution. — Shannon Dosemagen.
The idea of translating the intention of these satellite tools will be the next challenge. So long as we emphasize that these efforts are expressive, then these efforts are safe in 1st amendment territory. It is key, then, to highlight the fact that these robots are actually directed by real people with real goals. — Nabiha Syed.
Thanks all for coming and following Tow Center at today’s “Quantifying Journalism” Conference at Columbia University! Feel free to follow up with us with any questions and comments you have after the event! Looking forward to hearing from you all! Top Tweets from Today’s #TowTalk “Quantifying Journalism” Event http://t.co/EL1Lsd4b6N via @SeenCo
— Jeremy Caplan (@jeremycaplan) May 30, 2014