essay helper





Follow the Leaders? Jill Stein and Gary Johnson’s Twitter and Facebook Activity

Last week, the Green Party held its convention in Houston, Texas to nominate Jill Stein as its candidate for president. The other major third party in the U.S., the Libertarian Party, chose Gary Johnson as its candidate back in May. While no third-party candidate has ever been elected president, the record high disapproval ratings for the two major party candidates (Hillary Clinton for the Democrats and Donald Trump for the Republicans) suggest that voters may pay more attention to third-party candidates than they have in the past.

Third parties play a strange role in American politics. Some scholars have observed that the structure of the American political system makes it hard for third-party candidates to win, in large part because of the winner-take-all elections. Third parties in America more often act as pressure groups. For example, Ross Perot, the Reform Party presidential candidate in 1992, brought the issue of budget deficits to the public’s attention, which encouraged Bill Clinton to reduce the annual deficit while he was president (even to the point of running a budget surplus in 1998–2001). Though it is still unlikely that a third-party candidate has a chance to win (on July 30, RealClearPolitics’s poll average for July 18-29 put Johnson at 7.3% and Stein at 3.0%), perhaps one of these candidates will be seen as playing the role of spoiler, like Ralph Nader when he ran as the presidential candidate for the Green Party in 2000.

Given the possibility that Johnson and Stein may be more prominent than third-party candidates usually are, it’s worth paying attention to how they campaign. This article looks at how Johnson and Stein used Facebook and Twitter over the past several months, comparing their activity with candidates from major parties. (I chose Stein and Johnson because they appear on enough state ballots that they could win the 270 electoral votes needed to be elected. Other third party candidates do not appear on enough state ballots to win the election.) The Illuminating 2016 project uses computational methods to provide a data-driven look at how the candidates use Facebook and Twitter. Our algorithms classify each message by what it is trying to do: for example, calling the reader to engage in some type of action, providing information in a neutral manner, or advocating for or attacking a candidate. We break some of these categories down further, for example classifying attacks as focused on a candidate’s image or their issues.

Our data collection for Johnson begins on January 6th, 2016 and our collection for Stein begins on May 5th, 2016. Generally speaking, Stein uses social media as much as or more than major party candidates; Johnson, on the other hand, is far less active. For example, in the month of June, Stein was the most active presidential candidate on social media, posting 785 messages. Clinton followed with 731 messages. Trump sent 516, while Johnson posted only 221 times. This trend holds looking at May through July: Stein posts more than 3 times as often as Johnson does.

Stein-messagetype clinton-messagetype
johnson-messagetype Trump-messagetype


Stein’s strategy

The Green Party is a progressive party with a strong focus on environmental issues. The 2014 Green Party Platform describes the party as “committed to environmentalism, non-violence, social justice and grassroots organization.” The Platform advocates “safe, legal abortion”; “affirms the rights of all individuals to freely choose intimate partners”; calls for “thoughtful, carefully considered gun control”; proposes “shifting tax from individuals to corporations” and “taxing ‘bads’ not ‘goods’”; and suggests reducing the federal debt while funding “our environmental and social needs.”

Even before receiving her party’s nomination, Stein was actively pushing the Green Party message out through Twitter and Facebook. By early July, it became clear that Hillary Clinton would be the Democratic nominee after a hard-fought Democratic primary campaign where Vermont Senator Bernie Sanders tried to convince Democrats to support him instead of Clinton in order to advance a more progressive agenda for the Democratic Party. Since Sanders endorsed Clinton on July 12th, Stein has worked hard to convince Sanders’ supporters that they should shift their support to her as the standard-bearer of true progressive policies and as a genuine outsider. A few hours after Sanders’ endorsement of Clinton, Stein took to Twitter to call on his supporters to join her to “keep the political revolution going,” saying “We don’t need the Democrats.” In her attempt to win over the “Berners,” did Stein adopt a social media strategy like Sanders?



Looking at overall numbers from January 1 to July 31, her strategy does not look that similar to Sanders’. In many ways, it appears to be closer to Clinton’s. For example, consider calls to action—messages that ask people to take some action on behalf of the campaign, such as sharing a campaign message with their friends or attending a campaign event. In a recent study, the Illuminating 2016 project found that Sanders used considerably more calls to action than Clinton. Stein’s use of calls to action (16%) is closer to Clinton’s (15%) than to Sanders’ (21%). Stein’s use of informative messages (19%) is also more similar to Clinton’s (18%) than Sanders’ (13%).

clinton-messagetype2 sanders-messagetype2 stein-messagetype2

If you look at types of strategic messaging, the story is similar. Of Clinton’s strategic messages, 61% were advocacy, while 39% were attack; 71% of Sanders’ were advocacy, while 29% were attack; 62% of Stein’s were advocacy, while 38% were attack. While all three of these candidates advocated more often than they attacked, Sanders went on the attack substantially less frequently than Clinton or Stein. Given Stein’s hard push for Sanders’ supporters, it’s somewhat surprising that her social media strategy is in many ways closer to Clinton’s than to Sanders’.

As the Illuminating 2016 project has found, the Democrats in general talk about issues in their strategic messaging more than Republicans do. That said, Clinton still advocates or attacks on the basis of image more often than issues (see here for how we use the terms “image” and “issue”), with 53% of her strategic messages focusing on image rather than issues. Sanders, on the other hand, advocates and attacks on the basis of issues almost twice as often (63%) as he does on the basis of image (37%). Here, Stein charts her own course, splitting the difference: 47% of her strategic messages focus on image, leaving a slight majority to focus on issues.

Stein also does not closely mirror Clinton’s or Sanders’ strategy when it comes to types of calls to action. Clinton encourages far more digital engagement (49%) than Sanders (30%) or Stein (29%). Sanders encourages people to get out and vote considerably more (30%) than Clinton (23%) or Stein (21%). Stein encourages more traditional engagement (36%) than Clinton (20%) or Sanders (29%). Each campaign chose to emphasize different types of engagement, perhaps reflecting larger campaign strategies.

For example, Stein encourages traditional engagement to help her campaign gain access to the ballot in as many states as possible (as of August 7th, Stein is on the ballot in 24 states and in Washington, D.C.).


Sanders sent more messages encouraging people to vote, perhaps reflecting a concern that many people who support him were less likely to vote, especially during the primaries.


Clinton may be less concerned with encouraging traditional engagement (like volunteering) or voting because she has in a place a strong field organization for mobilizing supporters in key states. Instead, she has used Facebook and Twitter to encourage people to learn more about her by participating in Q&As or sending in their support for her where others could see it, creating a climate of positive opinion around her candidacy.


Stein’s activity on Facebook and on Twitter are different in several noteworthy ways. First, she sent far more messages on Twitter (2265 between May 5th and July 31st) than on Facebook (359 during the same period). Second, more of her messages on Twitter were strategic (63%) than on Facebook (44%); she used Facebook more for informative messages (34% versus 17% on Twitter).

stein-messagetype3 stein-messagetype4

So far in the campaign, Stein’s use of the different types of messages has remained fairly consistent. While she has sent more Facebook and Twitter messages each month than the previous month, the relative frequencies of each type of message have not changed much. It’s still early in her campaign, but so far it looks like her campaign has found a social media strategy they like and they’re sticking with it.



Johnson’s strategy

The Libertarian Party is the party of small government, sometimes described as socially liberal and fiscally conservative. The 2016 Libertarian Party Platform says this: the “government should be kept out of the matter” of abortion; “consenting adults should be free to choose their own sexual practices and personal relationships”; the government should not make laws restricting or monitoring ownership of firearms; there should be no income tax; and the government should not be allowed to spend more money than it brings in.

While Stein works hard to attract Sanders’ supporters, positioning herself as an alternative to the Democratic nominee, Johnson’s positioning is less clear (social liberalism and fiscal conservatism is a strange position in American politics). Some polls suggest that he could pull support from both Clinton and Trump in a three-way race.

Johnson generally uses social media less than other candidates. He also uses it in unique ways. Johnson uses calls to action more often (19%) than Clinton (15%) or Trump (10%). He also uses informative messages (31%) far more often than Clinton (18%), but roughly at the same rate as Trump (29%). He uses strategic messages (40%) far less often than either Clinton (57%) or Trump (47%).

clinton-messagetype3 johnson-messagetype3 trump-messagetype3

Drilling down into types of strategic messages, Johnson advocates (64%) more often than he attacks (36%), and he focuses on image (74%) more often than on issues (26%) when doing either. He attacks less often than do Clinton (39%) or Trump (43%), and he talks about image more often than Clinton (53%) or Trump (71%).

Johnson’s style of engagement is where he really stands out. While Clinton has a strong emphasis on digital engagement and Trump focuses on a combination of traditional, digital, and “get out the vote” engagement, Johnson stresses his media appearances (40%, to Clinton’s 2% and Trump’s 8%).


This follows from his campaign goal of getting “earned media”—where media outlets cover Johnson’s campaign, giving him publicity that his campaign does not have to pay for. Johnson seems to want to drive traffic to his media appearances by drawing attention to those appearances on Twitter and Facebook. This allows him to get out his message without paying for advertisements, perhaps leading to a self-reinforcing cycle of increased attention. As a third-party candidate with a smaller pool of people contributing money to his campaign, Johnson cannot afford to put as much money into spreading awareness of his name and policies as Clinton or Trump. Free coverage is particularly important for him.

johnson-greentypes clinton-greentypes trump-green-types

There are noteworthy differences in Johnson’s messages on Facebook and his messages on Twitter. Johnson sends far more messages on Twitter (715 between January 1st and July 31st) than on Facebook (276 messages during the same time). On Facebook, more than 50% of Johnson’s messages are informative, with only 24% being strategic. On Twitter, 40% of his messages are strategic and only 31% are informative. Like Stein, Johnson uses Facebook more frequently for informative messages and Twitter more frequently for strategic messages.

johnson-messagetype4 johnson-messagetype5

Unlike Stein, Johnson has emphasized different types of messages at different points in the campaign. In the months before the Libertarian National Convention, Johnson increasingly used strategic messages, highlighting his strengths and others’ weaknesses. Beginning in late May, Johnson began to use social media to inform his followers of media appearances, coinciding with a decrease in the frequency of strategic messages. This use of informative messages matches with his “earned media” strategy mentioned above (many of the informative messages inform readers of his media appearances). Other categories show more moderate changes over time, including a decline in the frequency of calls to action and conversational messages and an increase in ceremonial messages.



We might expect third-party candidates to be avid users of social media to get their message to the public. Compared to traditional campaign strategies, social media is cheaper, requires fewer staff members to reach the same size audience, and does not rely on traditional media to amplify the message. This analysis reveals that Johnson isn’t using social media nearly as much as the other candidates, though, missing an opportunity to engage his supporters directly.

This analysis also suggests that Stein isn’t simply Sanders 2.0. Rather than emulating his campaign’s social media strategy, Stein has acted more like Clinton on Facebook and Twitter. She may be calling for Sanders’ supporters to join her, but she isn’t using the social media strategies that helped Sanders start his revolution.

Sam Jackson is a PhD candidate at Syracuse University’s Maxwell School and a research assistant on the Illuminating 2016 project. For more on Illuminating 2016, visit their site.

Curious Communities: An online engagement platform meets face-to-face outreach

“Would you like to be on the radio?” The outreach producer for WBEZ’s Curious City project approaches residents with variations of this question as I shadow him in a park on Chicago’s South Side. He is not here to record a vox pop or to get person-on-the-street reactions to the latest news. Instead he is inviting people to share questions they want answered about any aspect of life in the Chicago region.


Since 2012, the Curious City project has been inviting Chicago public radio listeners to go online and nominate and vote on questions that they want a reporter to explore. A mix of staff and freelance reporters produce radio features about the questions that are selected, sometimes involving the question-askers in the storytelling process. The project was founded by Jennifer Brandel, who went on to set up Hearken, a digital platform that allows media outlets to adapt their own interactive engagement projects—and is now used by 44 state and regional broadcasters across the U.S.


Using Hearken, Curious City has produced a number of traffic-generating stories for WBEZ—exploring heavy topics like what happened to the people displaced by the construction of a major expressway, as well as lighter fare like the origins of the Chicago accent. Question-askers, though, have tended to come from areas of metro Chicago that are public radio strongholds. Seeking to expand their reach, the team is undertaking a foundation-supported initiative to engage potential audiences from areas of the city where questions have not been coming—primarily African American and Latino neighborhoods, as well as some predominantly white suburbs. The team openly acknowledges that their efforts are experimental. They are trying out a range of offline approaches—direct outreach versus outreach mediated by community institutions. They’re even seeing if it makes a difference whether the producer soliciting questions uses a microphone and recorder or pen and paper. In the end they will compare and see which tactics prove most successful at generating “novel” questions.

For the next several months, I will be following this project, with the support of the Tow Center, to explore whether this initiative has an effect on the local news communication infrastructure, and what the initiative suggests about journalistic norms regarding collaboration with audiences. I will interview journalists, editors, question-askers, community stakeholders, and residents of areas targeted by the outreach campaign.

Initial field outings have taken us to places like Jesse Owens Park in the South Side’s Pill Hill neighborhood, where we met a woman getting a golf lesson. When invited to share a question, she responded with a series of thoughtful queries about the distribution of resources between Chicago’s North and South sides. But afterwards she acknowledged she had been surprised to see us. “Quite honestly, I was like, ‘Why are these white people over here?’” she laughed. While she had never heard of the Curious City project, she liked the idea of journalists physically venturing out to get the perspectives of residents, and genuinely learning about her community. She complained that media representations of Chicago’s South Side tended to paint a monolithic picture of violence, when the reality was a tapestry of very different neighborhoods.

Sentiments like hers echo perspectives documented by a prior Tow Center study on community-based solutions journalism that myself and colleagues from the Metamorphosis research group conducted in South Los Angeles last year. Focus group participants told us they were frustrated with how media coverage stigmatized their neighborhoods. They suggested they largely welcomed reporting that took a more problem-solving approach to exploring community challenges, and that more could be done to engage residents in the process. The study called for foundations and media outlets to do more to support the process of listening to communities.

Curious City’s initiative seems to be combining Hearken’s digital platform with old-school pavement pounding outreach. So far, following their efforts is raising numerous questions about journalistic approaches to participatory media, and relations between public media and marginalized publics. What makes a good question and what happens when the burning thought on a resident’s mind is of greater concern than a question? How do producers and reporters navigate power dynamics and differences of race and class when they are often coming into a community as an outsider? How can the question-asking process be something of value not only for the primary media outlet, but also for community institutions and hyperlocal and ethnic media?

The project also offers an opportunity to examine a media-driven effort to strengthen what communication infrastructure theory (CIT) calls the local “storytelling network.” CIT researchers have previously found that communities are more cohesive when they have stronger links between residents, local media, and community organizations—and all of these actors share an understanding of what is happening in the community. Residents who connect to strong storytelling networks tend to have higher levels of civic engagement and self-efficacy. By reaching out to residents and community groups, WBEZ may be altering the storytelling network. However, there are likely to be barriers of professional culture and language that result in mutual skepticism and incomplete communication.

In a future blog, I will report back how Curious City assesses its online and offline attempts to connect with new communities of Chicago metro residents, what those residents are curious about, and how they would like media to engage with their communities.

Of Twitter, time, and talking: Reflections on interviewing political journalists

Journalists are busy people. It can be hard enough to get a hold of them and even more difficult to get them to meet for coffee as participants in a research project. I braced myself to weather the storm I was certain would come as I embarked upon my project’s field work. After all, I thought, I am about to enter the news industry as one of the many players to compete for a journalist’s attention. To my surprise, things took a slightly different turn.

I recently wrote an introductory blog post about my Tow Fellow project, “Beyond 140 characters.” The piece kicked off a mini-series of blog posts that will outline the project’s key findings on how, why, and under what circumstances political journalists engage with Twitter and which outcomes (both actual and sought after) journalists’ efforts on the platform yield. Here, “engagement” refers to all of journalists’ considerations and activities related to designing, managing, and monitoring a Twitter profile for journalistic and non-journalistic purposes, focusing primarily on active, but also on passive uses of the platform (e.g. tweeting and interactions with other platform users vs. merely following other Twitter profiles or discussions without publishing or sharing any content). This post is a reflection on my field work and provides transparency for how data was gathered. But it also discusses what it was like to interview political journalists, and this tells us something about the myriad intricacies of their occupational realities in a time of post-industrial journalism.

If you’ve ever done research, you know that the choice of method determines the kind of data you obtain, and, ultimately, how this allows you to shed light on your chosen topic. For the “Beyond 140 characters” project it quickly became clear that if I wanted to find out about journalists’ subjective experiences and perceptions of Twitter, I needed to speak with them directly. Expert interviews became the method of choice. This qualitative research technique allows for in-depth inquiries into subjects’ individual perspectives and points of view that can be difficult to gain access to via other methodologies. Because of the semi-structured, conversational style of the interview, interviewees may speak more readily and spontaneously about the meanings and factors that motivate some of their choices and behaviors.

Access, recruiting, and the art of pleasant persistence

The sampling rationale followed four pre-defined criteria. First, as this study focuses on legacy media organizations, each journalist had to work for one of the top 25 commercial broadsheet newspapers or top three cable news channels in the United States. Second, selected journalists had to specialize in the genre of political news, as ascertained by a combination of news organizations’ staff pages and recurring authorship of political news stories. Third, due to the study’s primary concern with active Twitter usage, journalists had to have a minimum amount of platform engagement (i.e. at least 10 tweets per week during a select period). Fourth, journalists were selected in a manner so as to reflect aspects of diversity within their occupational group (e.g. age, gender, professional socialization, and position within the employing organization’s hierarchy).

I reached out to more than 100 journalists and often followed up three more times. Some were unresponsive. Some declined (either because they didn’t want to contribute, were too busy, or because their news organization had explicitly told them not to participate). Some accepted. Some interviews fell through because of continuing scheduling conflicts, others required a few attempts as journalists were pulled into covering stories as they emerged. The news cycle doesn’t stop, as one journalist reminded me when we were trying to set up our conversation:

We can talk, assuming news doesn’t break.

The digital age has fundamentally changed the way journalists do their jobs and the online environment more often than not exacerbates existing pressures. While journalists have more visibility than ever before, this does not necessarily mean they are easily accessible or readily available. Even among journalists who spoke with me, time constraints remained a central reason why some were initially conflicted about contributing to my study. One journalist later outlined those pressures and what it is like to be a journalist today:

Intense. Intense demands to provide content. Intense demands to share information all the time. Intense demands to be correct and accurate. And intense demands to promote your material; to promote it across television, Twitter, radio, TV and print and the web. And that’s a lot. […] Now you have to share it in a compelling way, you have to tell the story on Twitter.  You have to tell it in other ways.  So the demands on your time are significant.

The final sample was comprised of 26 participants, of which 24 worked in editorial staff roles and two in editorial leadership. 23 journalists were employed by a broadsheet newspaper, but only three worked for a cable news channel, somewhat limiting the insights into the possible diversity of perspectives and experiences among broadcast journalists. I interviewed 20 male and six female journalists. The following graph shows the sample distribution by age group and gender:


Sample characteristics_age group and gender


The interview as a platform for journalistic reflection

For a researcher, participant-based data collection can be intense. You are always on and ready to (even spontaneously) fit into journalists’ schedules. You are trying to make it as easy as possible for them to meet with you (inadvertently becoming a quasi-connoisseur of a news organization’s local coffee shop scene) or talk on the phone (e.g. taking calls when they are in between meetings, while commuting or traveling). While consistency across interview modes and settings is desirable, there are practical challenges. Realistically, you are far from being on top of their list of priorities and you take the chances you get.

Once I got the chance to interview those 26 journalists, a curious thing happened. Many of them suddenly overrode their previous concerns centered on time constraints (along the lines of “your project sounds fascinating, but it’s just so busy right now” or “I would only have 10 to 15 minutes max”). The vast majority of journalists ended up speaking with me for much longer than they had initially said they were able to.

Take a look at the following graph that visualizes interview length based on interview mode for all 26 study participants:


Tableau_interview mode and interview length


As you can see, no interviews were shorter than 25 minutes. In fact, over half were longer than 45 minutes and almost a third even lasted up to one and a half hours. Unsurprisingly, face-to-face conversations tended to be longer than phone calls. As a matter of principle, this is not to say that longer interviews are always more insightful. But we may reasonably expect that a 90-minute conversation allows greater opportunity to ask questions, clarify statements and follow up, and thus yield richer, more in-depth data than a 15-minute chat.

Overall, the vast majority of journalists confirmed that Twitter has long become pertinent to their everyday-work. Yet, two distinct realities of engagement and discourses about it emerged, which perhaps relate to the substantial difference between journalists’ projected availability and actual time spent speaking with me.

1. Some journalists have a carefully curated presence on the platform. Their engagement stems from deliberate and conscious efforts and is often goal-oriented. These journalists make a substantial and strategic investment into the platform, and Twitter is something they feel they genuinely have a stake in. Naturally, they have a lot to say about it, but they rarely get a chance to discuss the distinct considerations, choices and evaluations that shape their engagement. One journalist explained:

I mean you’re trying to not only create a news source with your Twitter account, you’re trying to cultivate your own brand as a reporter; your reputation.  And Twitter is a useful tool for building your professional reputation because if you think about it, you have immediate access to some of the most influential people in the country and your followers to their phones.  They’re reading you in real-time.  So not only can you offer them smart analysis and reporting, you can also show about where you went to school or you can show about the kind of things you do once in a while with your life.  And it is the face you give to the public.

2. Other journalists are less strategic and preoccupied with their presence on Twitter. One journalist told me:

[I] thought about some things and others I just roll… just roll with.

Many admitted that the interview provided them with a rare opportunity for reflection and offered a platform for contemplating their relationship with and approach to Twitter, away from their workplace and digital lives. For example, at the end of the interview, one journalist said:

It’s hard to believe… we just talked for almost an hour and a half. I guess I had so much more to say than I realized. You know… You know in my job I don’t really get the chance to think about many of these things.

Learning from the content and context of interviews

To conclude, I would like to highlight two key take-aways from the field work stage of this project. First, prepare to be persistent, continue to follow up with potential interviewees, and accept that some journalists cannot or may not want to speak with you. You rely on journalists’ voluntary participation in your project, and many may be extraordinarily generous with their time, providing you with rich data for research. Second, consider the context of your interviews, e.g. the interviewee’s motivations to speak with you, as well as spatial, temporal and social aspects of the interview, etc. This will help you to be reflexive about the interview content, but also aide in better understanding journalists’ contemporary occupational environment and how they find themselves in a web of demands, risks and opportunities. This adds an important perspective to the subsequent stages of data analysis and interpretation.

Finally, I am utterly grateful to those 26 individuals who spent their valuable time telling me about their engagement and experiences with Twitter.

I will soon be blogging about the project’s key findings. So watch this space for the next post in the “Beyond 140 character” series.

[Image credit: E. Vargas, CC BY-SA 2.0]

What will the Internet of Things do to journalism?


According to Cisco, the number of connected objects is expected to reach 50 billion by 2020, equating to 6.58 connected devices per person. They are all controlled by tiny computers that communicate with each other, in an ecosystem commonly known as the “Internet of Things” (IoT).

The IoT has implications for two distinct aspects of journalism – newsgathering and consumption. Smart devices connected to each other can be used to provide better context to a story, such as data on traffic, weather, population density or power consumption.

Innovation Day 

My team at the Associated Press recently provided our engineers with Raspberry Pis – small, easily programmable computers – and access to various sensors, so they could build innovative data-gathering prototypes and display them to coworkers.

We also explored cloud-based platforms, like Amazon’s IoT Cloud, that manage both smart devices and the terabytes of data generated by them, and that facilitate meaningful analysis and decision making.

“How can we make the Internet of Things revolution work for AP, both for our products and our journalism?” asked Vince Tripodi, our vice president of research and development. “That’s what we’re trying to find out.”

A few ideas for how news organizations can incorporate the Raspberry Pis and sensors into their reporting quickly materialized:

– We can monitor vibration and noise from entertainment and political venues to identify the most popular songs at a concert, or the biggest plays of a game, or even the quotes that resonate the most at campaign rallies.

– We can measure water quality in Rio de Janeiro or air quality in Beijing, validating data from environmental protection agencies. More broadly, we can track climate change through conditions of drought or other macro events.

– We can monitor vibrations to measure the impact of construction sites and how they affect nearby residents and businesses, or foot traffic at new and current public transportation stops to gauge their usage.

Beyond reporting

These new technologies will allow journalists to break more stories and dig deeper into them, further closing the gap between the media and technology industries. It’s not just the gathering of news that promises to be affected, though – how audiences consume news will also undoubtedly change.

Key drivers of the growing Internet of Things are connected cars and smart homes. The experience of talking to a dashboard in your car or asking a device at home to tell you the latest news all depends on personalized voice recognition and natural language processing.

As adoption of these smart devices grows, there will still be privacy, security and technical concerns that need to be worked through. The inconsistency of available data in lower population areas also represents a major challenge.

But it should be clear by now that the Internet of Things, like the internet in general, won’t simply go away. As media companies, we need to start thinking about how these new technologies can help us better inform the world.

Francesco Marconi is the Strategy Manager for The Associated Press and an innovation fellow at the Tow Center. Follow him @fpmarconi

Using machine learning to classify presidential candidate social media messages

Since presidential campaigns have incorporated social media into their strategic messaging, it has become more challenging for journalists to cover the election in depth, because of the large amount of data generated by candidates and the public every day. Journalists tend to focus on single quotes or tweets rather than providing analysis and reporting on the aggregate of messages on social media. But single tweets may not give people a full appreciation for the style of campaigning or the substance of the rest of the tweets.

In order to get a sense of what the candidates and public are actually saying and how candidates communicate over time, we have taken a computational approach to predict categories of candidate-produced tweets and posts (as described in a blog post introducing the Illuminating 2016 project). We have been working on a system that automatically classifies each message into a category based on what the message is trying to do: urge people to act, change their opinions through persuasion, inform them about some activity or event, honoring or mourning people or holidays, or on Twitter having a conversation with members of the public. This blog aims to introduce you to how we use machine learning to predict category for candidates’ generated messages. The data currently presented on Illuminating 2016 is accurately categorized 77% of the time. For some categories, the accuracy is up to 84%, such as call-to-action, and strategic message types of advocacy and attack.

To predict presidential campaign message type, we used gubernatorial campaign data from 2014 to build initial categories and to train machine-learning models. And then, we tested the reliability of the best models built from the gubernatorial data and applied them to classify messages from the 2016 presidential campaign in real time. We’ve been collecting all of the announced major party candidates’ Twitter and Facebook posts since they declared their presidential bids. In all we have filled 6 servers with 24 presidential candidates’ social media messages, and of course we’re still collecting. The diagram below demonstrates how we use machine learning to train the models.

Diagram 1 Models training
Diagram 1: Models training

To understand candidates’ social media message strategy, we collected the Facebook and Twitter messages produced by the campaign accounts of 79 viable candidates who ran for governor. The collections started September 15th when all states had completed their primaries and shifted into the general election phase, and continued through November 7th, three days after the election. We ended up with a total of 34,275 tweets and 9,128 Facebook posts. We categorized these messages by their performative speech categories of strategic message, informative, call-to-action, and ceremonial. Besides these, we also added non-English, and conversational categories (conversational only applies to Twitter). These categories allow us to understand when candidates advocate for themselves, go on the attack, urge supporters to act, and use the affordances of social media to interact with the public.

These categories were developed deductively and were revised based on inductive analysis. We trained annotators and refined the codebook over several rounds until two or more annotators could look at the same message and agree on the category. We generate an inter-coder agreement score to determine how easy or hard it is for humans to categorize the messages, and also to make sure our categories are clearly defined and mutually exclusive as much as possible. Our score of that agreement is Krippendorff’s Alpha of .70 or greater on all categories. After annotating data independently, annotators developed gold standard annotations, which means that two coders categorized the same messages and then where they disagreed on a category, they talked it through and decided which was the “best” category for that message. This labeled Twitter and Facebook messages by the candidates, generating 4,147 tweets and 2,493 Facebook messages as gold standard data.

We used these gold standard data as training data to build models, and then applied the best models to un-labeled candidates’ messages in Facebook and Twitter. Prior to models building, we represented text,added relevant language features and political characteristics for models training purpose. For example,

  • We replaced instances of user tagging (e.g. @abc) and URLs (e.g. http://abc) with the general features USERNAME and URL;
  • We removed stop words;
  • We used unigram and bigram document representation;
  • We transformed messages to a sequences of boolean features;
  • We added feature starting with @mention for twitter data;
  • We used part of speech tagging, e.g. tweets or Facebook posts starting with verb;
  • Given characteristics of election data, we also added some relevant political features, e.g. political party (Republican, Democrat, Third Party) to help the model training.

For algorithm building, using Scikit-Learn, we performed several experiments with the following multi-class classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB), MaxEnt/Logistic Regression, and Stochastic Gradient Descent (SGD). All classification tasks were evaluated with 10-fold cross validation. We use a micro-averaged F1 score to measure prediction accuracy (F1 score reaches its best value at 1 and worst at 0). For Twitter data, the best micro-averaged F1 score is 0.72, as shown in Table 1, by using a SGD classifier with a Boolean feature, with tweets feature starting with @_username, verb first and party feature. The F1 value of strategic message is up to 0.75. For Facebook data, the best micro-averaged F1 value is 0.73, by using Linear SVC classifier with a Boolean feature and party feature. The F1 value of call-to-action is up to 0.80. By comparison, the majority baseline for Twitter data is 37.6% (1559/4147) and 40.1% (999/2493) for Facebook. It should be noted that the F1 score of ceremonial messages is low. The reason for the lower score for this category is that there are far fewer of these messages, and they often express a wider range of features, making them harder to classify.

Table 1 Machine prediction performance for Main Categories in Tweeter and Facebook
Table 1: Machine prediction performance for Main Categories in Tweeter and Facebook

For Strategic Message type prediction, we trained the classifiers with training data labeled as Strategic Message: 1,559 tweets and 860 Facebook posts. Each message is classified as either Advocacy or Attack. As shown in Table 2, the micro-averaged F1 scores of Twitter and Facebook data are 0.80 and 0.84. By comparison, the majority baseline for Twitter data is 69.4% (1082/1559) and 62.8% (540/860) for Facebook. Similarly, our Strategic Message’s focus classifiers were trained with the messages labelled as Strategic Message as well. Each message can be either Image, Issue or Endorsement. As shown in Table 3, the micro-averaged F1 scores of Strategic Message focus category in both Twitter and Facebook are 0.77. By comparison, the majority baseline for Twitter data is 48.2% (751/1559) and 50.6% (435/860) for Facebook.

Table 2 Machine prediction performance for Types of Strategic Messages in Tweeter and Facebook
Table 2: Machine prediction performance for Types of Strategic Messages in Tweeter and Facebook

Table 3 Machine prediction performance for Focus of Strategic Messages in Tweeter and Facebook
Table 3: Machine prediction performance for Focus of Strategic Messages in Tweeter and Facebook

We can see that all the micro-averaged F1 scores reported above are much higher than the baseline scores. This suggests that the machine-annotating models have been trained to predict candidate-produced messages well.

We are still testing the reliability of the current best models on presidential campaign data. When using the above reported best models on 2989 human-corrected presidential Twitter data and 2638 Facebook data, we found that generally the models still worked well, as shown in Table 4. But, F1 score of Conversational category dropped 20%. We guess that there should be some differences between gubernatorial campaign data and presidential campaign data in this category, and we are currently investigating the possible reasons.

Table 4 Machine prediction performance of presidential data by using gubernatorial data as training data
Table 4:
Machine prediction performance of presidential data by using gubernatorial data as training data

We also did experiments only including presidential data as training data to test the model performance. For Facebook, we found that the model performs pretty well on predicting strategic messages (F1=0.77) and call-to-action (F1=0.86), as shown in Table 5.

Table 5 Machine prediction performance of presidential data by using presidential data as training data
Table 5:
Machine prediction performance of presidential data by using presidential data as training data

Our next step is to do more experiments to improve the model, e.g., experimenting with binary classification, adding opinion classification and sentiment classification. We are applying the best models to predict category for messages generated by candidates in the 2016 presidential campaign now. In our Illuminating 2016 website, reporters and the public can understand presidential campaign messages type instantaneously. We are pulling public commentary on the election from social media and categorizing them now. You will see public commentary analysis in our website in August.

Thanks to Sikana Tanupambrungsun and Yatish Hegde at the School of Information Studies at Syracuse University for data collection and model training. 

A Guide to Journalism and Design

Executive Summary

It seems rare, these days, to encounter a conversation about the future of journalism that does not make some reference to the cluster of concepts known variously as design thinking, design practice, or human-centered design. Innovative news organizations, for example, are successfully deploying versions of this philosophy to develop journalism products with remarkably high user engagement. But there is much confusion over what design and design thinking really mean, especially in a journalistic context – never mind how the philosophy might actually be implemented with successful results.

This report first proposes a clearer definition of design – as a practice based on a set of processes and a mindset. It then suggests moving away from the phrase “design thinking,” which has become closely identified with a specific five-step process that could actually be limiting to news organizations. The report also identifies those types of problems, known as “wicked problems,” which could benefit most from the design process, arguing that many of the severe challenges journalism faces today belong to this category. Drawing on interviews with designers and journalists, and four in-depth studies of design in use – at BuzzFeed, The New York Times, National Public Radio, and – the report next explores concrete ways in which others might use these processes as a foundation for news innovation.

The research in this paper identifies several key benefits of design philosophy in creating new possibilities for journalism, including the ability to rapidly prototype and modify new products before major resources have been committed; to improve journalism by deeply understanding its role in the real lives of those who consume it; and to work directly with the communities in which news organizations are embedded to generate coverage and tell stories of direct relevance.

The report also sounds some cautionary notes. First, we must avoid seeking to fix the definition of design too rigidly into a specific sequence of steps that need always be followed, otherwise we risk undermining the very flexibility and responsiveness to context that are central benefits of the approach. Second, while embracing design’s emphasis on paying close attention to the needs and preferences of users, as journalists we must retain a commitment to reporting in the public interest, rather than making editorial decisions solely in favor of stories and products that bring the most success in financial terms.

Key Observations

This report specifies the following eight aspects as central to design in the context of journalism:

  • Thinking in systems; understanding news stories and news organizations as existing in a wide variety of larger informational, social, and organizational ecosystems.
  • Centering innovation on humans, not technology; serving audiences while resisting the assumption that an innovation is worthwhile just because new technology makes it possible.
  • Identifying the true problem, thereby avoiding the many pitfalls of simply assuming you know what it is.
  • Deep listening and other tools to empathize in profound ways with the realities of users’ lives so as to meet their needs more effectively.
  • Open ideation; a democratic and transparent set of approaches for generating ideas (in which brainstorming is only a first step).
  • Synthesizing and interrogating ideas, a process distinct from initial ideation, to winnow the best ideas from the rest and combine related ideas into coherent wholes.
  • Prototyping and iterating, or “the learning that happens through doing” – the process of making and using versions of the product from the earliest stages to reach understandings that could not be achieved through thinking alone.  
  • Testing, part of the prototyping and iterating cycle wherein designers observe people engaging with what they’ve made to see how it’s actually used – not how they assumed it would be used.

The report also identifies several primary applications of design in journalistic contexts, offering detailed suggestions for implementation in each case:

  • Testing and adapting new product ideas before they absorb vast organizational resources, to rapidly and affordably identify the most promising avenues for innovation.
  • As a tool for directly interacting with news audiences to better understand how they really use news organizations’ products.
  • As a way to reconceptualize each instance of journalism as belonging to a wider journalistic system, so that stories and other elements can be created to exist and reach users in a wide variety of forms and on a variety of platforms.
  • To facilitate civic journalism – drawing directly upon the experiences of news consumers as a primary source of stories – and solutions-focused journalism, exploring not only the problems of the communities served by news organizations, but also ways to address them.

Read the full report, Guide to Journalism and Design, here, plus: Heather Chaplin’s article at Columbia Journalism Review on how The New York Times is incorporating design into audience research.

Data Cleaning With Muck

Data journalism is a growing field, and as the practice becomes more sophisticated, data journalists are facing all the challenges of modern software development. In order to manage the increasingly complexity of data projects, journalists will need good tools. Muck is a prototype tool for data analysis, with a particular focus on data journalism. It allows users to take an incremental, iterative approach to their work. By breaking programming tasks into smaller steps, journalists can better explore their data, develop stories and try new techniques, while at the same time making their work more transparent and reproducible.

Projects built with Muck are structured as a collection of interdependent steps (we refer to this structure as the “dependency graph”). With a simple naming convention and a little behind-the-scenes magic, Muck is able to infer the dependencies between source code and data files. With this information, Muck rebuilds only the parts of a project that have changed, and their dependent parts. For large datasets, the time savings can be dramatic, allowing users to explore their data with little overhead. Our goal is to provide an environment that encourages correctness and clarity of both code and data, while remaining fast, pragmatic and ergonomic. This post describes a few early results, specifically facilities for patching text and transforming records.


Muck fills a role similar to the traditional Unix tool Make (see: paper and reference). Make, first released in 1976, is a command-line tool for automatically building products (typically compiled executables) from source files in an incremental fashion; when asked to produce a particular product (or “target”), it will only rebuild relevant portions of the project that have changed since the last build. The process is directed by a “makefile”, which specifies, for each target product to be built, the list of source files and products that the target depends on (the “dependencies”), as well as the commands to build the product (the “recipe”). In essence, the developer explicitly writes out the project’s dependency graph in the makefile, thus enabling Make to perform incremental builds.

Muck improves on this paradigm by eliminating the need for the makefile. Instead, it examines the contents of source files whose names match those of the target products, and infers the dependency relationships. While Muck is certainly not the first system to boast this capability, it is notable for supporting both source and data formats common to data journalism.

Test Project: Webster’s 1913 Dictionary

Once we got the basic dependency calculations and build commands working, we began several test projects that could help guide development in a practical direction. One ongoing experiment parses and analyzes the Project Gutenberg version of Webster’s 1913 Unabridged Dictionary. The work required a variety techniques that are fundamental to data journalism: web scraping, error correction, and text parsing. The project is not yet finished, but our experience thus far has led to some interesting additions to Muck.

The dictionary text that we have been working with is quite messy. Its basic structure is straightforward, but we have encountered exceptions in the text at nearly every step of development. At over 100,000 entries, the dictionary is too large to correct by hand (it appears that there have been valiant efforts over the years, but a variety of problems remain), so getting good results is a real challenge.

Structural Problems

The first, most glaring problem is that splitting the text into discrete records fails in a few places. These flaws are easily understood, but the code to correct them ranges from straightforward to convoluted.

Ambiguous Escape Sequences

The text contains a variety of obscure escape sequences (specific patterns of text intended to encode symbols or meaning that is not otherwise representable), some of which cannot be automatically parsed because the sequences also occur as legitimate, unescaped text. Simple find-and-replace operations using regular expressions yield lots of false positives.

Common and Rare Flaws

Some flaws occur once or a handful of times, while others occur thousands of times. Time-efficient strategies for correcting rare versus common flaws tend to be quite different. Sadly, it seems that the only way to know whether it makes more sense to correct by hand or programmatically is to try both!

Structure and Information Loss

One interesting thing about the English dictionary (for a programmer at least) is that the text is much more mechanical than prose, but still not so rigidly defined that it can be parsed like a programming language. The pronunciations, parts of speech descriptions, etymologies, and even the definitions themselves are written in a systematic style (although the early Webster’s Dictionary is famous for its colorful definitions: James Somers’ blog post was our initial inspiration for the project). Nonetheless, there seems to be an exception to any syntactic rule that one might conceive, and like natural languages, some ambiguities can only be resolved by semantic understanding of the content. To make matters worse, crucial punctuation like matching parentheses and brackets are missing in some cases. Writing code to parse these various elements into structured data has been downright maddening.

Patching Data

Data cleaning can be a challenging, time consuming process. The primary goal of Muck is to make it easy to create perfectly reproducible data projects; a curious reader should be able to check out a project from a source repository like Github, run the `muck` command, and reproduce all of the computations that create the analysis. Ideally, the various processes that go into the computation will be well organized and easily audited for correctness.

Programming languages offer tremendous capabilities to automate such tasks, but such power comes with the risk of overcorrection. Sometimes it is easier (and less confusing) to make a correction by hand, rather than via code. However, simply altering the original source data is not a reproducible practice, and too many hand corrections make it impossible to properly fact-check a data-driven story.

A classic solution to this problem has been available in the Unix programming world for many years: the `diff` and `patch` tools. “Diffing” is the process of calculating a “diff” (also called a “delta” or “patch”) from two versions of the same document: the diff shows the edits needed to transform the original version into the modified version. “Patching” is the process of applying a patch file to an original document to produce the modified result. Traditionally, these tools have been used to track and communicate changes to program source files, and form the conceptual basis for modern version control systems like Git. However, thanks to the Unix tradition of designing tools to operate on text, `diff` and `patch` are easily applied to text-based data formats as well.

So, we added support for patch files. Muck treats them as just another source format, with single data dependency (the “original” file) and output (the “modified” file). The benefit was immediately obvious: patch files are human-readable, allowing the patches to function as a reproducible version of the “data diary” that many data journalists use. Unfortunately, the various traditional patch formats have some shortcomings for our purposes:

  • “Unified diff” (the standard modern patch format) hunks contain hard-to-read positional information at the top of each hunk; diffs produced by `git` are even worse, containing hash values and contextual “@” lines designed specifically for source code.
  • Empty Unix patch files are truly empty, omitting the original file name needed by Muck for dependency inference.
  • The file paths in patch hunks often contain the build directory prefix, a minor but annoying detail.

Of these, the most significant (and surprising) problem is that for some workflows, once a patch is created, it makes more sense to edit the patch directly rather than to edit the “modified” file and recompute the patch. This is especially true when reviewing a set of patches; occasionally we would find a typo or change our correction strategy part way through. Unix patch formats were not designed with hand-editing in mind. As an experiment, we created our own patch format and tool, called Pat. The pat format is similar to the traditional “unified diff” format, but addresses the above shortcomings directly, and provides us with a means of experimenting further with this sort of workflow. In particular we would like to add commenting syntax, escaping of control characters, line splitting, and intra-line highlighting for ease of use. Pat is still in early development, and currently lacks documentation, but the code can be found at

While the patching methodology in Muck needs some refinement, it has already proved useful. We believe patching is an important tool for achieving reproducibility in real-world projects because it offers a middle road between manual editing and programmatic correction. The technique should be used with discretion though, and knowing when to switch strategies from patching to programmatic correction is largely a matter of experience. Choosing the right strategy for a given problem often requires experimentation.

Cleaning and Verification

Regardless of whether flaws are fixed by hand or via code, a fundamental challenge is to apply fixes without introducing new flaws. Good developers typically use some sort of testing framework to verify that their code works as intended, but how best to apply these methodologies to data problems is not obvious. Often, flaws are discovered in the course of implementing some end goal or feature, but are best fixed somewhere earlier in the data pipeline. The result is a disconnect between the logic that checks for a flaw and the process that fixes it.

Before proceeding, we should explain our use of the terms “check” and “test”. We make a distinction between “checking” for flaws in the data, with logic in the main pogram, and “testing” the program for logical flaws, via external testing code. This distinction becomes fuzzy if “checks” get factored out into scripts that are not part of the main computational graph, because once externalized they essentially become tests against the data. There are further elaborations to be explored (e.g. best practices regarding assertions and exceptions versus non-fatal logging in data cleaning code) but the main point is that traditional software testing methodologies do not map perfectly to the needs of data cleaning projects.

Once a data cleaning fix is implemented, the checking logic takes on a questionable role: either it issues a warning prior to the fix, or it is applied afterwards and remains silent. In the former case, the programmer quickly learns to ignore the message, and it only serves to obscure more meaningful warnings. In the latter, the developer has several options:

  • remove the check because it is slow, confusing or inconveniently located;
  • leave the check in place, where it no longer executes;
  • factor it out into some side branch of the computational graph.

Removing the checking code is a undesirable, because doing so eliminates the evidence of the flaw and thus the primary explanation of why the fix exists. At the same time, code that never executes is a notorious liability; as projects evolve, unexercised code tends to become incorrect. Only the last option sounds promising, but just moving the code into a separate program does not ensure that the code will get executed or maintained. The broad question we need to address is this: what are good strategies for clarity and correctness when working with these sorts of checks and fixes? At the very least, factoring out the checking code from the program pipeline and into a side branch allows it to be read and run as part of a manual audit, or via automated testing.

As a first step, we implemented a new file source type in Muck, “.list”, which is simply a list of scripts to execute. By creating a top-level ‘test.list’ file listing the test scripts that have accumulated in the project, we can simply run `muck test` and all of the checks will run.

Standalone tests are useful, but by themselves they don’t make the work much easier. At worst, they become a dumping ground for old code. What we need is a way to show how the flaws and fixes are related to the data pipeline. Ideally, we would be able log records that fail checks, even after fixes have been added, so that reviewers can confirm that the fix behaves as intended. The intent of the code would also be much clearer if check, fix, and logging logic were colocated in some fashion.

All of this suggests a major limitation of the file-based perspective inherent to Muck: many operations apply on a per-record basis, rather than per-file. Thus, an emerging goal is to articulate and enable a per-record data cleaning strategy:

  • Identify a flaw and implement checks for it.
  • Implement a fix.
  • Preserve both the check and the fix in the codebase in a clear, coherent way.
  • Continually verify that the check and fix remain well behaved using tests.

Auditable Data Transformations

We want the check for a flaw, the application of a fix, and any reporting of the occurrence to be expressed together in a cohesive, clearly written unit. After some experimentation and quite a bit of supporting work, Muck now supports just such a workflow, via a function called `transform`. The technical description of how it works is more painful than using it, so we’ll start with an example:

with muck.transform('input.txt') as t:

def non_empty_lines(line):
  'lines containing only whitespace are ommitted, without logging.'
  return bool(re.strip())

def vulgar(line):
  'profane lines are ommitted and logged.'
  return re.match(r'darn|gosh|shucks', line)

def (line):
  'all occurrences of the word "tweet" are replaced with "trumpet",'
  'preserving the leading capital; altered lines are logged.'
  return re.sub(r'([Tt])weet', r'1rumpet', line)

def br_tags(line):
  'capitalize each line; no logging.'
  return line.capitalize()

def big_words(line):
  'log lines with long words, but do not alter them.'
  return any(len(word) > 16 for word in line.split())

def out(line):
  'write the final result line.'
  print(line, end='')

It’s worth admitting up front that `muck.transform` uses several advanced python features in combination – this makes it a bit tough to describe. `transform` is meant to be used at the top level of a python script, and just like `muck.source`, it takes the name of the input data as its first parameter. It returns a `Transformer` object (`t` in the example), which provides several methods that serve as function decorators. Each decorator indicates a different kind of transformation. When applied to a function, a decorator adds a new stage to the pipeline.

All transformation functions must take a record as their sole parameter. Each kind of transformation has a different behavior:

  • `drop`: if the function returns True, the record is logged and dropped from subsequent processing.
  • `keep`: if the function returns False, the record is dropped; no logging takes place.
  • `edit`: the function returns a value to be passed to the next stage; if the returned value is not equal to the original value, then both are logged as a pair of -/+ lines.
  • `convert`: like `edit`, the function can return either the original or a new value; no logging takes place.
  • `flag`: if the function returns true, then the value is logged; the value is never altered.
  • `put`: the function is expected to write the output or otherwise consume the value; no logging occurs. Usually there is a single `put` at the end of the pipeline, but `transform` allows zero puts (perhaps for a script that just checks validity) or many puts (for example, if some intermediate output or multiple output formats are needed).

All stages are applied to each record in the order in which they were decorated. For those modes that feature automatic logging, the complete effect of the transformation is reported in a dedicated file, without the user having written any reporting code. This leads to a much better verification experience for the user, because Muck’s logging facilities are fancier and more organized than the typical debugging code the user would write. The logging feature also obviates the need for such clutter in project code.

For the technically inclined: there is a `run` method that actually performs the transformation on the input sequence. Note that `run` is not called in our example; instead we use the `with` form to treat the `Transformer` `t` as a Python “context manager”. The `__exit__` method of `Transformer`, which is automatically called at the end of the `with` scope, simply calls `run`. This usage pattern is optional and preferred purely as a convenience.

Our experience with `transform` so far has been that it speeds up development by letting us alter and rearrange multiple stages easily. The individual stages tend to be quite simple and easy to read, whereas our previous solutions were larger blocks of code, often inside nested `for` loops, which are more difficult to reason about. The automatic logging performed by `transform` makes it easy to verify that a given stage is performing as intended, which saves even more time over the course of a project.


Muck is not yet a mature tool, but our experience with it thus far has been promising. The framework it provides for data scripting tasks reduces clutter and boilerplate (an industry term for uninteresting, repetitive setup code), which lowers the effort required to create or alter steps in the project dependency graph. As a result, the conceptual clarity of our test project improved over time, as we gradually organized the code into small, meaningful pieces. This sort of progression stands in contrast to our prior experiences, in which monolithic analysis scripts became increasingly convoluted over the course of development.

The patch and transform techniques demonstrated here are conceptually simple, but they dramatically improve the programming experience for certain kinds of tasks common to data journalism. As we develop Muck further, we hope to identify other classes of problems which can be made less painful with support from the build system. If you have any ideas, let us know!


Watch Live: A First Amendment For Social Platforms

On June 23, 2016, BuzzFeed and the Tow Center for Digital Journalism will host a breakfast discussion on the power of social media platforms and the threat to free press and free speech standards.

As journalists, news organizations and citizens increasingly turn to social platforms as their principal means of publishing, are there enough safeguards against tech giants misusing their new power?

BuzzFeed Editor in Chief Ben Smith and BuzzFeed Assistant General Counsel Nabiha Syed recently authored an article which put the case for a First Amendment for social platforms: “We are suggesting that the platforms make a public commitment not just to opaque and ad hoc rules, but to time-honored principles and process.”

As BuzzFeed takes on the platforms, join us for a panel discussion at Columbia Journalism School:

Ben Smith, BuzzFeed Editor-in-Chief

Nabiha Syed, BuzzFeed Assistant General Counsel

Stuart Karle, William J. Brennan Jr. Visiting Professor of First Amendment Issues at Columbia Journalism School

Emily Bell, Director of the Tow Center for Digital Journalism


livestream will begin June 23 at 9:00am ETD

Join the online conversation:


Digital News in a Distributed Environment

Watch the full event above.

On June 21, the Tow Center hosted “Digital News in a Distributed Environment” at Columbia Journalism School, which featured the US launch of the Reuters Institute for the Study of Journalism’s 2016 Digital News Report, as well as the release of preliminary findings from a new research project by the Tow Center examining the relationship between social platforms and publishers.

The Report, which is the largest ongoing international study about news consumption, supports what we know about the growth of smartphones and social media for finding news, but also looks below these macro trends to reveal a number of surprising results about how people consume news worldwide.

Rasmus Klein Nielsen, the Director of Research at the Institute, presented the research, which is drawn from a YouGov survey of 50,000 online users and 26 countries, as well as a series of focus groups. The presentation was then followed by a panel led by Nielsen, with Liz Heron, the executive editor of The Huffington Post; Edward Roussel, Chief Innovation Officer at Dow Jones and The Wall Street Journal; Vivian Schiller, the former president and CEO of National Public Radio and now an independent strategist; and Tow’s own director Emily Bell.

The Reuters Institute Digital News Report and Panel

There is a great deal of information in the report, and texture in its geographical range, so we encourage everyone to go and look at the fact-sheets from particular countries. The full report is available here. Many of the numbers challenge industry expectations about consumer behavior—but not necessarily in a bad way. Take the figure that nine percent of English speakers pay for news. On the one hand, this number seems pretty dismal: how can an entire industry survive on nine percent? But on the other hand (as Vivian Schiller pointed out), nine percent approaches the percentage of NPR listeners who are also members—the pinnacle of a successful membership organization.

Nielsen highlighted other surprising findings. Video, in particular, did not grow as much as the industry expected. Just this month, a Facebook executive predicted that in five years, our news feeds will be entirely video. But the Report shows that readers are still resistant; 78 percent of of those sampled say they only read news or watch video occasionally.

news contentmain barriers

The Report suggests that pre-roll advertisements and the fact that it is faster to read than to watch may be contributing to this.

The other major finding in this vein was on reader trust in news and news brands. The Report finds that people like the concept of personalized news. But when they asked how that news should be personalized—through algorithms based on their news habits, through friends, or through editorial judgement of a news organization—, respondents backpedaled a bit. They prefer their own judgment first, then editorial judgment, and then their friends. But all of their approval numbers were less than 40 percent. And many people are worried about missing key information or other viewpoints.


In addition to this, the Report finds that when readers consume news on smartphones and social media, they may simply be unaware of the news organization from which they are reading. At the same time, though, people do have a pretty good idea of which news organizations they trust, but they don’t actually know what they’re reading.

As Nielsen astutely put it, these issues amount to a PR problem for journalists. People aren’t appreciating editorial opinion about what news gets onto our feeds, but at the same time, they do have a real appreciation of brands. How does that translate into business models for news organizations, especially when adblocking is growing, video isn’t as popular as predicted, and publishers aren’t seeing returns from the platforms they are now tied to?

The panel emphasized the amount of variation in the ways in which publishers approach experimentation with social platforms and how they are trying to build a sustainable business in this new environment. The contrast  between The Wall Street Journal’s strategy and The Huffington Post’s strategy is particularly stark.

The Wall Street Journal, which has embraced digital subscriptions since the very beginning, is cultivating its relationships with social media platforms very carefully. Roussel said they see Apple News for instance, as a way to introduce a new audience to WSJ content, which they can then drive back to the WSJ site and convert to digital subscriptions. But The Wall Street Journal is picking and choosing which platforms it feels are a “natural fit” for the brand.

But of course, digital subscription models are difficult to implement, and will only be successful for a small number of brands. The Huffington Post, as a very successful digital-born brand, has embraced a strategy where they work hand in hand with a large number of social media platforms.  Their strategy is to cultivate specific audiences on specific platforms, and to tailor Huffington Post content to those audiences. They have 79 Facebook pages, for instance, about 12 of which are robust with their own specific audiences. To the question of reader trust, Heron sees their social strategy as adapting editorial judgment to the new space: “You can express your editorial judgment in other ways [besides what goes on the front page] and get kind of exciting and abstract about it, and we’re very keen to do that.”

But as Vivian Schiller emphasized, while many publishers have been quick to embrace new technology on platforms, many of these creative ways of presenting stories are experimental, risky, and have yet to pay off in terms of revenue for the publishers. Take Facebook Instant Articles as an example. Many publishers were enthusiastic about jumping into Facebook Instant Articles, but they have not seen the returns: either financial returns or the increased engagement promised for stories that load more quickly.

Or video. Video is expensive to produce, and is hard to produce consistently. If readers aren’t actually flocking to video as expected, is it worth it to put so many resources into it? Again, the panel responses differed greatly by brand. Roussel noted that video shouldn’t be made just for the sake of video: it should be additive or natural to the story being told. Heron was much more optimistic about the future being in video. Huffington Post’s strategy, again, is to know the audience, and to make platform-specific content. But they do try to limit their video production to more enduring pieces and to live coverage. And Schiller pointed out that CPMs—costs per impression—are strong for video, which isn’t true of anything else besides branded content.

The takeaway is that no one strategy will work for every organization, and the current environment requires taking a lot of risks, as new platforms and modes of distributing news keep cropping up, and consistent returns on investment have not been figured out yet.

The Tow Center’s Preliminary Research on Publishers and Platforms

Tow Center’s brand new data on publishers and platforms dovetailed with this discussion, showing the sheer amount that publishers are now relying on platforms to distribute content. Platforms are publishers, whether they like it or not, said Emily Bell.

The research is based on week’s worth of tracking several different news organizations, where they post, and how much they post on each platform. Tow also reached out to a number of social media teams at various news organizations, and a number of platforms, to understand the different experiences they’re having working with each other. The slides from the presentation are embedded below, and the new research is accompanied by analysis by Bell at Columbia Journalism Review: “Who owns the news consumer: Social media platforms or publishers?

The Tow research breaks down which publishers are using what platforms at what rate. Some publishers are posting more on platforms than they are on their own sites. BuzzFeed said 80 percent of their reach is off their own site.


The above chart shows which social platforms a selection of publishers are using.

And they aren’t posting the same story in the same way to every platform; they now tailor their pieces—changing length, art, headline, etc.—to fit each platform. Some publishers are embracing native posts on social media platforms, such as Facebook Instant Articles, and which are driving traffic back to their own site. The Washington Post is now sharing the vast majority of their stories on Instant Articles, but The Wall Street Journal is driving most posts back to their own site.


graph3Each donut above shows what percentage of posts the publisher is making on social platforms are native Facebook posts (such as Instant Articles, videos, Live), as opposed to posts that link back to the publisher’s home site.

Tow also spoke with representatives from the platforms, many of whom felt that
they had a PR problem. Despite their variation, they are grouped together. And every time they change an algorithm, they get bad press.

Publishers are dying to get real metrics about how consumers on social media are interacting with their content. But many platforms are walled gardens: you have to be logged into to see any content on Snapchat, for instance. The biggest thing moving forward, apart from more research, is transparency on all sides. The platforms need to share metrics and be clearer about algorithmic changes with the publishers. But publishers also need to be transparent with each other.  One local publisher Tow spoke to expressed frustration about the uneven way the industry is moving forward:

I think the New York Times and the Washington Post did a disservice for a lot of us by jumping into bed with Facebook on Instant Articles so quickly without really scrutinizing [the deal]. It really ends up hurting us in the long haul.

And of course, readers should be informed about the support—financial or otherwise—they receive from platforms. The research suggested that some publishers were enthusiastic about working more closely  with each other to leverage power with the platforms.

We are facing the same anxieties that we did twenty years ago when news first started moving onto the web, and publishers are taking as varied approaches now as they did then. But as Liz Heron points out, there is an “appetite for collaboration” on both sides.

View the full Tow presentation below. For the full recording of the event, click here. The Reuters Institute 2016 Digital News Report is available here.

The Tow Center research team was lead by research director Claire Wardle, and research was conducted by Tow Fellows Pete Brown, Nushin Rashidian, Priyanjana Bengani, and Alex Gonclaves.

@risj_oxford | #DNP2016
@towcenter | #towpnp
Questions? Email

You Are Here: Two Approaches to Site-Specific Storytelling

When the You Are Here team first starting talking about potential sites for our devices, we knew there were a few key features that any potential location would need to have.

Access, of course, was one of the most important: You Are Here is about the culture and experience of a particular place, so our locations needed to be ones that were freely accessible to all kinds of people, and where they could really stop and spend a while. For that same reason, we also wanted spaces with visual and physical interest: while it’s tough to find a truly boring street corner in New York, standing in one place and staring at a building facade didn’t offer the kind of interaction we were looking for; our listeners are meant to be contributors as well, so choosing sites they could actively explore was essential. Perhaps most crucially, though, we wanted to engage with spaces that had both character and community: locations that were significant to the people who used and moved through them, and even significant to the broader life of the city itself.

After discussing our options and evaluating our constraints (the devices would need both power and substantial protection from the weather), we eventually selected two iconic New York City parks: Tompkins Square Park in the East Village, and the High Line in Chelsea. We liked that while both parks met our baseline criteria, they also embodied an important contrast: Tompkins Square Park has been long been a fixture in the city’s political life – as a site of tent cities and (occasionally anarchic) political movements – while the much more recent High Line is a prime example of carefully executed and well-polished central planning. These contrasts also offered a great opportunity for us to explore distinct approaches to engaging our listeners.

Tompkins Square Park: Beyond a Bellwhether

A recent resurgence in the city’s homeless population has once again brought media attention to Tompkins Square Park, which is often treated as a bellwhether for housing issues in the city. But does homelessness really dominate the character of the park? For this piece, You Are Here team member Benjamen Walker worked with audio producer Hillary Brueck to construct a site specific audio tour of Tompkins Square using this issue as a starting point.

On the tour listeners meet current and former homeless individuals, and a local who has lived across from the park since 1988, creating a piece that that would both outlive a given news cycle but still provide listeners with an opportunity to participate in a timely conversation was a real challenge.

For everyone who works with site-specific audio, tying it to a temporal event is dangerous,” says Walker. “I feel like this was a nice challenge, and shows that it’s something that you shouldn’t just block off – especially for projects with a local community.”

Anchoring the audio to a topic like homelessness – which is temporal, but also politically charged – also required delicacy and balance.

“The challenge became how to branch out from it,” says Walker. In working with Brueck, Walker says, “I wanted her to look at the homeless issue, but not be limited by it.”

“There are so many different communities in this park, from parents to punk rockers to sun bathers. You see people with cameras – especially in the spring with the hawks.”

While it does capture many of these voices, in the end Brueck and Walker’s piece is really meant to be a jumping off point for listeners who contribute their own observations about the park, which we’re eager to hear when the installation goes live in a few weeks.

The High Line: Not Just for Tourists

On the far side of town, the High Line has been a major tourist destination since it first opened in 2009, attracting over 5 million visitors a year. Though not even a decade old it has been – and continues to be – a major influence on both the sensibility and direction of the neighborhood.

“It’s one of New York’s newest sites, and it’s gotten its reputation as being a tourist site,” says Walker. But as producer Dasha Lisitsina illustrates in her audio collage, there are “a lot of New York City residents who are drawn to this place, for reasons which are quite surprising,” says Walker.

Part of the uniqueness of the High Line is its actual topography: it is a relatively narrow walkway that stretches from Gansevoort street to west 34th street.

“Because it’s this long strip, it’s kind of hard to describe as a place,” says Walker.

To overcome the difficulty of picking a “where” within this beautiful – but ultimately transient – space, Lisitsina ultimately decided to focus on “who.”

There are not only a lot of artists and musicians who are camping out there and doing something, there are also quite a lot of New Yorkers who use it,” says Walker.

Lisitsina’s audio collage introduces listeners to artists working or performing on the High Line, as well as tourists who have come to participate and marvel. The highlight of this tour though, are the New Yorkers who all have different reasons for coming to the High Line, and embody some of the themes that emerged through the editing process.

A lot of people are looking for a place to be more contemplative in the crowd,” says Walker, which he also sees as meshing well with the audio piece itself. “If you were wearing headphones and people watching – which is what people do there – it would be kind of great,” he says.

“The other theme that I think comes out is how artists are using the space,” Walker continues. “They’re all battling for a little atmosphere to connect with audiences and not step on each other – which also feels very New York, the battle over space.”

These very New York stories are exactly what we hope to bring together through You Are Here, and we look forward to collecting and sharing more of them as our installations go live in the next few weeks. Keep an eye out for more details on the Tow website and at!

Digital 8bit House