According to Cisco, the number of connected objects is expected to reach 50 billion by 2020, equating to 6.58 connected devices per person. They are all controlled by tiny computers that communicate with each other, in an ecosystem commonly known as the “Internet of Things” (IoT).
The IoT has implications for two distinct aspects of journalism – newsgathering and consumption. Smart devices connected to each other can be used to provide better context to a story, such as data on traffic, weather, population density or power consumption.
My team at the Associated Press recently provided our engineers with Raspberry Pis – small, easily programmable computers – and access to various sensors, so they could build innovative data-gathering prototypes and display them to coworkers.
We also explored cloud-based platforms, like Amazon’s IoT Cloud, that manage both smart devices and the terabytes of data generated by them, and that facilitate meaningful analysis and decision making.
“How can we make the Internet of Things revolution work for AP, both for our products and our journalism?” asked Vince Tripodi, our vice president of research and development. “That’s what we’re trying to find out.”
A few ideas for how news organizations can incorporate the Raspberry Pis and sensors into their reporting quickly materialized:
– We can monitor vibration and noise from entertainment and political venues to identify the most popular songs at a concert, or the biggest plays of a game, or even the quotes that resonate the most at campaign rallies.
– We can measure water quality in Rio de Janeiro or air quality in Beijing, validating data from environmental protection agencies. More broadly, we can track climate change through conditions of drought or other macro events.
– We can monitor vibrations to measure the impact of construction sites and how they affect nearby residents and businesses, or foot traffic at new and current public transportation stops to gauge their usage.
These new technologies will allow journalists to break more stories and dig deeper into them, further closing the gap between the media and technology industries. It’s not just the gathering of news that promises to be affected, though – how audiences consume news will also undoubtedly change.
Key drivers of the growing Internet of Things are connected cars and smart homes. The experience of talking to a dashboard in your car or asking a device at home to tell you the latest news all depends on personalized voice recognition and natural language processing.
As adoption of these smart devices grows, there will still be privacy, security and technical concerns that need to be worked through. The inconsistency of available data in lower population areas also represents a major challenge.
But it should be clear by now that the Internet of Things, like the internet in general, won’t simply go away. As media companies, we need to start thinking about how these new technologies can help us better inform the world.
Francesco Marconi is the Strategy Manager for The Associated Press and an innovation fellow at the Tow Center. Follow him @fpmarconi]]>
In order to get a sense of what the candidates and public are actually saying and how candidates communicate over time, we have taken a computational approach to predict categories of candidate-produced tweets and posts (as described in a blog post introducing the Illuminating 2016 project). We have been working on a system that automatically classifies each message into a category based on what the message is trying to do: urge people to act, change their opinions through persuasion, inform them about some activity or event, honoring or mourning people or holidays, or on Twitter having a conversation with members of the public. This blog aims to introduce you to how we use machine learning to predict category for candidates’ generated messages. The data currently presented on Illuminating 2016 is accurately categorized 77% of the time. For some categories, the accuracy is up to 84%, such as call-to-action, and strategic message types of advocacy and attack.
To predict presidential campaign message type, we used gubernatorial campaign data from 2014 to build initial categories and to train machine-learning models. And then, we tested the reliability of the best models built from the gubernatorial data and applied them to classify messages from the 2016 presidential campaign in real time. We’ve been collecting all of the announced major party candidates’ Twitter and Facebook posts since they declared their presidential bids. In all we have filled 6 servers with 24 presidential candidates’ social media messages, and of course we’re still collecting. The diagram below demonstrates how we use machine learning to train the models.
Diagram 1: Models training
To understand candidates’ social media message strategy, we collected the Facebook and Twitter messages produced by the campaign accounts of 79 viable candidates who ran for governor. The collections started September 15th when all states had completed their primaries and shifted into the general election phase, and continued through November 7th, three days after the election. We ended up with a total of 34,275 tweets and 9,128 Facebook posts. We categorized these messages by their performative speech categories of strategic message, informative, call-to-action, and ceremonial. Besides these, we also added non-English, and conversational categories (conversational only applies to Twitter). These categories allow us to understand when candidates advocate for themselves, go on the attack, urge supporters to act, and use the affordances of social media to interact with the public.
These categories were developed deductively and were revised based on inductive analysis. We trained annotators and refined the codebook over several rounds until two or more annotators could look at the same message and agree on the category. We generate an inter-coder agreement score to determine how easy or hard it is for humans to categorize the messages, and also to make sure our categories are clearly defined and mutually exclusive as much as possible. Our score of that agreement is Krippendorff’s Alpha of .70 or greater on all categories. After annotating data independently, annotators developed gold standard annotations, which means that two coders categorized the same messages and then where they disagreed on a category, they talked it through and decided which was the “best” category for that message. This labeled Twitter and Facebook messages by the candidates, generating 4,147 tweets and 2,493 Facebook messages as gold standard data.
We used these gold standard data as training data to build models, and then applied the best models to un-labeled candidates’ messages in Facebook and Twitter. Prior to models building, we represented text,added relevant language features and political characteristics for models training purpose. For example,
For algorithm building, using Scikit-Learn, we performed several experiments with the following multi-class classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB), MaxEnt/Logistic Regression, and Stochastic Gradient Descent (SGD). All classification tasks were evaluated with 10-fold cross validation. We use a micro-averaged F1 score to measure prediction accuracy (F1 score reaches its best value at 1 and worst at 0). For Twitter data, the best micro-averaged F1 score is 0.72, as shown in Table 1, by using a SGD classifier with a Boolean feature, with tweets feature starting with @_username, verb first and party feature. The F1 value of strategic message is up to 0.75. For Facebook data, the best micro-averaged F1 value is 0.73, by using Linear SVC classifier with a Boolean feature and party feature. The F1 value of call-to-action is up to 0.80. By comparison, the majority baseline for Twitter data is 37.6% (1559/4147) and 40.1% (999/2493) for Facebook. It should be noted that the F1 score of ceremonial messages is low. The reason for the lower score for this category is that there are far fewer of these messages, and they often express a wider range of features, making them harder to classify.
Table 1: Machine prediction performance for Main Categories in Tweeter and Facebook
For Strategic Message type prediction, we trained the classifiers with training data labeled as Strategic Message: 1,559 tweets and 860 Facebook posts. Each message is classified as either Advocacy or Attack. As shown in Table 2, the micro-averaged F1 scores of Twitter and Facebook data are 0.80 and 0.84. By comparison, the majority baseline for Twitter data is 69.4% (1082/1559) and 62.8% (540/860) for Facebook. Similarly, our Strategic Message’s focus classifiers were trained with the messages labelled as Strategic Message as well. Each message can be either Image, Issue or Endorsement. As shown in Table 3, the micro-averaged F1 scores of Strategic Message focus category in both Twitter and Facebook are 0.77. By comparison, the majority baseline for Twitter data is 48.2% (751/1559) and 50.6% (435/860) for Facebook.
Table 2: Machine prediction performance for Types of Strategic Messages in Tweeter and Facebook
Table 3: Machine prediction performance for Focus of Strategic Messages in Tweeter and Facebook
We can see that all the micro-averaged F1 scores reported above are much higher than the baseline scores. This suggests that the machine-annotating models have been trained to predict candidate-produced messages well.
We are still testing the reliability of the current best models on presidential campaign data. When using the above reported best models on 2989 human-corrected presidential Twitter data and 2638 Facebook data, we found that generally the models still worked well, as shown in Table 4. But, F1 score of Conversational category dropped 20%. We guess that there should be some differences between gubernatorial campaign data and presidential campaign data in this category, and we are currently investigating the possible reasons.
Table 4: Machine prediction performance of presidential data by using gubernatorial data as training data
We also did experiments only including presidential data as training data to test the model performance. For Facebook, we found that the model performs pretty well on predicting strategic messages (F1=0.77) and call-to-action (F1=0.86), as shown in Table 5.
Table 5: Machine prediction performance of presidential data by using presidential data as training data
Our next step is to do more experiments to improve the model, e.g., experimenting with binary classification, adding opinion classification and sentiment classification. We are applying the best models to predict category for messages generated by candidates in the 2016 presidential campaign now. In our Illuminating 2016 website, reporters and the public can understand presidential campaign messages type instantaneously. We are pulling public commentary on the election from social media and categorizing them now. You will see public commentary analysis in our website in August.
Thanks to Sikana Tanupambrungsun and Yatish Hegde at the School of Information Studies at Syracuse University for data collection and model training.]]>
It seems rare, these days, to encounter a conversation about the future of journalism that does not make some reference to the cluster of concepts known variously as design thinking, design practice, or human-centered design. Innovative news organizations, for example, are successfully deploying versions of this philosophy to develop journalism products with remarkably high user engagement. But there is much confusion over what design and design thinking really mean, especially in a journalistic context – never mind how the philosophy might actually be implemented with successful results.
This report first proposes a clearer definition of design – as a practice based on a set of processes and a mindset. It then suggests moving away from the phrase “design thinking,” which has become closely identified with a specific five-step process that could actually be limiting to news organizations. The report also identifies those types of problems, known as “wicked problems,” which could benefit most from the design process, arguing that many of the severe challenges journalism faces today belong to this category. Drawing on interviews with designers and journalists, and four in-depth studies of design in use – at BuzzFeed, The New York Times, National Public Radio, and AL.com – the report next explores concrete ways in which others might use these processes as a foundation for news innovation.
The research in this paper identifies several key benefits of design philosophy in creating new possibilities for journalism, including the ability to rapidly prototype and modify new products before major resources have been committed; to improve journalism by deeply understanding its role in the real lives of those who consume it; and to work directly with the communities in which news organizations are embedded to generate coverage and tell stories of direct relevance.
The report also sounds some cautionary notes. First, we must avoid seeking to fix the definition of design too rigidly into a specific sequence of steps that need always be followed, otherwise we risk undermining the very flexibility and responsiveness to context that are central benefits of the approach. Second, while embracing design’s emphasis on paying close attention to the needs and preferences of users, as journalists we must retain a commitment to reporting in the public interest, rather than making editorial decisions solely in favor of stories and products that bring the most success in financial terms.
This report specifies the following eight aspects as central to design in the context of journalism:
The report also identifies several primary applications of design in journalistic contexts, offering detailed suggestions for implementation in each case:
Read the full report, Guide to Journalism and Design, here, plus: Heather Chaplin’s article at Columbia Journalism Review on how The New York Times is incorporating design into audience research.]]>
Projects built with Muck are structured as a collection of interdependent steps (we refer to this structure as the “dependency graph”). With a simple naming convention and a little behind-the-scenes magic, Muck is able to infer the dependencies between source code and data files. With this information, Muck rebuilds only the parts of a project that have changed, and their dependent parts. For large datasets, the time savings can be dramatic, allowing users to explore their data with little overhead. Our goal is to provide an environment that encourages correctness and clarity of both code and data, while remaining fast, pragmatic and ergonomic. This post describes a few early results, specifically facilities for patching text and transforming records.
Muck fills a role similar to the traditional Unix tool Make (see: paper and reference). Make, first released in 1976, is a command-line tool for automatically building products (typically compiled executables) from source files in an incremental fashion; when asked to produce a particular product (or “target”), it will only rebuild relevant portions of the project that have changed since the last build. The process is directed by a “makefile”, which specifies, for each target product to be built, the list of source files and products that the target depends on (the “dependencies”), as well as the commands to build the product (the “recipe”). In essence, the developer explicitly writes out the project’s dependency graph in the makefile, thus enabling Make to perform incremental builds.
Muck improves on this paradigm by eliminating the need for the makefile. Instead, it examines the contents of source files whose names match those of the target products, and infers the dependency relationships. While Muck is certainly not the first system to boast this capability, it is notable for supporting both source and data formats common to data journalism.
Once we got the basic dependency calculations and build commands working, we began several test projects that could help guide development in a practical direction. One ongoing experiment parses and analyzes the Project Gutenberg version of Webster’s 1913 Unabridged Dictionary. The work required a variety techniques that are fundamental to data journalism: web scraping, error correction, and text parsing. The project is not yet finished, but our experience thus far has led to some interesting additions to Muck.
The dictionary text that we have been working with is quite messy. Its basic structure is straightforward, but we have encountered exceptions in the text at nearly every step of development. At over 100,000 entries, the dictionary is too large to correct by hand (it appears that there have been valiant efforts over the years, but a variety of problems remain), so getting good results is a real challenge.
The first, most glaring problem is that splitting the text into discrete records fails in a few places. These flaws are easily understood, but the code to correct them ranges from straightforward to convoluted.
The text contains a variety of obscure escape sequences (specific patterns of text intended to encode symbols or meaning that is not otherwise representable), some of which cannot be automatically parsed because the sequences also occur as legitimate, unescaped text. Simple find-and-replace operations using regular expressions yield lots of false positives.
Some flaws occur once or a handful of times, while others occur thousands of times. Time-efficient strategies for correcting rare versus common flaws tend to be quite different. Sadly, it seems that the only way to know whether it makes more sense to correct by hand or programmatically is to try both!
One interesting thing about the English dictionary (for a programmer at least) is that the text is much more mechanical than prose, but still not so rigidly defined that it can be parsed like a programming language. The pronunciations, parts of speech descriptions, etymologies, and even the definitions themselves are written in a systematic style (although the early Webster’s Dictionary is famous for its colorful definitions: James Somers’ blog post was our initial inspiration for the project). Nonetheless, there seems to be an exception to any syntactic rule that one might conceive, and like natural languages, some ambiguities can only be resolved by semantic understanding of the content. To make matters worse, crucial punctuation like matching parentheses and brackets are missing in some cases. Writing code to parse these various elements into structured data has been downright maddening.
Data cleaning can be a challenging, time consuming process. The primary goal of Muck is to make it easy to create perfectly reproducible data projects; a curious reader should be able to check out a project from a source repository like Github, run the `muck` command, and reproduce all of the computations that create the analysis. Ideally, the various processes that go into the computation will be well organized and easily audited for correctness.
Programming languages offer tremendous capabilities to automate such tasks, but such power comes with the risk of overcorrection. Sometimes it is easier (and less confusing) to make a correction by hand, rather than via code. However, simply altering the original source data is not a reproducible practice, and too many hand corrections make it impossible to properly fact-check a data-driven story.
A classic solution to this problem has been available in the Unix programming world for many years: the `diff` and `patch` tools. “Diffing” is the process of calculating a “diff” (also called a “delta” or “patch”) from two versions of the same document: the diff shows the edits needed to transform the original version into the modified version. “Patching” is the process of applying a patch file to an original document to produce the modified result. Traditionally, these tools have been used to track and communicate changes to program source files, and form the conceptual basis for modern version control systems like Git. However, thanks to the Unix tradition of designing tools to operate on text, `diff` and `patch` are easily applied to text-based data formats as well.
So, we added support for patch files. Muck treats them as just another source format, with single data dependency (the “original” file) and output (the “modified” file). The benefit was immediately obvious: patch files are human-readable, allowing the patches to function as a reproducible version of the “data diary” that many data journalists use. Unfortunately, the various traditional patch formats have some shortcomings for our purposes:
Of these, the most significant (and surprising) problem is that for some workflows, once a patch is created, it makes more sense to edit the patch directly rather than to edit the “modified” file and recompute the patch. This is especially true when reviewing a set of patches; occasionally we would find a typo or change our correction strategy part way through. Unix patch formats were not designed with hand-editing in mind. As an experiment, we created our own patch format and tool, called Pat. The pat format is similar to the traditional “unified diff” format, but addresses the above shortcomings directly, and provides us with a means of experimenting further with this sort of workflow. In particular we would like to add commenting syntax, escaping of control characters, line splitting, and intra-line highlighting for ease of use. Pat is still in early development, and currently lacks documentation, but the code can be found at https://github.com/gwk/pat.
While the patching methodology in Muck needs some refinement, it has already proved useful. We believe patching is an important tool for achieving reproducibility in real-world projects because it offers a middle road between manual editing and programmatic correction. The technique should be used with discretion though, and knowing when to switch strategies from patching to programmatic correction is largely a matter of experience. Choosing the right strategy for a given problem often requires experimentation.
Regardless of whether flaws are fixed by hand or via code, a fundamental challenge is to apply fixes without introducing new flaws. Good developers typically use some sort of testing framework to verify that their code works as intended, but how best to apply these methodologies to data problems is not obvious. Often, flaws are discovered in the course of implementing some end goal or feature, but are best fixed somewhere earlier in the data pipeline. The result is a disconnect between the logic that checks for a flaw and the process that fixes it.
Before proceeding, we should explain our use of the terms “check” and “test”. We make a distinction between “checking” for flaws in the data, with logic in the main pogram, and “testing” the program for logical flaws, via external testing code. This distinction becomes fuzzy if “checks” get factored out into scripts that are not part of the main computational graph, because once externalized they essentially become tests against the data. There are further elaborations to be explored (e.g. best practices regarding assertions and exceptions versus non-fatal logging in data cleaning code) but the main point is that traditional software testing methodologies do not map perfectly to the needs of data cleaning projects.
Once a data cleaning fix is implemented, the checking logic takes on a questionable role: either it issues a warning prior to the fix, or it is applied afterwards and remains silent. In the former case, the programmer quickly learns to ignore the message, and it only serves to obscure more meaningful warnings. In the latter, the developer has several options:
Removing the checking code is a undesirable, because doing so eliminates the evidence of the flaw and thus the primary explanation of why the fix exists. At the same time, code that never executes is a notorious liability; as projects evolve, unexercised code tends to become incorrect. Only the last option sounds promising, but just moving the code into a separate program does not ensure that the code will get executed or maintained. The broad question we need to address is this: what are good strategies for clarity and correctness when working with these sorts of checks and fixes? At the very least, factoring out the checking code from the program pipeline and into a side branch allows it to be read and run as part of a manual audit, or via automated testing.
As a first step, we implemented a new file source type in Muck, “.list”, which is simply a list of scripts to execute. By creating a top-level ‘test.list’ file listing the test scripts that have accumulated in the project, we can simply run `muck test` and all of the checks will run.
Standalone tests are useful, but by themselves they don’t make the work much easier. At worst, they become a dumping ground for old code. What we need is a way to show how the flaws and fixes are related to the data pipeline. Ideally, we would be able log records that fail checks, even after fixes have been added, so that reviewers can confirm that the fix behaves as intended. The intent of the code would also be much clearer if check, fix, and logging logic were colocated in some fashion.
All of this suggests a major limitation of the file-based perspective inherent to Muck: many operations apply on a per-record basis, rather than per-file. Thus, an emerging goal is to articulate and enable a per-record data cleaning strategy:
We want the check for a flaw, the application of a fix, and any reporting of the occurrence to be expressed together in a cohesive, clearly written unit. After some experimentation and quite a bit of supporting work, Muck now supports just such a workflow, via a function called `transform`. The technical description of how it works is more painful than using it, so we’ll start with an example:
with muck.transform('input.txt') as t: @t.keep def non_empty_lines(line): 'lines containing only whitespace are ommitted, without logging.' return bool(re.strip()) @t.drop def vulgar(line): 'profane lines are ommitted and logged.' return re.match(r'darn|gosh|shucks', line) @t.edit def (line): 'all occurrences of the word "tweet" are replaced with "trumpet",' 'preserving the leading capital; altered lines are logged.' return re.sub(r'([Tt])weet', r'1rumpet', line) @t.convert def br_tags(line): 'capitalize each line; no logging.' return line.capitalize() @t.flag def big_words(line): 'log lines with long words, but do not alter them.' return any(len(word) > 16 for word in line.split()) @t.put def out(line): 'write the final result line.' print(line, end='')
It’s worth admitting up front that `muck.transform` uses several advanced python features in combination – this makes it a bit tough to describe. `transform` is meant to be used at the top level of a python script, and just like `muck.source`, it takes the name of the input data as its first parameter. It returns a `Transformer` object (`t` in the example), which provides several methods that serve as function decorators. Each decorator indicates a different kind of transformation. When applied to a function, a decorator adds a new stage to the pipeline.
All transformation functions must take a record as their sole parameter. Each kind of transformation has a different behavior:
All stages are applied to each record in the order in which they were decorated. For those modes that feature automatic logging, the complete effect of the transformation is reported in a dedicated file, without the user having written any reporting code. This leads to a much better verification experience for the user, because Muck’s logging facilities are fancier and more organized than the typical debugging code the user would write. The logging feature also obviates the need for such clutter in project code.
For the technically inclined: there is a `run` method that actually performs the transformation on the input sequence. Note that `run` is not called in our example; instead we use the `with` form to treat the `Transformer` `t` as a Python “context manager”. The `__exit__` method of `Transformer`, which is automatically called at the end of the `with` scope, simply calls `run`. This usage pattern is optional and preferred purely as a convenience.
Our experience with `transform` so far has been that it speeds up development by letting us alter and rearrange multiple stages easily. The individual stages tend to be quite simple and easy to read, whereas our previous solutions were larger blocks of code, often inside nested `for` loops, which are more difficult to reason about. The automatic logging performed by `transform` makes it easy to verify that a given stage is performing as intended, which saves even more time over the course of a project.
Muck is not yet a mature tool, but our experience with it thus far has been promising. The framework it provides for data scripting tasks reduces clutter and boilerplate (an industry term for uninteresting, repetitive setup code), which lowers the effort required to create or alter steps in the project dependency graph. As a result, the conceptual clarity of our test project improved over time, as we gradually organized the code into small, meaningful pieces. This sort of progression stands in contrast to our prior experiences, in which monolithic analysis scripts became increasingly convoluted over the course of development.
The patch and transform techniques demonstrated here are conceptually simple, but they dramatically improve the programming experience for certain kinds of tasks common to data journalism. As we develop Muck further, we hope to identify other classes of problems which can be made less painful with support from the build system. If you have any ideas, let us know!
As journalists, news organizations and citizens increasingly turn to social platforms as their principal means of publishing, are there enough safeguards against tech giants misusing their new power?
BuzzFeed Editor in Chief Ben Smith and BuzzFeed Assistant General Counsel Nabiha Syed recently authored an article which put the case for a First Amendment for social platforms: “We are suggesting that the platforms make a public commitment not just to opaque and ad hoc rules, but to time-honored principles and process.”
As BuzzFeed takes on the platforms, join us for a panel discussion at Columbia Journalism School:
Ben Smith, BuzzFeed Editor-in-Chief
Nabiha Syed, BuzzFeed Assistant General Counsel
Stuart Karle, William J. Brennan Jr. Visiting Professor of First Amendment Issues at Columbia Journalism School
Emily Bell, Director of the Tow Center for Digital Journalism
livestream will begin June 23 at 9:00am ETD
On June 21, the Tow Center hosted “Digital News in a Distributed Environment” at Columbia Journalism School, which featured the US launch of the Reuters Institute for the Study of Journalism’s 2016 Digital News Report, as well as the release of preliminary findings from a new research project by the Tow Center examining the relationship between social platforms and publishers.
The Report, which is the largest ongoing international study about news consumption, supports what we know about the growth of smartphones and social media for finding news, but also looks below these macro trends to reveal a number of surprising results about how people consume news worldwide.
Rasmus Klein Nielsen, the Director of Research at the Institute, presented the research, which is drawn from a YouGov survey of 50,000 online users and 26 countries, as well as a series of focus groups. The presentation was then followed by a panel led by Nielsen, with Liz Heron, the executive editor of The Huffington Post; Edward Roussel, Chief Innovation Officer at Dow Jones and The Wall Street Journal; Vivian Schiller, the former president and CEO of National Public Radio and now an independent strategist; and Tow’s own director Emily Bell.
The Reuters Institute Digital News Report and Panel
There is a great deal of information in the report, and texture in its geographical range, so we encourage everyone to go and look at the fact-sheets from particular countries. The full report is available here. Many of the numbers challenge industry expectations about consumer behavior—but not necessarily in a bad way. Take the figure that nine percent of English speakers pay for news. On the one hand, this number seems pretty dismal: how can an entire industry survive on nine percent? But on the other hand (as Vivian Schiller pointed out), nine percent approaches the percentage of NPR listeners who are also members—the pinnacle of a successful membership organization.
Nielsen highlighted other surprising findings. Video, in particular, did not grow as much as the industry expected. Just this month, a Facebook executive predicted that in five years, our news feeds will be entirely video. But the Report shows that readers are still resistant; 78 percent of of those sampled say they only read news or watch video occasionally.
The Report suggests that pre-roll advertisements and the fact that it is faster to read than to watch may be contributing to this.
The other major finding in this vein was on reader trust in news and news brands. The Report finds that people like the concept of personalized news. But when they asked how that news should be personalized—through algorithms based on their news habits, through friends, or through editorial judgement of a news organization—, respondents backpedaled a bit. They prefer their own judgment first, then editorial judgment, and then their friends. But all of their approval numbers were less than 40 percent. And many people are worried about missing key information or other viewpoints.
In addition to this, the Report finds that when readers consume news on smartphones and social media, they may simply be unaware of the news organization from which they are reading. At the same time, though, people do have a pretty good idea of which news organizations they trust, but they don’t actually know what they’re reading.
As Nielsen astutely put it, these issues amount to a PR problem for journalists. People aren’t appreciating editorial opinion about what news gets onto our feeds, but at the same time, they do have a real appreciation of brands. How does that translate into business models for news organizations, especially when adblocking is growing, video isn’t as popular as predicted, and publishers aren’t seeing returns from the platforms they are now tied to?
The panel emphasized the amount of variation in the ways in which publishers approach experimentation with social platforms and how they are trying to build a sustainable business in this new environment. The contrast between The Wall Street Journal’s strategy and The Huffington Post’s strategy is particularly stark.
The Wall Street Journal, which has embraced digital subscriptions since the very beginning, is cultivating its relationships with social media platforms very carefully. Roussel said they see Apple News for instance, as a way to introduce a new audience to WSJ content, which they can then drive back to the WSJ site and convert to digital subscriptions. But The Wall Street Journal is picking and choosing which platforms it feels are a “natural fit” for the brand.
But of course, digital subscription models are difficult to implement, and will only be successful for a small number of brands. The Huffington Post, as a very successful digital-born brand, has embraced a strategy where they work hand in hand with a large number of social media platforms. Their strategy is to cultivate specific audiences on specific platforms, and to tailor Huffington Post content to those audiences. They have 79 Facebook pages, for instance, about 12 of which are robust with their own specific audiences. To the question of reader trust, Heron sees their social strategy as adapting editorial judgment to the new space: “You can express your editorial judgment in other ways [besides what goes on the front page] and get kind of exciting and abstract about it, and we’re very keen to do that.”
But as Vivian Schiller emphasized, while many publishers have been quick to embrace new technology on platforms, many of these creative ways of presenting stories are experimental, risky, and have yet to pay off in terms of revenue for the publishers. Take Facebook Instant Articles as an example. Many publishers were enthusiastic about jumping into Facebook Instant Articles, but they have not seen the returns: either financial returns or the increased engagement promised for stories that load more quickly.
Or video. Video is expensive to produce, and is hard to produce consistently. If readers aren’t actually flocking to video as expected, is it worth it to put so many resources into it? Again, the panel responses differed greatly by brand. Roussel noted that video shouldn’t be made just for the sake of video: it should be additive or natural to the story being told. Heron was much more optimistic about the future being in video. Huffington Post’s strategy, again, is to know the audience, and to make platform-specific content. But they do try to limit their video production to more enduring pieces and to live coverage. And Schiller pointed out that CPMs—costs per impression—are strong for video, which isn’t true of anything else besides branded content.
The takeaway is that no one strategy will work for every organization, and the current environment requires taking a lot of risks, as new platforms and modes of distributing news keep cropping up, and consistent returns on investment have not been figured out yet.
The Tow Center’s Preliminary Research on Publishers and Platforms
Tow Center’s brand new data on publishers and platforms dovetailed with this discussion, showing the sheer amount that publishers are now relying on platforms to distribute content. Platforms are publishers, whether they like it or not, said Emily Bell.
The research is based on week’s worth of tracking several different news organizations, where they post, and how much they post on each platform. Tow also reached out to a number of social media teams at various news organizations, and a number of platforms, to understand the different experiences they’re having working with each other. The slides from the presentation are embedded below, and the new research is accompanied by analysis by Bell at Columbia Journalism Review: “Who owns the news consumer: Social media platforms or publishers?”
The Tow research breaks down which publishers are using what platforms at what rate. Some publishers are posting more on platforms than they are on their own sites. BuzzFeed said 80 percent of their reach is off their own site.
The above chart shows which social platforms a selection of publishers are using.
And they aren’t posting the same story in the same way to every platform; they now tailor their pieces—changing length, art, headline, etc.—to fit each platform. Some publishers are embracing native posts on social media platforms, such as Facebook Instant Articles, and which are driving traffic back to their own site. The Washington Post is now sharing the vast majority of their stories on Instant Articles, but The Wall Street Journal is driving most posts back to their own site.
Each donut above shows what percentage of posts the publisher is making on social platforms are native Facebook posts (such as Instant Articles, videos, Live), as opposed to posts that link back to the publisher’s home site.
Tow also spoke with representatives from the platforms, many of whom felt that they had a PR problem. Despite their variation, they are grouped together. And every time they change an algorithm, they get bad press.
Publishers are dying to get real metrics about how consumers on social media are interacting with their content. But many platforms are walled gardens: you have to be logged into to see any content on Snapchat, for instance. The biggest thing moving forward, apart from more research, is transparency on all sides. The platforms need to share metrics and be clearer about algorithmic changes with the publishers. But publishers also need to be transparent with each other. One local publisher Tow spoke to expressed frustration about the uneven way the industry is moving forward:
I think the New York Times and the Washington Post did a disservice for a lot of us by jumping into bed with Facebook on Instant Articles so quickly without really scrutinizing [the deal]. It really ends up hurting us in the long haul.
And of course, readers should be informed about the support—financial or otherwise—they receive from platforms. The research suggested that some publishers were enthusiastic about working more closely with each other to leverage power with the platforms.
We are facing the same anxieties that we did twenty years ago when news first started moving onto the web, and publishers are taking as varied approaches now as they did then. But as Liz Heron points out, there is an “appetite for collaboration” on both sides.
View the full Tow presentation below. For the full recording of the event, click here. The Reuters Institute 2016 Digital News Report is available here.
The Tow Center research team was lead by research director Claire Wardle, and research was conducted by Tow Fellows Pete Brown, Nushin Rashidian, Priyanjana Bengani, and Alex Gonclaves.
JOIN THE ONLINE CONVERSATION:
@risj_oxford | #DNP2016
@towcenter | #towpnp
Questions? Email email@example.com
Access, of course, was one of the most important: You Are Here is about the culture and experience of a particular place, so our locations needed to be ones that were freely accessible to all kinds of people, and where they could really stop and spend a while. For that same reason, we also wanted spaces with visual and physical interest: while it’s tough to find a truly boring street corner in New York, standing in one place and staring at a building facade didn’t offer the kind of interaction we were looking for; our listeners are meant to be contributors as well, so choosing sites they could actively explore was essential. Perhaps most crucially, though, we wanted to engage with spaces that had both character and community: locations that were significant to the people who used and moved through them, and even significant to the broader life of the city itself.
After discussing our options and evaluating our constraints (the devices would need both power and substantial protection from the weather), we eventually selected two iconic New York City parks: Tompkins Square Park in the East Village, and the High Line in Chelsea. We liked that while both parks met our baseline criteria, they also embodied an important contrast: Tompkins Square Park has been long been a fixture in the city’s political life – as a site of tent cities and (occasionally anarchic) political movements – while the much more recent High Line is a prime example of carefully executed and well-polished central planning. These contrasts also offered a great opportunity for us to explore distinct approaches to engaging our listeners.
Tompkins Square Park: Beyond a Bellwhether
A recent resurgence in the city’s homeless population has once again brought media attention to Tompkins Square Park, which is often treated as a bellwhether for housing issues in the city. But does homelessness really dominate the character of the park? For this piece, You Are Here team member Benjamen Walker worked with audio producer Hillary Brueck to construct a site specific audio tour of Tompkins Square using this issue as a starting point.
On the tour listeners meet current and former homeless individuals, and a local who has lived across from the park since 1988, creating a piece that that would both outlive a given news cycle but still provide listeners with an opportunity to participate in a timely conversation was a real challenge.
“For everyone who works with site-specific audio, tying it to a temporal event is dangerous,” says Walker. “I feel like this was a nice challenge, and shows that it’s something that you shouldn’t just block off – especially for projects with a local community.”
Anchoring the audio to a topic like homelessness – which is temporal, but also politically charged – also required delicacy and balance.
“The challenge became how to branch out from it,” says Walker. In working with Brueck, Walker says, “I wanted her to look at the homeless issue, but not be limited by it.”
“There are so many different communities in this park, from parents to punk rockers to sun bathers. You see people with cameras – especially in the spring with the hawks.”
While it does capture many of these voices, in the end Brueck and Walker’s piece is really meant to be a jumping off point for listeners who contribute their own observations about the park, which we’re eager to hear when the installation goes live in a few weeks.
The High Line: Not Just for Tourists
On the far side of town, the High Line has been a major tourist destination since it first opened in 2009, attracting over 5 million visitors a year. Though not even a decade old it has been – and continues to be – a major influence on both the sensibility and direction of the neighborhood.
“It’s one of New York’s newest sites, and it’s gotten its reputation as being a tourist site,” says Walker. But as producer Dasha Lisitsina illustrates in her audio collage, there are “a lot of New York City residents who are drawn to this place, for reasons which are quite surprising,” says Walker.
Part of the uniqueness of the High Line is its actual topography: it is a relatively narrow walkway that stretches from Gansevoort street to west 34th street.
“Because it’s this long strip, it’s kind of hard to describe as a place,” says Walker.
To overcome the difficulty of picking a “where” within this beautiful – but ultimately transient – space, Lisitsina ultimately decided to focus on “who.”
“There are not only a lot of artists and musicians who are camping out there and doing something, there are also quite a lot of New Yorkers who use it,” says Walker.
Lisitsina’s audio collage introduces listeners to artists working or performing on the High Line, as well as tourists who have come to participate and marvel. The highlight of this tour though, are the New Yorkers who all have different reasons for coming to the High Line, and embody some of the themes that emerged through the editing process.
“A lot of people are looking for a place to be more contemplative in the crowd,” says Walker, which he also sees as meshing well with the audio piece itself. “If you were wearing headphones and people watching – which is what people do there – it would be kind of great,” he says.
“The other theme that I think comes out is how artists are using the space,” Walker continues. “They’re all battling for a little atmosphere to connect with audiences and not step on each other – which also feels very New York, the battle over space.”
These very New York stories are exactly what we hope to bring together through You Are Here, and we look forward to collecting and sharing more of them as our installations go live in the next few weeks. Keep an eye out for more details on the Tow website and at youarehere.network!]]>
If her post is interesting enough (and it certainly might not be—aren’t podcasts for geeks and your weird comedy friends?), you may click on the link. The link takes you to the podcast’s website, where you can stream an HOUR-LONG episode. If you do hit play on the podcast (and again, this if is a strong one), you’re subject to two minutes of ads and an intro to the show.
As this all unfolds you’re likely on the train, in line at the coffee shop, or procrastinating at work. You probably don’t have time for this hour-long podcast nonsense. You hit the back button. You never hear the show.
Compare this to how people discover video and print online—by way of little segments that are easy to share on social media, whether it be in gifs, images with captions or highlighted and screen-capped sections of text. Users can perform their fandom by spreading samples of the work they love and drawing their social network in. That’s how content goes viral: it’s how a video can garner 22 million views.
If we want audio storytelling to reach similarly wide audiences, people have to be able to sample it in the same way. We believe the fact that podcasts can’t be easily snipped and shared online is inhibiting the growth of the podcast industry. So our team has been attempting to change that.
Over the past few months, with funding from the Tow Center for Digital Journalism and the Knight Foundation Prototype Fund, This American Life has hired a team of developers and designers to solve the problem how to share audio online. We currently have a functioning prototype that allows a user to easily access the entirety of This American Life’s 20 years of content, scroll through it to find their favorite clips, convert those audio clips into beautiful, transcribed .mp4 videos, and share those videos on social media. We’ve worked hard with our UX and UI designers to make this process intuitive, fast, and fun.
We hope that if social media users are inundated with moving, funny or tragic audio snippets from our episodes shared by their friends, they will feel far more compelled to listen to our podcast. Maybe their narrow ideas about podcasts will be challenged. Maybe they will even become new fans of our show. We also hope our tool will allow listeners to meme-ify and play with audio—again, performing their fandom.
Our project will be open-source, so any podcast creator can adapt this technology to their own website and their own catalogue. We don’t just want to make it easier to share This American Life audio—we want it to be easier to share all audio.]]>
“You Are Here” is an experiment in using digital technologies to represent and engage truly local communities, by offering content and interaction that is only available in a particular place. Using small, inexpensive, open-source wireless routers to deliver compelling, location-specific content, “You Are Here” seeks to facilitate conversations that are informed by the character of the physical spaces where they are installed, and which, in turn, shape the lives of the people who live, work, and play there.
For the past several months, the You Are Here team has been working on bringing together the many different pieces required to make this happen: we have been building new technologies that make it simple for anyone to setup and run a web server on inexpensive hardware setup that can fit inside a breadbox.
The You Are Here web server, which can run a standalone wifi network anywhere you have an outlet.
We have also been creating audio stories that share a sampling of the issues and voices present in each of our two sites: Tompkins Square Park and the High Line. And we’ve been reaching out to local businesses and non-profits to find partners who can host our devices and help us connect with the local community that we hope will contribute to and grow the stories we’ve seeded.
In the coming weeks, we’re excited to details about where you can find You Are Here and what we’ve learned from making site-specific audio, tiny web servers and new connections. We’ll also share news about launch events and community contributions, and invite you to share your ideas about how the You Are Here technology could be used by other communities in other ways.
In the meantime, feel free to take a look at our (Internet-accessible) website at youarehere.network, and when the time comes we hope you’ll join the conversation!
On May 12, Tow Center for Digital Journalism launched Guide to SecureDrop, a research project led by Tow Fellow Charles Berret, with a panel discussion at Columbia Journalism School. The report is available to download and read at the Tow Center’s GitBook repository.
SecureDrop is a platform that enables sources to contact journalists securely and anonymously through encrypted communications. It has been dubbed “a WikiLeaks for every newsroom,” and is currently used in roughly twelve news organizations, including The Guardian, The New Yorker, ProPublica, The Washington Post and The Globe and Mail. Another dozen independent journalists and non-profits are using the system, and there is a waiting list of about 75.
Berret’s report includes, for the first time, interviews with many of the journalists using SecureDrop. “The findings are promising,” said Berret at the event. “The impact of SecureDrop as a reporting tool is reported to be strong. The typical contents of SecureDrop inboxes fall into several categories. There is little variation in methods of newsroom coordination, but publishers differ on whether or not to cite the origin of SecureDrop stories.”
The panelists discussed at length why such a system, which costs over $2,000 and is less convenient to use than common communication applications, is necessary. “A Pew poll last year showed that 64% of investigative journalists believe that they are under surveillance, and almost half of them changed their practices in light of these concerns,” said Berret. “SecureDrop responds to a real need for more encryption.”
“In the past few years, the Obama administration had prosecuted more sources of journalists than all other administrations combined. One of the reasons why they have been able to do this is their surveillance capabilities,” said Trevor Timm, co-founder and executive director of the Freedom of the Press Foundation. “When you look at the indictments of these sources, you see that they have conducted some sort of digital surveillance on the sources, on the journalists, or on both.
“It has become increasingly important for journalists to start protecting sources, not after the fact, but immediately when they start talking to them,” said Timm. “There are many advantages to Using Gmail, but it puts your communications in the hands of a third party. They can have a court order served on them for all your communications, a gag order will be attached, and you may never know. SecureDrop tries to put that power back in the hands of journalists. There are no third parties involved when you set up SecureDrop in your news organization. Each organization owns its own SecureDrop system, and even we don’t have access to any of the data.”
In recent years, Susan McGregor, Assistant Director of the Tow Center for Digital Journalism and Assistant Professor at Columbia Journalism School, has been studying digital security practices among journalists. “Before the Snowden revelations, there wasn’t great awareness of the surveillance capabilities that are a side effect of electronic communications,” said McGregor. “We now see a second phase – an understanding that this is not just about national security reporting. Journalists who cover a wide range of themes, like finance or regulation, are appreciating the risks that come with that.”
“Default encryption is the direction we are seeing things move, and journalists too need to make this a default. News organizations should say: ‘This is how you talk to us.’”
Garrett Robinson, the Lead Developer on SecureDrop and one of the panelists, provided a quick demo. “The goal is to make leaking fairly easy and fairly safe,” Robinson explained. “SecureDrop allows journalists to ask the source questions without knowing who they are talking to, which gives them plausible deniability.”
“SecureDrop is pretty onerous for journalists to use, and in our training we help them find the workflow. When you make something easy to use, it is almost always the case that you will sacrifice security. We are trying to balance that and bring it back to usability,” said Robinson.
Betsy Reed, Editor-in-Chief of The Intercept, reported that implementing SecureDrop granted The Intercept access to a wider range of stories. “We are seeing a higher volume of submissions and a higher quality of material. One thing came through that looked interesting, but had nothing to do with anything The Intercept is doing. The source heard about our use of SecureDrop, and that is why they came to us. We can now give it to someone else who does not have access to the system.”
According to Barret, “The Globe and Mail said that if they get one story a year from SecureDrop, it will be worth the expense. They immediately got a story, and continue to get stories since.”
Mike Tigas, a News Applications Developer at ProPublica, emphasized that “SecureDrop tries to make it as easy for the source as possible. If a source comes to you and says ‘I have this thing and I want to give it to you securely,’ there is already a record of that. SecureDrop provides a first contact that is more secure than what sources would normally do otherwise.”
“We got to a point where it is relatively easy for a source to use SecureDrop,” said Timm. “This is important, because we can spend a day or two in a newsroom and teach them tricks for keeping things as secure as possible, but it is impossible to train sources that we don’t know.”
Efrat Nechushtai is a current Ph.D. candidate at the Columbia University Graduate School of Journalism.]]>