How It's Made, Research

Hyper-compensation: Ted Nelson and the impact of journalism

0

NewsLynx is a Tow Center research project and platform aimed at better understanding the impact of news. It is conducted by Tow Fellows Brian Abelson, Stijn DeBrouwere & Michael Keller.

“If you want to make an apple pie from scratch, you must first invent the universe.” — Carl Sagan

Before you can begin to measure impact, you need to first know who’s talking about you. While analytics platforms provide referrers, social media sites track reposts, and media monitoring tools follow mentions, these services are often incomplete and come with a price. Why is it that, on the internet — the most interconnected medium in history — tracking linkages between content is so difficult?

The simple answer is that the web wasn’t built to be *fully* connected, per se. It’s an idiosyncratic, labyrinthine garden of forking paths with no way to navigate from one page to pages that reference it.

We’ve spent the last few months thinking about and building an analytics platform called NewsLynx which aims to help newsrooms better capture the quantitative and qualitative effects of their work. Many of our features are aimed at giving newsrooms a better sense of who is talking about their work. This seemingly simple feature, to understand the links among web pages, has taken up the majority of our time. This obstacle turns out to be a shortcoming in the fundamental architecture of the web. But without it, however, the web might never have succeeded.

The creator of the web, Tim Berners Lee didn’t provide a means for contextual links in the specification for HTML. The world wide web wasn’t the only idea for networking computers, however. Over 50 years ago an early figure in computing had a different vision of the web – a vision that would have made the construction of NewsLynx a lot easier today, if not completely unnecessary.

Around 1960, a man named Ted Nelson came up with an idea for a structure of linking pieces of information in a two-way fashion. Whereas links on the web today just point one way — to the place you want to go — pages on Nelson’s internet would have a “What links here?” capability so would know all the websites that point to your page.

And if you were dreaming up the ideal information web, this structure makes complete sense: why not make the most connections possible? As Borges writes, “I thought of a labyrinth of labyrinths, of one sinuous spreading labyrinth that would encompass the past and the future and in some way involve the stars.”

Nelson called his project Xanadu, but it had the misfortune of being both extremely ahead of its time and incredibly late to the game. Project Xanadu’s first and somewhat cryptic release debuted this year: over 50 years after it was first conceived.

In the mean time, Berners-Lee put forward HTML with its one-way links, in the early 90s and it took off into what we know today. And one of the reasons for the web’s success is its extremely informal, ad-hoc functionality: anyone can put up an HTML page and without hooking into or caring about a more elaborate system. Compared to Xanadu, what we use today is the quick and dirty implementation of a potentially much richer and also much harder to maintain ecosystem.

Two-way linking would make not only impact research easier but also a number of other problems on the web. In his latest book “Who Owns the Future?”, Jaron Lanier discusses two-way linking as a potential solution to copyright infringement and a host of other web maladies. His logic is that if you could always know who is linking where, then you could create a system of micropayments to make sure authors get proper credit. His idea has its own caveats, but it shows the systems that two-way linking might enable. Chapter Seven of Lanier’s book discusses some of the other reasons Nelson’s idea never took off.

The desire for two-way links has not gone away, however. In fact, the *lack* of two-way links is an interesting lens through which to view the current tech environment. By creating a central server that catalogs and makes sense of the one-way web, Google’s adds value with its ability to make the internet seem more like Project Xanadu. If two-way links existed, you wouldn’t need all of the features of Google Analytics. People could implement their own search engines with their own page rank algorithms based on publicly available citation information.

The inefficiency of one-way links left a hole at the center of the web for a powerful player to step in and play librarian. As a result, if you want to know how your content lives online, you have to go shopping for analytics. To effectively monitor the life of an article, newsrooms currently use a host of services from trackbacks and Google Alerts to Twitter searches and ad hoc scanning. Short link services break web links even further. Instead of one canonical URL for a page, you can have a bit.ly, t.co, j.mp or thousands of other custom domains.

NewsLynx doesn’t have the power of Google. But, we have been working on a core feature that would leverage Google features and other two-way link surfacing techniques to make monitoring the life of an article much easier: we’re calling them “recipes”, for now (#branding suggestions welcome). In NewsLynx, you’ll add these “recipes” to the system and it will alert you of all pending mentions in one filterable display. If a citation is important, you can assign it to an article or onto your organization more generally. We also have a few built-in recipes to get you started.

We’re excited to get this tool into the hands of news sites and see how it helps them better understand their place in the world wide web. As we prepare to launch the platform in the next month or so, check back here for any updates.

Past Events

Why We Like Pinterest for Fieldwork: Research by Nikki Usher and Phil Howard

4

Nikki Usher, GWU

Phil Howard, UW and CEU

7/16/2014

Anyone tackling fieldwork these days can chose from a wide selection of digital tools to put in their methodological toolkit. Among the best of these tools are platforms that let you archive, analyze, and disseminate at the same time. It used to be that these were fairly distinct stages of research, especially for the most positivist among us. You came up with research questions, chose a field site, entered the field site, left the field site, analyzed your findings, got them published, and shared your research output with friends and colleagues.

 

But the post-positivist approach that many of us like involves adapting your research questions—reflexively and responsively—while doing fieldwork. Entering and leaving your field site is not a cool, clean and complete process. We analyze findings as we go, and involve our research subjects in the analysis. We publish, but often in journals or books that can’t reproduce the myriad digital artifacts that are meaningful in network ethnography. Actor network theory, activity theory, science and technology studies and several other modes of social and humanistic inquiry approach research as something that involves both people and devices. Moreover, the dissemination of work doesn’t have to be something that happens after publication or even at the end of a research plan.

 

Nikki’s work involves qualitative ethnographic work at field sites where research can last from five months to a brief week visit to a quick drop in day. She learned the hard way from her research for Making News at The New York Times that failing to find a good way to organize and capture images was a missed opportunity post-data collection. Since then, Nikki’s been using Pinterest for fieldwork image gathering quite a bit. Phil’s work on The Managed Citizen was set back when he lost two weeks of field notes on the chaotic floor of the Republican National Convention in 2000 (security incinerates all the detritus left by convention goers). He’s been digitizing field observations ever since.

 

Some people put together personal websites about their research journey. Some share over Twitter. And there are plenty of beta tools, open source or otherwise, that people play with. We’ve both enjoyed using Pinterest for our research projects. Here are some points on how we use it and why we like it.

 

How To Use It

  1. When you start, think of this as your research tool and your resource.   If you dedicate yourself to this as your primary archiving system for digital artifacts you are more likely to build it up over time. If you think of this as a social media publicity gimmick for your research, you’ll eventually lose interest and it is less likely to be useful for anyone else.
  2. Integrate it with your mobile phone because this amps up your capacity for portable, taggable, image data collection.
  3. Link the board posts to Twitter or your other social media feeds. Pinterest itself isn’t that lively a place for researchers yet. The people who want to visit your Pinterest page are probably actively following your activities on other platforms so be sure to let content flow across platforms.
  4. Pin lots of things, and lots of different kinds of things. Include decent captions though be aware that if you are feeding Twitter you need to fit character limits.
  5. Use it to collect images you have found online, images you’ve taken yourself during your fieldwork, and invite the communities you are working with to contribute.
  6. Backup and export things once in a while for safe keeping. There is no built-in export function, but there are a wide variety of hacks and workarounds for transporting your archive.

 

What You Get

  1. Pinterest makes it easy to track the progress of the image data you gather. You may find yourself taking more photos in the field because they can be easily arranged, saved and categorized.
  2. Using it regularly adds another level of data as photos and documents captured on phone and then added on Pinterest can be quickly field captioned and then re-catalogued, giving you a chance to review the visual and built environment of your field site and interrogate your observations afresh.
  3. Visually-enhanced constant comparative methods: post-data collection, you can go beyond notes to images and captions that are easily scanned for patterns and points of divergence. This may be going far beyond what Glaser and Strauss had imagined, of course.
  4. Perhaps most important, when you forget what something looks like when you’re writing up your results, you’ve got an instant, easily searchable database of images and clues to refresh your memory.

Why We Like It

  1. It’s great for spontaneous presentations. Images are such an important part of presenting any research. Having a quick publically accessible archive of content allows you to speak, on the fly, about what you are up to. You can’t give a tour of your Pinterest page for a job talk. But having the resource there means you can call on images quickly during a Q&A period, or quickly load something relevant on a phone or browser during a casual conversation about your work.
  2. It gives you a way to interact with subjects. Having the Pinterest link allows you to show a potential research subject what you are up to and what you are interested in. During interviews it allows you to engage people on their interpretation of things. Having visual prompts handy can enrich and enliven any focus group or single subject interview. These don’t only prompt further conversation, they can prompt subjects to give you even more links, images, videos and other digital artifacts.
  3. It makes your research interests transparent. Having the images, videos and artifacts for anyone to see is a way for us to show what we are doing. Anyone with interest in the project and the board link is privy to our research goals. Our Pinterest page may be far less complicated than many of our other efforts to explain our work to a general audience.
  4. You can disseminate as you go. If you get the content flow right, you can tell people about your research as you are doing it. Letting people know about what you are working on is always a good career strategy. Giving people images rather than article abstracts and draft chapters gives them something to visualize and improves the ambient contact with your research community
  5. It makes digital artifacts more permanent. As long as you keep your Pinterest, what you have gathered can become a stable resource for anyone interested in your subjects. As sites and material artifacts change, what you have gathered offers a permanent and easily accessible snapshot of a particular moment of inquiry for posterity.

 

Pinterest Wish-list

One of us is a Windows Phone user (yes really) and it would be great if there was a real Pinterest app for the Windows Phone. One touch integration from the iPhone, much like Twitter, Facebook, and Flicker from the camera roll would be great (though there is an easy hack).

 

We wish it would be easier to have open, collaborative boards. Right now, the only person who can add to a board is you, at least at first. You can invite other people to join a “group board” via email, but Pinterest does not have open boards that allow anyone with a board link to add content.

 

Here’s a look at our Pinboards: Phil Howard’s Tech + Politics board, and Nikki Usher’s boards on U.S. Newspapers. We welcome your thoughts…and send us images!

 

 

 

 

Nikki Usher is an assistant professor at the George Washington University’s School of Media and Public Affairs. Her project is Post Industrial News Spaces and Places with Columbia’s Tow Center on Digital Journalism. Phil Howard is a professor at the Central European University and the University of Washington. His project is a book on Political Power and the Internet of Things for Yale University Press.

 

Research

Knight Foundation joins The Tow Foundation as a sponsor for the initiative headed by Columbia University’s Tow Center for Digital Journalism

4

Knight Foundation joins The Tow Foundation as a sponsor for the initiative headed by Columbia University’s Tow Center for Digital Journalism

Tow Center program defends journalism from the threat of mass surveillance ” by Jennifer Henrichsen and Taylor Owen on Knight Blog 

NEW YORK – June 10, 2014 – The Journalism After Snowden initiative, a project of The Tow Center for Digital Journalism at Columbia University Graduate School of Journalism, will expand to further explore the role of journalism in the age of surveillance, thanks to new funding from the John S. and James L. Knight Foundation.

Journalism After Snowden will contribute high-quality conversations and research to the national debate around state surveillance and freedom of expression through a yearlong series of events, research projects and articles that will be published in coordination with the Columbia Journalism Review.

Generous funding from The Tow Foundation established the initiative earlier in the academic year. The initiative officially kicked off in January with a high-level panel of prominent journalists and First Amendment scholars who tackled digital privacy, state surveillance and the First Amendment rights of journalists.

Read more in the press release from the Knight Foundation.

Past Events

Glenn Greenwald Speaks | Join the Tow Center for an #AfterSnowden Talk in San Francisco on June 18, 2014

4

Join the Tow Center for an evening lecture with Glenn Greenwald, who will discuss the state of journalism today and his recent reporting on surveillance and national security issues, on June 18, 2014 at 7pm at the Nourse Theater in San Francisco.

In April 2014, Greenwald and his colleagues at the Guardian received the Pulitzer Prize for Public Service. Don’t miss Greenwald speak in-person as he fits all the pieces together, recounting his high-intensity eleven-day trip to Hong Kong, examining the broader implications of the surveillance detailed in his reporting, and revealing fresh information on the NSA’s unprecedented abuse of power with never-before-seen documents entrusted to him by Snowden himself.  Sponsored by: Haymarket Books, Center for Economic Research and Social Change, Glaser Progress Foundation, Tow Center for Digital Journalism – Columbia Journalism School, reserve your seat for Glenn Greenwald Speaks / Edward Snowden, the NSA, and the U.S. Surveillance State.

Please note: this is a ticketed event. Tickets are $4.75 each.  | Purchase Tickets

This event is part of Journalism After Snowden, a yearlong series of events, research projects and writing from the Tow Center for Digital Journalism in collaboration with the Columbia Journalism Review. For updates on Journalism After Snowden, follow the Tow Center on Twitter @TowCenter #AfterSnowden.

Journalism After Snowden is funded by The Tow Foundation and the John S. and James L. Knight Foundation.

Lauren Mack is the Research Associate at the Tow Center. Follow her on Twitter @lmack.

Past Events

Tow Center Launches Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output

7

Crediting is rare, there’s a huge gulf in how senior managers and newsdesks talk about it and there’s a significant reliance on news agencies for discovery and verification. These are some of the key takeaways of Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output published today by the Tow Center of Digital Journalism.

 

The aim of this research project was to provide the first comprehensive report about the use of user-generated content (UGC) among broadcast news channels. UGC being – for this report – photographs and videos captured by people unrelated to the newsroom, who would not describe themselves as professional journalists.

 

Some of the Principle Findings are:

  • UGC is used by news organizations daily and can produce stories that otherwise would not, or could not, be told. However, it is often used only when other imagery is not available. 40% of UGC on television was related to Syria.
  • There is a significant reliance on news agencies in terms of discovering and verifying UGC. The news agencies have different practices and standards in terms of how they work with UGC.
  • News organizations are poor at acknowledging when they are using UGC and worse at crediting the individuals responsible for capturing it. Our data showed that: 72 percent of UGC was not labeled or described as UGC and just 16 percent of UGC on TV had an onscreen credit.
  • News managers are often unaware of the complexities involved in the everyday work of discovering, verifying, and clearing rights for UGC. Consequently, staff in many newsrooms do not receive the training and support required to develop these skills.
  • Vicarious trauma is a real issue for journalists working with UGC every day – and it’s different from traditional newsroom trauma. Some newsrooms are aware of this – but many have no structured approach or policy in place to deal with it.
  • There is a fear amongst rights managers in newsrooms that a legal case could seriously impact the use of UGC by news organisations in the future

 

This research was designed to answer two key questions.  First, when and how is UGC used by broadcast news organizations, on air as well as online?  Second, does the integration of UGC into output cause any particular issues for news organizations? What are those issues and how do newsrooms handle them?

 

The work was completed in two phases. The first involved an in-depth, quantitative content analysis examining when and how eight international news broadcasters use UGC.  1,164 hours of TV output and 2,254 Web pages were analyzed here. The second was entirely qualitative and saw the team interview 64 news managers, editors, and journalists from 38 news organizations based in 24 countries across five continents. This report takes both phases to provide a detailed overview of the key findings.

 

The research provides the first concrete figures we have about the level of reliance on UGC by international news channels. It also explores six key issues that newsrooms face in terms of UGC. The report is designed around those six issues, meaning you can dip into any one particular issue:

1) Workflow – how is UGC discovered and verified? Do newsrooms do this themselves, and if so, which desk is responsible? Or is UGC ‘outsourced’ to news agencies?

2) Verification – are there systematic processes for verifying UGC? Is there a threshold that has to be reached before a piece of content can be used?

3) Permissions – how do newsrooms seek permissions? Do newsrooms understand the copyright implications around UGC?

4) Crediting – do newsrooms credit UGC?

5) Labeling – are newsrooms transparent about the types of UGC that they use in terms of who uploaded the UGC and whether they have a specific agenda?

6) Ethics and Responsibilities – how do newsrooms consider their responsibilities to uploaders, the audience and their own staff?

 

The full report can be viewed here.

Announcements, Events, Past Events, Research

Digital Security and Source Protection For Journalists: Research by Susan McGregor

3

EXECUTIVE SUMMARY

The law and technologies that govern the functioning of today’s digital communication systems have dramatically affected journalists’ ability to protect their sources.  This paper offers an overview of how these legal and technical systems developed, and how their intersection exposes all digital communications – not just those of journalists and their sources – to scrutiny. Strategies for reducing this exposure are explored, along with recommendations for individuals and organizations about how to address this pervasive issue.

 

DOWNLOAD THE PDF

GitBookCover

 

 

 



Order a (bound) printed copy.

Comments, questions & contributions are welcome on the version-controlled text, available as a GitBook here:

http://susanemcg.gitbooks.io/digital-security-for-journalists/

DIGITAL SECURITY AND SOURCE PROTECTION FOR JOURNALISTS

Preamble

Digital Security for Journalists A 21st Century Imperative

The Law: Security and Privacy in Context

The Technology: Understanding the Infrastructure of Digital Communications

The Strategies: Understanding the Infrastructure of Digital Communications

Looking Ahead

Footnotes

 

Research

Diversity in the Robot Reporter Newsroom

2

bots

The Associated Press recently announced a big new hire: A robot reporter from Automated Insights (AI) would be employed to write up to 4,400 earnings report stories per quarter. Last year, that same automated writing software produced over 300 million stories — that’s some serious scale from a single algorithmic entity.

So what happens to media diversity in the face of massive automated content production platforms like the one Automated Insights created? Despite the fact that we’ve done pretty abysmally at incorporating a balance of minority and gender perspectives in the news media, I think we’d all like to believe that by including diverse perspectives in the reporting and editing of news we fly closer to the truth. A silver lining to the newspaper industry crash has been a profusion of smaller, more nimble media outlets, allowing for far more variability and diversity in the ideas that we’re exposed to.

Of course software has biases and although the basic anatomy of robot journalists is comparable, there are variations within and amongst different systems such as the style and tone that’s produced as well as the editorial criteria that are coded into the systems. Algorithms are the product of a range of human choices including various criteria, parameters, or training data that can also pass along inherited, systematic biases. So while a robot reporter offers the promise of scale (and of reducing costs), we need to consider an over-reliance on any one single automated system. For the sake of media diversity the one bot needs to fork itself and become 100,000.

We saw this in microcosm unfold over the last week. The @wikiparliament bot was launched in the UK to monitor edits to Wikipedia from IP addresses within parliament (a form of transparency and accountability for who was editing what). Within days it had been mimed by the @congressedits bot which was set up to monitor the U.S. Congress. What was particularly interesting about @congressedits though is that it was open sourced by creator Ed Summers. And that allowed the bot to quickly spread and be adapted for different jurisdictions like Australia, Canada, France, Sweden, Chile, Germany, and even Russia.

Tailoring a bot for different countries is just one (relatively simple) form of adaptation, but I think diversifying bots for different editorial perspectives could similarly benefit from a platform. I would propose that we need to build an open-source news bot architecture that different news and journalistic organizations could use as a scaffolding to encode their own editorial intents, newsworthiness criteria, parameters, data sets, ranking algorithms, cultures, and souls into. By creating a flexible platform as an underlying starting point, the automated media ecology could adapt and diversify faster and into new domains or applications.

Such a platform would also enable the expansion of bots oriented towards different journalistic tasks. A lot of the news and information bots you find on social media these days are parrots of various ilks: they aggregate content on a particular topical niche, like @BadBluePrep, @FintechBot and @CelebNewsBot or for a geographical area like @North_GA, or they simply retweet other accounts based on some trigger words. Some of the more sophisticated bots do look at data feeds to generate novel insights, like @treasuryio or @mediagalleries, but there’s so much more that could be done if we had a flexible bot platform.

For instance we might consider building bots that act as information collectors and solicitors, moving away from pure content production to content acquisition. This isn’t so far off really. Researchers at IBM have been working on this for a couple years already and have already build a prototype system that “automatically identifies and ask[s] targeted strangers on Twitter for desired information.” The technology is oriented towards collecting accurate and up-to-date information from specific situations where crowd information may be valuable. It’s relatively easy to imagine an automated news bot being launched after a major news event to identify and solicit information, facts, or photos from people most likely nearby or involved in the event. In another related project the same group at IBM has been developing technology to identify people on Twitter that are more likely to propagate (Read: Retweet) information relating to public safety news alerts. Essentially they grease the gears of social dissemination by identifying just the right people for a given topic and at a particular time who are most likely to further share the information.

There are tons of applications for news bots just waiting for journalists to build them: factchecking, information gathering, network bridging, audience development etc. etc. Robot journalists don’t just have to be reporters. They can be editors, or even (hush) work on the business side.

What I think we don’t want to end up with is the Facebook or Google of robot reporting: “one algorithm to rule them all”. It’s great that the Associated Press is exploring the use of these technologies to scale up their content creation, but down the line when the use of writing algorithms extends far beyond earnings reports, utilizing only one platform may ultimately lead to homogenization and frustrate attempts to build a diverse media sphere. Instead the world that we need to actively create is one where there are thousands of artisanal news bots serving communities and variegated audiences, each crafted to fit a particular context and perhaps with a unique editorial intent. Having an open source platform would help enable that, and offer possibilities to plug in and explore a host of new applications for bots as well.

Research

The Anatomy of a Robot Journalist

5

Given that an entire afternoon was dedicated to a “Robot Journalism Bootcamp” at the Global Editors Network Summit this week, it’s probably safe to say that automated journalism has finally gone mainstream — hey it’s only taken close to 40 years since the first story writing algorithm was created at Yale. But there are still lots of ethical questions and debates that we need to sort out, from source transparency to corrections policies for bots. Part of that hinges on exactly how these auto-writing algorithms work: What are their limitations and how might we design them to be more value-sensitive to journalism?

Despite the proprietary nature of most robot journalists, the great thing about patents is that they’re public. And patents have been granted to several major players in the robo-journalism space already, including Narrative ScienceAutomated Insights, and Yseop, making their algorithms just a little bit less opaque in terms of how they operate. More patents are in the pipeline from both heavy weights like CBS Interactive, and start-ups like Fantasy Journalist. So how does a robo-writer from Narrative Science really work?

Every robot journalist first needs to ingest a bunch of data. Data rich domains like weather were some of the first to have practical natural language generation systems. Now we’re seeing a lot of robot journalism applied to sports and finance — domains where the data can be standardized and made fairly clean. The development of sensor journalism may provide entirely new troves of data for producing automated stories. Key here is having clean and comprehensive data, so if you’re working in a domain that’s still stuck with PDFs or sparse access, the robots haven’t gotten there yet.

After data is read in by the algorithm the next step is to compute interesting or newsworthy features from the data. Basically the algorithm is trying to figure out the most critical aspects of an event, like a sports game. It has newsworthiness criteria built into its statistics. So for example, it looks for surprising statistical deviations like minimums, maximums, or outliers, big swings and changes in a value, violations of an expectation, a threshold being crossed, or a substantial change in a predictive model. “Any feature the value of which deviates significantly from prior expectation, whether the source of that expectation is due to a local computation or from an external source, is interesting by virtue of that deviation from expectation,” the Narrative Science patent reads. So for a baseball game the algorithm computes “win probability” after every play. If win probability has a big delta in-between two plays it probably means something important just happened and the algorithm puts that on a list of events that might be worthy of inclusion in the final story.

Once some interesting features have been identified, angles are then selected from a pre-authored library. Angles are explanatory or narrative structures that provide coherence to the overall story. Basically they are patterns of events, circumstances, entities, and their features. An angle for a sports story might be “back-and-forth horserace”, “heroic individual performance”, “strong team effort”, or “came out of a slump”. Certain angles are triggered according to the presence of certain derived features (from the previous step). Each angle is given an importance value from 1 to 10 which is then used to rank that angle against all of the other proposed angles.

Once the angles have been determined and ordered they are linked to specific story points, which connect back to individual pieces of data like names of players or specific numeric values like score. Story points can also be chosen and prioritized to account for personal interests such as home team players. These points can then be augmented with additional factual content drawn from internet databases such as where a player is from, or a quote or picture of them.

The last step the robot journalist takes is natural language generation, which for the Narrative Science system is done by recursively traversing all of the angle and story point representations and using phrasal generation routines to generate and splice together the actual English text. This is probably by far the most straightforward aspect of the entire pipeline — it’s pretty much just fancy templates.

So, there you have it, the pipeline for a robot journalist: (1) ingest data, (2) compute newsworthy aspects of the data, (3) identify relevant angles and prioritize them, (4) link angles to story points, and (5) generate the output text.

Obviously there can be variations to this basic pipeline as well. Automated insights for example uses randomization to provide variability in output stories and also incorporates a more sophisticated use of narrative tones that can be used to generate text. Based on a desired tone, different text might be generated to adhere to an apathetic, confident, pessimistic, or enthusiastic tone. YSeop on the other hand uses techniques for augmenting templates with metadata so that they’re more flexible. This allows templates to for instance conjugate verbs depending on the data being used. A post generation analyzer (you might call it a robot editor) from YSeop further improves the style of a written text by looking for repeated words and substituting synonyms or alternate words.

From my reading, I’d have to say that the Narrative Science patent seems to be the most informed by journalism. It stresses the notion of newsworthiness and editorial in crafting a narrative. But that’s not to say that the stylistic innovations from Automated Insights, and template flexibility of YSeop aren’t important. What still seems to be lacking though is a broader sense of newsworthiness besides “deviance” in these algorithms. Harcup and O’Neill identified 10 modern newsworthiness values, each of which we might make an attempt at mimicking in code: reference to the power elite, reference to celebrities, entertainment, surprise, bad news, good news, magnitude (i.e. significance to a large number of people), cultural relevance to audience, follow-up, and newspaper agenda. How might robot journalists evolve when they have a fuller palette of editorial intents available to them?

Research

Sensors and Journalism, A Major Report

4

Today, the Tow Center is publishing a report about the uses, the opportunities and the risks of sensors and journalism.

We’re describing the landscape where sensors and journalism combine, and continues on to define necessary terms for understanding this area of research. Reporters are using sensors in an era when the rapid development of technology is moving data into the mainstream of journalism. The increasing ubiquity of sensors, their increasing capability and accessibility are on the supply side, while investigative reporters, computer aided reporters and journalist/technologists are on the demand side.

We are including drones in the field of sensing, partially because of the amount of attention they’re currently receiving, and partially because of their potential to extend human sight far beyond our bodily bounds.

While recent commentaries about journalistic sensing have focused just on sensors that journalists have built themselves (or commissioned), our definition also includes journalistic uses of data from sensor systems that are not controlled by the reporters themselves. We have excluded opinion polling, information gathered by humans’ five senses, and data produced by monitoring computer processes like bit-torrent networks.

That said, our description should not be used to separate sensor-based journalism from other reporting processes. The intellectual tools we discuss may be useful for many data-intensive projects, and sensor reporting needs to be integrated with traditional forms.

The introduction also includes a chapter by scholar Charles Berret, who has written a sensor history, charting humanity’s efforts to extend the reach of our five natural senses. It starts with the scales unearthed by archeologists, the Neolithic markers like Stonehenge, and the agricultural tools from the Nile region. Berret notes that, in the 1500’s, the astronomer Tycho Brahe built a data network using the post, which compiled sensor readings to draw the most accurate and comprehensive star maps of his time. Cameras and sound sensors came in the nineteenth century, a moment ‘when mechanical sensors were first treated with greater credibility than the human observer.’ The history outlined here only goes as far as the first half of the twentieth century, but within our time period it covers early medical sensors in the form of René Laennec’s stethoscope and Willem Einthoven’s electrocardiogram, and meteorologists’ use of doppler radar.
The introduction finishes by outlining the characteristics of sensors that make them useful—or not—and helping readers identify what elements of the world can be sensed.

Case Studies

The report’s second section, containing case studies, examines seven projects that used sensors for journalism. Each study includes the story of what happened and then offers analysis in which we identify its distinctive or noteworthy elements, as well as the lessons journalists may take from the projects. The case studies start to show distinct types of sensor uses that suit different journalistic goals. The first type is when investigative reporters (environmental reporters in these two examples) design a sensing process to collect data with the intent of testing a hypothesis. They used relatively mature professional equipment and consulted with experts. They had justifiable confidence in their data, even though their processes were quite different from how scientists work when intending to publish in a peer-reviewed journal, or when doing work on behalf of regulators.
Another type of sensor use by journalists is accessing data from municipal sensor systems. The Sun Sentinel won a Pulitzer Prize by using tollgate data to investigate widespread speeding by off duty Florida police. The Washington Post published an extensive explanatory feature based on data from a network of microphones installed by the city law enforcement. A separate type of journalistic sensing involves DIY hardware development. At the moment, these projects value participation and informal science education. At the moment the equipment they use is unlikely to produce data that can be heavily relied upon in legal or health settings. However, the makers in this part of the field see great long-term potential, inspired by open-source software, a phenomenon that has returned great value to newsrooms. In case studies, we have also added analysis of the U.S. drone journalism industry, as it stands right now. At the high end, a small number of organizations are using footage shot by specialist pilots of professional cinematography drones. In the middle, enterprising news industry employees are experimenting with pro-am equipment costing hundreds, not thousands of dollars. Mainstream media organizations are also sourcing drone footage shot by hobbyists. All of this activity is proceeding despite a rapidly changing, highly contested regulatory environment.

Laws, Ethics, Sensors and Journalism

For the next section, about laws and ethics for reporting with sensors, we recruited 12 experts to write a chapter each. They applied their considerable knowledge and ability from professions in law, technology, ethics, academic research, and the sciences. In the individual essays, each helps identify and navigate the key issues that arise when their field intersects with sensors and journalism.

The authors who address the privacy and surveillance issues write that these are early days for the field. The courts have thus far dealt with consent to record, defamation and false light in the context of cameras and microphones. The potential for journalists to break those laws using different types of sensors certainly exists, and if legal claims are made courts will likely consider the ethical standards that emerge in these next few years of sensor reporting. The field is emerging even as relationship between newsrooms and their audiences transforms. Our authors suggest that journalists should involve their communities as they negotiate the tricky questions of who owns and controls personal data from sensors.

Newsroom managers who have staff making or acquiring hardware should also acquaint themselves with the basics of open-source licensing. Often, journalists who design their own sensing systems will lean towards sharing their work under open-source principles, but this may involve legal liability if hardware goes wrong and causes physical damage. The risks are avoidable, however, as the article by Diana Cooper makes clear.

Still in the realm of legal issues for hardware makers, this current phase of rapid, widespread DIY development is moving a lot faster than The Federal Communications Commission. The FCC requires that any electronic device that might emit radio interference be tested and approved before marketing. However, that regime did not consider many conceivable journalistic uses of custom sensors produced in small batches.

If and when sensing, in particular drone use, becomes a widespread journalistic practice, human error is likely. Serious mistakes will attract negligence claims, following in a tradition codified by The Digest of Justinian in the mid-sixth century. It contained a section on ‘Those Who Pour or Throw Things Out of Buildings’. The laws extended to falling things, as well. Despite this history, the novelty of drone journalism will make insurance tricky and expensive until the industry has more data on which to model risk profiles.

The last group within the legal and ethical section concerns truth and accuracy. Sensors may seduce journalists into thinking their output is objective and free from the errors inherent in human testimony. That is a risky belief. We have drawn on the expertise of the EPA to show how reporters might design a sensor-based data collection process to improve their accuracy. For journalists, the concept of ‘ground-truthing’—supplementing sensor information with human input—will be valuable. It should help introduce nuance, guard against mistakes and treat fairly the people at the heart of our stories.

For the final section, we have distilled this report into a set of recommendations, including groups of strategic moves, good work practices, and efforts the industry may collectively consider.

Strategic Recommendations

  • Identify and cultivate sensor sources for the beats you’ve prioritized.
  • Put a watching brief on open source sensing systems.
  • News nerds should do hardware too.

Work Practice Recommendations

  • Before sensing, articulate your hypothesis.
  • Work with experts on complex stories.
  • Understand the entire pipeline for your story’s sensor data.
  • Combine sensing with traditional reporting.

Recommendations for the industry, collectively

  • Journalists have an opportunity and a responsibility to report on sensor systems.
  • Advocate for access to data from publicly funded sensor systems.

However, even though we have documented significant amounts of journalistic sensing here, we hope that this report will need updates as newsrooms keep combining their reporters’ ideas with new sensing opportunities.

Research

The Art and Science of Data-Driven Journalism

5

Journalists have been using data in their stories for as long as the profession has existed. A revolution in computing in the 20th century created opportunities for data integration into investigations, as journalists began to bring technology into their work. In the 21st century, a revolution in connectivity is leading the media toward new horizons. The Internet, cloud computing, agile development, mobile devices, and open source software have transformed the practice of journalism, leading to the emergence of a new term: data journalism.

Although journalists have been using data in their stories for as long as they have been engaged in reporting, data journalism is more than traditional journalism with more data. Decades after early pioneers successfully applied computer-assisted reporting and social science to investigative journalism, journalists are creating news apps and interactive features that help people understand data, explore it, and act upon the insights derived from it. New business models are emerging in which data is a raw material for profit, impact, and insight, co-created with an audience that was formerly reduced to passive consumption. Journalists around the world are grappling with the excitement and the challenge of telling compelling stories by harnessing the vast quantity of data that our increasingly networked lives, devices, businesses, and governments produce every day.

While the potential of data journalism is immense, the pitfalls and challenges to its adoption throughout the media are similarly significant, from digital literacy to competition for scarce resources in newsrooms. Global threats to press freedom, digital security, and limited access to data create difficult working conditions for journalists in many countries. A combination of peer-to-peer learning, mentorship, online training, open data initiatives, and new programs at journalism schools rising to the challenge, however, offer reasons to be optimistic about more journalists learning to treat data as a source.

Following is a list of the 14 findings, recommendations and predictions explored in detail in the full report, which can be downloaded here (PDF).

1) Data will become even more of a strategic resource for media.

2) Better tools will emerge that democratize data skills.

3) News apps will explode as a primary way for people to consume data journalism.

4) Being digital first means being data-centric and mobile-friendly.

5) Expect more robojournalism, but know that human relationships and storytelling still matter.

6) More journalists will need to study the social sciences and statistics.

7) There will be higher standards for accuracy and corrections.

8) Competency in security and data protection will become more important.

9) Audiences will demand more transparency on reader data collection and use.

10) Conflicts over public records, data scraping, and ethics will surely arise.

11) Collaborate with libraries and universities as archives, hosts, and educators.

12) Expect data-driven personalization and predictive news in wearable interfaces..

13) More diverse newsrooms will produce better data journalism.

14) Be mindful of data-ism and bad data. Embrace skepticism.

Past Events

Tow Center Launches Three Tow Reports on UGC, Sensors, and Data-driven Journalism

4

The Tow Center team is thrilled to launch three new research reports.

Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output, written by Claire Wardle and Sam Dubberley is the result of a major global study into the integration of User Generated Content (UGC) in news output in television broadcasts and online.

Sensors and Journalism, led by Fergus Pitt and including a wide range of contributors, explores how recent advances in sensor networks, citizen science, unmanned vehicles and community-based data collection can be used by a new generation of sensor journalist to move from data analysis to data collection.  The report critically reviews recent prominent uses of sensors by journalists, explores the ethics and legal implications of sensing for journalism, and makes a series of recommendations for how sensors can be integrated into newsrooms.

The Art and Science of Data-Driven Journalism, by Alex Howard provides a recent history and current best-practices in the space of data and computational journalism, based on dozens of interviews with industry leaders.

This research was made possible by grants from the Knight and Tow Foundations.  More details on the Tow Center research program can be found here.

All three reports will be launched at today’s Tow Center conference Quantifying Journalism: Data, Metrics, and Computation which will also include panels on newsroom metrics, data journalism, and sensors as well as talks by a range of Tow Fellow.  Further information on today’s conference can be found here.  All sessions will be broadcast live starting at 9am (EST) here.