Past Events

Behind “Losing Ground” II: Q&A with Scott Klein and Al Shaw of ProPublica

1

We  spoke with Scott Klein, assistant managing editor, and Al Shaw, news application developer at ProPublica about the editorial decisions and satellite imagery used in “Losing Ground.”  To read more about the science behind the imagery, read a case study by Fergus Pitt here.

Smitha: Where did the idea for “Losing Ground” come from?

Scott: Mapping is an important part of what we do. We hadn’t done all that much with satellite—maybe an image or two.

I attended the Tow Center’s Remote Sensing Seminar, a two-day conference, and I started thinking, people have used this in news, but no one’s used this to do an investigative story. I started talking to Fergus Pitt about what makes sense for an investigative story and among the things we thought would be interesting was using satellites–using satellites not just to illustrate a story but also as a key way of analyzing information. So Fergus and the Tow Center very generously offered to help us.

Another thing that happened is that we had been working in the Mississippi river delta for a long time, and we knew that there was a big story happening in Southeast Louisiana that was not known outside the delta which was this soil erosion, subsidence, land loss issue.

Al: Bob Marshall at the Lens had been writing about this extensively. I had also read a book called Bayou Farewell, which is a fantastic book about the people who live outside the levees and watching the land disappear, so there were a lot of different inspirations.

Scott: One of the things we wanted to look at with satellites was the Mississippi delta. In our reporting, we found that NOAA, the National Oceanic and Atmospheric Administration, maintained a list of places to remove from maps because of climate change and soil erosion. These are settlements, marshes, bays and rivers and things like that that are now just open water, so that NOAA knows they don’t have to provide weather alerts there, because you don’t have to provide weather alerts for places that don’t exist.

So this is fascinating for us, and we had seen that nobody had covered this before. This really inspired us to start asking the question, what’s being lost?

If you drew a circle around New Orleans, everyone inside the circle knows that this is happening, everyone outside the circle doesn’t.

We said this is a story we want them all to hear, we really think there’s a compelling reason to use satellites to analyze the information and then to use the satellite images to tell the story and let them see what’s being lost, both from a 30,000 foot view, almost literally, all the way down to personal stories.

And that’s when we started thinking, who has done really compelling work in southeast Louisiana? Who has the contacts and the means and the understanding to be able to go and tell personal stories of the people whose settlements, whose culture, whose livelihoods are being destroyed?

Al: Bob Marshall has grown up and spent his entire life down there.

He actually has a boat and knows these bayous by heart. When we first started talking about this, I went down there and asked him, “Say we had satellite imagery— what are the areas that you would like to see most illuminated?” and we kind of went through a map and he said, “This is one of the most important places.” We actually started drawing boxes around places, and that’s how the reporting built up from there.

Smitha: So you had a lot of local knowledge from Bob to rely on.

Scott: That was really a key thing that we didn’t have here in New York. Again, thanks to the Tow Center, Al and some other folks  got some pretty intensive training from a studio that does satellite work as well as from Lela Prasad, a woman who used to work for NASA.

It was sort of like, when the people in the Matrix are asked, if each of you had to buy a helicopter: just over three days, there was an incredibly intensive knowledge exchange and Lela Prasad taught us exactly how to understand and work with satellite images.

And that was when things really got started.

Smitha: How easy was it to pick up this technology? Was it challenging, or was there a steep learning curve?

Al: It is somewhat of a steep learning curve. When you download these images right from NASA, they don’t look like anything really.   And the tools are still somewhat rudimentary. You kind of have to cobble stuff together. We had to actually write a bunch of new software.

Scott: Now there are also a number of satellites with different capabilities and different caveats that you have to understand.

Al: The sediment kind of looked indistinguishable from the land, so that didn’t tell our story that well. We had to color correct the images because satellites don’t make water look like what we think of as water, like bright blue, or bright green for land. The images are also shot on different days, shot from different angles, there’s cloud cover. To turn that into the big image you see on the site took a fair amount of processing.

The 1922 map, that’s a United States Geological Service (USGS) map we got from Louisiana State University, and we geo-referenced it, which basically means adding geographic data to their standard image in order to line up with the 2014 land set satellite image.

Smitha: So there was a lot of stitching together geospatial information from very different sources as well.

Al: In the intro slideshow, there are overlays from the levees, the canals, oil and gas infrastructure, and pipelines, which came from different sources in USGS, from government sources in Louisiana, from a dissertation that an LSU student had done. So a lot of different sources went into it.

Smitha: One thing that really struck me about the way the piece is put together is that it’s very simple. Even though there is a lot of complex information, it feels very easy to navigate.   How did you approach the issue of usability, the user’s experience of the site?

Scott: From the beginning we knew we wanted to make something where the maps were the main kind of metaphor. The maps were going to be the biggest thing on the page, the central spine of the interaction.

A few weeks ago we did some semi-formal user testing. We put a tweet out and sent an e-mail asking for volunteers to come and take a look at this. We watched them navigate through a draft of the app on a big screen and we asked them questions. It taught us a lot, and we cut out a lot of stuff. There was a whole different navigation metaphor that we left out.

Smitha: What are your hopes for the policy implications of this piece?

Scott: Our job as journalists is to inform the debate and give people as much information as they need to make really good decisions. Our hope for the policy piece is that we inform the debate in Louisiana. More importantly, I think focusing national attention on this will bring it needed scrutiny.

Smitha: What challenges are unique to working with satellite imagery?

Al: The raw size of the data is a big one. We went through tens of gigabytes of satellite imagery and other sources and being able to chew through that is a big barrier itself.

Smitha: The collaboration with The Lens, based out of New Orleans—-is this something you do a lot, working closely with local papers?

Scott: We do it very extensively. It’s a long tradition for us to work with local newsrooms.

Smitha: Has the cost of working on an interactive piece like this been prohibitive at all, or has it been a worthy investment?

Scott: It has absolutely been a worthy investment. The only costs have been staff time, travel—the imagery is all from the government.

Smitha: Do you have any projects similar to “Losing Ground” currently in the works or in gestation?

Scott: We do! We can’t talk about it. And this isn’t even the end of the Louisiana project, so we will have more to come.

 

 

Research

Sensors and Journalism: ProPublica, Satellites and The Shrinking Louisiana Coast

1

Two months before the programmers journalists at ProPublica would be ready to go live with an impressive news app illustrating the huge loss of land along the Louisiana coast line, the development team gathered in their conference room above the financial district in Manhattan.

This was partly a show-off session and partly a review. Journalist-developers Brian Jacobs and Al Shaw pulled a browser up onto the glossy 46-inch screen and loaded up their latest designs. At first it appears to be simple satellite photography, spanning about 20,000 square miles, but the elegance hides a complicated process to pull layers of meaning from many rich data sets.

At the heart of the story is the fact that the Louisiana coastline loses land at a rate equivalent to a football field each hour. That comes to 16 square miles per year. The land south of New Orleans has always been low-lying, but since the Army Corps of Engineers built levees along the Mississippi after the huge 1927 floods, the delta has been losing ground. Previously, the river carried sediment down and deposited it to gradually build up dry land throughout the delta. The same levees that protect upstream communities also block that sediment from reaching the upstream river and floating down to become Louisiana coastline. Environmental researchers say that the energy industry’s canal-dredging and well-drilling have accelerated natural erosion. Together, the constricted river and the oil extraction have exacerbated the effect of sea level rises from climate change.

The loss of ground endangers people: The dry land used to provide protection to New Orleans’ people and businesses, because when storms like Hurricane Katrina sweep in from the Gulf Coast, they lose power as they move from the water to land. It’s therefore crucial to have a wide buffer between the sea and the city. Now, with 2,000 fewer acres of protective land, the state will have to spend more money building tougher, higher walls, flood insurance will be more costly, infrastructure could break and the people inside those walls risk death and injury at much higher rates. If the land-loss isn’t slowed the costs will get higher.

Satellites Clearly Show The Story

For this story, Al Shaw’s goal was to illustrate the scale and severity of the problem. Print journalists have written extensively on the story. But the forty years worth of remote sensing data available from NASA’s Landsat satellites helped the ProPublica journalists to show the story with immediate power and clarity. They processed Landsat 8 sensing data themselves and drew on the US Geological Survey’s interpretations of data from earlier Landsat craft.

The project combines a high-level view with eight zoomed-in case studies. The scene of land, marsh and water known locally as the Texaco Canals forms one of the most dramatic examples. Starting with data collected from aerial photography in the 1950s and ending with 2012 satellite data, the layered maps show how the canals sliced up the marshlands and the relocated soil stopped sediments replenishing the land. The result is an area that starts mostly as land, and ends mostly as open water. Contemporary and archival photos complement the birds-eye view with a human level perspective.

This is Satellite Sensing’s Learning Curve

At this point, we need to reveal a conflict of interest. In February 2014 The Tow Center provided training to four journalists from ProPublica. Lela Prashad, a remote sensing specialist who has worked with NASA led a two day workshop covering the fundamental physics of satellite sensing, a briefing on the different satellite types and qualities, where to find satellite data and the basics of how to process it. ProPublica news apps director Scott Klein had been at a Tow Center journalistic sensing conference eight months earlier to see a presentation by Arlene Ducao and Ilias Koen on their satellite infra-red maps of Jakarta and saw that ProPublica’s innovative newsroom might be able to use remote sensing to cover some of their environmental stories in new ways.

The ProPublica journalists, to produce this work, learnt the physics and applications of remote sensing technology. The earth’s surface pushes energy out into the atmosphere and space – some is an immediate reflection of the sun’s rays, some is energy absorbed earlier. Human sources like city lights and industrial activity also produce emissions. Energy waves range from the high-frequency, short wavelength gamma-rays and x-rays, through ultra-violet then into the visible spectrum (what human eyes sense) and on towards the longer wavelengths of infra-red, microwave and radio.
Satellites flown by NASA (and increasing numbers of private companies) point cameras towards Earth taking pictures of the energy which passes through the atmosphere. Those are the ultraviolet, visible and infrared bands. (The various generations of satellites have had different capabilities. As they develop, they have recorded Earth with more detail and pass overhead more frequently.)
Those scenes, when processed, can reveal with great accuracy the materials that form Earth’s surface. The exact hue of each pixel represents the specific substance below. Geologists needing to identify types of rock take advantage of the fact that, for example, sandstone reflects a different combination of energy waves than granite. Food security analysts can assess the moisture, and therefore the health, of a country’s wheat crop – helping them predict shortages (and speculators predict pricing). ProPublica is showing the Louisiana coast-line change over time from dry land, to marsh, to open water.

The US Geological Survey (USGS) makes its data available through a series of free online catalogues. Registered users can nominate the area they are interested in, pick a series of dates and download image files which include all the available energy bands. Crucially, those image files include the Geographic Information Systems (GIS) meta-data that allow the journalists to precisely match the pixels in data files to known locations.

 

How The Developers Built it

Brian Jacobs learned how to to reproduce and combine the information in an accessible form for ProPublica’s online audience. The opening scene of the app has eight layers. The top one uses scanned copy of a 1922 survey map owned by the USGS and scanned by the Louisiana State University library. Jacobs pulled it into his mapping software to match the geographic features with GIS location data and used photoshop to prepare it for online display; cutting out the water and normalizing the color.

The bottom layer displays the 2014 coastline – stitched together from six Landsat 8 tiles, including many steps of processing. Jacobs picked out images from satellite passes when the skies were free from cloud cover. After pulling in the image tiles from the infrared and true-color bands and merging them together, Jacobs normalized the distortions and color differences so the separate images would mosaic consistently.
Working with the command-line tools, GDAL (a geospatial library) and ImageMagick (an image editing suite), he prepared them for online display. Pictures of the Earth’s curved surface need to be slightly warped to make sense on a flat images, the types of warps are called projections. The raw USGS images come in the space industry’s WGS84 projection standard, but the web mostly uses Mercator. (Here’s Wikipedia‘s explanation, and xkcd’s cartoon version.)

Researchers who work with remote sensing have a specific language and sets of practices for how they treat color in their visualizations. The ProPublica journalists adopted some of those practices, but also needed to produce their work for a lay audience. So, although the features on ProPublica’s maps are easily recognizable, they are not what’s known as ‘true color’. When viewers look closely at the bottom layer, it’s clear that these are not simply aerial photographs. In comparison to satellite photography displayed via Google maps, the ProPublica layer has a much sharper contrast between land and water. The green pixels showing land are vibrant, while the blue sections showing water are rich, deep blues.

The color palette is, in fact, a combination of two sets of satellite data: The water pixels are interpreted from Landsat’s infrared and green bands, while the land pixels come from Landsat ‘true color’ red, green and blue bands, with extra sharpening from the panchromatic band (panchromatic appears as shades of gray, but can be interpreted to color). At 30m/pixel, Landsat’s color bands are lower resolution than its 15m/pixel panchromatic band.

Step By Step Frames

This is a detail of a single tile, in a single band, but color-corrected.

A detail of a single tile in the true-color space, somewhat color-corrected.

At this point, the developers have stitched together multiple tiles of their area, and combined images from many wavelength bands, a process known as pansharpening.

At this point, the developers have stitched together multiple tiles of their area, and combined images from true-color and panchromatic bands, a process known as pansharpening.

The water mask

This is the mask that ProPublica produced from the near-infrared and green bands. It’s used to make a distinction between the areas of land and water.

Pansharpened, zoomed

This frame shows the final result of ProPublica’s satellite image processing. At this point the images have been pansharpened and the water layer has been included from the near IR and green band.

The final view that ProPublica showed their users.

This shows the underlay for ProPublica’s case studies. This land pixels combine true color bands and the high-resolution panchromatic band. The water pixels come from the infrared and green bands.

Google satellite view, zoomed

The same area, as shown in Google maps’ satellite view. Mostly, it uses true-color satellite imagery for land, and bathymetry data for water.

A detail of the USGS map. Each color represents a period of land loss. ProPublica extracted each period to separate layers in their interactive map.

A detail of a USGS map of the region. Each color represents a period of land loss. ProPublica extracted each the pixels of period to separate layers into an interactive map.

The other layers come from a range of sources. In the opening scene, viewers can bring up overlays of the human building associated with the oil and gas industry: including the wells and pipelines, the dredged canals and the levees that protect the homes and businesses around the coastline.

When users zoom in to one of ProPublica’s seven case studies, they can scrub through another 16 layers. Each one shows a slice of time when the coast line receded. A layer of olive green pixels indicates the remaining dry land. The data for these 16 layers came from researchers at the US Geological Survey (USGS) who had analyzed 37 years of satellite data combined with historical surveys and mid-century aerial photography. ProPublica’s worked with John Barras at the USGS, a specialist who could draw on years of his own work and decades of published studies. He handed over a large geo-referenced image file exported from the software suite ERDAS Imagine. Each period’s land loss was rendered in a separate color

The Amount of Time, Skill and Effort

Scott Klein described this project as one of ProPublica’s larger ones, but not abnormally so. His team of developer-journalists release around twelve projects of this size each year, as well as producing smaller pieces to accompany work by the rest of ProPublica’s newsroom.

For six months, the project was a major focus for Al Shaw and Brian Jacobs. Both Shaw and Jacobs are young, highly skilled and prized developer-journalists. Al Shaw has a BA, and is also highly active in the New York’s hacks/hackers community. Brian Jacobs is a Knight-Mozilla Fellow working at ProPublica, with a background that includes a year at MIT’s Senseable City Lab and four years as a UI designer at Azavea, a Philadelphia based geospatial software company. They worked on it close to full time, with oversight from their director Scott Klein. During the later stages, ProPublica’s design director David Sleight advised on the interaction design, hired a freelance illustrator and led user-testing. ProPublica partnered with The Lens, a non-profit public-interest newsroom based in New Orleans, whose environmental reporter Bob Marshall wrote the text. The Lens also sourced three freelance photo researchers and photographers for ProPublica.

ProPublica Have Shared Their Tools

To produce the work, ProPublica had to extend the ‘simple-tiles’ software library they use to publish maps – a process that soaked up months of developer time. They’ve now open-sourced that code – a move which can radically speed up the development process for other newsrooms who have skilled developers. In common with most news organizations, the interactive maps ProPublica has historically published have used vector graphics: which display as outlines of relatively simple geographic and city features like states, roads and building footprints. This project renders raster (aka bitmap) images, the kind of file used for complicated or very detailed visual information.

ProPublica’s explanation about their update to simple-tiles is available on their blog, and the code is available via github.

Their Launch

ProPublica put the app live on the 28th of August, exactly nine years after the Hurricane Katrina made New Orleans’ mayor order the city’s first ever mandatory evacuation.

How It's Made, Research

Hyper-compensation: Ted Nelson and the impact of journalism

1

NewsLynx is a Tow Center research project and platform aimed at better understanding the impact of news. It is conducted by Tow Fellows Brian Abelson, Stijn DeBrouwere & Michael Keller.

“If you want to make an apple pie from scratch, you must first invent the universe.” — Carl Sagan

Before you can begin to measure impact, you need to first know who’s talking about you. While analytics platforms provide referrers, social media sites track reposts, and media monitoring tools follow mentions, these services are often incomplete and come with a price. Why is it that, on the internet — the most interconnected medium in history — tracking linkages between content is so difficult?

The simple answer is that the web wasn’t built to be *fully* connected, per se. It’s an idiosyncratic, labyrinthine garden of forking paths with no way to navigate from one page to pages that reference it.

We’ve spent the last few months thinking about and building an analytics platform called NewsLynx which aims to help newsrooms better capture the quantitative and qualitative effects of their work. Many of our features are aimed at giving newsrooms a better sense of who is talking about their work. This seemingly simple feature, to understand the links among web pages, has taken up the majority of our time. This obstacle turns out to be a shortcoming in the fundamental architecture of the web. But without it, however, the web might never have succeeded.

The creator of the web, Tim Berners Lee didn’t provide a means for contextual links in the specification for HTML. The world wide web wasn’t the only idea for networking computers, however. Over 50 years ago an early figure in computing had a different vision of the web – a vision that would have made the construction of NewsLynx a lot easier today, if not completely unnecessary.

Around 1960, a man named Ted Nelson came up with an idea for a structure of linking pieces of information in a two-way fashion. Whereas links on the web today just point one way — to the place you want to go — pages on Nelson’s internet would have a “What links here?” capability so would know all the websites that point to your page.

And if you were dreaming up the ideal information web, this structure makes complete sense: why not make the most connections possible? As Borges writes, “I thought of a labyrinth of labyrinths, of one sinuous spreading labyrinth that would encompass the past and the future and in some way involve the stars.”

Nelson called his project Xanadu, but it had the misfortune of being both extremely ahead of its time and incredibly late to the game. Project Xanadu’s first and somewhat cryptic release debuted this year: over 50 years after it was first conceived.

In the mean time, Berners-Lee put forward HTML with its one-way links, in the early 90s and it took off into what we know today. And one of the reasons for the web’s success is its extremely informal, ad-hoc functionality: anyone can put up an HTML page and without hooking into or caring about a more elaborate system. Compared to Xanadu, what we use today is the quick and dirty implementation of a potentially much richer and also much harder to maintain ecosystem.

Two-way linking would make not only impact research easier but also a number of other problems on the web. In his latest book “Who Owns the Future?”, Jaron Lanier discusses two-way linking as a potential solution to copyright infringement and a host of other web maladies. His logic is that if you could always know who is linking where, then you could create a system of micropayments to make sure authors get proper credit. His idea has its own caveats, but it shows the systems that two-way linking might enable. Chapter Seven of Lanier’s book discusses some of the other reasons Nelson’s idea never took off.

The desire for two-way links has not gone away, however. In fact, the *lack* of two-way links is an interesting lens through which to view the current tech environment. By creating a central server that catalogs and makes sense of the one-way web, Google’s adds value with its ability to make the internet seem more like Project Xanadu. If two-way links existed, you wouldn’t need all of the features of Google Analytics. People could implement their own search engines with their own page rank algorithms based on publicly available citation information.

The inefficiency of one-way links left a hole at the center of the web for a powerful player to step in and play librarian. As a result, if you want to know how your content lives online, you have to go shopping for analytics. To effectively monitor the life of an article, newsrooms currently use a host of services from trackbacks and Google Alerts to Twitter searches and ad hoc scanning. Short link services break web links even further. Instead of one canonical URL for a page, you can have a bit.ly, t.co, j.mp or thousands of other custom domains.

NewsLynx doesn’t have the power of Google. But, we have been working on a core feature that would leverage Google features and other two-way link surfacing techniques to make monitoring the life of an article much easier: we’re calling them “recipes”, for now (#branding suggestions welcome). In NewsLynx, you’ll add these “recipes” to the system and it will alert you of all pending mentions in one filterable display. If a citation is important, you can assign it to an article or onto your organization more generally. We also have a few built-in recipes to get you started.

We’re excited to get this tool into the hands of news sites and see how it helps them better understand their place in the world wide web. As we prepare to launch the platform in the next month or so, check back here for any updates.

Past Events

Why We Like Pinterest for Fieldwork: Research by Nikki Usher and Phil Howard

4

Nikki Usher, GWU

Phil Howard, UW and CEU

7/16/2014

Anyone tackling fieldwork these days can chose from a wide selection of digital tools to put in their methodological toolkit. Among the best of these tools are platforms that let you archive, analyze, and disseminate at the same time. It used to be that these were fairly distinct stages of research, especially for the most positivist among us. You came up with research questions, chose a field site, entered the field site, left the field site, analyzed your findings, got them published, and shared your research output with friends and colleagues.

 

But the post-positivist approach that many of us like involves adapting your research questions—reflexively and responsively—while doing fieldwork. Entering and leaving your field site is not a cool, clean and complete process. We analyze findings as we go, and involve our research subjects in the analysis. We publish, but often in journals or books that can’t reproduce the myriad digital artifacts that are meaningful in network ethnography. Actor network theory, activity theory, science and technology studies and several other modes of social and humanistic inquiry approach research as something that involves both people and devices. Moreover, the dissemination of work doesn’t have to be something that happens after publication or even at the end of a research plan.

 

Nikki’s work involves qualitative ethnographic work at field sites where research can last from five months to a brief week visit to a quick drop in day. She learned the hard way from her research for Making News at The New York Times that failing to find a good way to organize and capture images was a missed opportunity post-data collection. Since then, Nikki’s been using Pinterest for fieldwork image gathering quite a bit. Phil’s work on The Managed Citizen was set back when he lost two weeks of field notes on the chaotic floor of the Republican National Convention in 2000 (security incinerates all the detritus left by convention goers). He’s been digitizing field observations ever since.

 

Some people put together personal websites about their research journey. Some share over Twitter. And there are plenty of beta tools, open source or otherwise, that people play with. We’ve both enjoyed using Pinterest for our research projects. Here are some points on how we use it and why we like it.

 

How To Use It

  1. When you start, think of this as your research tool and your resource.   If you dedicate yourself to this as your primary archiving system for digital artifacts you are more likely to build it up over time. If you think of this as a social media publicity gimmick for your research, you’ll eventually lose interest and it is less likely to be useful for anyone else.
  2. Integrate it with your mobile phone because this amps up your capacity for portable, taggable, image data collection.
  3. Link the board posts to Twitter or your other social media feeds. Pinterest itself isn’t that lively a place for researchers yet. The people who want to visit your Pinterest page are probably actively following your activities on other platforms so be sure to let content flow across platforms.
  4. Pin lots of things, and lots of different kinds of things. Include decent captions though be aware that if you are feeding Twitter you need to fit character limits.
  5. Use it to collect images you have found online, images you’ve taken yourself during your fieldwork, and invite the communities you are working with to contribute.
  6. Backup and export things once in a while for safe keeping. There is no built-in export function, but there are a wide variety of hacks and workarounds for transporting your archive.

 

What You Get

  1. Pinterest makes it easy to track the progress of the image data you gather. You may find yourself taking more photos in the field because they can be easily arranged, saved and categorized.
  2. Using it regularly adds another level of data as photos and documents captured on phone and then added on Pinterest can be quickly field captioned and then re-catalogued, giving you a chance to review the visual and built environment of your field site and interrogate your observations afresh.
  3. Visually-enhanced constant comparative methods: post-data collection, you can go beyond notes to images and captions that are easily scanned for patterns and points of divergence. This may be going far beyond what Glaser and Strauss had imagined, of course.
  4. Perhaps most important, when you forget what something looks like when you’re writing up your results, you’ve got an instant, easily searchable database of images and clues to refresh your memory.

Why We Like It

  1. It’s great for spontaneous presentations. Images are such an important part of presenting any research. Having a quick publically accessible archive of content allows you to speak, on the fly, about what you are up to. You can’t give a tour of your Pinterest page for a job talk. But having the resource there means you can call on images quickly during a Q&A period, or quickly load something relevant on a phone or browser during a casual conversation about your work.
  2. It gives you a way to interact with subjects. Having the Pinterest link allows you to show a potential research subject what you are up to and what you are interested in. During interviews it allows you to engage people on their interpretation of things. Having visual prompts handy can enrich and enliven any focus group or single subject interview. These don’t only prompt further conversation, they can prompt subjects to give you even more links, images, videos and other digital artifacts.
  3. It makes your research interests transparent. Having the images, videos and artifacts for anyone to see is a way for us to show what we are doing. Anyone with interest in the project and the board link is privy to our research goals. Our Pinterest page may be far less complicated than many of our other efforts to explain our work to a general audience.
  4. You can disseminate as you go. If you get the content flow right, you can tell people about your research as you are doing it. Letting people know about what you are working on is always a good career strategy. Giving people images rather than article abstracts and draft chapters gives them something to visualize and improves the ambient contact with your research community
  5. It makes digital artifacts more permanent. As long as you keep your Pinterest, what you have gathered can become a stable resource for anyone interested in your subjects. As sites and material artifacts change, what you have gathered offers a permanent and easily accessible snapshot of a particular moment of inquiry for posterity.

 

Pinterest Wish-list

One of us is a Windows Phone user (yes really) and it would be great if there was a real Pinterest app for the Windows Phone. One touch integration from the iPhone, much like Twitter, Facebook, and Flicker from the camera roll would be great (though there is an easy hack).

 

We wish it would be easier to have open, collaborative boards. Right now, the only person who can add to a board is you, at least at first. You can invite other people to join a “group board” via email, but Pinterest does not have open boards that allow anyone with a board link to add content.

 

Here’s a look at our Pinboards: Phil Howard’s Tech + Politics board, and Nikki Usher’s boards on U.S. Newspapers. We welcome your thoughts…and send us images!

 

 

 

 

Nikki Usher is an assistant professor at the George Washington University’s School of Media and Public Affairs. Her project is Post Industrial News Spaces and Places with Columbia’s Tow Center on Digital Journalism. Phil Howard is a professor at the Central European University and the University of Washington. His project is a book on Political Power and the Internet of Things for Yale University Press.

 

Research

Knight Foundation joins The Tow Foundation as a sponsor for the initiative headed by Columbia University’s Tow Center for Digital Journalism

5

Knight Foundation joins The Tow Foundation as a sponsor for the initiative headed by Columbia University’s Tow Center for Digital Journalism

Tow Center program defends journalism from the threat of mass surveillance ” by Jennifer Henrichsen and Taylor Owen on Knight Blog 

NEW YORK – June 10, 2014 – The Journalism After Snowden initiative, a project of The Tow Center for Digital Journalism at Columbia University Graduate School of Journalism, will expand to further explore the role of journalism in the age of surveillance, thanks to new funding from the John S. and James L. Knight Foundation.

Journalism After Snowden will contribute high-quality conversations and research to the national debate around state surveillance and freedom of expression through a yearlong series of events, research projects and articles that will be published in coordination with the Columbia Journalism Review.

Generous funding from The Tow Foundation established the initiative earlier in the academic year. The initiative officially kicked off in January with a high-level panel of prominent journalists and First Amendment scholars who tackled digital privacy, state surveillance and the First Amendment rights of journalists.

Read more in the press release from the Knight Foundation.

Past Events

Glenn Greenwald Speaks | Join the Tow Center for an #AfterSnowden Talk in San Francisco on June 18, 2014

5

Join the Tow Center for an evening lecture with Glenn Greenwald, who will discuss the state of journalism today and his recent reporting on surveillance and national security issues, on June 18, 2014 at 7pm at the Nourse Theater in San Francisco.

In April 2014, Greenwald and his colleagues at the Guardian received the Pulitzer Prize for Public Service. Don’t miss Greenwald speak in-person as he fits all the pieces together, recounting his high-intensity eleven-day trip to Hong Kong, examining the broader implications of the surveillance detailed in his reporting, and revealing fresh information on the NSA’s unprecedented abuse of power with never-before-seen documents entrusted to him by Snowden himself.  Sponsored by: Haymarket Books, Center for Economic Research and Social Change, Glaser Progress Foundation, Tow Center for Digital Journalism – Columbia Journalism School, reserve your seat for Glenn Greenwald Speaks / Edward Snowden, the NSA, and the U.S. Surveillance State.

Please note: this is a ticketed event. Tickets are $4.75 each.  | Purchase Tickets

This event is part of Journalism After Snowden, a yearlong series of events, research projects and writing from the Tow Center for Digital Journalism in collaboration with the Columbia Journalism Review. For updates on Journalism After Snowden, follow the Tow Center on Twitter @TowCenter #AfterSnowden.

Journalism After Snowden is funded by The Tow Foundation and the John S. and James L. Knight Foundation.

Lauren Mack is the Research Associate at the Tow Center. Follow her on Twitter @lmack.

Announcements, announcements-home, Events, Past Events, The Tow Center

Upcoming Tow Tea: New Beats in Digital Media Thursday, September 25, 4-6 PM

0

Tow Tea
New Beats and Covering Diverse Beats in Digital Media

Thursday, September 25th, 2014
4:00 pm – 6:00 pm
The Brown Institute for Media Innovation

The Tow Center for Digital Journalism is thrilled to kick off our Fall 2014 Series of Tow Teas with an exciting conversation between Zave Martohardjono and David Noriega.  All are welcome and encouraged to attend.

Zavé Martohardjono is a Brooklyn-based trans* artist who works in performance, movement, video, and text. With roots in documentary filmmaking, Zavé became interested in video and media while studying International Relations and Political Economy at Brown University. He went on to grassroots filmmaking and youth media education, later receiving an M.F.A. in Media Arts Production at the City College of New York.

David Noriega is a reporter with Buzzfeed.  Born in Bogotá, Colombia, he moved to the U.S. as a teenager. He graduated from Brown University with a degree in comparative literature in 2008 before pursuing his Master’s at Columbia University Graduate School of Journalism. Noriega is interested in covering social justice issues and the Latin American diaspora.

Open to the public

Announcements, announcements-home, Events

Upcoming Events

0

Mapping Issues with the Web: An Introduction to Digital Methods

How can digital traces be used to understand issues and map controversies? Presenters: Liliana Bounegru and Jonathan Gray

Tuesday, September 23, 5-6:30

RSVP Required Via Eventbrite

Liliana Bounegru and Jonathan Gray

Liliana Bounegru and Jonathan Gray

On the occasion of Bruno Latour’s visit to Columbia University, this presentation will show participants how to operationalize his seminal Actor-Network Theory using digital data and methods in the service of social and cultural research.

Participants will be introduced to some of the digital methods and tools developed at the University of Amsterdam and Sciences Po over the past decade and how they have been used to generate insights around a wide variety of topics, from human rights to extremism, global health to climate change.

Professor Bruno Latour will provide a short response to this presentation and join the subsequent discussion.

Liliana Bounegru and Jonathan Gray collaborated to produce the popular, prescient ‘Data Journalism Handbook‘, published in 2012. They are currently working on a new project exploring how journalists can use new digital tools and methods developed by social science researchers to transform coverage of complex issues and events – using the Paris 2015 climate negotiations as a case study.

Please RSVP via Eventbrite

Announcements, announcements-home

Behind ProPublica’s “Losing Ground”

1

Behind ProPublica’s “Losing Ground”

Today, almost nine years after Hurricane Katrina made landfall in Louisiana, ProPublica is launching “Losing Ground,” a mixed media piece that shows the erosion of the Louisiana coastline using maps, photographs, text, a timeline, and audio interviews of residents.

The piece relies heavily on satellite imagery and is the product of a unique collaboration between ProPublica, a New York based newsroom and The Lens, a public interest newsroom based in New Orleans. The project also represents a first for the Tow Center, which helped to train journalists at ProPublica in remote sensing techniques as part of its extensive Sensor Newsroom research program, which was a Tow-Knight research project.  It is the first time the Tow Center has collaborated directly with a newsroom as part of its field research.

Tow’s Sensor Newsroom research, led by Fergus Pitt, has been multi-fold. This past February, the Tow Center taught the ProPublica team the fundamental physics of remote sensing, key concepts of temporal and spatial resolution, and did some practical exercises to analyze ground moisture levels. This was followed by the release of the report “Sensors and Journalism,” which covers everything from the legal dimensions of using sensors to case studies of newsrooms across the country.  This summer, Columbia was the first J-School in the country to offer a course, the Sensor Newsroom, to teach students how to use sensor technology to enrich their reporting and storytelling abilities.

Accompanying “Drowning Louisiana” is a case study Propublica, Satellites, and the Shrinking Louisiana Coast,” by Pitt detailing the technical process behind the production of the piece.

 

Louisiana

 

Since Hurricane Katrina, the city of New Orleans and the bayous of Louisiana have become part of the American cultural imagination. The HBO series Treme depicts the lives of characters in New Orleans in the immediate aftermath of the storm. The film the Beasts of the Southern Wild is about a young girl in a fictional bayou community (affectionately called “the bathtub”) that is almost wiped out by a storm and subsequent salt-water erosion.  The novel Salvage the Bones by Jesmyn Ward chronicles one family’s struggle during the ten days preceding Hurricane Katrina.

There is one constant in the lives of these characters: the persistent fear of flooding, of erasure.

In the closing chapter of Salvage the Bones, the protagonist reflects, “She left us a dark Gulf and salt-burned land. She left us to learn to crawl. She left us to salvage. Katrina is the mother we will remember until the next mother with large, merciless hands, committed to blood, comes.”

Yet, nothing captures the reality of this fear as well as “Drowning Louisiana.” As you hover over maps showing the land loss between 1930 and 2010, the affect is chilling. This is no longer a potentiality, but rather inevitability. In audio clips, residents describe hometowns that have already been swallowed up by the Gulf of Mexico.

“ProPublica’s remote sensing work is a great example of how reporters have all these powerful new tools at their disposal. The Sensors + Journalism report we just released shows how they’re used by other top newsrooms and gives readers an overview of how to think about them,” says Pitt.

Emily Bell, director of the Tow Center says, “Through the Tow Knight research projects we have been building a type of research methodology into digital journalism which investigates emerging practices and technologies and encourages collaboration with newsrooms.  Ultimately we want to be a place where academics, practitioners and students can learn together through actual application of new techniques and and then share the findings with the broader journalism community.”

An upcoming research project by Tow will introduce virtual reality technology to newsrooms in New York in order to create vivid experiences of current affairs.

Read more about the editorial decisions and satellite imagery in “Losing Ground” in a Q&A with Scott Klein and Al Shaw of ProPublica. 


Interested in participating in or proposing research relating to emerging technology, newsrooms, or new media? Send an e-mail to TowCenter@Columbia.edu.

Announcements, announcements-home, Events

Upcoming Events

0

All-Class Lecture: The New Global Journalism

Tuesday, Sep. 30, 2014, 6:00pm

(Lecture Hall)

Based on a new report from the Tow Center, a panel discussion on how digital technology and social media have changed the work of journalists covering international events. #CJSACL

Panelists include report co-authors: 

Ahmed Al OmranSaudi Arabia correspondent at The Wall Street Journal

Burcu BaykurtPh.D. candidate in Communications at Columbia Journalism School

Jessie GrahamSenior Multimedia Producer at Human Rights Watch

Kelly Golnoush NiknejadEditor-in-Chief at Tehran Bureau

Program will be moderated by Dean of Academic Affairs, Sheila Coronel

Event begins at 6 PM

RSVP is requested at JSchoolRSVP@Columbia.edu

Announcements, announcements-home, Events

Upcoming Tow Event: Just Between You and Me?

1

Just between you and me?

(Pulitzer Hall – 3rd Floor Lecture Hall)

In the wake of the Snowden disclosures, digital privacy has become more than just a hot topic, especially for journalists. Join us for a conversation about surveillance, security and the ways in which “protecting your source” means something different today than it did just a few years ago. And, if you want to learn some practical, hands-on digital security skills—including tools and techniques relevant to all journalists, not just investigative reporters on the national security beat—stick around to find out what the Tow Center research fellows have in store for the semester.

The event will be held at 6 p.m. on Monday, August 25th in the 3rd Floor Lecture Hall of Pulitzer Hall. We welcome and encourage all interested students, faculty and staff to attend.

Announcements, Events, Past Events, Research

Digital Security and Source Protection For Journalists: Research by Susan McGregor

3

EXECUTIVE SUMMARY

The law and technologies that govern the functioning of today’s digital communication systems have dramatically affected journalists’ ability to protect their sources.  This paper offers an overview of how these legal and technical systems developed, and how their intersection exposes all digital communications – not just those of journalists and their sources – to scrutiny. Strategies for reducing this exposure are explored, along with recommendations for individuals and organizations about how to address this pervasive issue.

 

DOWNLOAD THE PDF

GitBookCover

 

 

 



Order a (bound) printed copy.

Comments, questions & contributions are welcome on the version-controlled text, available as a GitBook here:

http://susanemcg.gitbooks.io/digital-security-for-journalists/

DIGITAL SECURITY AND SOURCE PROTECTION FOR JOURNALISTS

Preamble

Digital Security for Journalists A 21st Century Imperative

The Law: Security and Privacy in Context

The Technology: Understanding the Infrastructure of Digital Communications

The Strategies: Understanding the Infrastructure of Digital Communications

Looking Ahead

Footnotes

 

Past Events

Tow Center Launches Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output

9

Crediting is rare, there’s a huge gulf in how senior managers and newsdesks talk about it and there’s a significant reliance on news agencies for discovery and verification. These are some of the key takeaways of Amateur Footage: A Global Study of User-Generated Content in TV and Online News Output published today by the Tow Center of Digital Journalism.

 

The aim of this research project was to provide the first comprehensive report about the use of user-generated content (UGC) among broadcast news channels. UGC being – for this report – photographs and videos captured by people unrelated to the newsroom, who would not describe themselves as professional journalists.

 

Some of the Principle Findings are:

  • UGC is used by news organizations daily and can produce stories that otherwise would not, or could not, be told. However, it is often used only when other imagery is not available. 40% of UGC on television was related to Syria.
  • There is a significant reliance on news agencies in terms of discovering and verifying UGC. The news agencies have different practices and standards in terms of how they work with UGC.
  • News organizations are poor at acknowledging when they are using UGC and worse at crediting the individuals responsible for capturing it. Our data showed that: 72 percent of UGC was not labeled or described as UGC and just 16 percent of UGC on TV had an onscreen credit.
  • News managers are often unaware of the complexities involved in the everyday work of discovering, verifying, and clearing rights for UGC. Consequently, staff in many newsrooms do not receive the training and support required to develop these skills.
  • Vicarious trauma is a real issue for journalists working with UGC every day – and it’s different from traditional newsroom trauma. Some newsrooms are aware of this – but many have no structured approach or policy in place to deal with it.
  • There is a fear amongst rights managers in newsrooms that a legal case could seriously impact the use of UGC by news organisations in the future

 

This research was designed to answer two key questions.  First, when and how is UGC used by broadcast news organizations, on air as well as online?  Second, does the integration of UGC into output cause any particular issues for news organizations? What are those issues and how do newsrooms handle them?

 

The work was completed in two phases. The first involved an in-depth, quantitative content analysis examining when and how eight international news broadcasters use UGC.  1,164 hours of TV output and 2,254 Web pages were analyzed here. The second was entirely qualitative and saw the team interview 64 news managers, editors, and journalists from 38 news organizations based in 24 countries across five continents. This report takes both phases to provide a detailed overview of the key findings.

 

The research provides the first concrete figures we have about the level of reliance on UGC by international news channels. It also explores six key issues that newsrooms face in terms of UGC. The report is designed around those six issues, meaning you can dip into any one particular issue:

1) Workflow – how is UGC discovered and verified? Do newsrooms do this themselves, and if so, which desk is responsible? Or is UGC ‘outsourced’ to news agencies?

2) Verification – are there systematic processes for verifying UGC? Is there a threshold that has to be reached before a piece of content can be used?

3) Permissions – how do newsrooms seek permissions? Do newsrooms understand the copyright implications around UGC?

4) Crediting – do newsrooms credit UGC?

5) Labeling – are newsrooms transparent about the types of UGC that they use in terms of who uploaded the UGC and whether they have a specific agenda?

6) Ethics and Responsibilities – how do newsrooms consider their responsibilities to uploaders, the audience and their own staff?

 

The full report can be viewed here.

How It's Made, Research

The Data Journalist’s Eye, An Introduction

0

This is the first of a series of essays by Jonathan Stray to help data journalists see the processes they must master. Anybody can read a graph — that’s the point of publishing it — but journalists who are committed to producing great work need to look deeper. Stray’s essays will guide journalists towards answering four crucial questions: Where did the data come from? How was it analyzed? What is a reader going to believe when they see the story, and what can they do with that knowledge? Although data has just recently exploded into every corner of society, data journalism draws from ideas and traditions that go back hundreds of years. 

This is a graph of the U.S. unemployment rate over the last ten years. There is a whole world just beneath the surface of this image, an intricate web of people and ideas and actions.

The US unemployment rate from 2004 to 2014

It’s clear that a lot of people lost their jobs after the 2008 financial crash. You can read this chart and say how many: the unemployment rate went up by 5% which means 15 million people. This is a very ordinary, very reasonable way of talking about this data, exactly the sort of thing that should pop into your head when you see this image. The data journalist needs to look deeper.

 

What is this little squiggly line, where did it come from, and why do we think it’s so important? This trace is not the detached, professional abstraction it appears to be. There was much craft in its making; the crash was real enough but the graph is an invention. Yet it represents something very real, if you’re someone looking for work. Graphs like this can tell us what is happening, as a society, and suggest what to do. Journalists use charts like this all the time to understand and illustrate stories.

 

How does this work? Why do we invest this collection of numbers with such authority, and should we?

The journalist who works with data has answer these question. The broadest answers come from an intellectual tradition that predates the written word, beginning with the concept of counting. Yet each story demands specific answers, and you can’t do good data journalism without answering: why this chart and this data? How is an unemployment chart any better, or different, than walking into the street and asking people about their post-crash lives? Is data really any use at all, here?

This simple chart is not as simple as it seems, when you look closely.

My goal is to help you do good data work. But you can’t prove that a piece of data journalism is correct in the same way that you can prove a mathematical theorem is correct. Journalism starts and ends in the world of human experience, and math is just one part in the middle. Still it seems that some journalism uses data better than others, and there are many things you could do with data that are widely recognized as mistaken or deceptive. Within these constraints there is endless space for creation.

The principles of data work go back to the dawn of civilization: the very first writing systems were used for accounting, long before they were sophisticated enough to express language.[i] At that time the rules of addition must have seemed incredibly arcane (in base 60, at first!) and it must have been a powerful trick to be able to tell in advance how many stones you would need for a building. There is no doubt that numbers, like words, are a type of practical magic. But you already know how to count. I want to talk about some of the main ideas that were developed during the Enlightenment, then massively refined and expanded in the 20th century, with modern statistics and computers.

And so I’ve been collecting pieces, trying to understand what I can take from other fields, hoping to use data thoughtfully and effectively in my journalism work. I suspect that what I have left to learn is a lot more than what I can say now. But I’ve come to appreciate certain ideas, cherished principles from other traditions.

I’ve tried to organize the things that can be said about using data in journalism into four parts: quantification, analysis, communication, and action. These are roughly the stages of any data-based story. I don’t think anyone will be surprised to hear that data journalism includes analysis and communication. But I don’t find that nearly enough. A data story necessarily stretches back through time to the creation of the data, and hopefully it also stretches forward to the point where someone finds it helpful in deciding what to do.

Data journalism begins with quantification, and questions about quantification. Data is not something that exists in nature. Unemployed people are a very different thing than unemployment data! What is the process that turns one into the other? To put it another way: what is counted and how?

Who is unemployed? There are at least six different ways that the U.S. government counts, which give rise to data sets labeled U1 to U6.[ii] The official unemployment rate – it’s officially called the “official” rate — is known as U3. But U3 does not count people who gave up looking for a job, as U4 does, or people who hold part time jobs because they can’t get a full time job, as U6 does.

And this says nothing of how these statistics are actually tabulated. No one goes around to ask every single American about their employment status every single month. The official numbers are not “raw” counts but must be derived from other data in a vast and sophisticated ongoing estimation process. Unemployment figures, being estimates, have statistical estimation error, far more error than generally realized. This makes most stories about short term increases or decreases irrelevant.[iii]

There is some complex relationship between the idea conveyed by the words “unemployment rate” and the process that produces a particular set of numbers.

Normally all of this is backstage, hidden behind the chart. This is so for all data. Data is created; it is a record, a document, an artifact, dripping with meaning and circumstance. Something specific happened in the creation of every item of data, every digit and bit of stored information. A machine recorded a number at some point on some medium, or a particular human on a particular day made a judgment that some aspect of the world was this and not that, and marked a 0 or a 1. Even before that, someone had to decide that some sort of information was worth recording, had to conceive of the categories and meanings and ways of measurement, and had to set up the whole apparatus of data production.

Data production is an astonishing process involving humans, machines, ideas, and reality. It is social, physical,  and particular. I’m going to call this whole process “quantification,” a word which I’ll use to include everything from the conception of quantities all the way through to their concrete measurement and recording.

If quantification turns the world into data, analysis turns data into knowledge. Here is where data journalism comes closest to reproducible science, and leans heavily on math, statistics and logic. There are rules here, and we want those rules: it is hard to forgive arithmetic errors or a reporter’s confused causality. Journalists have a duty to get this sort of thing right, so data journalism demands deep and specific technical knowledge.

Suppose you want to know if the unemployment rate is affected by, say, tax policy. You might compare the unemployment rates of countries with different tax rates. The logic here is sound, but a simple comparison is wrong. A great many things can and do affect the unemployment rate, so it’s difficult to isolate just the effect of taxes. Even so, there are statistical techniques that can help you guess at what the unemployment rate would have been if all factors other than tax policy were the same between countries. We’re now talking about imaginary worlds, derived from the real through force of logic. That’s a tricky thing, not always possible, and not always defensible even when formally possible. Fortunately we have hundreds of years of guidance to help us.

Journalists are not economists, of course. They’re not really specialists of any kind, if journalism is all they have studied and practiced. We already have economists, epidemiologists, criminologists, climatologists, on and on. But a data journalist needs to understand the methods of any field they touch or they will be unable to tell good work from bad. They won’t know which analyses are worth repeating. Even worse, they will not understand which data matters how. And increasingly journalists are attempting their own analyses, when they discover that the knowledge they want does not yet exist. There is no avoiding quantitative methods.

Many people have some sort of reaction to the idea of studying statistics. Perhaps they had bad experiences with math in school. I find this a little sad, though I don’t blame you if you feel this way. Statistics in particular is often taught badly, using an outdated curriculum that is neither sensitive to the needs of the non-specialist nor particularly attuned to the wide availability of computing.[iv] We can do better. This isn’t a statistics course, but I’ll try to point out the specific ideas that are most relevant to data work in journalism. And they are such beautiful ideas!

The best way to learn quantitative methods is to get your hands dirty taking the machines apart. To do good data journalism work, or even to recognize good data journalism work, you need the grime of statistical method under your fingernails. That only comes from practice, but I can point to a few fundamentals, big ideas like distributions, models, causation and prediction. All of this knowledge is standard stuff, part of our shared heritage, but it can be remarkably difficult to find a description of how it all fits together.

The result of all of this work is something presented to the world, an act of communication. This is required of journalism. It’s one of the things that makes journalism different from research or scholarship or intelligence or science, or any field that produces knowledge but doesn’t feel the compulsion to tell it from the rooftops.

Communication always depends on the audience. The journalist doesn’t publish their story into vacuum, but into human minds and human societies. A story includes an unemployment chart because it is a better way of communicating changes in the unemployment rate than a table of numbers. And that is true because human eyes and brains process visual information in a certain way. Your visual system is attuned to the orientation of lines, which allows you to perceive trends without conscious effort. What a marvelous inborn ability!

Communication starts with the senses and moves ever deeper into consciousness. We know quite a lot about how minds work with data. Raw numbers are difficult to interpret without comparisons, which leads to all sorts of normalization formulas. Variation tends to get collapsed into stereotypes, and uncertainty tends to be ignored as we look for patterns and simplifications. Risk is personal and subjective, but there are sensible ways to compare and communicate odds.

But more than these technical concerns is the question of what is being said about whom. Journalism is supposed to reflect society back to itself for the benefit of us all, but who is the “we” in the data? Certain people are excluded from any count, and the astonishing variation of life is necessarily abstracted away into a useful fiction of uniformity. A vast social media data set seems like it ought to tell us deep truths about society, but it cannot say anything about the people who do not post, or the things they don’t post about. It does not speak for all. The unemployment rate reduces each voice to a single interchangeable bit: are you looking for work, yes/no?

An act of data journalism is a representation of reality that relies on stereotypes to fill in the lives behind the numbers. By stereotypes I mean our everyday understanding of people we’ve never met and situations we’ve always been lucky enough to avoid. Regardless of whether our image of “unemployed person” is positive or negative, we have to draw on this image to bring meaning to the idea of an unemployment rate. What the audience understands when they look at the data depends on what they already believe. Data can demolish or reinforce stereotypes.  So it is not enough for data to be presented “accurately.” We have to ask what the recipient will end up believing about the world, and about the people represented by the data. Often, data is best communicated by connecting it to specific human stories that bring life and depth to the numbers.

We’re not quite done. I adore curiosity, and learning for the pleasure of knowing. But that’s not enough for journalism, which is supposed to inform an active democracy. Knowing the unemployment rate is interesting. Much better is knowing that a specific plan would plausibly create jobs. This is the type of knowledge that allows us to shape our future.

What good is journalism that never touches action? Action is not only essential, it is a powerfully clarifying perspective. Asking what someone could want to do is a question that will ripple through all the stages of your work, if you let it.

Data cannot tell us what to do, but it can sometimes tell us about consequences. The 20th century saw great advances in our understanding of causality and prediction. Prediction is the queen of knowledge; it is knowledge of the future. Prediction can give us instrumental knowledge: the knowledge of how to bring the world from the way it is now to the way we want it to be.

But prediction is very hard. Most things can’t be predicted well, for fundamental reasons such as lack of data, intrinsic randomness, free will, or chaos theory. There are profound limits to what we can know about the future. Yet where prediction is possible, there is convincing evidence that data is essential. Purely qualitative methods, no matter how sophisticated, just don’t seem to be as accurate. (The best methods are mixed.) Data is an irreplaceable resource for journalism that asks what will happen, what should be done, or how best to do it.

Predictions hide everywhere in data work. A claim to generalization is a also a claim to prediction, and prediction is also one of the very best ways we have of validating our knowledge. This is the logic of testing and “falsification” in the scientific method. There might be many stories that match the data we have now, but only true stories — stories that match the world — can match the data that is yet to exist.

But don’t believe for a second that all we need to do is run the equations forward and read off what to do. We’ve seen that broken dream before. At an individual level, the ancient desire for universal quantification can be a harmless fantasy, even an inspiration for the creation of new and powerful abstractions. At a societal level, utopian technocratic visions have been uniformly disastrous. A fully quantified social order is an insult to freedom, and there are good reasons to suspect that such a system can never really work.[v] Questions of action can hone and refine our data journalism, but actual action — making a choice and doing — requires practical knowledge, wisdom, and creativity. The use of data in journalism, like the use of data in society, will always involve artistry.

Quantification produces data from the world. Analysis finds a story in the data. Communication is where the story leaps to the mind of the audience. The audience acts on the world.

The Data Journalism Cycle: Quantification, Analysis, Communication, Action

 

All of this is implicit in every use of data in journalism. All of it is just below the surface of an unemployment chart in the news, to say nothing of the complex visualizations that journalists now create routinely. Data journalism depends on what we have decided to count, the techniques used to interpret those counts, how we have decided to show the results, and what happens after we do. And then the world changes, and we report again. The data journalist sees this rich web of people, ideas, and action behind every number.

Endnotes

[i] Denise Schmandt-Besserat. Tokens and Writing: The Cognitive Development. SCRIPTA (1) 2009 145:154 http://sites.utexas.edu/dsb/files/2014/01/TokensWriting_the_Cognitive_Development.pdf

[ii] Table A-15. Alternative measures of labor underutilization. http://www.bls.gov/news.release/empsit.t15.htm.

[iii] A nice visualization of how the error in unemployment rates can lead to incorrect interpretations is How Not To be Misled by the Jobs Report. New York Times, 2014-5-1. http://www.nytimes.com/2014/05/02/upshot/how-not-to-be-misled-by-the-jobs-report.html

[iv] George Cobb. The Introductory Statistics Course: a Ptolemaic Curriculum. http://escholarship.org/uc/item/6hb3k0nz.

[v] see for example James C. Scott. Seeing Like a State. Yale University Press, 1998.

announcements-home

Tow Center at SXSW 2015

1

The Tow Center has entered three panel ideas for the upcoming SXSW Interactive 2015, which will take place between March 13 and March 17 in Austin, Texas.  Since 2007, 30% of the decision making process about which panels get chosen has been left up to the public.  Read about our panels below and vote here!

PanelPicker13_pie

Privacy and Publication: Open Information Ethics

Digital Misdirection: Hiding Online

Virtual Reality Journalism

 

Digital Misdirection: Hiding Online

The data thrown off by our digital devices and communications provides a disturbingly intimate and accurate portrait of our interests and relationships to everyone from our internet and email providers to online advertisers. While encryption technologies and the like can stem the flow of some of this information, this session will outline some creative ways that you can fool the system without feeling like you need an advanced degree in technology.

Privacy and Publication: Open Information Ethics

Information made public through police departments, lawsuits and FOIA requests are an essential component of public interest and accountability reporting. At the same time, intersecting datasets can quickly de-anonymize ordinary people, compromising their privacy and even putting them at risk. From publishing the addresses of gun permit holders in Westchester to online mugshot records to the Snowden revelations and the Right to be Forgotten, the meaning of privacy and the consequences of publishing have raised the stakes for journalists trying to responsibly work with sensitive sources and publish controversial stories. Alice Brennan, Fidel Martinez and Susan McGregor will discuss real-world journalism situations as well as strategies applicable to anyone looking to be more privacy-conscious in their online life.

See more on SXSW’s Website

 

Virtual Reality Journalism
While the potential for virtual reality has long been theorized, the launch of the developer kit for the Oculus Rift virtual reality headset has brought the technology into the mainstream. While most of the content development to date has occurred in the fields of gaming and film there is tremendous potential to use virtual reality for a new form of journalism – immersing users in events that they would otherwise not have access to – war, hardship, far away places. Highlighting the multiple immersive journalism projects built by de la Peña and exploring the virtual reality journalism pilot project between the Tow Center, Vice News and the Secret Location, this panel will bring together leading technologists, academics, and journalists at the cutting edge of this new field to ask whether virtual reality can effect how users empathize with and better understand international events.

See more on SXSW’s website

Don’t forget to vote!

Research

Diversity in the Robot Reporter Newsroom

3

bots

The Associated Press recently announced a big new hire: A robot reporter from Automated Insights (AI) would be employed to write up to 4,400 earnings report stories per quarter. Last year, that same automated writing software produced over 300 million stories — that’s some serious scale from a single algorithmic entity.

So what happens to media diversity in the face of massive automated content production platforms like the one Automated Insights created? Despite the fact that we’ve done pretty abysmally at incorporating a balance of minority and gender perspectives in the news media, I think we’d all like to believe that by including diverse perspectives in the reporting and editing of news we fly closer to the truth. A silver lining to the newspaper industry crash has been a profusion of smaller, more nimble media outlets, allowing for far more variability and diversity in the ideas that we’re exposed to.

Of course software has biases and although the basic anatomy of robot journalists is comparable, there are variations within and amongst different systems such as the style and tone that’s produced as well as the editorial criteria that are coded into the systems. Algorithms are the product of a range of human choices including various criteria, parameters, or training data that can also pass along inherited, systematic biases. So while a robot reporter offers the promise of scale (and of reducing costs), we need to consider an over-reliance on any one single automated system. For the sake of media diversity the one bot needs to fork itself and become 100,000.

We saw this in microcosm unfold over the last week. The @wikiparliament bot was launched in the UK to monitor edits to Wikipedia from IP addresses within parliament (a form of transparency and accountability for who was editing what). Within days it had been mimed by the @congressedits bot which was set up to monitor the U.S. Congress. What was particularly interesting about @congressedits though is that it was open sourced by creator Ed Summers. And that allowed the bot to quickly spread and be adapted for different jurisdictions like Australia, Canada, France, Sweden, Chile, Germany, and even Russia.

Tailoring a bot for different countries is just one (relatively simple) form of adaptation, but I think diversifying bots for different editorial perspectives could similarly benefit from a platform. I would propose that we need to build an open-source news bot architecture that different news and journalistic organizations could use as a scaffolding to encode their own editorial intents, newsworthiness criteria, parameters, data sets, ranking algorithms, cultures, and souls into. By creating a flexible platform as an underlying starting point, the automated media ecology could adapt and diversify faster and into new domains or applications.

Such a platform would also enable the expansion of bots oriented towards different journalistic tasks. A lot of the news and information bots you find on social media these days are parrots of various ilks: they aggregate content on a particular topical niche, like @BadBluePrep, @FintechBot and @CelebNewsBot or for a geographical area like @North_GA, or they simply retweet other accounts based on some trigger words. Some of the more sophisticated bots do look at data feeds to generate novel insights, like @treasuryio or @mediagalleries, but there’s so much more that could be done if we had a flexible bot platform.

For instance we might consider building bots that act as information collectors and solicitors, moving away from pure content production to content acquisition. This isn’t so far off really. Researchers at IBM have been working on this for a couple years already and have already build a prototype system that “automatically identifies and ask[s] targeted strangers on Twitter for desired information.” The technology is oriented towards collecting accurate and up-to-date information from specific situations where crowd information may be valuable. It’s relatively easy to imagine an automated news bot being launched after a major news event to identify and solicit information, facts, or photos from people most likely nearby or involved in the event. In another related project the same group at IBM has been developing technology to identify people on Twitter that are more likely to propagate (Read: Retweet) information relating to public safety news alerts. Essentially they grease the gears of social dissemination by identifying just the right people for a given topic and at a particular time who are most likely to further share the information.

There are tons of applications for news bots just waiting for journalists to build them: factchecking, information gathering, network bridging, audience development etc. etc. Robot journalists don’t just have to be reporters. They can be editors, or even (hush) work on the business side.

What I think we don’t want to end up with is the Facebook or Google of robot reporting: “one algorithm to rule them all”. It’s great that the Associated Press is exploring the use of these technologies to scale up their content creation, but down the line when the use of writing algorithms extends far beyond earnings reports, utilizing only one platform may ultimately lead to homogenization and frustrate attempts to build a diverse media sphere. Instead the world that we need to actively create is one where there are thousands of artisanal news bots serving communities and variegated audiences, each crafted to fit a particular context and perhaps with a unique editorial intent. Having an open source platform would help enable that, and offer possibilities to plug in and explore a host of new applications for bots as well.

Research

The Anatomy of a Robot Journalist

5

Given that an entire afternoon was dedicated to a “Robot Journalism Bootcamp” at the Global Editors Network Summit this week, it’s probably safe to say that automated journalism has finally gone mainstream — hey it’s only taken close to 40 years since the first story writing algorithm was created at Yale. But there are still lots of ethical questions and debates that we need to sort out, from source transparency to corrections policies for bots. Part of that hinges on exactly how these auto-writing algorithms work: What are their limitations and how might we design them to be more value-sensitive to journalism?

Despite the proprietary nature of most robot journalists, the great thing about patents is that they’re public. And patents have been granted to several major players in the robo-journalism space already, including Narrative ScienceAutomated Insights, and Yseop, making their algorithms just a little bit less opaque in terms of how they operate. More patents are in the pipeline from both heavy weights like CBS Interactive, and start-ups like Fantasy Journalist. So how does a robo-writer from Narrative Science really work?

Every robot journalist first needs to ingest a bunch of data. Data rich domains like weather were some of the first to have practical natural language generation systems. Now we’re seeing a lot of robot journalism applied to sports and finance — domains where the data can be standardized and made fairly clean. The development of sensor journalism may provide entirely new troves of data for producing automated stories. Key here is having clean and comprehensive data, so if you’re working in a domain that’s still stuck with PDFs or sparse access, the robots haven’t gotten there yet.

After data is read in by the algorithm the next step is to compute interesting or newsworthy features from the data. Basically the algorithm is trying to figure out the most critical aspects of an event, like a sports game. It has newsworthiness criteria built into its statistics. So for example, it looks for surprising statistical deviations like minimums, maximums, or outliers, big swings and changes in a value, violations of an expectation, a threshold being crossed, or a substantial change in a predictive model. “Any feature the value of which deviates significantly from prior expectation, whether the source of that expectation is due to a local computation or from an external source, is interesting by virtue of that deviation from expectation,” the Narrative Science patent reads. So for a baseball game the algorithm computes “win probability” after every play. If win probability has a big delta in-between two plays it probably means something important just happened and the algorithm puts that on a list of events that might be worthy of inclusion in the final story.

Once some interesting features have been identified, angles are then selected from a pre-authored library. Angles are explanatory or narrative structures that provide coherence to the overall story. Basically they are patterns of events, circumstances, entities, and their features. An angle for a sports story might be “back-and-forth horserace”, “heroic individual performance”, “strong team effort”, or “came out of a slump”. Certain angles are triggered according to the presence of certain derived features (from the previous step). Each angle is given an importance value from 1 to 10 which is then used to rank that angle against all of the other proposed angles.

Once the angles have been determined and ordered they are linked to specific story points, which connect back to individual pieces of data like names of players or specific numeric values like score. Story points can also be chosen and prioritized to account for personal interests such as home team players. These points can then be augmented with additional factual content drawn from internet databases such as where a player is from, or a quote or picture of them.

The last step the robot journalist takes is natural language generation, which for the Narrative Science system is done by recursively traversing all of the angle and story point representations and using phrasal generation routines to generate and splice together the actual English text. This is probably by far the most straightforward aspect of the entire pipeline — it’s pretty much just fancy templates.

So, there you have it, the pipeline for a robot journalist: (1) ingest data, (2) compute newsworthy aspects of the data, (3) identify relevant angles and prioritize them, (4) link angles to story points, and (5) generate the output text.

Obviously there can be variations to this basic pipeline as well. Automated insights for example uses randomization to provide variability in output stories and also incorporates a more sophisticated use of narrative tones that can be used to generate text. Based on a desired tone, different text might be generated to adhere to an apathetic, confident, pessimistic, or enthusiastic tone. YSeop on the other hand uses techniques for augmenting templates with metadata so that they’re more flexible. This allows templates to for instance conjugate verbs depending on the data being used. A post generation analyzer (you might call it a robot editor) from YSeop further improves the style of a written text by looking for repeated words and substituting synonyms or alternate words.

From my reading, I’d have to say that the Narrative Science patent seems to be the most informed by journalism. It stresses the notion of newsworthiness and editorial in crafting a narrative. But that’s not to say that the stylistic innovations from Automated Insights, and template flexibility of YSeop aren’t important. What still seems to be lacking though is a broader sense of newsworthiness besides “deviance” in these algorithms. Harcup and O’Neill identified 10 modern newsworthiness values, each of which we might make an attempt at mimicking in code: reference to the power elite, reference to celebrities, entertainment, surprise, bad news, good news, magnitude (i.e. significance to a large number of people), cultural relevance to audience, follow-up, and newspaper agenda. How might robot journalists evolve when they have a fuller palette of editorial intents available to them?