Susan E. McGregor

Innovation Showcase

Innovation Showcase / NewsBeast Labs

0

NewsBeast Labs – Data Journalism at Newsweek & The Daily Beast

nb-labs-mosaicSince June, NewsBeast Labs has been developing interactive data projects on such topics as the 2012 election, access to abortion services in America and the national gun debate. Our projects use everything from traditional computer-assisted reporting techniques and web scraping, to new techniques in interactive mapping and presenting readers’ stories as part of the narrative.

When we come up with a project topic, we design for two speeds: What can we do today and what can we do if we had a whole month? For instance, following the Newtown shooting, we did a quick turnaround interactive asking readers to complete the sentences “I own a gun because…” or “I don’t own a gun because…” to see what readers had to say about gun ownership in the wake of the shooting. As a longer project, we created a database of all 530+ members of congress and categorized their stance on gun control based on recent statements and voting records. We created a tool — This Is Your Rep On Guns — to let readers find their representatives, see their recent statements and contact them. We also created a Twitter bot — @YourRepsOnGuns — that retweets representatives when the discuss guns. The bot keeps the project alive and lets the topic become a part of readers’ daily feed.

On a visual level, we also design for two speeds: Some readers will spend two seconds with the story and some will spend two hours — we design the visuals to convey the story instantly but also include all the complexity of the issue in data that interested readers can explore. You can read more about our projects at http://newsbeastlabs.tumblr.com or get in touch at michael.keller@thedailybeast.com

THE PROJECTS

INNOVATION SHOWCASE 2013 • MAY 18TH, 12PM – 4PM • COLUMBIA JOURNALISM SCHOOL

Innovation Showcase

Innovation Showcase 2013 / Shadi Bushra

0

The Road to Nimule: The Story of South Sudan’s First Paved Highway

Shadi Bushra / @shadibushra

Decades of civil war left South SudaAfter a day of cutting sugarcane to sell, a girl stops along then with no infrastructure to speak of. When the war ended in 2005, the international community, led by the United States, took it upon itself to rebuild the country. For six years USAID constructed a road leading from the capital, Juba, to the border town of Nimule, along the frontier with Uganda. The road was meant to link South Sudan with its East African neighbors, offering a new trade route and reducing dependence on the rest of Sudan. The road has done that, but the accidents and crime associated with the road have irked local communities. Local leaders assert that they were deceived as to the road’s width (and therefore its safety) by the companies that built it. There have also been incidents of mob violence to punish those drivers that speed or endanger civilians. As a whole, the road presents the best of both worlds: modernity and possible economic prosperity, but not without the disruptive effects that rapid changes have on the local level. The project is a collection of photos and video that speak to this theme.

THE PROJECTHOW IT’S MADE


INNOVATION SHOWCASE 2013 • MAY 18TH, 12PM – 4PM • COLUMBIA JOURNALISM SCHOOL

Announcements, CU Community

A Brief History of Computer-Assisted Reporting

0

It’s been more than a year since The New York Times declared this The Age of Big Data, but for most Americans, the news really hit home on election night, 2012.

Nate Silver’s uncannily accurate predictions about how the presidential race would turn out made him one of the most talked about people of the campaign, even in media circles, where the journalistic merit of Silver’s statistically driven work was vigorously debated.

Yet Silver’s work is arguably less revolution than evolution, one facet of a journalistic practice that has actually been around for decades, even if, like Silver, it only recently made it into the mainstream.

“We started out with this a long time ago—before the Web, before even reasonably simple computers,” says Sarah Cohen, editor of the computer-assisted reporting (CAR) team at The New York Times. As early as the late 1960s, journalists like Philip Meyer and Elliott Jaspin were using social science methods and data analysis—sometimes with the help of mainframe computers—to generate and test their journalistic hypotheses. “That was how a generation of us learned what [computer-assisted reporting] was,” says Cohen.

CAR is a practice that, while producing powerful results (see the Pulitzers of Jaspin, Meyer, Dedman, and others) for many years existed only at the margins of most newsrooms, the domain of a few motivated reporters. For much of that time, the methods of CAR hewed closely to those described in Meyer’s seminal book, Precision Journalism, and the tools remained fairly constant: spreadsheets, database software, and, eventually online resources. Likewise, the end product was the same as for any other news story: a printed text article.

In recent years, however, a slew of new terms have filtered into journalists’ vocabularies and job titles, like data journalism, computational journalism, news apps, and data visualization. To the uninitiated, what these descriptors mean—much less how they differ—may seem inscrutable. Yet even to insiders, their intersections and boundaries are often hard to resolve, and somewhere behind the semantics hovers a difficult question: Are these just new methods for executing the old jobs of journalism, or are they a fundamentally new philosophy of what journalism can be?

“In terms of terminology, I think it can be both misleading and enlightening,” says Troy Thibodeaux, the editor for newsroom innovation at the Associated Press. “It’s a very strange thing, because we’re all doing very closely related work.”

But what is it?

Perhaps the first step in discussing these practices is to distinguish between process and product. News apps and data visualization generally describe a class of publishing formats, usually a combination of graphics (interactive or otherwise) and reader-accessible databases. Because these end products are typically driven by relatively substantial data sets, their development often shares processes with CAR, data journalism, and computational journalism. In theory, at least, the latter group is format agnostic, more concerned with the mechanisms of reporting than the form of the output.

“CAR reporters are good at getting records,” says Reg Chua, data editor at Thomson Reuters. “A lot of CAR is data journalism; it’s interrogating data. Computational journalism represents a new step in what you can do—use of computers, and the processing power of computers and programming, to do types of reporting that were unimaginable even a few years ago.”

Harnessing that computational power, however, has meant bringing new practitioners into the field, and their ideas come from outside the typical CAR tradition.

“Now there’s this whole other path of people who were developers who have a very different perspectives,” says Thibodeaux.

At this year’s National Institute of Computer-Assisted Reporting (NICAR) conference in early March, Thibodeaux created and moderated a panel called “From CAR to newsapps and back again,” composed of two-person teams that have collaborated to produce some of the most influential work in digital journalism.

While on the whole the tone of the panel was mutually complimentary, Sarah Cohen conceded that some journalists still tend to trivialize the visual aspects of journalism.

“There are still some editors, though they are fewer and fewer, who really just think of graphics and interactive as just the candy,” rather than a legitimate news format, she says.

For their part, however, the developers present seemed to welcome a move towards more story-driven news apps and visualizations.

“News apps are now edited, which is fairly new,” said Derek Willis, interactive developer at The New York Times. “I think we now hope to treat the editing process as seriously as you do with any story, including asking, ‘Does this work? Does it deserve to stay up?’ I think this has been the growth in the domain.”

Yet while many news app developers will agree that news apps need story, they also assert that journalism needs news apps, which Thibodeaux says do “the thing that a story can’t do, which is let you drill down.”

Rather than focusing only on individual, moment-in-time accounts, Chua says, journalistic publishing needs to include work that is both more focused and more incremental. “The real example of this is Homicide Watch: It updates in essentially real time, and you can drop in anytime and see what the trends are.” This sort of in-progress publishing, Chua believes, is essential, “if we want to get all the value of all the reporting we do every day, and also better serve these communities.”

What’s next?

Whether or not they agree on the need to diversify the way news is published, CAR reporters, data editors, and news app developers alike see new technologies changing the way that journalism is both conceptualized and executed.

As much was indicated by the strong impression made on many attendees by Jeff Larson and Chase Davis’s NICAR presentation, “Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom.”

“I’m pretty conservative on this stuff,” says Thibodeaux. “Source reporting leads to the best data reporting.” But after Larson and Davis’s presentation, he says, he can see how “the techniques start to act like sources. The tools let us ask questions that we couldn’t even conceive of before.”

Likewise, Cohen sees significant opportunities in algorithmic document analysis. “Our ability to make sense of messy original records has been revolutionized,” she says.

Whether the broader use of data science tools to do journalism will increase the acceptance of work like Silver’s remains to be seen, but his methods are more likely to be embraced than abandoned. If nothing else, the economic advantages of offloading more work to machines is hard to finesse:

“We don’t have the financial wherewithal to waste the kind of time we waste,” says Cohen. “If we spend a week doing document analysis that could be done by an algorithm, then we deserve to be replaced by machines.”

“We need to reserve the work for things that take human creativity and human insight.”

Tips & Tutorials

Google Fusion Tables Map Manager

0

A tool for creating and managing, exploring and publishing Google Fusion Tables maps that have one or more layers.

(more…)

Past Events

Tow Center Mapathon! 2012

0

Despite a looming Hurricane Sandy, the Tow Center’s Mapathon 2012! with MapBox and CartoDB  produced great results, with student teams, mentors, and a handful of outside participants collaborating to map and visualize data from West Nile outbreaks to rat populations in the NYC area. Very special thanks to Dave Johnson (@djindc) and Andrew Hill (@andrewxhill) for their invaluable instruction and and tireless assistance to our teams. (more…)

Tips & Tutorials

#stealthis: Security Essentials for Journalists

0

The following is an abridged version of “Steal this Digital Security Toolbox”, which I presented at the 2012 Online News Association conference. Please note that while the below is only a brief overview of security techniques, there are two major takeaways. (1) Encrypt your devices. (2) Set yourself up with a Virtual Private Network (VPN) to protect your wireless activity. Get in the habit; these practices are important no matter where or with whom you are working. The CUJ community can learn more on October 18, when Jeff Sieben, Columbia Journalism School’s head of IT, will be leading a Tow Tea on digital security. (more…)

Past Events

Cyber Scholars and Doing Data Journalism

0

In March the Tow Center will sponsor three events related to major issues in digital journalism.

March 27th – CyberScholar Working Group Forum
On March 27th, we will host this month’s Cyberscholar Working Group, a forum for fellows and affiliates of MIT, Yale Law School Information Society Project, Columbia University, and the Berkman Center for Internet & Society at Harvard University to discuss their ongoing research. This month’s gathering will take place at Columbia University’s Graduate School of Journalism on March 27th,6-9 p.m. in Room 107B. Presentations will include:

Harris Chen, Harvard: “The Future Criminal Investigation in the Digital Age”
Shlomit Yanisky-Ravid, Yale: “Traditional Knowledge – Culture Expression and Access to Knowledge: The Open Questions”
David Thaw, Yale: “Comparing Management-Based Regulation and Prescriptive Legislation: How to Improve Information Security Through Regulation”
John Kelly, Columbia/Harvard: “Analyzing Russian Social Media”

Details and RSVP: http://cwgmar2012.eventbrite.com/
Contact: Kate Fink, kaf2155@columbia.edu

 

March 28th – Doing Data Journalism: It’s Not Just Numbers

Data journalism is quickly becoming one of the hottest topics in the industry – but what exactly is it, and what tools, teams and techniques are necessary for doing it well?

On March 28th, the Tow Center for Digital Journalism will host several of data journalism’s most prominent innovators and practitioners for a discussion about the possibilities and pitfalls of this evolving field. We hope you will join us at Columbia Journalism School from 6 – 7:30pm to hear their perspectives and join the discussion.

Panelists include:

Julia Angwin, tech editor for The Wall Street Journal
Jo Craven McGinty, projects editor for Computer Assisted Reporting at The New York Times
Scott Klein, editor of News Applications at ProPublica
Aron Pilhofer, editor of Interactive News at The New York Times
Ashkan Soltani, technical consultant and privacy expert
Mo Tamman, award-winning data journalist at Reuters

The event is free and open to the public.

Details and RSVP: http://cujtowdoingdatajournalism.eventbrite.com/
Contact: Susan E. McGregor, sem2196@columbia.edu

 

Finally, on March 6 the Tow Center partnered with the New York World to sponsor a unique panel discussion about government accountability and transparency in the age of digital records. The event brought together panelists from government and non-profit sectors to help elucidate the goals and challenges of open government initiatives. The lively discussion addressed how legislative wording can have significant impact on the types of records that are made publicly available, and what the opportunities are for further development in this area.

Panelists included:

Philip Ashlock, OpenPlans
Andrew Hoppin, New Amsterdam Ideas and formerChief Information Officer, New York State Senate
Amy Ngai, Sunlight Foundation
New York City Council Member Gale Brewer (invited)
Michael Powell, The New York Times
Moderated by Alex Howard, Government 2.0, correspondent, O’Reilly Radar

Past Events

Tow Center/ScraperWiki Datacamp

0

On February 3rd & 4th, the Tow Center will be hosting a two-day “DataCamp” along with ScraperWiki, a recent Knight News Challenge winner and innovative platform for collecting and accessing online data.

The focus of the DataCamp is “New York accountability,” and much of the activity will revolve around cleaning up, publishing, and generating stories with hard-to-access New York data sets. In addition to speakers and workshop sessions, the event is designed to provide a space for students, journalists, designers and programmers to explore and develop interesting data-driven journalism with help and support from both the Tow Center and ScraperWiki.

There are three “streams” of participation for the event: one for journalists interested in developing data-driven stories, another for (Ruby or Python) programmers looking to flex their “scraping” skills, and a third for developers interested in learning the finer points of the ScraperWiki platform.

For more details about the three streams and to sign up for the event, please visit the ScraperWiki events page. We recommend signing up soon – the event is free, but space is limited.

We are also interested in hearing from anyone who would like to volunteer to facilitate any or all of these streams during the event. For details, please contact sem2196@columbia.edu.

Tips & Tutorials

Embedding Google Visualizations in WordPress and Tumblr

0

The Google Code Playground is a good place to play around and experiment with many of Google’s Visualization widgets; you can even sign in with your Google account at the top right and save your work as you go (once logged in click the small dropdown arrow just to the right of the “Edit HTML” button in order to save).

Once your visualization is complete, however, you’ll want to be able to publish it. Both WordPress and Tumblr can display these visualizations – all that’s required is updating a few plugins and settings first.

To embed a visualization in WordPress


By default, WordPress will essentially erase any complex code that is simply pasted into a regular post; if it didn’t, a little bit of bad code would easily break a whole blog. Instead, WordPress requires that any custom code be put into a “Custom Field” container, which separates and protects the rest of the page from whatever code it contains. In most basic WordPress installs, Custom Fields aren’t activated by default – first, you must choose “Screen Options” at the top right of the dashboard page, and check off the “Custom Fields” option. You’ll then see a Custom Fields area appear below your post on the edit page. To add your visualization, click “Enter new” and type a name (no spaces or special characters) for your visualization in the Name input box; in the Value area paste the code that the playground generates once you’ve hit the “Edit HTML” button. To save it, click the “Add Custom Field” button.

Now that your field is created, you need to actually insert it into your post. Though Custom Fields were originally intended for widgets that would appear on every post, their usefulness for adding arbitrary code has made them popular for embedding unique content in a single post as well – content like Google Visualizations and other multimedia. The easiest way to do this is by using a so-called “shortcode” – an HTML-like tag that works specifically in WordPress. To activate these, you’ll need first to install and activate the Custom Fields Shortcode plugin. Then, to insert your visualization in a post, simply switch to HTML editing mode (as opposed to “Visual”) and use the code:

[cf]_Your_Visualization_Custom_Field_Name_[/cf]

Google Visualization API Sample




To embed a visualization in Tumblr

Embedding a Google Visualization in Tumblr is even easier than adding it in WordPress, though overall it can be less flexible. The first step is to switch your editing mode to HTML.

On your dashboard, click on the gear-shaped “Preferences” icon at top right. On the third option – “Edit posts using” – select the “plain text/HTML” option. To add your visualization to a post, simply paste the code that the Google Visualization Playground generates once you’ve hit the “Edit HTML” button wherever you want the visualization to appear.

Tips & Tutorials

Google Fusion Tables Basics: Uploading, Editing, Visualizing

0

Google Fusion Tables (now incorporated into Google Docs as “tables”) is a tool that allows quick previews and publishing of interactive data-driven visualizations, including maps, charts, and graphs. Once you have your data set selected, you simply need to edit it in your favorite spreadsheet program, upload it to Google Docs as a new “Table”, and then choose the visualization you want to preview from a dropdown menu. You can even refine and manipulate exactly what data given visualization displays, and, to limited extent, customize its look and feel. If you decide you want to publish it live on the web, Fusion Tables will also generate code for you.

Step 1: Choose your data source


The interface for tables is similar to that of a spreadsheet program, so your data should be in some kind of spreadsheet-compatible format, such as .xls(x), .csv, or tab-delimted .txt.

Step 2: Upload and edit metadata

Log in to your gmail account, and go to the “Documents” tab. Select Create >> Table(beta). In the dialog box that appears, click Browse, and select your data file. If your data is in a .txt file, you may need to change the “Separator character” to Tab; otherwise click Next.

The next dialog offers a preview of your data as it will appear in the table; it will assume that column titles are in the first row, though this can be changed in the dropdown. By default, all columns and rows will be imported, but columns can be removed by unchecking the box at the top. Once you’ve made your selections, click Next.

The final import screen allows you to edit the metadata for the table: name, attribution, attribution link, and description. Remember: you cannot use or publish data without a source, so fill out this section as thoroughly as possible, especially the attribution and page link. You will save yourself time down the line by recording the relevant metadata, so you don’t have to go hunting for it later.


Click Finish.


Step 3: Editing the 'schema'

“Schema” is a database term that describes the type of information in each column of a table. In Fusion Tables, columns may contain text, numbers, locations, or date/times. In general, the program makes fairly good guesses automatically about what type of information is in each column, but you can always edit this by choosing Edit >> Modify columns. In the dialog box, you can select each column and use the Type dropdown to modify the data type.

Data highlighted in yellow indicates which columns the table assume to be “location” information (such as street addresses, city, state or country names, latitude and longitude information or KML). Although multiple “location” columns are allowed, it is recommended that you edit the table to have only one. If you have latitude and longitude information in two separate columns, this will be detected by the table and it will use both when mapping. For best results with street addresses, these should be in a single column.


Step 4: Editing the data

Although column names and data points can be edited directly in Fusion Tables, this is not generally recommended. Fusion Tables does not support many common spreadsheet functions, such as “find and replace” or mathematical formulas. In general, information should be uploaded to Fusion Tables only after it has been “cleaned up.”

If you need to make individual edits, simply click on a data cell, wait for it to change to an “input” box, and type in your changes.


Step 5: Visualizing the data

One of the great strengths of Fusion Tables is the ease of visualizing data; it can quickly generate maps, line charts, bar charts and pie charts. Simply choose Visualize >> and select the visualization you want to use.