Research, Tips & Tutorials

Applying data science to all the news that’s fit to print

0

Over the past year, an important element of my research into data journalism’s past, present and future has been interviews with young journalists like Jeremy Bowers or Dan Hill and (relatively) hoary veteran practitioners like Aron Pilhofer. Their experience has provided invaluable evidence for debugging debates about the topic.

chase-davis-headshotThat was certainly the case with Chase Davis, an assistant editor on the Interactive News Desk at the New York Times. I first me Chase in 2011 at the first Newsfoo, in Phoenix, Arizona, where he gave an Ignite talk on three news problems data science can help solve. Davis followed up in 2012 with an Ignite on using data to explore the evolution of data journalism. Both 5 minutes videos are well worth watching if you’re even remotely interested in journalism and technology. (Davis also open sourced his data journalism Ignite on Github, if you’d like to explore that way.)

Today, Davis teaches an advanced data journalism class at Mizzou, where he helps transfer his skills and perspective (treat data a source). Our interview, lightly edited for clarity, content and [bracketed] and hyperlinked for context, follows.

What is a day in your life like?

I help supervise the developer/journalists who build many of our cool Web projects. I have a background as a reporter, primarily doing investigations and covering politics, so I try to dabble in that world as well. I also teach a class in advanced data journalism at the Missouri School of Journalism and do some consulting on the side.

How did you get started? Did you get any special degrees or certificates? Quantitative skills?

I got started in data journalism almost by accident. I started learning to program for fun in middle school, then fell in love with journalism and ended up at Mizzou. I lived a typical j-student life for a few years, writing a bunch for the student paper and doing internships, then applied (based on a total misunderstanding) to start working for NICAR. The couple years I spent there really tied those two skillsets together.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list, but I’ll name a few. Jacquee Petchel, Lise Olsen and Mark Katches for schooling me in the ways of capital-J Journalism. Brant Houston and Jeff Porter for taking me in at NICAR and showing me how journalism and data can work together. And, really, the entire IRE and NICAR community, which is outrageously giving of its collective time.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m pretty minimalist: a terminal window and some type of text editor. The only place I splurge is on a database GUI (I like Navicat). The one tool I couldn’t live without is Python, which is the best Swiss Army knife a data journalist can have.

What are the foundational skills that someone needs to practice data journalism?

The same core skills you need to practice any kind of journalism: curiosity, skepticism, an eye for detail and a sense of a good story. [They] also [need] numeracy, or at least conceptual mathematical literacy, which is still unfortunately too rare. Also important are databases and spreadsheets, statistics, and some kind of programming language — doesn’t matter which one. Being your own worst critic doesn’t hurt. And intellectual courage. You need to be motivated, not intimidated, to learn new and difficult things.

Where do you turn to keep your skills updated or learn new things?

Personal projects. I always have at least one on the backburner, and I make sure it stretches me in a new direction. Working on something I care about is the best way for me to stay motivated. I get bored learning from books.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

The oversimplified explanation is that most journalism students can’t code or do math, while most computer science students don’t know storytelling.

Hybrids on either side are rare, and we’re scooping them up as fast as we can.

Journalism schools could be doing more, but it’s not all their fault. It takes intellectual agility and natural curiosity to effectively develop hybrid skills. I don’t think that’s something we can teach solely through curriculum. That’s why I don’t think every journalism student should “learn how to code.” Being able to write a few lines of Javascript is great, but if you let your skills dead-end with that, you’re not going to be a great newsroom developer.

Folks on our interactive and graphics teams at the Times have remarkably diverse backgrounds: journalism and computer science, sure, but also cartography, art history, and no college degree at all. What makes them great is that they have an instinct to self-teach and explore.

That’s what journalism schools can encourage: introduce data journalism with the curriculum, then provide a venue for students to tinker and explore. Ideally, someone on faculty should know enough to guide them. The school should show an interest in data journalism work on par with more traditional storytelling.

Oh, and they should require more math classes.

What data journalism project are you the most proud of working on or creating?

Hard question, but I’ll offer up pretty much anything that my old team at the Center for Investigative Reporting has done. That was my first turn at being a boss, and the fact that they haven’t all been fired suggests that I didn’t mess them up too bad.

What data journalism project created by someone else do you most admire?

Look at the Philip Meyer Awards every year and you pretty much have that answer. Anyone who can take a spreadsheet full of rows and columns, or a bunch of code, and turn it into something that changes (or starts) the conversation about an important topic is the whole reason many of us got into this game in the first place.

How has the environment for doing this kind of work changed in the past five years?

It’s night and day. Five years ago, this kind of thing was still seen in a lot of places at best as a curiosity, and at worst as something threatening or frivolous. Some newsrooms got it, but most data journalists I knew still had to beg, borrow and steal for simple things like access to servers.

Solid programming practices were unheard of — version control? What’s that? If newsroom developers today saw Matt Waite’s code when he first launched PolitiFact, their faces would melt like “Raiders of the Lost Ark.”

Now, our team at the Times runs dozens of servers. Being able to code is table stakes. Reporters are talking about machine-frickin’-learning, and newsroom devs are inventing pieces of software that power huge chunks of the web. The game done changed.

What’s different about practicing data journalism today, versus 10 years ago?

It was actually 10 years ago that I first got into data journalism, which makes me feel old even though I’m not.

Back then, data journalism was mostly seen as doing analyses for stories. Great stories, for sure, but interactives and data visualizations were more rare.

Now, data journalism is much more of a Big Tent speciality. Data journalists report and write, craft interactives and visualizations, develop storytelling platforms, run predictive models, build open source software, and much, much more. The pace has really picked up, which is why self-teaching is so important.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I don’t think the semantics are important. Journalism is journalism. It should be defined on its own merits, not by the tools we use to accomplish it. Treating these things as exotic specialties makes it too easy to pigeonhole the people who practice them. And I hate that.

What’s the one thing people always get wrong when they talk about data journalism?

That data journalists are unicorns.

Or wizards. Or that they can somehow pull swords from stones in a way that mere laypeople can’t. That kind of attitude is dangerous — not because it mythologizes tech skills, or demonstrates willful ignorance on the part of technophobes (both of which are sad), but because it drives a cultural wedge between data journalists and the rest of the newsroom.

[Imagine hearing] “I’m a conventional reporter, so my specialty is reporting. You’re a tech person, so you write code.”

I think that’s crap. I know plenty of reporters who can code, and plenty of data journalists who can report the hell out of a good story. By dividing them culturally, we almost let people see the “journalist” in “data journalist” as secondary. We turn them into specialists, rather than letting them bring journalism and technology together in new and creative ways.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Numeracy is important. A more universal appreciation of technology in our industry is important. A culture of rapid, constant experimentation is important. To the extent that data journalism has encouraged those things in newsrooms, I think it’s been hugely important.

The actual product of data journalism — news apps, visualizations, stories — those will all continue to evolve, but data journalism’s continuing contribution to newsroom culture is something that I hope is permanent.

How It's Made, Research, Tips & Tutorials

Oakland Police Beat applies data-driven investigative journalism in California

0

One of the explicit connections I’ve made over the years lies between data-driven investigative journalism and government or corporate accountability. In debugging the backlash to data journalism, I highlighted the work of The Los Angeles Times Data Desk, which has analyzed government performance data for accountability, among other notable projects. I could also have pointed to the Chicago Sun-Times, which applied data-driven investigative methods to determine  that the City of Chicago’s 911 dispatch times vary widely depending on where you live, publishing an interactive map online for context, or to a Pulitzer Prize-winning story on speeding cops in Florida.

oaklandpb

This week, there’s a new experiment in applying data journalism  to local government accountability in Oakland, California, where the Oakland Police Beat has gone online. The nonprofit website, which is part of Oakland Local and The Center for Media Change and funded by The Ethics and Excellence in Journalism Foundation and The Fund for Investigative Journalism, was co-founded by Susan Mernit and Abraham Hyatt, the former managing editor of ReadWrite. (Disclosure: Hyatt edited my posts there.)

Oakland Police Beat is squarely aimed at shining sunlight on the practices of Oakland’s law enforcement officers. Their first story out of the gate is pulled no punches, finding that Oakland’s most decorated officers were responsible for a high number of brutality lawsuits and shootings.

The site also demonstrated two important practices that deserve to become standard in data journalism: explaining the methodology behind their analysis, including source notes, and (eventually) publishing the data behind the investigation. 

To learn more about why Oakland Police Beat did that, how they’ve approach their work and what the long game is, I contacted Hyatt. Our interview follows, lightly edited and hyperlinked for context. Any [bracketed] comments are my own.

So, what exactly did you launch? What’s the goal?

Hyatt: We launched a news site and a database with 25 years worth of data about individual Oakland Police Department (OPD) officers who have been involved in shootings and misconduct lawsuits.

Oakland journalists usually focus (and rightfully so) on the city’s violent crime rate and the latest problems with the OPD. We started this project by asking if we could create a comprehensive picture of the officers with the most violent behavior, which is why the OPD is where it is today. We started requesting records and tracking down information. That eventually became the database. It’s the first time anyone in Oakland has created a resource like this.

What makes this “data-driven journalism?”

Hyatt: We started with the data and let it guide the course of the entire project. The stories we’ve written all came from the data.

Why is sharing the data behind the work important?

Hyatt: Sharing is critical. Sharing, not traffic, is the metric I’m using to gauge our success, although traffic certainly is fun to watch, too. That’s the main reason that we’re allowing people to download all of our data. (The settlement database will be available for download next week.)

How will journalists, activists, and data nerds use it over time? That’s going to be the indicator of how important this work was.

[Like ProPublica, Oakland Police Beat is encouraging reuse. The site says that "You’re welcome to republish our stories and use our data for free. We publish our stories under an Attribution-NonCommercial-ShareAlike 4.0 License."]

Where do you get the data?

Hyatt: All of it came from city and court documents. Some of it came as .CSV files, some as PDFs that we had to scrape.

How much time and effort did it take to ingest, clean, structure and present?

Hyatt: Almost all of the court docs had to be human-read. It was a laborious process of digging to find officer names and what the allegations were. Combining city settlement data records and court docs took close to five months. Then, we discovered that the city’s data had flaws and that took another couple of months to resolve.

Some of the data was surprisingly easy to get. I didn’t expect the City Attorney’s office to be so forthcoming with information. Other stuff was surprisingly difficult. The OPD refused to give us awards data before 2007. They claim that they didn’t keep that data on individual officers before then. I know that’s completely false, but we’re a tiny project. We don’t have the resources to take them to court over it. Our tools were very simple.

Did you pay for it?

Hyatt: We used PACER a ton. The bill was close to $900 by the time we were done. We mainly worked out of spreadsheets. I had a handful of command line tools that I used to clean and process data. I ran a virtual machine so that I could use some Linux-bases tools as well. I heart Open Refine. We experimented with using Git for version control on stories we were writing.

“ A used chemical agent grenade found on the streets in downtown Oakland following Occupy demonstrations in 2011. Photo by Eric K Arnold.

Will you be publishing data, methodology as you went along?

Hyatt: The methodology post covers all of our stories. We’ll continue to publish stories, as well as some data sets that we got along the way that we decided not to put into our main dataset, like several hundred city attorney reports about the settled cases.

What’s the funding or revenue model for the site? Where will this be in one year? Or 5?

Hyatt: Everyone wants grant-funded journalism startups to be sustainable, but, so often, they start strong and then peter out when resources run dry.

Instead of following that model, I knew from the start that this was going to be a phased project. We had some great grants that got us started, but I didn’t know what the funding picture was going to look like once we started running stories. So, I tried to turn that limitation into a strength.

We’re publishing eight weeks worth of stories and data. We’re going to cram as much awesome into those weeks as we can and then, if needed, we can step away and let this project stand on its own.

With that said, we’re already looking for funding for a second phase (which will focus on teens and the OPD). When we get it, we’ll use this current data as a springboard for Phase 2.

Could this approach be extended to other cities?

Hyatt: The OPD and its problems are pretty unique in the USA. This was successful because there was so much stuff to work with in Oakland. I don’t think our mentality for creating and building this project was unique.

Research, Tips & Tutorials

Profile of the Data Journalist: Dan Hill

0

Part of my research into data journalism’s past, present and future has been interviews with veteran practitioners like Aron Pilhofer, given the insight that those talks offers for debugging debates about “what it all means,” and younger journalists like Jeremy Bowers or Dan Hill. Their recent paths to the profession should offer insight and inspiration to others who would follow in their paths.

DanHill_jpg_800x1000_q100

 

Hill was kind enough to discuss his work with me this spring. Our interview follows, lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

I joined The Texas Tribune as a full-time news apps developer in January. Our team is responsible for both larger-scale ”explorer” apps and what I’d call “daily interactives.” My day often involves writing and processing public information requests, designing interactives and working on Django apps, depending on the scale of my project.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I’ve always wanted to be a reporter, but the work of Phillip Reese at The Sacramento Bee and The Chicago Tribune’s news apps team inspired me to enhance my storytelling with data. I was a student fellow for the Northwestern University Knight Lab and studied journalism and computer science, but an internship with The Washington Post taught me how to apply what I was learning in a newsroom.

Did you have any mentors? Who? What were the most important resources they shared with you?

I’ve had awesome mentors. Bobby Calvan and Josh Freedom du Lac were the first to treat me like a real reporter. Jon Marshall helped me explore my interests. Phillip Reese showed me how to find untold stories in spreadsheets and Brian Boyer encouraged me to learn Python. Serdar Tumgoren and Jeremy Bowers showed me how a team of news developers operates. Travis Swicegood taught me how todeal with real world data.
My mentors remind me to always be learning and asking questions.

What does your personal data journalism “stack” look like? What tools could you not live without?

I use Excel, OpenOffice, GoogleDocs, Django and iPython notebooks for data analysis. R is creeping into my workflow for exploring datasets and experimenting with visualizations. We use d3 and chartjsfor web graphics and Mapbox for web maps. I could probably survive without Backbone, but we use it a lot.

What are the foundational skills that someone needs to practice data journalism?

I think a data journalist needs news judgment and attention to detail in order to identify the newsworthiness and limitations of datasets.
Statistics can help explain a dataset’s strengths and weaknesses, so I wish I paid more attention during my stats classes in school.
In addition to finding the stories, data journalists also need to be able to explain why data is significant to their audience, so visual journalists need design skills — and, of course, reporting and writing.

Where do you turn to keep your skills updated or learn new things?

I check Source, the Northwestern Knight Lab blog and the NICAR listserv for new ideas. Lately, I’ve been teaching myself statistics and R with r-tutor and Machine Learning for Hackers.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

I think the differences between the developer and newsroom cultures make it hard for newsrooms to find people with tech and journalism skills ,and to coordinate projects with developers and reporters.
As a student in journalism school, I was inspired to learn more about data when professor Darnell Little showed how it could enhance my reporting and help me find stories hidden in datasets.
I learned more developer-journalist skills like database management and web design from meetups, tutorials and classes outside the j-school, but the journalism school exposed me to what journalists with those skills could do.
I’ll add I’m impressed with data literacy of the Texas Tribune newsroom, where reporters request spreadsheets and use data to verify claims on their beats. Even if reporters don’t have the programming chops to make an interactive graphic, for example, they’re great about identifying potential data stories.

What data journalism project are you the most proud of working on or creating?

My summer intern project at The Washington Post, a study of every Washington D.C. homicide case between 2000 and 2011, was my first experience making news app in a newsroom. I was honored get to work with the investigative reporters as a newbie intern and learned a ton from building the database and doing analysis with Serdar. All of my contributions were on the backend, but I was thrilled to work with that dataset as an intern.

What data journalism project created by someone else do you most admire?

Propublica’s Message Machine was my favorite project from the 2012 presidential election, because it took a unique approach to identify trends in email metadata.
I’m excited for more stories that collect everyday metadata or use sensors to explore the data around us.
IMG_3360-300x290

How has the environment for doing this kind of work changed in the past five years?

I’d never heard of a “news apps team” five years ago. I knew I wanted to be an investigative reporter but never thought I would write code every day. I admired reporters like Phillip Reese who were working with data and making interactive graphics, but I didn’t see as many teams of specialized developer-journalists.

What’s different about practicing data journalism today, versus 10 years ago?

I wasn’t even a teenager 10 years ago, but I would gander… THE INTERNET. Online data portals, open government and open Web stuff are important to the data journalism I do. I’m not sure they were as common a decade ago.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I think of “data journalism” as an umbrella term that refers the use of data in reporting or presentation, whereas I think of CAR and computational journalism as subsets of data journalism that involve analyzing a dataset.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

I’m excited to work with data because of its widespread use in decision making. I think news apps can help people understand meaningful data and uphold accountability for people who create and make decisions with data.
Be A Newsnerd has better answers

What’s the one thing people always get wrong when they talk about data journalism?

Although the web plays a big role in the growth of data journalism, I don’t think you need to be online to do data journalism.

Research, Tips & Tutorials

Profile of the Data Journalist: Jeremy Bowers

0

As in 2012, when I published a series of profiles of data journalists at Radar, I’ve conducted a series of email interviews after the annual conference of the National Institute for Computer-Assisted Reporting (NICAR). They’re part of my research into data journalism’s past, present and future, helping me to debug debates about “what it all means.

jeremy_bowers_headshot
The following interview is with Jeremy Bowers, a news application developer at NPR. (He also knows a lot about how the Internet works.) It has been lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

I work on the Visuals team at NPR.

Our team adheres to modified scrum principles. We have a daily scrum at 10:00am for all of the members of our Visuals team. We work on projects in one-week sprints. At the end of each sprint we have an iteration review and a “ticket shuffle” where we decide what tickets each of us should work on in the next sprint. Our typical projects rarely exceed four sprint cycles.

Our projects involve minimally four people: One developer, one designer, one project manager and one stakeholder . Some projects add more designers or more developers as necessary. And sometimes we have a lot of stakeholders.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I started in data journalism at the St. Petersburg Times. I’d been working as the blog administrator for our “online” team and was informally recruited by Matt Waite to help out with a project that would turn into MugShots.

I have no special degrees or certificates. I was a political science major and I had planned to go to law school before a mediocre LSAT performance made me rethink my priorities.

I did have a background in server administration and was really familiar with Linux because of a few semesters spent hacking with a good friend in college, so that’s been pretty helpful.

Did you have any mentors? Who? What were the most important resources they shared with you?

Matt Waite from the St. Petersburg Times got me started in data journalism and has been my mentor for as long as I can remember. I don’t call him as much anymore now that he’s “Professor Waite,” but I still use a lot of our conversations as guidelines even now.

I also owe a debt to Derek Willis, though we’ve never worked together. I’m pretty much daily inspired by Ben Welsh and Ken Schwenke at the Los Angeles Times. They build apps that matter and make me look critically at what I am building. Finally, my co-worker Chris Groskopf is a great source of inspiration about keeping my code and work habits clean and professional.

What does your personal data journalism “stack” look like? What tools could you not live without?

I live and die with three tools.

First, a terminal emulator. Right now, I’m using iTerm2 with a customized Solarized theme.

Second, a text editor. Right now, I’m using Sublime Text 3 with a ton of installed packages.

Finally, I need a Web browser. Right now, I’m using Chrome.

At NPR Visuals, we’ve documented our stack so that anyone can code like we do.

What are the foundational skills that someone needs to practice data journalism? Where do you turn to keep your skills updated or learn new things?

I think the foundational skill for data journalism is curiosity. Rabid, all-consuming curiosity has done more for my career than any particular skill or talent.

I also think that good data journalists are passionate about their projects. Unlike many tech or newsroom jobs, it’s difficult to punch a clock and work 9-to-5 as a data journalist. I’m constantly pulling out my laptop and hacking on something, even when it’s not directly tied to a work project.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

The easy answer would be to say that there aren’t enough data journalists to go around. But that’s not exactly true. With Knight and Google fellowships, recent college graduates, interns, and proto-news-nerds miscast in other roles, media companies are surrounded by possibilities. Our challenge, as I see it, is building an environment where hackers and the hacker ethic can thrive. And that’s a tough thing to do at any large company, let alone a media company. But we’ve got to make that our personal mission and not be confounded by what feels like an impersonal bureaucracy.

What data journalism project are you the most proud of working on or creating?

Without a doubt, PolitiFact is the most exciting project I’ve worked on. I also really enjoyed working on the Arrested Development app for NPR — so much so, that I binge-watched the fourth season and coded up the jokes over 24 hours the day the episodes were released!

What data journalism project created by someone else do you most admire?

I love everything about the Los Angeles Times’s data desk. Their homicide and crime apps are nothing short of miraculous.

How has the environment for doing this kind of work changed in the past five years?

I released my first news app to the wild in April of 2009. At that time, there were only a handful of groups that I knew of writing code in newsrooms — the New York Times, the Washington Post, the Chicago Tribune, and Ben Welsh at the Los Angeles Times. About 300 people attended NICAR that year, and it was focused on print CAR reporting.

This year, NICAR hosted 997 people and had well-attended sessions on JavaScript and D3, tools that basically only work on the Web. There are probably 20 teams writing code in newsrooms today, and there are entire college classes dedicated to producing hacker journalists. So the environment has gotten much richer and larger in the last five years.

What’s different about practicing data journalism today, versus 10 years ago?

I can’t speak generally since I only started doing “data journalism” about five years ago. I hesitate to argue that what I’m currently doing is really data journalism as opposed to newsroom product development.

As not to cheat you out of an answer, I can say that my job now involves much more rigor. When I first started writing code at the St. Petersburg Times, we didn’t use version control. We didn’t sandbox our code. We didn’t have automated deployment tools. These days, we have GitHub to store our code, tons of command-line tricks to keep our code in separate virtual environments, and we have fantastic deployment tools that make updating our code a snap.

Additionally, my organization is much more aware of what I’m doing. My manager and his managers are much more cognizant of data journalism generally and specifically about how our work fits in with the organization’s strategy. When I started, we basically worked invisibly on products that almost nobody really knew or cared about.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I’m terrible at semantic differences. I’ll take the broad view on this one: If you’re writing code in a newsroom, you’re probably committing acts of journalism. I don’t feel terribly strongly about what we decide to call this or how we decide to slice up what an investigative journalist, a news librarian or a news apps developer might be doing every day. If they’re writing code and making journalism, I want them to have every opportunity to succeed. I don’t feel any need to give them labels or have their titles prevent them from writing code to get their jobs done.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

I think that Scott Klein said it best, and I’ll paraphrase: If you’re not using algorithmic or computational methods to analyze data, someone is scooping you on your beat.

There’s hardly a beat in journalism anymore that doesn’t involve structured data, which means that there’s hardly a journalist that wouldn’t benefit from automated methods for analyzing that data. Folks who are passing up that aspect of their jobs are just handing that opportunity over to someone else.

Print and radio are so time- and space-limited. If you’re not using the Web to tell all of the stories rather than just one story, you’re probably doing that wrong as well.

What’s the one thing people always get wrong when they talk about data journalism?

I don’t like it when people talk about how “organizations don’t get” data journalism, and I don’t like it for a very specific reason: The inability to create a news hacker culture doesn’t rest on the shoulders of some amorphous “organization.”

We should place that blame where it belongs: Squarely on the shoulders of individuals in your newsroom.

What we’ve got is a people problem. Editors and other newsroom opinion leaders should be making an environment for their reporters or others to participate in hacker journalism.

The same ethics that Eric Raymond elucidated in the Hacker How-To should guide journalists in newsrooms everywhere:

  • The world is full of fascinating problems waiting to be solved.
  • No problem should ever have to be solved twice.
  • Boredom and drudgery are evil.
  • Freedom is good.
  • Attitude is no substitute for competence.

To make your organization a place where hacker ethics are practiced requires positive action — it won’t just spring into being because of a memo. So, don’t blame your company because there’s no room to operate like a hacker. Instead, blame your boss or your boss’s boss. It’s most effective when you discuss this with them personally. But make sure you give those people an opportunity to correct their wrongs. Few people are actually hostile to the hacker ethic; most are just unfamiliar.

Research, Tips & Tutorials

Profile of the Data Journalist: Serdar Tumgoren

0

As in 2012, when I published a series of profiles of data journalists at Radar, I’ve conducted a series of email interviews after the annual conference of the National Institute for Computer-Assisted Reporting (NICAR). They’re part of my research into data journalism’s past, present and future, helping me to debug debates about “what it all means.

The second interview with Serdar Turmogren, co-creator of the Knight News Challenge-funded OpenElections project, follows. It has been lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

serdarThe Associated Press. As a member of the relatively new Data Team, it’s a pretty mixed bag: devops/server work; newsroom tool-building, such as our in-house install of DocumentCloud; data wrangling and analysis for sundry investigative and spot stories.

I also help reporters hunt down and navigate data sources for stories, and help them apply budding technical skills when I don’t have the time to get involved on a project myself.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I started as a print reporter but was bit early on by the investigative journalism bug. The document chase quickly broadened to include data, and led me down a traditional “CAR path” of spreadsheets to databases to programming languages and web development. When I first started programming around 2005, I took a Perl class at a community college. My grade in that class shall remain hidden under lock and key.

…but seriously, you don’t need a computer science degree to master the various skills of “data journalism.” I learned how to apply technology to journalism through lots of late-night hacking, tons of programming books, and the limitless generosity of NICARians who shared technical advice, provided moral support, and taught classes at NICAR conferences.

Did you have any mentors? Who? What were the most important resources they shared with you?

Dave Sheingold at The (Bergen) Record. Derek Willis and Aron Pilhofer at NYT. Troy Thibodeaux at the AP.

Unbiased technical knowledge and advice that always brought the focus back to journalism. It’s easy to get obsessed with the tech side, something Phil Meyer warned us about.

What does your personal data journalism “stack” look like? What tools could you not live without?

Python, Ruby, Linux, spreadsheets and databases, QGIS, and myriad command-line tools for wrangling data.

If I could only keep one tool (and all of its libraries), I’d have to say Python. Nowadays, it can handle most everything you’d need, from general data wrangling to analysis to visualization to Web frameworks.

Ruby is a solid alternative, so I’m not looking to start any flame wars here. In my case, I’m still partial to Python because it’s the first programming language in which I gained a degree of fluency.

What are the foundational skills that someone needs to practice data journalism?

Same as a good investigative reporter: Curiosity and doggedness.

Mastering a programming language requires the same curiosity and persistence as unravelling a bureaucratic maze. You have to be willing to put in the hours and not give up when you hit a dead end.

Where do you turn to keep your skills updated or learn new things?

IRE/NICAR, PythonJournos, colleagues past and present.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

Sometimes. news organizations see us as an amorphous group of nerds who can be plugged into an org chart, and out will come Data Journalism! Unfortunately, it’s not as simple as that.

My skill set is wildly different than those of the next “data journalist.” I think schools are making a better effort to train young journalists in many of the skills that fall under the umbrella of data journalism: data wrangling, analysis, visualization; statistics; digital literacy (how does the Web work?); Web development.

But very few journalists are actually master of all of these skills. (I can’t name one). The real question is, do news organizations know what they want or need? If you understand the goals of your organization, you can go out and find the right kinds of nerds. Otherwise, you’re hiring in the dark.

What data journalism project are you the most proud of working on or creating?

Rebuilding the elections data system for The Washington Post’s Web operation. It was a monumental — and some would say foolhardy — undertaking, but we ultimately created a system that helped power many Web and mobile interactives/graphics during the 2012 primaries and presidential election.

What data journalism project created by someone else do you most admire?

Congress Votes, an app at The Washington Post created by Derek Willis. This is the first big political news app (that I’m aware of) that tried to provide an interactive experience of such a high-profile government data set, with a high degree of back-end automation. It also tried to have a little fun. (Zodiac signs for Congress critters, anyone?)

It inspired many of us to start thinking about how we could be more creative and engaging on the web with government data. While we’ve seen many advances in the years since, I think Congress Votes stands out as a milestone in the history of news apps development.

How has the environment for doing this kind of work changed in the past five years?

The tools and knowledge have exploded. Powerful open source tools are increasingly available, along with countless free books and tutorials online. Cloud computing platforms are providing cheap or free ways to experiment with data tools. It’s had a massive democratizing effect, and that’s a good thing.

What’s different about practicing data journalism today, versus 10 years ago?

There are way more nerds at NICAR conferences. Seriously, the tent has grown bigger to include programmers, Web developers, data scientists (I’ll leave it to others to debate whether that’s a new name for stats geeks), and sundry other nerds.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

Yes. Ultimately, we’re trying to marshal technology to bring context to people’s lives. The tools and methods and specialties evolve, but the goal remains the same: Keep the public informed.

Why are data journalism and news apps important, in the context of the contemporary digital environment for information?

Data journalism is vital, because we’re confronted with a growing ocean of information in electronic form.

A data journalist is anyone, in my book, who can fluently work with this primary source. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.

News apps are harder to pin down, and I often find folks don’t have quite the same definition. If we’re talking about telling data-driven stories in a digital context (mobile, Web), well, then yes, I’d say news apps are an important piece of the puzzle for informing people in a different medium.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s new. Ben Welsh and “KRS One” summed that one up quite nicely at NICAR 2014.

Research, Tips & Tutorials

2014 NICAR conference highlights data journalism’s past, present and future

0

When I first attended the annual conference of the National Institute for Computer-Assisted Reporting (NICAR) in 2012, it was as a speaker. I was there to give a short talk about new data coming from the open governent movement. While it went well, and, somewhat remarkably, the presentation I used has over 10,000 views on Slideshare, I feel somewhat embarrassed about it in retrospect, given what I have learned about the NICAR community in 2012 and in the years since. This tribe of journalists are the best at using technology to support investigative journalism in the United States. They know, perhaps better than anyone else in the country, exactly how much data is available, where and of what quality and format. I figured out quite quickly that I’d be learning far more from them than the inverse.

The NICAR community is also generous: in 2012, conference attendees were quite kind about what I had to say. Thankfully, my host, David Herzog, an associate professor at the Missouri School of Journalism and academic advisor to NICAR, was happy about the presentation and discussion that followed. As he told me afterwards, “Gov 2.0, FOSS and agile development are breathing new life into data and journalism.” To the extent that I explained that relationship at NICAR or the international open data conference later in 2012, I’m happy.

The silicon lining from attending the 2012 NICAR conference was that it inspired me to write a feature story about the importance of data journalism to society and to publish a series of profiles of the people practicing it, research that led eventually to my fellowship at the Tow Center. In the years since, I’ve continued investigating data journalism, a focus that brought me back to the 2014 NICAR conference in Baltimore, Maryland.

What I found at the Inner Harbor Marriott was even more breathtaking than the huge crabcake I devoured at lunch at Feidley’s, up in Lexington Market: nearly a thousand data journalists from 20 countries gathered to network, teach and learn from one another. The NICAR conference has more than tripled in size since 2009.

“I released my first news app to the wild in April of 2009,” said NPR developer Jeremy Bowers, in an interview. “At that time, there were only a handful of groups that I knew of writing code in newsrooms — the New York Times, the Washington Post, the Chicago Tribune, and Ben Welsh at the Los Angeles Times. About 300 people attended NICAR that year, and it was focused on print CAR reporting. This year, NICAR hosted 950 people and had well-attended sessions on JavaScript and D3, tools that basically only work on the Web. There are probably 20 teams writing code in newsrooms today, and there are entire college classes dedicated to producing hacker journalists. So the environment has gotten much richer and larger in the last five years.”

As in 2012, I left feeling exhilarated, even though I couldn’t attend all four days of the conference. Everywhere I looked, there were journalists talking about how they had used tools, applied programming, told better stories, made platforms, or teaching others to do those things. Instead of chewing over the future of news, they were building it. The conference is unquestionably the best forum in the world to tap into the zeitgeist around data journalism and learn from people practicing it at all levels.

“What’s different about practicing data journalism today, versus 10 or 20 years ago,” said Matt Waite, a journalism professor at the University of Nebraska and long-time NICAR attendee, “was that from the early 1990s to mid 2000s, the tools didn’t really change all that much. The big change was we switched from FoxPro to Access for databases. Around 2000, with the Census, more people got into GIS. But really, the tools and techniques were pretty confined to that toolchain: spreadsheet, database, GIS. Now you can do really, really sophisticated data journalism and never leave Python. There’s so many tools now to do the job that it’s really expanding the universe of ideas and possibilities in ways that just didn’t happen in the early days.”

Sadly, there is no way for anyone to attend all of the 150 or so panels or workshops at NICAR 14. Fortunately, the conference, its attendees and media covering it all have done a great job of putting resources online, benefitting conference goers and people who couldn’t make it alike.

From the Investigative Researchers and Editors conference blog, tipsheets and session audio, to Northwestern University’s Knight Lab students’ coverage, to New York Times developer advocate Chrys Wu’s extraordinarily comprehensive wrap up of NICAR 14 slides, links and tools, there’s enough material to occupy you for days.

Freed of the weight of doing a roundup by their hard work, I’ll highlight five different themes that resonated with me: security, hacks and hackers, making sensors and drones, culture and collaboration, context and history.

Security

One of the positive effects of months of stories catalyzed by the leaks of Edward Snowden, the former NSA contractor, is that many more journalists are not only thinking about security but taking steps improve their operational practices, encrypting their email, chats and mobile devices. Whether we like it or not, this is the world we live and work in when so much of media and its means of production have gone digital.

 

The Tow Center has hosted workshops, shared lessons on digital security for journalists and will be publishing a white paper on the topic later this year. Two of my colleagues at the attended NICAR to give talks on the subject.

Susan McGregor collaborated Jennifer Valentino-DeVries, Josh Meyer, and Chris Dotenwith on a session focused upon protecting your work from prying eyes. While audio and slides aren’t available online yet, their guidance will be helpful to any journalist. Until then, the Committee to Protect Journalist’s security guide is a fantastic resource.

 

I wasn’t able to watch the presentation on threat modeling from Jonathan Stray in person at the annual conference of the National Institute for Computer-Assisted Reporting (NICAR) in Baltimore. Thankfully, Stray’s talk was recorded and live online within days. I think it’s one of the best presentations on operational security for journalists — or, for that matter, anyone traveling and working abroad in the 21st century — I’ve ever seen. This isn’t just a list of things to buy, install or learn: it’s a way of thinking about staying secure, based upon threat modeling in computer security. I’d recommend it to anyone creating and publishing acts of journalism in the digital age.

Hacks and hackers

Last week, news came in that the U.S. Department of Justice had moved to dismiss charges filed against reporter Barrett Brown based upon him sharing a hyperlink to a forum where stolen personally identifiable information existed. If a court had criminalized that action, it would have had a profound chilling effect upon press freedom online.

That doesn’t mean that journalists are now free to pursue hacking in the newsroom or beyond, as a panel at NICAR reminded a packed room of attendees. In fact, the opposite is true.

As members of the panel explained, journalists need to be thoughtful about how and when they access and publish data, particularly with respect to scraping, a fundamental, useful tool in the data journalist’s toolkit. The very same activity that supports a Pulitzer Prize-winning investigation at a newspaper might land a freelancer in hot water with an overzealous U.S. Attorney. For instance, Wall Street Journal reporter Jen Valentino described how careful her team was in collecting the data that informed this story on differential pricing at Staples stores.

Simply put, if you’re interested in data journalism or civic hacking, you need to know about the Computer Fraud and Abuse Act (CFAA), along with proposals for its reform, and think about when it’s appropiate to access data or not. If you’re unsure and don’t have the legal resources of major media organization behind you, think twice or thrice before clicking.

Sensors and drones

File homebrew hardware, “mood cubes” and drone journalism under the shiny and new part of NICAR, focused upon “what’s next,” but there’s was a lot to like about the creativity on display this year and enough substance regarding early usage that these are trends and techniques to watch.

WNYC data news editor John Keefe teamed up with Liza Stark, an adjunct professor at the Parsons School of Design, to demonstrate various DIY open source hardware projects and “Team Blinky,” giving the NICAR attendees ideas for discovering data in the world around us.

It was great, interactive fun, but what made it special (to this observer) was Keefe’s remark that WNYC’s Cicada Tracker project turned up some 8000 cicada sightings, with 800 people making trackers. The project made crowdsourcing data collection through sensor journalism and a distributed listening audience a reality that collected 1500 temperature readings, not just a theoretical project. For more of a deep dive into the topic, check out the proceedings of the sensor journalism workshop at the Tow Center last year. “Making your own data” came up again in the previous day’s NICAR session, with respect to the use of drones and open source hardware.

 

To be fair, the presenter, University of Nebraska professor Matt Waite flew a drone at the 2012 NICAR Conference I attended as well, but the context is different.

If you haven’t been paying attention, 2013 was a breakout year for drone journalism, as my colleague Fergus Pitt noted here at Radar, from disasters to protests.

Waite pointed to an upcoming project in Kenya, as a new example.

This year, and in the years ahead, drones could be used to do more than film events, Waite suggested: they could collect data.

 

As he has been doing for years, however, Waite cautioned NICAR attendees excited about becoming pilots to learn about the regulations that govern their use and not to be stupid in the absence of clear guidance from the Federal Aviation Administration.

Context and history

If there’s one lightning talk you watch from this year’s NICAR, make it L.A. Times data desk editor Ben Welsh’s reflection on the history of the craft.

At over eighteen minutes, the the video is about three times as long as the average lightning talk, but every minute is worth your time. Welsh reminded the audience of all of the history that has gone before, from the muckrakers at the turn of the last century to decades of journalists learning how to apply steady and then rapidly evolving generations of technologies to journalism.

The one thing people always get wrong about data journalism is that it’s new, said Serdar Tumgoren, a data journalist at the Associated Press, in an interview. “Ultimately, we’re trying to marshal technology to bring context to people’s lives,” he related. “The tools and methods and specialties evolve, but the goal remains the same: Keep the public informed.”

Culture and collaboration

Ultimately, what differentiates NICAR from the hundreds of other panels, forums and conferences that I’ve been to over the years — particularly technology conferences — is the community I’ve seen on display, both online and off. I thought Matt Waite captured something important in his post about a 5 step NICAR recovery plan. While ostensibly about getting started, right away, with applying the new skills or techniques gleaned from the experience, his peroration hits upon what has made this annual collection of “journo nerds” special and endure:

“People have been saying for decades that NICAR feels different — and it is different — because of the culture of the conference. I can talk for hours about what that culture is and why it’s the way it is, but it boils down to this: It’s a giving culture.

Every one of the speakers? Volunteer. All of the hands-on teachers? Same. Did you talk to someone in the hall? Grab a speaker after to ask them a question? They almost certainly stayed right there and answered your question, right? I can remember time after time of Big Name Journalists From Big Name News Organizations dropping everything and showing me how to do something. I can count scores of times where they gave me a business card and said call me if you run into trouble. That generosity amazed me. Inspired me. Made me want to do the same.”

Several journalists from Europe commented to me at NICAR that the kind of code sharing and collaboration across newsrooms would be nigh unthinkable in their countries, given fierce competition between news outlets. To be honest, seeing data journalists from organizations that compete for scoops here in the United States share tips, tools and tricks in person and on NICAR’s busy listserv was a surprise when I first saw it.

Now, it remains inspiring. NICAR is a teaching culture, much like a teaching hospital, except that the patients are stories and the physicians are editors and reporters trying to inject facts and rational discourse into a national conversation that is all too often lacking in either.

Given the demand for data journalists and the scarcity of the skills needed, the culture and willingness to “pay forward” what mentors have taught in years passed is heartening. Many thanks to everyone there who took time to talk with me about what they do, how, and why. I’ll be sharing more interviews in post to come.

Alexander Howard is a Tow Fellow working on the Tow Center’s Data Journalism Project at the Tow Center for Digital Journalism.  The Data Journalism Project is a project made possible by generous funding from both The Tow Foundation and the John S. and James L. Knight Foundation. The Data Journalism Project includes a wide range of academic research, teaching, public engagement and development of best practices in the field of data and computational journalism. Follow Alexander Howard on Twitter @digiphile. To learn more about the Tow Center Fellowship Program, please contact the Tow Center’s Research Director Taylor Owen: taylor.owen@columbia.edu.

Announcements, Tips & Tutorials

Data Journalism Resources

0

A collection of online resources about data journalism:

The Data Journalism Handbook

 Jonathan Stray’s Computational Journalism Class. Journalism and Media Studies Center at the University of Hong Kong, Spring 2013

+ Brian Boyer, Hacker Journalism 101

+ Online Book: The New Precision Journalism, Philip Meyer

The Tech Data Journalism Bootcamp, Civic Media Lab, MIT

The School of Data

+ The Data Journalism Developer Studio. The Google Group and on GitHub

The Reporters Lab, Tools, Techniques and Research for Public Affairs Reporting. Sarah Cohen’s and now Tyler Dukes’

+ Facts Are Sacred: The Power of Data, Simon Rogers

+ Journalism in the Age of Data

+ ProPublica’s News App and Data Guides

Announcements, How It's Made, Tips & Tutorials

News App and Data Guides from ProPublica

0

Coding the news now has a manifesto. ProPublica’s developers launched a series of news application guides, including a coding manifesto, this morning. The guides, which all live on GitHub, are intended to give insight into the programming ethos of the non-profit investigative journalism outfit. As the manifesto says, “We’re not making any general statements about anything beyond the environment we know: Doing journalism on deadline with code.”

Scott Klein, Jeff Larson and Jennifer LaFleur wrote the guides, which include a news app style guide, a data check-list and a design guide. These resources add to the ever-growing community of news application developers, many of whom are actively blogging about and sharing their working processes.

Read all the guides here.

Tips & Tutorials

Google Fusion Tables Map Manager

0

A tool for creating and managing, exploring and publishing Google Fusion Tables maps that have one or more layers.

(more…)

Tips & Tutorials

#stealthis: Security Essentials for Journalists

0

The following is an abridged version of “Steal this Digital Security Toolbox”, which I presented at the 2012 Online News Association conference. Please note that while the below is only a brief overview of security techniques, there are two major takeaways. (1) Encrypt your devices. (2) Set yourself up with a Virtual Private Network (VPN) to protect your wireless activity. Get in the habit; these practices are important no matter where or with whom you are working. The CUJ community can learn more on October 18, when Jeff Sieben, Columbia Journalism School’s head of IT, will be leading a Tow Tea on digital security. (more…)