How It's Made, Research, Tips & Tutorials

Think about data from the beginning of the story, says Cheryl Phillips

3

“Stories can be told in many different ways,” said Cheryl Phillips. “A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.”

Phillips, an award-winning investigative journalist, will begin teaching students how to treat data as a source this fall, when she begins a new gig as a lecturer at Stanford’s graduate school of journalism helping to open up Stanford’s new Computational Journalism Lab.

“Cheryl Phillips brings an outstanding mix of experience in data journalism and investigative work to our program. Students and faculty here are eager to start working with her to push forward the evolving field of computational journalism,” said Jay Hamilton, Hearst Professor of Communication and Director of the Stanford Journalism Program, in a statement. “Her emphasis on accountability reporting and interest in using data to lower the costs of discovering stories will help our journalism students learn how to uncover stories that currently go untold in public affairs reporting.”

STUDIO MUG - SEATTLE - 4/9/2013

I interviewed Phillips about her career, which has included important  reporting on the nonprofit and philanthropy world, her plans for teaching at Stanford, data journalism, j-schools and teaching digital skills, and the challenges that newsrooms face today and in the future.

What is a day in your life like now?

I’m the data innovation editor at The Seattle Times. Essentially, I work with data for stories and help coordinate data-related efforts, such as working with reporters, graphics folks, and others on news apps and visualizations. I also have looked at some of our systems and processes and suggested new, more time-effective methods for us.

I’ve been at The Seattle Times since 2002. I started as a data-focused reporter on the investigations team, then became deputy investigations editor, then data enterprise editor. I also worked on the metro desk and edited a team of reporters. I currently work in the digital/online department, but really work across all the departments. I also helped train the newsroom when we moved to a new content management system about a year or so ago. I am trying to wrap up a couple of story-related projects, and do some data journalism newsroom training before I start at Stanford in the fall.

How did you get started in data journalism? Did you earn any special degrees or certificates?

I remember taking a class (outside of the journalism department) while in college. The subject purported to be about learning how personal computers worked but, aside from a textbook that showed photos of a personal computer, we really just learned how to write if, then loops on a mainframe.

I got my first taste of data journalism at the Fort Worth Star-Telegram. That’s where I did my first story using any kind of computer for something other than putting words on a screen. I had gotten the ownership agreement for the Texas Rangers, which included a somewhat complex formula. I kept doing the math on my calculator and screwed it up each time. Finally, I called up a friend of mine who was a CPA, and she taught me Lotus 1-2-3.

My real start in computer-assisted reporting came in 1995, when I was on loan to USA TODAY. I was fortunate enough to land in the enterprise department with the data editors, and Phil Meyer was there a consultant. By the end of five months, I could use spreadsheets, Paradox (for DOS!) and SPSS. What a great education. I followed that up by joining IRE and attending the NICAR conference. I’ve missed very few since then and also done some of NICAR’s specialized training on stats and maps.

I have no special degrees or certificates, but I have taken some online courses in R, Python, etc.

Did you have any mentors? Who? What were the most important resources they shared with you?

Phil Meyer is amazing, and such a great teacher. He taught me statistics, but also taught me about how to think about data. Sara Cohen and Aron Pilhofer of the New York Times, and Jennifer LaFleur of CIR. Paul Overberg at USA TODAY. They have all helped me over the years.

NICAR is an incredible world, full of data journalists and journalist-programmers who are willing to help others out. It’s a great family.

On the investigative journalism front, Jim Neff and David Boardman are fantastic editors and great at asking vital questions.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m a firm believer in the power of the spreadsheet. So much of what journalists do on a daily basis can be made easier and more effective by just using a spreadsheet.

I use OpenRefine,  CometDocs, Tabula, AP Overview and Document Cloud. I use MySQL with Navicat. I still use Access. I’m a recent convert to R, but also use SPSS. I use ESRI for mapping, but am interested in exploring other options also. I use Google Fusion Tables as well.

Most of my work has been in more of the traditional CAR front, but I’ve been learning Python for scraping projects.

What are the foundational skills that someone needs to practice data journalism?

In many ways, the same foundational skills you need for any kind of journalism.

Curiosity, for one. Journalists need to think about stories from a mindset that includes data from the very beginning, such as when a reporter talks to a source, or a government official. If an official mentions statistics, don’t just ask for a summary report, but ask for the underlying data — and for that same data over time. The editors of those reporters need to do the same thing. Think about the possibilities if you had more information and could analyze and view it in different ways.

Second, be open to learning any skill sets that will help tell the story. I got into data journalism because I discovered stories I would not be able to tell if I didn’t obtain and analyze data. We all know journalists don’t like to take someone’s word for something — data journalism just takes that to the next level.

Third, in terms of technical skills, learn how to use a spreadsheet, at a bare minimum. Really, one tool leads to another. Once you know how a spreadsheet works, you are more open to using OpenRefine to clean and standardize that data, or learning a language for scraping data, or another program that will help with finding connections.

What classes will you be teaching at Stanford, and how?

I will be teaching several courses, including a data journalism class focusing on relational data, basic statistics and mapping. I also will be teaching an investigative reporting class focusing on investigative reporting tools.

In general, I want to make sure the students are telling stories from data that they analyze. They should be not only learning the technical stack, but how to apply the technical knowledge to real-world journalism. I am hoping to create some partnerships with newsrooms as well.

Where do you turn to keep your skills updated or learn new things?

IRE and NICAR and all the folks involved there. I also try to learn from our producers at The Seattle Times, who come in knowing way more than I did when I started in journalism. I try to follow smart people on Twitter and other social media.

I like to reach out to folks about what they are doing. I think reaching out and connecting with folks outside of journalism is a great way to make sure we are aware of other new tools, developments, etc.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience?

Newsrooms are often still structured into silos, so reporters just report and write. They may hand their data off to a graphics desk, but they don’t necessarily analyze or visualize data themselves. Producers produce, but don’t write, even though they may enjoy that and be good at it, too.

Some of this is by necessity, but it makes it harder to learn new skills — and some of these skills are really useful. A reporter who knows how to visualize data may also be able to look at in a different way for reporting the story out too. So, building collaborative teams is important, as is providing time for folks to try out other skills.

Are journalism schools training people properly? What will you do differently?

I think it’s no secret that a lot of change is starting to take place in schools.

Cindy Royal had an interesting piece aboutplatforms just the other day. In general, I think my answer here is similar to the biggest challenge for newsrooms: We need to take a more integrated approach. Classrooms and their teachers should collaborate on work.

So, for example, a multimedia class produces the visualizations and videos that go with the stories being written in another class. (Yes, Stanford already does this.)

Data journalism should not be just one class out of a curriculum, but infused throughout a curriculum. Every type of journalist can learn data-related skills that will help them, whether they end up as a copy editor, a reporter, a front-line editor or a graphics artist.

What data journalism project are you the most proud of working on or creating?

I have been asked this question before and can never answer it well. My last story is always the one I’m most proud of, unless it’s the one I’m about to publish.

That said, as an editor at The Seattle Times, I worked with Jennifer LaFleur (then with ProPublica) on a project tracking the reasons behind foreclosures, a deep dive into the driving factors behind foreclosures from several cities.

When I was a reporter, I was lucky enough to get to work with Ken Armstrong on our court secrecy project in 2006, which changed state practice. I also led the reporting effort on problems with airport security. Both of those used small data sets, which we built ourselves, but told important stories.

I can think of even more stories that weren’t data projects per se, but which used data in the reporting in critical ways. The recent Oso mudslide coverage is an example of where we used mapping data and landslide data to effectively tell the story of the impact of the slide on the victims and of how the potential disastrous consequences had been ignored over time.

What data journalism project created by someone else do you most admire?

Too many to count. There has been so much great work done. ProPublica’s Dollars for Docs was fantastic not only for its stories, but the way they shared the data and the way newsrooms from across the country could tap into the work.  Last year, the Milwaukee Journal Sentinel’s project,Deadly Delays, was such important work.

How has the environment for doing this kind of work changed in the past five years?

It’s much more integrated into new immersive storytelling platforms. There is a recognition that stories can be told in many different ways. A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.

I think there are many more team collaborations, with the developers, designers and reporters and CAR specialists working together from the outset. We need a lot more of this.

What’s different about practicing data journalism today, versus 10 years ago? What about teaching it?

There are more tools, with more coming every day. A few are great, and a lot aspire to be great and some of those will probably get there.

The really fantastic thing about the change is that it’s relatively easy to contribute to the development of a tool that will help journalism, even just as a beta tester.

There are more tech folk interested in helping make journalism better. We’re becoming a less insular world, and that’s a good thing.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

News apps help tell important stories. It’s the same reason narrative is important.

It always should boil down to that: “does this tool, language, or app help tell a story?” If the answer is “yes,” and you think the story could be worth the effort, then the tool is important too.

What’s the one thing people always get wrong when they talk about data journalism?

I think I’ll have to punt on this one. As you have pointed out, data journalism is a big umbrella term for many different things — precision journalism, computer-assisted reporting, computational journalism, news apps, etc. — so it’s easy to have a different idea as to what it means.

[IMAGE CREDIT: University of Washington]

How It's Made, Research, Tips & Tutorials

Treat data as a source and then open it to the public, says Momi Peralta

0

Long before data journalism entered the mainstream discourse, La Nacion was pushing the boundaries of what was possible in Argentina, a country without an freedom of information law. If you look back into La Nacion’s efforts to go online and start to treat data as a source, you’ll find Angélica “Momi” Peralta Ramos (@momiperalta), the multimedia development manager who originally launched LaNacion.com in the 1990s and now manages its data journalism efforts.
peralta

Ramos contends that data-driven innovation is an antidote to budget crises in newsrooms. Her perspective is grounded in experience: Peralta’s team at La Nacion is using data journalism to challenge a FOIA-free culture in Argentina, opening up data for reporting and reuse to holding government accountable. This spring, I interviewed her about her work and perspective. Her answers follow, lightly edited for clarity.

You’re a computer scientist and MBA. How did you end up in journalism?

Years ago, I fell in love with the concept of the Internet. It is the synthesis of what I’d studied: information technology applied to communications. Now, with the opportunity of data journalism, I think there is a new convergence: the extraction and sharing of knowledge through collaboration using technology. I’m curious about everything and love to discover things.

How did your technical and business perspective inform how you approached LaNacion.com and La Nacion Data?

In terms of organization, it helped to consider traditional business areas like sales, marketing, customer service, business intelligence, and of course technology and a newsroom for content.

At first, I believed in the unlimited possibilities of technology applied to publishing online, and the power of the net to distribute content. Content was free to access and gratuity became the norm. As consumers embraced it, there was a demand and a market, and when there is a market there are business opportunities, although with a much more fragmented competitive environment.

The same model applies now to data journalism. Building content from data or data platforms must evolve to an economy of scale in which the cost of producing [huge amounts of] content in one single effort tends to zero.

What examples of data-driven journalism should the public know about at La Nacion?

Linked below is a selection of 2013 projects. Some of them are finalists in the 2014 Data Journalism Awards! Please watch the videos inside the posts, as we explained how we manage to extract, transform, build and open data in every case.

How you see digital publishing, the Internet and data journalism in South America or globally? What about your peers?

I can’t tell about everyone else’s view, but I think we see it all the same, as both a big challenge and opportunity.

From then on, it’s a matter of being willing to do things. The technology is there, the talent is everywhere, the people who make a difference are the ones you have to gather.

As the context is different in every country and there are obstacles, you have to become a problem solver and be creative, but never stop. For example, if there are language barriers, translate. If there is no open data, start by doing it yourself. If technology is expensive, check first for free versions. Most are enough to do everything you need.

What are the most common tools applied to data journalism at La Nacion?

Collaborative tools. Google Docs, spreadsheets, Open Refine, Junar’s open data platform, Tableau Public for interactive graphs, and now Javascript or D3.js for reusable interactive graphs tied to updated datasets. We love tools that don’t need a developer every time to create interactive content. These are end user´s tools.

Developers are the best for “build once, use many times” kinds of content, developing tools, news applications and for creative problem solving.

What are the basic tools and foundational skills that data journalists need?

First, searching. Using advanced search techniques, in countries like ours, you find there is more on the Deep Web than in the surface.

Then scraping, converting data from PDFs, structuring datasets, and analyzing data. Then, learning to publish in open data formats.

Last, but not least: socializing and sharing your work.

Data journalists need a tolerance for frustration and ability to reinvent and self motivate. Embrace technology. Don’t be afraid to experiment with tools, and learn to ask for help: teamwork is fun.

How do you and your staff keep your skills updated and learn?

We self-teach for free, thanks to the net. We look at best practices and inspiration from other´s cases, then whenever, we can, we for assistance at conferences as NICAR, ISOJ or ONA and follow them online. If there are local trainings, we assist. We went to introductory two-day courses for ArcGIS and Qlikview (business inteliigence software) just to learn the possibilities of these technologies.

We taught ourselves Tableau. An interactive designer and myself took two days off in a Starbucks with the training videos. Then she, learned more in an advanced course.

We love webinars and MOOC, like the Knight Center´s or the EJR’s data journalism MOOC.

We design internal trainings. We have a data journalism training program, now starting our 4th edition, with five days of full-time learning for groups of journalists and designers in our newsroom. We also design Excel courses for analyzing and designing data sets (DIY Data!) and, thanks to our Knight-Mozilla OpenNews fellows, we have customized workshops like CartoDB and introductions to D3.js.

We go to hackathons and meetups — nearly every meetup in Buenos Aires. We interact with experts and with journalists and learn a lot there, working in teams.

What are the biggest challenges La Nacion faces in practicing data journalism? What’s changed since 2011, in terms of the environment?

The context. To take just one example, consider the inflation scandal in Argentina. Even The Economist removed our [national] figures from their indicators page. Media that reported private indicators were considered as opposition by the government, which took away most of official advertising from these media, fined private consultants who calculate consumer price indices different than the official, pressed private associations of consumers to stop measuring price and releasing price indexes, and so on.

Regarding official advertising, between 2009 and 2013, we managed to build a dataset. We found out that 50% went to 10 media groups, the ones closer to the government. In the last period, a hairdresser (stylist) received more advertising money than the largest newspapers in Argentina. Here´s how we built and analyzed this dataset.

Last year, independent media suffered an ad ban, as reported in The Wall Street Journal: “Argentina imposes ad ban, businesses said.”

Argentina is ranked 106 / 177 in Transparency International Corruption Perceptions Index. We still are without a Freedom of Information law.

Regarding open data from governments, there are some initiatives. One that is more advanced is the City of Buenos Aires Open Data portal, but also there are national, some provincial and municipal initiatives starting to publish useful information, and even open data.

Perhaps the best change is that we have is a big hacktivism community of transparency activists, NGOs, journalists and academic experts that are ready to share knowledge for data problem solving as needed or in hackathons.

Our dream is for everyone to understand data as a public service, not only to enhance accountability but to enhance our quality of life.

What’s different about your work today, versus 1995, when LaNacion.com went online?

In 1995, we were alone. Everything was new and hard to sell. There was a small audience. Producing content was static, still in two dimensions, perhaps including a picture in .jpg form, and feedback came through e-mail.

Now there is a huge audience, a crowded competitive environment, and things move faster than ever in terms of formats, technologies, businesses and creative uses by audiences. Every day, there are challenges and opportunities to engage where audiences are, and give them something different or useful to remember us and come back.

Why are data journalism and news apps important?

Both move public information closer to the people and literally put data in citizens’ hands.

News apps are great to tell stories, and localize your data, but we need more efforts to humanize data and explain data. [We should] make datasets famous, put them in the center of a conversation of experts first, and in the general public afterwards.

If we report on data, and we open data while reporting, then others can reuse and build another layer of knowledge on top of it. There are risks, if you have the traditional business mindset, but in an open world there is more to win than to lose by opening up.

This is not only a data revolution. It is an open innovation revolution around knowledge. Media must help open data, especially in countries with difficult access to information.

How do Freedom of Information laws relate to data journalism?

FOI laws are vital for journalism, but more vital for citizens in general, for the justice system, for politicians, businesses or investors to make decisions. Anyone can republish information, if she can get it, but there are requests of information with no response at all.

What about open government in general? How does the open data movement relate to data journalism?

The open government movement is happening. We must be ready to receive and process open data, and then tell all the stories hidden in datasets that now may seem raw or distant.

To begin with, it would be useful to have data on open contracts, statements of assets and salaries of public officials, ways to follow the money and compare, so people can help monitor government accountability. Although we dream in open data formats, we love PDFs against receiving print copies.

The open data movement and hacktivism can accelerate the application of technology to ingest large sets of documents, complex documents or large volumes of structured data. This will accelerate and help journalism extract and tell better stories, but also bring tons of information to the light, so everyone can see, process and keep governments accountable.

The way to go for us now is use data for journalism but then open that data. We are building blocks of knowledge and, at the same time, putting this data closer to the people, the experts and the ones who can do better work than ourselves to extract another story or detect spots of corruption.

It makes lots of sense for us to make the effort of typing, building datasets, cleaning, converting and sharing data in open formats, even organizing our own ‘datafest’ to expose data to experts.

Open data will help in the fight against corruption. That is a real need, as here corruption is killing people.

Research, Tips & Tutorials

Tasneem Raja urges newsrooms to adopt pair programming for better data journalism

4

New, digitally native media enterprises like Five Thirty Eight have gotten a lot of attention — and some grief — as they’ve gone online this year. It’s media organizations and journalists at them that were born in print, however, that pioneered the practices of computer-assisted reporting that underpin the news apps and data journalism of today. The New York Times’ Upshot is just the latest addition to these efforts.

tasneem-rajaTasneem Raja, the interactive editor at Mother Jones Magazine, knows this reality well. She’s one of the growing number of journalists who aren’t just reporting upon the news but building the medium for the message to be communicated. Before she joined Mother Jones, was the news apps editor at The Bay Citizen, where her team built  a Bike Accident Tracker and a government salary database, among other things, and a feature writer at The Chicago Reader. Raja’s insights into how to build an interactive news team (more on that below) are well worth reading. You can follow her work on Github or her commentary on Twitter. Our interview follows, lightly edited and linked for context.

Where do you work now? What is a day in your life like?

I’m a senior editor at Mother Jones magazine, where I lead an awesome team of data reporters and interactive producers. I’m also a writer and reporter, in print and on the Web.

We live by a few guiding principles on my team. The big one is that it’s our job to make sure everybody in the newsroom can tell a story by any means necessary. That is, reporters should know how to map, the mapmaking pros on my team should know how to factcheck, the fact checkers should know when to use a column chart versus a bar chart, and so on. We don’t believe in siloed skills.

Of course, some folks will always be way better at some skills than others, but you gotta pay it forward, which brings me to our second guiding principle: we are all learners, and we are all teachers.

Put it all together, and you get quite the three-ring circus of hybrid journalism going on here everyday. Today, for instance, I’m finishing up edits on a big magazine feature story about the future of programming, while teaching Illustrator charting to a reporter with good data on air pollution. Producer Jaeah Lee is teaching a reporter best practices in structured data for an easy-to-update map of gay marriage laws. Our interactive fellow AJ Vicens just fired off a quick blog post about racism in sports, and is now working with a reporter on abstracting an open-source template we made. We probably look pretty different than most data teams in this way.

How did you get started in data journalism? Did you get any special degrees or certificates?

I was a staff writer at the Chicago Reader in the mid-2000s, which was, of course, a scary time to be in news. When a bunch of my senior mentors there, all writers, got canned in 2007, I decided to re-evaluate my career and went to j-school at Berkeley to learn new skills. I was lucky enough to be there while Josh Williams was teaching web development (he left for the NYT, where he worked on Snowfall and tons of other big interactive pieces), and essentially attached myself at the hip. It turned into a year-long independent study, and got me a job on the launch team at The Bay Citizen, where I created a news apps team that made some really cool data projects for the Bay Area (RIP, TBC).

What quantitative skills did you start with?

I’ve always appreciated structured ways of looking at information. There’s something about well-formatted tables of information and clean spreadsheets that makes me really happy. That’s the most important skill for a data journalism, in my opinion: a love of working with structured data, and creating whole new systems and worlds atop it. That strange love is what makes you want to put the time in to learn R, command line tools, pivot tables, and so on – all stuff I didn’t pick up ’til halfway through my first job in data journalism.

Did you have any mentors? Who? What were the most important resources they shared with you?

“Mentor” is a funny word. Here are a few people who’ve left deep thumbprints on the way I think about my work (whether they intended to or not).

Josh Williams taught me everything I needed to know to get a really good first job in data journalism and news apps: what’s a text editor, what’s the command line, what’s a Web framework. More than that, he got me thinking in terms of abstraction. For instance, he was always saying, “Never build something you can only use once.” Instead, think both in terms of the specific needs of the project in front of you, and the broader needs of a similar project you might not even know of till next year. Seeing the way he held both of these concepts in his head at once was an incredible lesson in how to be a journalist who is also a pretty decent project manager.

Brian Boyer taught me the importance of having a guiding philosophy (or three) to your work. The why of what you do, not just the how. And that your philosophies can sound more like something a chef or a potter would say, than a data nerd. In other words, he got me thinking about this work as craft.

Scott Klein has inspired me to better know my shit. That is, it’s not enough to read a few blog posts about data journalism and crown yourself the next Edward Tufte. There’s a lot of history to what we do, a lot of important choices to be made, and fortunately, there are very old and very new books out there to learn from. You can’t have a conversation with Scott without wanting to go pick up a book.

What does your personal data journalism “stack” look like? What tools could you not live without?

1. A good, simple text editor, with good syntax highlighting
2. A spreadsheet app, with version control and collaborative editing
3. GitHub
4. Google.com
5. The cognitive ability to think in terms of abstraction

Where should people who want to learn start?

A hundred people have said it before me, and better: pick a project you genuinely want to do, and then hack, Google, and plead for help in forums, and read books, until you get it working.

Where do you turn to keep your skills updated or learn new things?

1. The NICAR conference
2. Twitter
3. Increasingly, printed books
4. Dissecting the work of colleagues at other shops

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are a ton of challenges, so I’ll pick one: we don’t have a pair programming model on the editorial side of the newsroom, and we need one.

Journalism schools still teach journalism as a very hierarchical, often solitary pursuit. That’s not the way it works in data journalism, and the best learning is still gonna be on the job. That requires cross-pollination between folks with different skill sets. We need a pairing model across newsrooms, not just in the nerd corner.

I’ve had several people tell me they’re surprised to learn how small my team is, given the daily volume of content we put out. That’s because we’re not the only ones who can work with data and visuals in our newsroom. W’ve spent serious time pairing with something like 1 in 3 staffers here, working and training side by side whenever physically possible. (We have offices on different coasts). We’ve gotten several editors and reporters on Github, and while we don’t have them checking in code through the command line (yet), they’re well-versed in the how and why of version control.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s all quantitative. That’s like guiding principle #3 on my team: everything is data. Words are data. Gifs are data. If it can be sorted, tracked, counted, merged, filtered: it’s probably data. I’d say half the project my team does is more qualitative than quantitative. That is, most people wouldn’t consider it data visualization so much as photo essays, games, quizzes, etc. There’s a lot of power in developing a data skill set — both technical and cognitive — that lets you make cool things with words and pictures, too.

You’re written about Silicon Valley’s “brogrammer problem.” How do data journalists and their communities of practice handle issues of race, sexism or gender?

NICAR is a pretty healthy place to be a non-white, non-male person working in journalism. I can’t speak to issues of class, ability, gender identity, and other types of difference, other than to say we’re almost definitely less good at them, and that needs to change.

I don’t have experience with the way folks in this community handle issues of inclusion issues when they come up, but I have seen evidence of folks working to preemptively to create environments that are more less exclusionary than the norm in web development, quantitative analysis, the visual arts, or journalism. Maybe it’s because there haven’t been that many of us webby data journos till recently. Data journalists are pragmatic by nature, and maybe it just didn’t make sense to alienate potential swaths of new recruits.

That’s not to say everything is rainbows and sunshine, but I’m gonna take a rare moment of optimism here and say that I’m proud to represent this community, because in my experience, it’s genuinely committed to inclusion.

How It's Made, Research, Tips & Tutorials

Data skills make you a better journalist, says ProPublica’s Sisi Wei

3

sisi-weiI’ve found that the best antidote to a decade of discussion about the “future of news” is to talk to the young journalists who are building it. Sisi Wei’s award-winning journalism shows exactly what that looks like, in practice. Just browse her projects or code repositories on Github. Listening to her lightning talk at the 2014 NICAR conference on how ProPublica reverse engineered the Sina Weibo API to analyze censorship was one of many high points of the conference for me.

Wei, a news applications developer at ProPublica, was formerly a graphics editor at The Washington Post. She is also the co-founder of “Code with me,” a programming workshop for journalists. Our interview about her work and her view of the industry follows.

Where do you work now? What is a day in your life like?

I currently work at ProPublica, on the News Applications Team. We make interactive graphics and news apps; think of projects like 3D flood maps and Dollars for Docs.

At ProPublica, no one has a specific responsibility like design, backend development, data analysis, etc. Instead, people on the team tend to do the whole stack from beginning to end. When we need help, or don’t understand something, we ask our teammates. And of course, we’re constantly working alongside reporters and editors outside of the team as well. When someone’s app is deploying soon, we all pitch in to help take things off his/her plate.

On a given day, I could be calling sources and doing interviews, searching for a specific dataset, cleaning data, making my own data, analyzing it, coming up with the best way to visualize it, or programming an interactive graphic or news app. And of course, I could also be buried beneath interview notes and writing an article.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you have?

I got started in college when I began making interactive graphics for North by Northwestern. I was a journalism/philosophy/legal studies major, so I can safely say that I had no special degrees or qualifications for data journalism.

The closest formal training I got was an “Introduction to Statistics” course my senior year, which I wish I’d taken earlier. I also had a solid math background for a non-major. The last college math course I took was on advanced linear algebra and multivariable calculus. Not that I’ve used either of those skills in my work just yet.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list. So, here’s just a sample of all the amazing people who I’ve been lucky to consider mentors in the past few years, and one of the many things they’ve all taught me.
Tom Giratikanon showed me that journalists could use programming to tell stories and exposed me to ActionScript and how programming works. Kat Downs taught me not to let the story be overshadowed by design or fancy interaction, and Wilson Andrews showed me how a pro handles making live interactive graphics for election night. Todd Lindeman taught me how to better visualize data and how to really take advantage of Adobe Illustrator. Lakshmi Ketineni and Michelle Chen honed my javascript and really taught me SQL and PHP.

Now at ProPublica, my teammates are my mentors. Here is where I learned Ruby on Rails, how news app development really works and how to handle large databases with first ActiveRecord and now ElasticSearch (which I am still working on learning).

What does your personal data journalism “stack” look like? What tools could you not live without?

  • Sublime Text, whose multiple selection feature is the trump card that makes it impossible for me to switch to anything else. If you haven’t used multiple selection, stop what you’re doing and go check it out.
  • The Terminal, for deploying and using Git or just testing out small bits of code in Ruby or Python.
  • Chrome, to debug my code.
  • The Internet, for the answers to all of my questions.

What are the foundational skills that someone needs to practice data journalism?

An insatiable appetite to get to the bottom of something, and the willingness to learn any tool to help you find the answers you’re looking for. In that process, you’ll by necessity learn programming skills, or data analysis skills. Both are important, But without knowing what questions to ask, or what you’re trying to accomplish, neither of those skills will help you.

Where should people who want to learn start?

In terms of programming, just pick a project, make it simple, make it happen and then finish it. Like Jennifer DeWalt did when she made 180 websites in 180 days.

Regarding data analysis, if you’re still in school, take more classes in statistics. If you’re not in school, NICAR offers CAR boot camps, or you can search for materials online, such as this book that teaches statistics to programmers.

Where do you turn to keep your skills updated or learn new things?

I don’t have a frequent cache of websites that I revisit to learn things. I simply figure out what I want to learn, or what problem I’m trying to solve, and use the Internet to find what I need to know.

For example, I’m currently trying to figure out which Javascript library or game engine can best enable me to create newsgames. I started out knowing close to nothing about the subject. Ten minutes of searching later, I had detailed comparisons between game engines, demos and reviews of gaming Javascript libraries, as well as wonderful tips from indie game developers for any rookies looking to get started.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are two major pipelines for newsrooms to recruit people with these skills. The first is to recruit journalists who have programming and/or data analysis experience. The second is to recruit programmers or data analysts to come into journalism.

The latter, I think, is much harder than the former, though the Knight-Mozilla OpenNews Fellowship is doing a great job of doing this. Schools are getting better at teaching students data journalism skills, but not at a high enough rate. I often see open job positions, but I rarely see students or professionals with the right skills and experience unable to find a job.

The lack of students, however, is a problem that starts before college. When high school students are applying for journalism school, they expect to go into print or radio or TV news. They don’t expect to learn how to code, or practice data analysis. I think one of the largest challenges is how to change this expectation at an earlier stage.

All of that said, I do have one wish that I would like journalism schools for fulfill: I wish that no j-school ever reinforces or finds acceptable, actively or passively, the stereotype that journalists are bad at math. All it takes is one professor who shrugs off a math error to add to this stereotype, to have the idea pass onto one of his or her students. Let’s be clear: Journalists do not come with a math disability.

What data journalism project created by someone else do you most admire?

I actually want to highlight a project called Vax, which was not built by journalists, but deploys the same principles as data journalism and has the same goals of educating the reader.

Vax is a game that teaches students both how epidemics spread, as well as prevention techniques. It was created originally to help students taking a Coursera MOOC on Epidemics really engage with the topic. I think it’s accomplished that in spades. Not only are users hooked right from the beginning, the game allows you to experience for yourself how people are interconnected, and how those who refuse vaccinations affect the process.

How has the environment for doing this kind of work changed in the past five years?

Since I only entered the field three years ago in 2011, all I can say is this: Data journalism is gaining momentum.

Our techniques are becoming more sophisticated and we’re learning from our mistakes. We’re constantly improving, building new tools and making it easier and more accessible to do common tasks. I don’t want to predict anything grand, but I think the environment is only going to get better.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

To me, data journalism has become the umbrella term that includes anyone who works in data, journalism and programming. (And yes, executing functions in Excel or writing SQL queries is both data and programming.)

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Philip Meyer, who wrote “Precision Journalism,” answers the first part of this question with his entire book, which I would recommend any aspiring data journalist read immediately. He says:

“Read any of the popular journals of media criticism and you will find a long litany of repeated complaints about modern journalism. It misses important stories, is too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their cause is not so much a lack of energy or talent or dedication to truth, as the critics sometimes imply, but a simple lag in the application of information science — a body of knowledge — to the daunting problems of reporting the news in a time of information overload.”

Data journalism allows journalists to point to the raw data and ask questions, as well as question the very conclusions we are given. It allows us to use social science techniques to illuminate stories that might otherwise be hidden in plain sight.

News apps specifically allow users to search for what’s most relevant to them in a large dataset, and give individual readers the power to discover how a large, national story relates to them. If the story is that doctors have been receiving payments from pharmaceutical companies, news apps let you search to see if your doctor has as well.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s new, or just a phase the journalism industry is going through.

Data journalism has been around since the 1970s (if not earlier), and it is not going to go away, because the skills involved are core to being a better journalist, and to making your story relatable to millions of users online.

Just imagine, if a source told you that 2+2=18, would you believe that statement? The more likely scenario is that you’d question your source about why he or she would say something so blatantly wrong, because you know how to do math, and you know that 2+2=4. Analyzing raw data can result in a similar question to a source, except this time you can ask, “Why does your data say X, but you say Y?”

Isn’t that a core skill every journalist should have?

Research, Tips & Tutorials

Applying data science to all the news that’s fit to print

5

Over the past year, an important element of my research into data journalism’s past, present and future has been interviews with young journalists like Jeremy Bowers or Dan Hill and (relatively) hoary veteran practitioners like Aron Pilhofer. Their experience has provided invaluable evidence for debugging debates about the topic.

chase-davis-headshotThat was certainly the case with Chase Davis, an assistant editor on the Interactive News Desk at the New York Times. I first me Chase in 2011 at the first Newsfoo, in Phoenix, Arizona, where he gave an Ignite talk on three news problems data science can help solve. Davis followed up in 2012 with an Ignite on using data to explore the evolution of data journalism. Both 5 minutes videos are well worth watching if you’re even remotely interested in journalism and technology. (Davis also open sourced his data journalism Ignite on Github, if you’d like to explore that way.)

Today, Davis teaches an advanced data journalism class at Mizzou, where he helps transfer his skills and perspective (treat data a source). Our interview, lightly edited for clarity, content and [bracketed] and hyperlinked for context, follows.

What is a day in your life like?

I help supervise the developer/journalists who build many of our cool Web projects. I have a background as a reporter, primarily doing investigations and covering politics, so I try to dabble in that world as well. I also teach a class in advanced data journalism at the Missouri School of Journalism and do some consulting on the side.

How did you get started? Did you get any special degrees or certificates? Quantitative skills?

I got started in data journalism almost by accident. I started learning to program for fun in middle school, then fell in love with journalism and ended up at Mizzou. I lived a typical j-student life for a few years, writing a bunch for the student paper and doing internships, then applied (based on a total misunderstanding) to start working for NICAR. The couple years I spent there really tied those two skillsets together.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list, but I’ll name a few. Jacquee Petchel, Lise Olsen and Mark Katches for schooling me in the ways of capital-J Journalism. Brant Houston and Jeff Porter for taking me in at NICAR and showing me how journalism and data can work together. And, really, the entire IRE and NICAR community, which is outrageously giving of its collective time.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m pretty minimalist: a terminal window and some type of text editor. The only place I splurge is on a database GUI (I like Navicat). The one tool I couldn’t live without is Python, which is the best Swiss Army knife a data journalist can have.

What are the foundational skills that someone needs to practice data journalism?

The same core skills you need to practice any kind of journalism: curiosity, skepticism, an eye for detail and a sense of a good story. [They] also [need] numeracy, or at least conceptual mathematical literacy, which is still unfortunately too rare. Also important are databases and spreadsheets, statistics, and some kind of programming language — doesn’t matter which one. Being your own worst critic doesn’t hurt. And intellectual courage. You need to be motivated, not intimidated, to learn new and difficult things.

Where do you turn to keep your skills updated or learn new things?

Personal projects. I always have at least one on the backburner, and I make sure it stretches me in a new direction. Working on something I care about is the best way for me to stay motivated. I get bored learning from books.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

The oversimplified explanation is that most journalism students can’t code or do math, while most computer science students don’t know storytelling.

Hybrids on either side are rare, and we’re scooping them up as fast as we can.

Journalism schools could be doing more, but it’s not all their fault. It takes intellectual agility and natural curiosity to effectively develop hybrid skills. I don’t think that’s something we can teach solely through curriculum. That’s why I don’t think every journalism student should “learn how to code.” Being able to write a few lines of Javascript is great, but if you let your skills dead-end with that, you’re not going to be a great newsroom developer.

Folks on our interactive and graphics teams at the Times have remarkably diverse backgrounds: journalism and computer science, sure, but also cartography, art history, and no college degree at all. What makes them great is that they have an instinct to self-teach and explore.

That’s what journalism schools can encourage: introduce data journalism with the curriculum, then provide a venue for students to tinker and explore. Ideally, someone on faculty should know enough to guide them. The school should show an interest in data journalism work on par with more traditional storytelling.

Oh, and they should require more math classes.

What data journalism project are you the most proud of working on or creating?

Hard question, but I’ll offer up pretty much anything that my old team at the Center for Investigative Reporting has done. That was my first turn at being a boss, and the fact that they haven’t all been fired suggests that I didn’t mess them up too bad.

What data journalism project created by someone else do you most admire?

Look at the Philip Meyer Awards every year and you pretty much have that answer. Anyone who can take a spreadsheet full of rows and columns, or a bunch of code, and turn it into something that changes (or starts) the conversation about an important topic is the whole reason many of us got into this game in the first place.

How has the environment for doing this kind of work changed in the past five years?

It’s night and day. Five years ago, this kind of thing was still seen in a lot of places at best as a curiosity, and at worst as something threatening or frivolous. Some newsrooms got it, but most data journalists I knew still had to beg, borrow and steal for simple things like access to servers.

Solid programming practices were unheard of — version control? What’s that? If newsroom developers today saw Matt Waite’s code when he first launched PolitiFact, their faces would melt like “Raiders of the Lost Ark.”

Now, our team at the Times runs dozens of servers. Being able to code is table stakes. Reporters are talking about machine-frickin’-learning, and newsroom devs are inventing pieces of software that power huge chunks of the web. The game done changed.

What’s different about practicing data journalism today, versus 10 years ago?

It was actually 10 years ago that I first got into data journalism, which makes me feel old even though I’m not.

Back then, data journalism was mostly seen as doing analyses for stories. Great stories, for sure, but interactives and data visualizations were more rare.

Now, data journalism is much more of a Big Tent speciality. Data journalists report and write, craft interactives and visualizations, develop storytelling platforms, run predictive models, build open source software, and much, much more. The pace has really picked up, which is why self-teaching is so important.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I don’t think the semantics are important. Journalism is journalism. It should be defined on its own merits, not by the tools we use to accomplish it. Treating these things as exotic specialties makes it too easy to pigeonhole the people who practice them. And I hate that.

What’s the one thing people always get wrong when they talk about data journalism?

That data journalists are unicorns.

Or wizards. Or that they can somehow pull swords from stones in a way that mere laypeople can’t. That kind of attitude is dangerous — not because it mythologizes tech skills, or demonstrates willful ignorance on the part of technophobes (both of which are sad), but because it drives a cultural wedge between data journalists and the rest of the newsroom.

[Imagine hearing] “I’m a conventional reporter, so my specialty is reporting. You’re a tech person, so you write code.”

I think that’s crap. I know plenty of reporters who can code, and plenty of data journalists who can report the hell out of a good story. By dividing them culturally, we almost let people see the “journalist” in “data journalist” as secondary. We turn them into specialists, rather than letting them bring journalism and technology together in new and creative ways.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Numeracy is important. A more universal appreciation of technology in our industry is important. A culture of rapid, constant experimentation is important. To the extent that data journalism has encouraged those things in newsrooms, I think it’s been hugely important.

The actual product of data journalism — news apps, visualizations, stories — those will all continue to evolve, but data journalism’s continuing contribution to newsroom culture is something that I hope is permanent.

How It's Made, Research, Tips & Tutorials

Oakland Police Beat applies data-driven investigative journalism in California

6

One of the explicit connections I’ve made over the years lies between data-driven investigative journalism and government or corporate accountability. In debugging the backlash to data journalism, I highlighted the work of The Los Angeles Times Data Desk, which has analyzed government performance data for accountability, among other notable projects. I could also have pointed to the Chicago Sun-Times, which applied data-driven investigative methods to determine  that the City of Chicago’s 911 dispatch times vary widely depending on where you live, publishing an interactive map online for context, or to a Pulitzer Prize-winning story on speeding cops in Florida.

oaklandpb

This week, there’s a new experiment in applying data journalism  to local government accountability in Oakland, California, where the Oakland Police Beat has gone online. The nonprofit website, which is part of Oakland Local and The Center for Media Change and funded by The Ethics and Excellence in Journalism Foundation and The Fund for Investigative Journalism, was co-founded by Susan Mernit and Abraham Hyatt, the former managing editor of ReadWrite. (Disclosure: Hyatt edited my posts there.)

Oakland Police Beat is squarely aimed at shining sunlight on the practices of Oakland’s law enforcement officers. Their first story out of the gate is pulled no punches, finding that Oakland’s most decorated officers were responsible for a high number of brutality lawsuits and shootings.

The site also demonstrated two important practices that deserve to become standard in data journalism: explaining the methodology behind their analysis, including source notes, and (eventually) publishing the data behind the investigation. 

To learn more about why Oakland Police Beat did that, how they’ve approach their work and what the long game is, I contacted Hyatt. Our interview follows, lightly edited and hyperlinked for context. Any [bracketed] comments are my own.

So, what exactly did you launch? What’s the goal?

Hyatt: We launched a news site and a database with 25 years worth of data about individual Oakland Police Department (OPD) officers who have been involved in shootings and misconduct lawsuits.

Oakland journalists usually focus (and rightfully so) on the city’s violent crime rate and the latest problems with the OPD. We started this project by asking if we could create a comprehensive picture of the officers with the most violent behavior, which is why the OPD is where it is today. We started requesting records and tracking down information. That eventually became the database. It’s the first time anyone in Oakland has created a resource like this.

What makes this “data-driven journalism?”

Hyatt: We started with the data and let it guide the course of the entire project. The stories we’ve written all came from the data.

Why is sharing the data behind the work important?

Hyatt: Sharing is critical. Sharing, not traffic, is the metric I’m using to gauge our success, although traffic certainly is fun to watch, too. That’s the main reason that we’re allowing people to download all of our data. (The settlement database will be available for download next week.)

How will journalists, activists, and data nerds use it over time? That’s going to be the indicator of how important this work was.

[Like ProPublica, Oakland Police Beat is encouraging reuse. The site says that "You’re welcome to republish our stories and use our data for free. We publish our stories under an Attribution-NonCommercial-ShareAlike 4.0 License."]

Where do you get the data?

Hyatt: All of it came from city and court documents. Some of it came as .CSV files, some as PDFs that we had to scrape.

How much time and effort did it take to ingest, clean, structure and present?

Hyatt: Almost all of the court docs had to be human-read. It was a laborious process of digging to find officer names and what the allegations were. Combining city settlement data records and court docs took close to five months. Then, we discovered that the city’s data had flaws and that took another couple of months to resolve.

Some of the data was surprisingly easy to get. I didn’t expect the City Attorney’s office to be so forthcoming with information. Other stuff was surprisingly difficult. The OPD refused to give us awards data before 2007. They claim that they didn’t keep that data on individual officers before then. I know that’s completely false, but we’re a tiny project. We don’t have the resources to take them to court over it. Our tools were very simple.

Did you pay for it?

Hyatt: We used PACER a ton. The bill was close to $900 by the time we were done. We mainly worked out of spreadsheets. I had a handful of command line tools that I used to clean and process data. I ran a virtual machine so that I could use some Linux-bases tools as well. I heart Open Refine. We experimented with using Git for version control on stories we were writing.

” A used chemical agent grenade found on the streets in downtown Oakland following Occupy demonstrations in 2011. Photo by Eric K Arnold.

Will you be publishing data, methodology as you went along?

Hyatt: The methodology post covers all of our stories. We’ll continue to publish stories, as well as some data sets that we got along the way that we decided not to put into our main dataset, like several hundred city attorney reports about the settled cases.

What’s the funding or revenue model for the site? Where will this be in one year? Or 5?

Hyatt: Everyone wants grant-funded journalism startups to be sustainable, but, so often, they start strong and then peter out when resources run dry.

Instead of following that model, I knew from the start that this was going to be a phased project. We had some great grants that got us started, but I didn’t know what the funding picture was going to look like once we started running stories. So, I tried to turn that limitation into a strength.

We’re publishing eight weeks worth of stories and data. We’re going to cram as much awesome into those weeks as we can and then, if needed, we can step away and let this project stand on its own.

With that said, we’re already looking for funding for a second phase (which will focus on teens and the OPD). When we get it, we’ll use this current data as a springboard for Phase 2.

Could this approach be extended to other cities?

Hyatt: The OPD and its problems are pretty unique in the USA. This was successful because there was so much stuff to work with in Oakland. I don’t think our mentality for creating and building this project was unique.

Research, Tips & Tutorials

Profile of the Data Journalist: Dan Hill

0

Part of my research into data journalism’s past, present and future has been interviews with veteran practitioners like Aron Pilhofer, given the insight that those talks offers for debugging debates about “what it all means,” and younger journalists like Jeremy Bowers or Dan Hill. Their recent paths to the profession should offer insight and inspiration to others who would follow in their paths.

DanHill_jpg_800x1000_q100

 

Hill was kind enough to discuss his work with me this spring. Our interview follows, lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

I joined The Texas Tribune as a full-time news apps developer in January. Our team is responsible for both larger-scale “explorer” apps and what I’d call “daily interactives.” My day often involves writing and processing public information requests, designing interactives and working on Django apps, depending on the scale of my project.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I’ve always wanted to be a reporter, but the work of Phillip Reese at The Sacramento Bee and The Chicago Tribune’s news apps team inspired me to enhance my storytelling with data. I was a student fellow for the Northwestern University Knight Lab and studied journalism and computer science, but an internship with The Washington Post taught me how to apply what I was learning in a newsroom.

Did you have any mentors? Who? What were the most important resources they shared with you?

I’ve had awesome mentors. Bobby Calvan and Josh Freedom du Lac were the first to treat me like a real reporter. Jon Marshall helped me explore my interests. Phillip Reese showed me how to find untold stories in spreadsheets and Brian Boyer encouraged me to learn Python. Serdar Tumgoren and Jeremy Bowers showed me how a team of news developers operates. Travis Swicegood taught me how todeal with real world data.
My mentors remind me to always be learning and asking questions.

What does your personal data journalism “stack” look like? What tools could you not live without?

I use Excel, OpenOffice, GoogleDocs, Django and iPython notebooks for data analysis. R is creeping into my workflow for exploring datasets and experimenting with visualizations. We use d3 and chartjsfor web graphics and Mapbox for web maps. I could probably survive without Backbone, but we use it a lot.

What are the foundational skills that someone needs to practice data journalism?

I think a data journalist needs news judgment and attention to detail in order to identify the newsworthiness and limitations of datasets.
Statistics can help explain a dataset’s strengths and weaknesses, so I wish I paid more attention during my stats classes in school.
In addition to finding the stories, data journalists also need to be able to explain why data is significant to their audience, so visual journalists need design skills — and, of course, reporting and writing.

Where do you turn to keep your skills updated or learn new things?

I check Source, the Northwestern Knight Lab blog and the NICAR listserv for new ideas. Lately, I’ve been teaching myself statistics and R with r-tutor and Machine Learning for Hackers.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

I think the differences between the developer and newsroom cultures make it hard for newsrooms to find people with tech and journalism skills ,and to coordinate projects with developers and reporters.
As a student in journalism school, I was inspired to learn more about data when professor Darnell Little showed how it could enhance my reporting and help me find stories hidden in datasets.
I learned more developer-journalist skills like database management and web design from meetups, tutorials and classes outside the j-school, but the journalism school exposed me to what journalists with those skills could do.
I’ll add I’m impressed with data literacy of the Texas Tribune newsroom, where reporters request spreadsheets and use data to verify claims on their beats. Even if reporters don’t have the programming chops to make an interactive graphic, for example, they’re great about identifying potential data stories.

What data journalism project are you the most proud of working on or creating?

My summer intern project at The Washington Post, a study of every Washington D.C. homicide case between 2000 and 2011, was my first experience making news app in a newsroom. I was honored get to work with the investigative reporters as a newbie intern and learned a ton from building the database and doing analysis with Serdar. All of my contributions were on the backend, but I was thrilled to work with that dataset as an intern.

What data journalism project created by someone else do you most admire?

Propublica’s Message Machine was my favorite project from the 2012 presidential election, because it took a unique approach to identify trends in email metadata.
I’m excited for more stories that collect everyday metadata or use sensors to explore the data around us.
IMG_3360-300x290

How has the environment for doing this kind of work changed in the past five years?

I’d never heard of a “news apps team” five years ago. I knew I wanted to be an investigative reporter but never thought I would write code every day. I admired reporters like Phillip Reese who were working with data and making interactive graphics, but I didn’t see as many teams of specialized developer-journalists.

What’s different about practicing data journalism today, versus 10 years ago?

I wasn’t even a teenager 10 years ago, but I would gander… THE INTERNET. Online data portals, open government and open Web stuff are important to the data journalism I do. I’m not sure they were as common a decade ago.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I think of “data journalism” as an umbrella term that refers the use of data in reporting or presentation, whereas I think of CAR and computational journalism as subsets of data journalism that involve analyzing a dataset.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

I’m excited to work with data because of its widespread use in decision making. I think news apps can help people understand meaningful data and uphold accountability for people who create and make decisions with data.
Be A Newsnerd has better answers

What’s the one thing people always get wrong when they talk about data journalism?

Although the web plays a big role in the growth of data journalism, I don’t think you need to be online to do data journalism.

Research, Tips & Tutorials

Profile of the Data Journalist: Jeremy Bowers

8

As in 2012, when I published a series of profiles of data journalists at Radar, I’ve conducted a series of email interviews after the annual conference of the National Institute for Computer-Assisted Reporting (NICAR). They’re part of my research into data journalism’s past, present and future, helping me to debug debates about “what it all means.

jeremy_bowers_headshot
The following interview is with Jeremy Bowers, a news application developer at NPR. (He also knows a lot about how the Internet works.) It has been lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

I work on the Visuals team at NPR.

Our team adheres to modified scrum principles. We have a daily scrum at 10:00am for all of the members of our Visuals team. We work on projects in one-week sprints. At the end of each sprint we have an iteration review and a “ticket shuffle” where we decide what tickets each of us should work on in the next sprint. Our typical projects rarely exceed four sprint cycles.

Our projects involve minimally four people: One developer, one designer, one project manager and one stakeholder . Some projects add more designers or more developers as necessary. And sometimes we have a lot of stakeholders.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I started in data journalism at the St. Petersburg Times. I’d been working as the blog administrator for our “online” team and was informally recruited by Matt Waite to help out with a project that would turn into MugShots.

I have no special degrees or certificates. I was a political science major and I had planned to go to law school before a mediocre LSAT performance made me rethink my priorities.

I did have a background in server administration and was really familiar with Linux because of a few semesters spent hacking with a good friend in college, so that’s been pretty helpful.

Did you have any mentors? Who? What were the most important resources they shared with you?

Matt Waite from the St. Petersburg Times got me started in data journalism and has been my mentor for as long as I can remember. I don’t call him as much anymore now that he’s “Professor Waite,” but I still use a lot of our conversations as guidelines even now.

I also owe a debt to Derek Willis, though we’ve never worked together. I’m pretty much daily inspired by Ben Welsh and Ken Schwenke at the Los Angeles Times. They build apps that matter and make me look critically at what I am building. Finally, my co-worker Chris Groskopf is a great source of inspiration about keeping my code and work habits clean and professional.

What does your personal data journalism “stack” look like? What tools could you not live without?

I live and die with three tools.

First, a terminal emulator. Right now, I’m using iTerm2 with a customized Solarized theme.

Second, a text editor. Right now, I’m using Sublime Text 3 with a ton of installed packages.

Finally, I need a Web browser. Right now, I’m using Chrome.

At NPR Visuals, we’ve documented our stack so that anyone can code like we do.

What are the foundational skills that someone needs to practice data journalism? Where do you turn to keep your skills updated or learn new things?

I think the foundational skill for data journalism is curiosity. Rabid, all-consuming curiosity has done more for my career than any particular skill or talent.

I also think that good data journalists are passionate about their projects. Unlike many tech or newsroom jobs, it’s difficult to punch a clock and work 9-to-5 as a data journalist. I’m constantly pulling out my laptop and hacking on something, even when it’s not directly tied to a work project.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

The easy answer would be to say that there aren’t enough data journalists to go around. But that’s not exactly true. With Knight and Google fellowships, recent college graduates, interns, and proto-news-nerds miscast in other roles, media companies are surrounded by possibilities. Our challenge, as I see it, is building an environment where hackers and the hacker ethic can thrive. And that’s a tough thing to do at any large company, let alone a media company. But we’ve got to make that our personal mission and not be confounded by what feels like an impersonal bureaucracy.

What data journalism project are you the most proud of working on or creating?

Without a doubt, PolitiFact is the most exciting project I’ve worked on. I also really enjoyed working on the Arrested Development app for NPR — so much so, that I binge-watched the fourth season and coded up the jokes over 24 hours the day the episodes were released!

What data journalism project created by someone else do you most admire?

I love everything about the Los Angeles Times’s data desk. Their homicide and crime apps are nothing short of miraculous.

How has the environment for doing this kind of work changed in the past five years?

I released my first news app to the wild in April of 2009. At that time, there were only a handful of groups that I knew of writing code in newsrooms — the New York Times, the Washington Post, the Chicago Tribune, and Ben Welsh at the Los Angeles Times. About 300 people attended NICAR that year, and it was focused on print CAR reporting.

This year, NICAR hosted 997 people and had well-attended sessions on JavaScript and D3, tools that basically only work on the Web. There are probably 20 teams writing code in newsrooms today, and there are entire college classes dedicated to producing hacker journalists. So the environment has gotten much richer and larger in the last five years.

What’s different about practicing data journalism today, versus 10 years ago?

I can’t speak generally since I only started doing “data journalism” about five years ago. I hesitate to argue that what I’m currently doing is really data journalism as opposed to newsroom product development.

As not to cheat you out of an answer, I can say that my job now involves much more rigor. When I first started writing code at the St. Petersburg Times, we didn’t use version control. We didn’t sandbox our code. We didn’t have automated deployment tools. These days, we have GitHub to store our code, tons of command-line tricks to keep our code in separate virtual environments, and we have fantastic deployment tools that make updating our code a snap.

Additionally, my organization is much more aware of what I’m doing. My manager and his managers are much more cognizant of data journalism generally and specifically about how our work fits in with the organization’s strategy. When I started, we basically worked invisibly on products that almost nobody really knew or cared about.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I’m terrible at semantic differences. I’ll take the broad view on this one: If you’re writing code in a newsroom, you’re probably committing acts of journalism. I don’t feel terribly strongly about what we decide to call this or how we decide to slice up what an investigative journalist, a news librarian or a news apps developer might be doing every day. If they’re writing code and making journalism, I want them to have every opportunity to succeed. I don’t feel any need to give them labels or have their titles prevent them from writing code to get their jobs done.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

I think that Scott Klein said it best, and I’ll paraphrase: If you’re not using algorithmic or computational methods to analyze data, someone is scooping you on your beat.

There’s hardly a beat in journalism anymore that doesn’t involve structured data, which means that there’s hardly a journalist that wouldn’t benefit from automated methods for analyzing that data. Folks who are passing up that aspect of their jobs are just handing that opportunity over to someone else.

Print and radio are so time- and space-limited. If you’re not using the Web to tell all of the stories rather than just one story, you’re probably doing that wrong as well.

What’s the one thing people always get wrong when they talk about data journalism?

I don’t like it when people talk about how “organizations don’t get” data journalism, and I don’t like it for a very specific reason: The inability to create a news hacker culture doesn’t rest on the shoulders of some amorphous “organization.”

We should place that blame where it belongs: Squarely on the shoulders of individuals in your newsroom.

What we’ve got is a people problem. Editors and other newsroom opinion leaders should be making an environment for their reporters or others to participate in hacker journalism.

The same ethics that Eric Raymond elucidated in the Hacker How-To should guide journalists in newsrooms everywhere:

  • The world is full of fascinating problems waiting to be solved.
  • No problem should ever have to be solved twice.
  • Boredom and drudgery are evil.
  • Freedom is good.
  • Attitude is no substitute for competence.

To make your organization a place where hacker ethics are practiced requires positive action — it won’t just spring into being because of a memo. So, don’t blame your company because there’s no room to operate like a hacker. Instead, blame your boss or your boss’s boss. It’s most effective when you discuss this with them personally. But make sure you give those people an opportunity to correct their wrongs. Few people are actually hostile to the hacker ethic; most are just unfamiliar.

Research, Tips & Tutorials

Profile of the Data Journalist: Serdar Tumgoren

0

As in 2012, when I published a series of profiles of data journalists at Radar, I’ve conducted a series of email interviews after the annual conference of the National Institute for Computer-Assisted Reporting (NICAR). They’re part of my research into data journalism’s past, present and future, helping me to debug debates about “what it all means.

The second interview with Serdar Turmogren, co-creator of the Knight News Challenge-funded OpenElections project, follows. It has been lightly edited for clarity, content and hyperlinked for context.

Where do you work now? What is a day in your life like?

serdarThe Associated Press. As a member of the relatively new Data Team, it’s a pretty mixed bag: devops/server work; newsroom tool-building, such as our in-house install of DocumentCloud; data wrangling and analysis for sundry investigative and spot stories.

I also help reporters hunt down and navigate data sources for stories, and help them apply budding technical skills when I don’t have the time to get involved on a project myself.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you start with?

I started as a print reporter but was bit early on by the investigative journalism bug. The document chase quickly broadened to include data, and led me down a traditional “CAR path” of spreadsheets to databases to programming languages and web development. When I first started programming around 2005, I took a Perl class at a community college. My grade in that class shall remain hidden under lock and key.

…but seriously, you don’t need a computer science degree to master the various skills of “data journalism.” I learned how to apply technology to journalism through lots of late-night hacking, tons of programming books, and the limitless generosity of NICARians who shared technical advice, provided moral support, and taught classes at NICAR conferences.

Did you have any mentors? Who? What were the most important resources they shared with you?

Dave Sheingold at The (Bergen) Record. Derek Willis and Aron Pilhofer at NYT. Troy Thibodeaux at the AP.

Unbiased technical knowledge and advice that always brought the focus back to journalism. It’s easy to get obsessed with the tech side, something Phil Meyer warned us about.

What does your personal data journalism “stack” look like? What tools could you not live without?

Python, Ruby, Linux, spreadsheets and databases, QGIS, and myriad command-line tools for wrangling data.

If I could only keep one tool (and all of its libraries), I’d have to say Python. Nowadays, it can handle most everything you’d need, from general data wrangling to analysis to visualization to Web frameworks.

Ruby is a solid alternative, so I’m not looking to start any flame wars here. In my case, I’m still partial to Python because it’s the first programming language in which I gained a degree of fluency.

What are the foundational skills that someone needs to practice data journalism?

Same as a good investigative reporter: Curiosity and doggedness.

Mastering a programming language requires the same curiosity and persistence as unravelling a bureaucratic maze. You have to be willing to put in the hours and not give up when you hit a dead end.

Where do you turn to keep your skills updated or learn new things?

IRE/NICAR, PythonJournos, colleagues past and present.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

Sometimes. news organizations see us as an amorphous group of nerds who can be plugged into an org chart, and out will come Data Journalism! Unfortunately, it’s not as simple as that.

My skill set is wildly different than those of the next “data journalist.” I think schools are making a better effort to train young journalists in many of the skills that fall under the umbrella of data journalism: data wrangling, analysis, visualization; statistics; digital literacy (how does the Web work?); Web development.

But very few journalists are actually master of all of these skills. (I can’t name one). The real question is, do news organizations know what they want or need? If you understand the goals of your organization, you can go out and find the right kinds of nerds. Otherwise, you’re hiring in the dark.

What data journalism project are you the most proud of working on or creating?

Rebuilding the elections data system for The Washington Post’s Web operation. It was a monumental — and some would say foolhardy — undertaking, but we ultimately created a system that helped power many Web and mobile interactives/graphics during the 2012 primaries and presidential election.

What data journalism project created by someone else do you most admire?

Congress Votes, an app at The Washington Post created by Derek Willis. This is the first big political news app (that I’m aware of) that tried to provide an interactive experience of such a high-profile government data set, with a high degree of back-end automation. It also tried to have a little fun. (Zodiac signs for Congress critters, anyone?)

It inspired many of us to start thinking about how we could be more creative and engaging on the web with government data. While we’ve seen many advances in the years since, I think Congress Votes stands out as a milestone in the history of news apps development.

How has the environment for doing this kind of work changed in the past five years?

The tools and knowledge have exploded. Powerful open source tools are increasingly available, along with countless free books and tutorials online. Cloud computing platforms are providing cheap or free ways to experiment with data tools. It’s had a massive democratizing effect, and that’s a good thing.

What’s different about practicing data journalism today, versus 10 years ago?

There are way more nerds at NICAR conferences. Seriously, the tent has grown bigger to include programmers, Web developers, data scientists (I’ll leave it to others to debate whether that’s a new name for stats geeks), and sundry other nerds.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

Yes. Ultimately, we’re trying to marshal technology to bring context to people’s lives. The tools and methods and specialties evolve, but the goal remains the same: Keep the public informed.

Why are data journalism and news apps important, in the context of the contemporary digital environment for information?

Data journalism is vital, because we’re confronted with a growing ocean of information in electronic form.

A data journalist is anyone, in my book, who can fluently work with this primary source. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.

News apps are harder to pin down, and I often find folks don’t have quite the same definition. If we’re talking about telling data-driven stories in a digital context (mobile, Web), well, then yes, I’d say news apps are an important piece of the puzzle for informing people in a different medium.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s new. Ben Welsh and “KRS One” summed that one up quite nicely at NICAR 2014.

Research, Tips & Tutorials

2014 NICAR conference highlights data journalism’s past, present and future

4

When I first attended the annual conference of the National Institute for Computer-Assisted Reporting (NICAR) in 2012, it was as a speaker. I was there to give a short talk about new data coming from the open governent movement. While it went well, and, somewhat remarkably, the presentation I used has over 10,000 views on Slideshare, I feel somewhat embarrassed about it in retrospect, given what I have learned about the NICAR community in 2012 and in the years since. This tribe of journalists are the best at using technology to support investigative journalism in the United States. They know, perhaps better than anyone else in the country, exactly how much data is available, where and of what quality and format. I figured out quite quickly that I’d be learning far more from them than the inverse.

The NICAR community is also generous: in 2012, conference attendees were quite kind about what I had to say. Thankfully, my host, David Herzog, an associate professor at the Missouri School of Journalism and academic advisor to NICAR, was happy about the presentation and discussion that followed. As he told me afterwards, “Gov 2.0, FOSS and agile development are breathing new life into data and journalism.” To the extent that I explained that relationship at NICAR or the international open data conference later in 2012, I’m happy.

The silicon lining from attending the 2012 NICAR conference was that it inspired me to write a feature story about the importance of data journalism to society and to publish a series of profiles of the people practicing it, research that led eventually to my fellowship at the Tow Center. In the years since, I’ve continued investigating data journalism, a focus that brought me back to the 2014 NICAR conference in Baltimore, Maryland.

What I found at the Inner Harbor Marriott was even more breathtaking than the huge crabcake I devoured at lunch at Feidley’s, up in Lexington Market: nearly a thousand data journalists from 20 countries gathered to network, teach and learn from one another. The NICAR conference has more than tripled in size since 2009.

“I released my first news app to the wild in April of 2009,” said NPR developer Jeremy Bowers, in an interview. “At that time, there were only a handful of groups that I knew of writing code in newsrooms — the New York Times, the Washington Post, the Chicago Tribune, and Ben Welsh at the Los Angeles Times. About 300 people attended NICAR that year, and it was focused on print CAR reporting. This year, NICAR hosted 950 people and had well-attended sessions on JavaScript and D3, tools that basically only work on the Web. There are probably 20 teams writing code in newsrooms today, and there are entire college classes dedicated to producing hacker journalists. So the environment has gotten much richer and larger in the last five years.”

As in 2012, I left feeling exhilarated, even though I couldn’t attend all four days of the conference. Everywhere I looked, there were journalists talking about how they had used tools, applied programming, told better stories, made platforms, or teaching others to do those things. Instead of chewing over the future of news, they were building it. The conference is unquestionably the best forum in the world to tap into the zeitgeist around data journalism and learn from people practicing it at all levels.

“What’s different about practicing data journalism today, versus 10 or 20 years ago,” said Matt Waite, a journalism professor at the University of Nebraska and long-time NICAR attendee, “was that from the early 1990s to mid 2000s, the tools didn’t really change all that much. The big change was we switched from FoxPro to Access for databases. Around 2000, with the Census, more people got into GIS. But really, the tools and techniques were pretty confined to that toolchain: spreadsheet, database, GIS. Now you can do really, really sophisticated data journalism and never leave Python. There’s so many tools now to do the job that it’s really expanding the universe of ideas and possibilities in ways that just didn’t happen in the early days.”

Sadly, there is no way for anyone to attend all of the 150 or so panels or workshops at NICAR 14. Fortunately, the conference, its attendees and media covering it all have done a great job of putting resources online, benefitting conference goers and people who couldn’t make it alike.

From the Investigative Researchers and Editors conference blog, tipsheets and session audio, to Northwestern University’s Knight Lab students’ coverage, to New York Times developer advocate Chrys Wu’s extraordinarily comprehensive wrap up of NICAR 14 slides, links and tools, there’s enough material to occupy you for days.

Freed of the weight of doing a roundup by their hard work, I’ll highlight five different themes that resonated with me: security, hacks and hackers, making sensors and drones, culture and collaboration, context and history.

Security

One of the positive effects of months of stories catalyzed by the leaks of Edward Snowden, the former NSA contractor, is that many more journalists are not only thinking about security but taking steps improve their operational practices, encrypting their email, chats and mobile devices. Whether we like it or not, this is the world we live and work in when so much of media and its means of production have gone digital.

The Tow Center has hosted workshops, shared lessons on digital security for journalists and will be publishing a white paper on the topic later this year. Two of my colleagues at the attended NICAR to give talks on the subject.

Susan McGregor collaborated Jennifer Valentino-DeVries, Josh Meyer, and Chris Dotenwith on a session focused upon protecting your work from prying eyes. While audio and slides aren’t available online yet, their guidance will be helpful to any journalist. Until then, the Committee to Protect Journalist’s security guide is a fantastic resource.

I wasn’t able to watch the presentation on threat modeling from Jonathan Stray in person at the annual conference of the National Institute for Computer-Assisted Reporting (NICAR) in Baltimore. Thankfully, Stray’s talk was recorded and live online within days. I think it’s one of the best presentations on operational security for journalists — or, for that matter, anyone traveling and working abroad in the 21st century — I’ve ever seen. This isn’t just a list of things to buy, install or learn: it’s a way of thinking about staying secure, based upon threat modeling in computer security. I’d recommend it to anyone creating and publishing acts of journalism in the digital age.

Hacks and hackers

Last week, news came in that the U.S. Department of Justice had moved to dismiss charges filed against reporter Barrett Brown based upon him sharing a hyperlink to a forum where stolen personally identifiable information existed. If a court had criminalized that action, it would have had a profound chilling effect upon press freedom online.

That doesn’t mean that journalists are now free to pursue hacking in the newsroom or beyond, as a panel at NICAR reminded a packed room of attendees. In fact, the opposite is true.

As members of the panel explained, journalists need to be thoughtful about how and when they access and publish data, particularly with respect to scraping, a fundamental, useful tool in the data journalist’s toolkit. The very same activity that supports a Pulitzer Prize-winning investigation at a newspaper might land a freelancer in hot water with an overzealous U.S. Attorney. For instance, Wall Street Journal reporter Jen Valentino described how careful her team was in collecting the data that informed this story on differential pricing at Staples stores.

Simply put, if you’re interested in data journalism or civic hacking, you need to know about the Computer Fraud and Abuse Act (CFAA), along with proposals for its reform, and think about when it’s appropiate to access data or not. If you’re unsure and don’t have the legal resources of major media organization behind you, think twice or thrice before clicking.

Sensors and drones

File homebrew hardware, “mood cubes” and drone journalism under the shiny and new part of NICAR, focused upon “what’s next,” but there’s was a lot to like about the creativity on display this year and enough substance regarding early usage that these are trends and techniques to watch.

WNYC data news editor John Keefe teamed up with Liza Stark, an adjunct professor at the Parsons School of Design, to demonstrate various DIY open source hardware projects and “Team Blinky,” giving the NICAR attendees ideas for discovering data in the world around us.

It was great, interactive fun, but what made it special (to this observer) was Keefe’s remark that WNYC’s Cicada Tracker project turned up some 8000 cicada sightings, with 800 people making trackers. The project made crowdsourcing data collection through sensor journalism and a distributed listening audience a reality that collected 1500 temperature readings, not just a theoretical project. For more of a deep dive into the topic, check out the proceedings of the sensor journalism workshop at the Tow Center last year. “Making your own data” came up again in the previous day’s NICAR session, with respect to the use of drones and open source hardware.

 

To be fair, the presenter, University of Nebraska professor Matt Waite flew a drone at the 2012 NICAR Conference I attended as well, but the context is different.

If you haven’t been paying attention, 2013 was a breakout year for drone journalism, as my colleague Fergus Pitt noted here at the Tow Center, from disasters to protests.

Waite pointed to an upcoming project in Kenya, as a new example.

This year, and in the years ahead, drones could be used to do more than film events, Waite suggested: they could collect data.

 

As he has been doing for years, however, Waite cautioned NICAR attendees excited about becoming pilots to learn about the regulations that govern their use and not to be stupid in the absence of clear guidance from the Federal Aviation Administration.

Context and history

If there’s one lightning talk you watch from this year’s NICAR, make it L.A. Times data desk editor Ben Welsh’s reflection on the history of the craft.

At over eighteen minutes, the the video is about three times as long as the average lightning talk, but every minute is worth your time. Welsh reminded the audience of all of the history that has gone before, from the muckrakers at the turn of the last century to decades of journalists learning how to apply steady and then rapidly evolving generations of technologies to journalism.

The one thing people always get wrong about data journalism is that it’s new, said Serdar Tumgoren, a data journalist at the Associated Press, in an interview. “Ultimately, we’re trying to marshal technology to bring context to people’s lives,” he related. “The tools and methods and specialties evolve, but the goal remains the same: Keep the public informed.”

Culture and collaboration

Ultimately, what differentiates NICAR from the hundreds of other panels, forums and conferences that I’ve been to over the years — particularly technology conferences — is the community I’ve seen on display, both online and off. I thought Matt Waite captured something important in his post about a 5 step NICAR recovery plan. While ostensibly about getting started, right away, with applying the new skills or techniques gleaned from the experience, his peroration hits upon what has made this annual collection of “journo nerds” special and endure:

“People have been saying for decades that NICAR feels different — and it is different — because of the culture of the conference. I can talk for hours about what that culture is and why it’s the way it is, but it boils down to this: It’s a giving culture.

Every one of the speakers? Volunteer. All of the hands-on teachers? Same. Did you talk to someone in the hall? Grab a speaker after to ask them a question? They almost certainly stayed right there and answered your question, right? I can remember time after time of Big Name Journalists From Big Name News Organizations dropping everything and showing me how to do something. I can count scores of times where they gave me a business card and said call me if you run into trouble. That generosity amazed me. Inspired me. Made me want to do the same.”

Several journalists from Europe commented to me at NICAR that the kind of code sharing and collaboration across newsrooms would be nigh unthinkable in their countries, given fierce competition between news outlets. To be honest, seeing data journalists from organizations that compete for scoops here in the United States share tips, tools and tricks in person and on NICAR’s busy listserv was a surprise when I first saw it.

Now, it remains inspiring. NICAR is a teaching culture, much like a teaching hospital, except that the patients are stories and the physicians are editors and reporters trying to inject facts and rational discourse into a national conversation that is all too often lacking in either.

Given the demand for data journalists and the scarcity of the skills needed, the culture and willingness to “pay forward” what mentors have taught in years passed is heartening. Many thanks to everyone there who took time to talk with me about what they do, how, and why. I’ll be sharing more interviews in post to come.

Alexander Howard is a Tow Fellow working on the Tow Center’s Data Journalism Project at the Tow Center for Digital Journalism.  The Data Journalism Project is a project made possible by generous funding from both The Tow Foundation and the John S. and James L. Knight Foundation. The Data Journalism Project includes a wide range of academic research, teaching, public engagement and development of best practices in the field of data and computational journalism. Follow Alexander Howard on Twitter @digiphile. To learn more about the Tow Center Fellowship Program, please contact the Tow Center’s Research Director Taylor Owen: taylor.owen@columbia.edu.