announcements-home, Events, Past Events, Tips & Tutorials

Source Protection: Resources

We are happy to report that many of the attendees of our October 11 workshop on Source Protection in the Information Age left with a good foundation in digital security, and trainers gained a better understanding of the challenges journalists face in becoming more secure. 
This was a collaboratively organized event that brought together organizations and individuals passionate about the safety and security of journalists. We remain committed to continue supporting this collaboration, and will be planning future workshops. 
If you weren’t able to attend the event, we recommend starting with this brief recap. In addition, we would like to share some resources that you may find useful for continuing to develop your skills and understandings in this area.
The organizers
(Lorenzo, Susan, Sandy & George)

Workshop Panel Videos

Panel 1: How technology and the law put your information at risk

Runa Sandvik, James Vasile, Aaron Williamson | Moderated by Jenn Henrichsen

Panel 2: Source protection in the real world – how journalists make it work

Online Resources

Workshop Resources

Online Library

Tactical Tech Collective

Tactical Tech’s Privacy & Expression program builds digital security awareness and skills of independent journalists, and anyone else who is concerned about the security risks and vulnerabilities of digital tools. On their website you can find manuals, short films, interactive exercises and well designed how-to’s. 

Upcoming Privacy & Security Events

October 20 | 6:30pm | Tracked Online:  How its done and how you can protect yourself
Techno-Activism 3rd Mondays (TA3M) is a community-run monthly meetup that happens in 21 cities throughout the world. It is a good place to meet and learn from individuals that work on anti-surveillance and anti-censorship issues. The October edition of NYC TA3M will feature former product lead of Ghostery who will explain how 3rd parties track you online, what information they collect, and what you can do to protect yourself. If you would like to be alerted of upcoming TA3m events, contact Sandra Ordonez @

Circumvention Tech Festival

The Circumvention Tech Festival will occur on March 1-6 in Valencia, Spain. The festival gathers the community fighting censorship and surveillance for a week of conferences, workshops, hackathons, and social gatherings, featuring many of the Internet Freedom community’s flagship events. This includes a full day of journo security events, which will be conducted both in English and Spanish. This is a great opportunity to meet the digital security pioneers. 


How It's Made, Research, Tips & Tutorials

Think about data from the beginning of the story, says Cheryl Phillips


“Stories can be told in many different ways,” said Cheryl Phillips. “A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.”

Phillips, an award-winning investigative journalist, will begin teaching students how to treat data as a source this fall, when she begins a new gig as a lecturer at Stanford’s graduate school of journalism helping to open up Stanford’s new Computational Journalism Lab.

“Cheryl Phillips brings an outstanding mix of experience in data journalism and investigative work to our program. Students and faculty here are eager to start working with her to push forward the evolving field of computational journalism,” said Jay Hamilton, Hearst Professor of Communication and Director of the Stanford Journalism Program, in a statement. “Her emphasis on accountability reporting and interest in using data to lower the costs of discovering stories will help our journalism students learn how to uncover stories that currently go untold in public affairs reporting.”


I interviewed Phillips about her career, which has included important  reporting on the nonprofit and philanthropy world, her plans for teaching at Stanford, data journalism, j-schools and teaching digital skills, and the challenges that newsrooms face today and in the future.

What is a day in your life like now?

I’m the data innovation editor at The Seattle Times. Essentially, I work with data for stories and help coordinate data-related efforts, such as working with reporters, graphics folks, and others on news apps and visualizations. I also have looked at some of our systems and processes and suggested new, more time-effective methods for us.

I’ve been at The Seattle Times since 2002. I started as a data-focused reporter on the investigations team, then became deputy investigations editor, then data enterprise editor. I also worked on the metro desk and edited a team of reporters. I currently work in the digital/online department, but really work across all the departments. I also helped train the newsroom when we moved to a new content management system about a year or so ago. I am trying to wrap up a couple of story-related projects, and do some data journalism newsroom training before I start at Stanford in the fall.

How did you get started in data journalism? Did you earn any special degrees or certificates?

I remember taking a class (outside of the journalism department) while in college. The subject purported to be about learning how personal computers worked but, aside from a textbook that showed photos of a personal computer, we really just learned how to write if, then loops on a mainframe.

I got my first taste of data journalism at the Fort Worth Star-Telegram. That’s where I did my first story using any kind of computer for something other than putting words on a screen. I had gotten the ownership agreement for the Texas Rangers, which included a somewhat complex formula. I kept doing the math on my calculator and screwed it up each time. Finally, I called up a friend of mine who was a CPA, and she taught me Lotus 1-2-3.

My real start in computer-assisted reporting came in 1995, when I was on loan to USA TODAY. I was fortunate enough to land in the enterprise department with the data editors, and Phil Meyer was there a consultant. By the end of five months, I could use spreadsheets, Paradox (for DOS!) and SPSS. What a great education. I followed that up by joining IRE and attending the NICAR conference. I’ve missed very few since then and also done some of NICAR’s specialized training on stats and maps.

I have no special degrees or certificates, but I have taken some online courses in R, Python, etc.

Did you have any mentors? Who? What were the most important resources they shared with you?

Phil Meyer is amazing, and such a great teacher. He taught me statistics, but also taught me about how to think about data. Sara Cohen and Aron Pilhofer of the New York Times, and Jennifer LaFleur of CIR. Paul Overberg at USA TODAY. They have all helped me over the years.

NICAR is an incredible world, full of data journalists and journalist-programmers who are willing to help others out. It’s a great family.

On the investigative journalism front, Jim Neff and David Boardman are fantastic editors and great at asking vital questions.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m a firm believer in the power of the spreadsheet. So much of what journalists do on a daily basis can be made easier and more effective by just using a spreadsheet.

I use OpenRefine,  CometDocs, Tabula, AP Overview and Document Cloud. I use MySQL with Navicat. I still use Access. I’m a recent convert to R, but also use SPSS. I use ESRI for mapping, but am interested in exploring other options also. I use Google Fusion Tables as well.

Most of my work has been in more of the traditional CAR front, but I’ve been learning Python for scraping projects.

What are the foundational skills that someone needs to practice data journalism?

In many ways, the same foundational skills you need for any kind of journalism.

Curiosity, for one. Journalists need to think about stories from a mindset that includes data from the very beginning, such as when a reporter talks to a source, or a government official. If an official mentions statistics, don’t just ask for a summary report, but ask for the underlying data — and for that same data over time. The editors of those reporters need to do the same thing. Think about the possibilities if you had more information and could analyze and view it in different ways.

Second, be open to learning any skill sets that will help tell the story. I got into data journalism because I discovered stories I would not be able to tell if I didn’t obtain and analyze data. We all know journalists don’t like to take someone’s word for something — data journalism just takes that to the next level.

Third, in terms of technical skills, learn how to use a spreadsheet, at a bare minimum. Really, one tool leads to another. Once you know how a spreadsheet works, you are more open to using OpenRefine to clean and standardize that data, or learning a language for scraping data, or another program that will help with finding connections.

What classes will you be teaching at Stanford, and how?

I will be teaching several courses, including a data journalism class focusing on relational data, basic statistics and mapping. I also will be teaching an investigative reporting class focusing on investigative reporting tools.

In general, I want to make sure the students are telling stories from data that they analyze. They should be not only learning the technical stack, but how to apply the technical knowledge to real-world journalism. I am hoping to create some partnerships with newsrooms as well.

Where do you turn to keep your skills updated or learn new things?

IRE and NICAR and all the folks involved there. I also try to learn from our producers at The Seattle Times, who come in knowing way more than I did when I started in journalism. I try to follow smart people on Twitter and other social media.

I like to reach out to folks about what they are doing. I think reaching out and connecting with folks outside of journalism is a great way to make sure we are aware of other new tools, developments, etc.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience?

Newsrooms are often still structured into silos, so reporters just report and write. They may hand their data off to a graphics desk, but they don’t necessarily analyze or visualize data themselves. Producers produce, but don’t write, even though they may enjoy that and be good at it, too.

Some of this is by necessity, but it makes it harder to learn new skills — and some of these skills are really useful. A reporter who knows how to visualize data may also be able to look at in a different way for reporting the story out too. So, building collaborative teams is important, as is providing time for folks to try out other skills.

Are journalism schools training people properly? What will you do differently?

I think it’s no secret that a lot of change is starting to take place in schools.

Cindy Royal had an interesting piece aboutplatforms just the other day. In general, I think my answer here is similar to the biggest challenge for newsrooms: We need to take a more integrated approach. Classrooms and their teachers should collaborate on work.

So, for example, a multimedia class produces the visualizations and videos that go with the stories being written in another class. (Yes, Stanford already does this.)

Data journalism should not be just one class out of a curriculum, but infused throughout a curriculum. Every type of journalist can learn data-related skills that will help them, whether they end up as a copy editor, a reporter, a front-line editor or a graphics artist.

What data journalism project are you the most proud of working on or creating?

I have been asked this question before and can never answer it well. My last story is always the one I’m most proud of, unless it’s the one I’m about to publish.

That said, as an editor at The Seattle Times, I worked with Jennifer LaFleur (then with ProPublica) on a project tracking the reasons behind foreclosures, a deep dive into the driving factors behind foreclosures from several cities.

When I was a reporter, I was lucky enough to get to work with Ken Armstrong on our court secrecy project in 2006, which changed state practice. I also led the reporting effort on problems with airport security. Both of those used small data sets, which we built ourselves, but told important stories.

I can think of even more stories that weren’t data projects per se, but which used data in the reporting in critical ways. The recent Oso mudslide coverage is an example of where we used mapping data and landslide data to effectively tell the story of the impact of the slide on the victims and of how the potential disastrous consequences had been ignored over time.

What data journalism project created by someone else do you most admire?

Too many to count. There has been so much great work done. ProPublica’s Dollars for Docs was fantastic not only for its stories, but the way they shared the data and the way newsrooms from across the country could tap into the work.  Last year, the Milwaukee Journal Sentinel’s project,Deadly Delays, was such important work.

How has the environment for doing this kind of work changed in the past five years?

It’s much more integrated into new immersive storytelling platforms. There is a recognition that stories can be told in many different ways. A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.

I think there are many more team collaborations, with the developers, designers and reporters and CAR specialists working together from the outset. We need a lot more of this.

What’s different about practicing data journalism today, versus 10 years ago? What about teaching it?

There are more tools, with more coming every day. A few are great, and a lot aspire to be great and some of those will probably get there.

The really fantastic thing about the change is that it’s relatively easy to contribute to the development of a tool that will help journalism, even just as a beta tester.

There are more tech folk interested in helping make journalism better. We’re becoming a less insular world, and that’s a good thing.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

News apps help tell important stories. It’s the same reason narrative is important.

It always should boil down to that: “does this tool, language, or app help tell a story?” If the answer is “yes,” and you think the story could be worth the effort, then the tool is important too.

What’s the one thing people always get wrong when they talk about data journalism?

I think I’ll have to punt on this one. As you have pointed out, data journalism is a big umbrella term for many different things — precision journalism, computer-assisted reporting, computational journalism, news apps, etc. — so it’s easy to have a different idea as to what it means.

[IMAGE CREDIT: University of Washington]

How It's Made, Research, Tips & Tutorials

Treat data as a source and then open it to the public, says Momi Peralta


Long before data journalism entered the mainstream discourse, La Nacion was pushing the boundaries of what was possible in Argentina, a country without an freedom of information law. If you look back into La Nacion’s efforts to go online and start to treat data as a source, you’ll find Angélica “Momi” Peralta Ramos (@momiperalta), the multimedia development manager who originally launched in the 1990s and now manages its data journalism efforts.

Ramos contends that data-driven innovation is an antidote to budget crises in newsrooms. Her perspective is grounded in experience: Peralta’s team at La Nacion is using data journalism to challenge a FOIA-free culture in Argentina, opening up data for reporting and reuse to holding government accountable. This spring, I interviewed her about her work and perspective. Her answers follow, lightly edited for clarity.

You’re a computer scientist and MBA. How did you end up in journalism?

Years ago, I fell in love with the concept of the Internet. It is the synthesis of what I’d studied: information technology applied to communications. Now, with the opportunity of data journalism, I think there is a new convergence: the extraction and sharing of knowledge through collaboration using technology. I’m curious about everything and love to discover things.

How did your technical and business perspective inform how you approached and La Nacion Data?

In terms of organization, it helped to consider traditional business areas like sales, marketing, customer service, business intelligence, and of course technology and a newsroom for content.

At first, I believed in the unlimited possibilities of technology applied to publishing online, and the power of the net to distribute content. Content was free to access and gratuity became the norm. As consumers embraced it, there was a demand and a market, and when there is a market there are business opportunities, although with a much more fragmented competitive environment.

The same model applies now to data journalism. Building content from data or data platforms must evolve to an economy of scale in which the cost of producing [huge amounts of] content in one single effort tends to zero.

What examples of data-driven journalism should the public know about at La Nacion?

Linked below is a selection of 2013 projects. Some of them are finalists in the 2014 Data Journalism Awards! Please watch the videos inside the posts, as we explained how we manage to extract, transform, build and open data in every case.

How you see digital publishing, the Internet and data journalism in South America or globally? What about your peers?

I can’t tell about everyone else’s view, but I think we see it all the same, as both a big challenge and opportunity.

From then on, it’s a matter of being willing to do things. The technology is there, the talent is everywhere, the people who make a difference are the ones you have to gather.

As the context is different in every country and there are obstacles, you have to become a problem solver and be creative, but never stop. For example, if there are language barriers, translate. If there is no open data, start by doing it yourself. If technology is expensive, check first for free versions. Most are enough to do everything you need.

What are the most common tools applied to data journalism at La Nacion?

Collaborative tools. Google Docs, spreadsheets, Open Refine, Junar’s open data platform, Tableau Public for interactive graphs, and now Javascript or D3.js for reusable interactive graphs tied to updated datasets. We love tools that don’t need a developer every time to create interactive content. These are end user´s tools.

Developers are the best for “build once, use many times” kinds of content, developing tools, news applications and for creative problem solving.

What are the basic tools and foundational skills that data journalists need?

First, searching. Using advanced search techniques, in countries like ours, you find there is more on the Deep Web than in the surface.

Then scraping, converting data from PDFs, structuring datasets, and analyzing data. Then, learning to publish in open data formats.

Last, but not least: socializing and sharing your work.

Data journalists need a tolerance for frustration and ability to reinvent and self motivate. Embrace technology. Don’t be afraid to experiment with tools, and learn to ask for help: teamwork is fun.

How do you and your staff keep your skills updated and learn?

We self-teach for free, thanks to the net. We look at best practices and inspiration from other´s cases, then whenever, we can, we for assistance at conferences as NICAR, ISOJ or ONA and follow them online. If there are local trainings, we assist. We went to introductory two-day courses for ArcGIS and Qlikview (business inteliigence software) just to learn the possibilities of these technologies.

We taught ourselves Tableau. An interactive designer and myself took two days off in a Starbucks with the training videos. Then she, learned more in an advanced course.

We love webinars and MOOC, like the Knight Center´s or the EJR’s data journalism MOOC.

We design internal trainings. We have a data journalism training program, now starting our 4th edition, with five days of full-time learning for groups of journalists and designers in our newsroom. We also design Excel courses for analyzing and designing data sets (DIY Data!) and, thanks to our Knight-Mozilla OpenNews fellows, we have customized workshops like CartoDB and introductions to D3.js.

We go to hackathons and meetups — nearly every meetup in Buenos Aires. We interact with experts and with journalists and learn a lot there, working in teams.

What are the biggest challenges La Nacion faces in practicing data journalism? What’s changed since 2011, in terms of the environment?

The context. To take just one example, consider the inflation scandal in Argentina. Even The Economist removed our [national] figures from their indicators page. Media that reported private indicators were considered as opposition by the government, which took away most of official advertising from these media, fined private consultants who calculate consumer price indices different than the official, pressed private associations of consumers to stop measuring price and releasing price indexes, and so on.

Regarding official advertising, between 2009 and 2013, we managed to build a dataset. We found out that 50% went to 10 media groups, the ones closer to the government. In the last period, a hairdresser (stylist) received more advertising money than the largest newspapers in Argentina. Here´s how we built and analyzed this dataset.

Last year, independent media suffered an ad ban, as reported in The Wall Street Journal: “Argentina imposes ad ban, businesses said.”

Argentina is ranked 106 / 177 in Transparency International Corruption Perceptions Index. We still are without a Freedom of Information law.

Regarding open data from governments, there are some initiatives. One that is more advanced is the City of Buenos Aires Open Data portal, but also there are national, some provincial and municipal initiatives starting to publish useful information, and even open data.

Perhaps the best change is that we have is a big hacktivism community of transparency activists, NGOs, journalists and academic experts that are ready to share knowledge for data problem solving as needed or in hackathons.

Our dream is for everyone to understand data as a public service, not only to enhance accountability but to enhance our quality of life.

What’s different about your work today, versus 1995, when went online?

In 1995, we were alone. Everything was new and hard to sell. There was a small audience. Producing content was static, still in two dimensions, perhaps including a picture in .jpg form, and feedback came through e-mail.

Now there is a huge audience, a crowded competitive environment, and things move faster than ever in terms of formats, technologies, businesses and creative uses by audiences. Every day, there are challenges and opportunities to engage where audiences are, and give them something different or useful to remember us and come back.

Why are data journalism and news apps important?

Both move public information closer to the people and literally put data in citizens’ hands.

News apps are great to tell stories, and localize your data, but we need more efforts to humanize data and explain data. [We should] make datasets famous, put them in the center of a conversation of experts first, and in the general public afterwards.

If we report on data, and we open data while reporting, then others can reuse and build another layer of knowledge on top of it. There are risks, if you have the traditional business mindset, but in an open world there is more to win than to lose by opening up.

This is not only a data revolution. It is an open innovation revolution around knowledge. Media must help open data, especially in countries with difficult access to information.

How do Freedom of Information laws relate to data journalism?

FOI laws are vital for journalism, but more vital for citizens in general, for the justice system, for politicians, businesses or investors to make decisions. Anyone can republish information, if she can get it, but there are requests of information with no response at all.

What about open government in general? How does the open data movement relate to data journalism?

The open government movement is happening. We must be ready to receive and process open data, and then tell all the stories hidden in datasets that now may seem raw or distant.

To begin with, it would be useful to have data on open contracts, statements of assets and salaries of public officials, ways to follow the money and compare, so people can help monitor government accountability. Although we dream in open data formats, we love PDFs against receiving print copies.

The open data movement and hacktivism can accelerate the application of technology to ingest large sets of documents, complex documents or large volumes of structured data. This will accelerate and help journalism extract and tell better stories, but also bring tons of information to the light, so everyone can see, process and keep governments accountable.

The way to go for us now is use data for journalism but then open that data. We are building blocks of knowledge and, at the same time, putting this data closer to the people, the experts and the ones who can do better work than ourselves to extract another story or detect spots of corruption.

It makes lots of sense for us to make the effort of typing, building datasets, cleaning, converting and sharing data in open formats, even organizing our own ‘datafest’ to expose data to experts.

Open data will help in the fight against corruption. That is a real need, as here corruption is killing people.

Research, Tips & Tutorials

Tasneem Raja urges newsrooms to adopt pair programming for better data journalism


New, digitally native media enterprises like Five Thirty Eight have gotten a lot of attention — and some grief — as they’ve gone online this year. It’s media organizations and journalists at them that were born in print, however, that pioneered the practices of computer-assisted reporting that underpin the news apps and data journalism of today. The New York Times’ Upshot is just the latest addition to these efforts.

tasneem-rajaTasneem Raja, the interactive editor at Mother Jones Magazine, knows this reality well. She’s one of the growing number of journalists who aren’t just reporting upon the news but building the medium for the message to be communicated. Before she joined Mother Jones, was the news apps editor at The Bay Citizen, where her team built  a Bike Accident Tracker and a government salary database, among other things, and a feature writer at The Chicago Reader. Raja’s insights into how to build an interactive news team (more on that below) are well worth reading. You can follow her work on Github or her commentary on Twitter. Our interview follows, lightly edited and linked for context.

Where do you work now? What is a day in your life like?

I’m a senior editor at Mother Jones magazine, where I lead an awesome team of data reporters and interactive producers. I’m also a writer and reporter, in print and on the Web.

We live by a few guiding principles on my team. The big one is that it’s our job to make sure everybody in the newsroom can tell a story by any means necessary. That is, reporters should know how to map, the mapmaking pros on my team should know how to factcheck, the fact checkers should know when to use a column chart versus a bar chart, and so on. We don’t believe in siloed skills.

Of course, some folks will always be way better at some skills than others, but you gotta pay it forward, which brings me to our second guiding principle: we are all learners, and we are all teachers.

Put it all together, and you get quite the three-ring circus of hybrid journalism going on here everyday. Today, for instance, I’m finishing up edits on a big magazine feature story about the future of programming, while teaching Illustrator charting to a reporter with good data on air pollution. Producer Jaeah Lee is teaching a reporter best practices in structured data for an easy-to-update map of gay marriage laws. Our interactive fellow AJ Vicens just fired off a quick blog post about racism in sports, and is now working with a reporter on abstracting an open-source template we made. We probably look pretty different than most data teams in this way.

How did you get started in data journalism? Did you get any special degrees or certificates?

I was a staff writer at the Chicago Reader in the mid-2000s, which was, of course, a scary time to be in news. When a bunch of my senior mentors there, all writers, got canned in 2007, I decided to re-evaluate my career and went to j-school at Berkeley to learn new skills. I was lucky enough to be there while Josh Williams was teaching web development (he left for the NYT, where he worked on Snowfall and tons of other big interactive pieces), and essentially attached myself at the hip. It turned into a year-long independent study, and got me a job on the launch team at The Bay Citizen, where I created a news apps team that made some really cool data projects for the Bay Area (RIP, TBC).

What quantitative skills did you start with?

I’ve always appreciated structured ways of looking at information. There’s something about well-formatted tables of information and clean spreadsheets that makes me really happy. That’s the most important skill for a data journalism, in my opinion: a love of working with structured data, and creating whole new systems and worlds atop it. That strange love is what makes you want to put the time in to learn R, command line tools, pivot tables, and so on – all stuff I didn’t pick up ’til halfway through my first job in data journalism.

Did you have any mentors? Who? What were the most important resources they shared with you?

“Mentor” is a funny word. Here are a few people who’ve left deep thumbprints on the way I think about my work (whether they intended to or not).

Josh Williams taught me everything I needed to know to get a really good first job in data journalism and news apps: what’s a text editor, what’s the command line, what’s a Web framework. More than that, he got me thinking in terms of abstraction. For instance, he was always saying, “Never build something you can only use once.” Instead, think both in terms of the specific needs of the project in front of you, and the broader needs of a similar project you might not even know of till next year. Seeing the way he held both of these concepts in his head at once was an incredible lesson in how to be a journalist who is also a pretty decent project manager.

Brian Boyer taught me the importance of having a guiding philosophy (or three) to your work. The why of what you do, not just the how. And that your philosophies can sound more like something a chef or a potter would say, than a data nerd. In other words, he got me thinking about this work as craft.

Scott Klein has inspired me to better know my shit. That is, it’s not enough to read a few blog posts about data journalism and crown yourself the next Edward Tufte. There’s a lot of history to what we do, a lot of important choices to be made, and fortunately, there are very old and very new books out there to learn from. You can’t have a conversation with Scott without wanting to go pick up a book.

What does your personal data journalism “stack” look like? What tools could you not live without?

1. A good, simple text editor, with good syntax highlighting
2. A spreadsheet app, with version control and collaborative editing
3. GitHub
5. The cognitive ability to think in terms of abstraction

Where should people who want to learn start?

A hundred people have said it before me, and better: pick a project you genuinely want to do, and then hack, Google, and plead for help in forums, and read books, until you get it working.

Where do you turn to keep your skills updated or learn new things?

1. The NICAR conference
2. Twitter
3. Increasingly, printed books
4. Dissecting the work of colleagues at other shops

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are a ton of challenges, so I’ll pick one: we don’t have a pair programming model on the editorial side of the newsroom, and we need one.

Journalism schools still teach journalism as a very hierarchical, often solitary pursuit. That’s not the way it works in data journalism, and the best learning is still gonna be on the job. That requires cross-pollination between folks with different skill sets. We need a pairing model across newsrooms, not just in the nerd corner.

I’ve had several people tell me they’re surprised to learn how small my team is, given the daily volume of content we put out. That’s because we’re not the only ones who can work with data and visuals in our newsroom. W’ve spent serious time pairing with something like 1 in 3 staffers here, working and training side by side whenever physically possible. (We have offices on different coasts). We’ve gotten several editors and reporters on Github, and while we don’t have them checking in code through the command line (yet), they’re well-versed in the how and why of version control.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s all quantitative. That’s like guiding principle #3 on my team: everything is data. Words are data. Gifs are data. If it can be sorted, tracked, counted, merged, filtered: it’s probably data. I’d say half the project my team does is more qualitative than quantitative. That is, most people wouldn’t consider it data visualization so much as photo essays, games, quizzes, etc. There’s a lot of power in developing a data skill set — both technical and cognitive — that lets you make cool things with words and pictures, too.

You’re written about Silicon Valley’s “brogrammer problem.” How do data journalists and their communities of practice handle issues of race, sexism or gender?

NICAR is a pretty healthy place to be a non-white, non-male person working in journalism. I can’t speak to issues of class, ability, gender identity, and other types of difference, other than to say we’re almost definitely less good at them, and that needs to change.

I don’t have experience with the way folks in this community handle issues of inclusion issues when they come up, but I have seen evidence of folks working to preemptively to create environments that are more less exclusionary than the norm in web development, quantitative analysis, the visual arts, or journalism. Maybe it’s because there haven’t been that many of us webby data journos till recently. Data journalists are pragmatic by nature, and maybe it just didn’t make sense to alienate potential swaths of new recruits.

That’s not to say everything is rainbows and sunshine, but I’m gonna take a rare moment of optimism here and say that I’m proud to represent this community, because in my experience, it’s genuinely committed to inclusion.

How It's Made, Research, Tips & Tutorials

Data skills make you a better journalist, says ProPublica’s Sisi Wei


sisi-weiI’ve found that the best antidote to a decade of discussion about the “future of news” is to talk to the young journalists who are building it. Sisi Wei’s award-winning journalism shows exactly what that looks like, in practice. Just browse her projects or code repositories on Github. Listening to her lightning talk at the 2014 NICAR conference on how ProPublica reverse engineered the Sina Weibo API to analyze censorship was one of many high points of the conference for me.

Wei, a news applications developer at ProPublica, was formerly a graphics editor at The Washington Post. She is also the co-founder of “Code with me,” a programming workshop for journalists. Our interview about her work and her view of the industry follows.

Where do you work now? What is a day in your life like?

I currently work at ProPublica, on the News Applications Team. We make interactive graphics and news apps; think of projects like 3D flood maps and Dollars for Docs.

At ProPublica, no one has a specific responsibility like design, backend development, data analysis, etc. Instead, people on the team tend to do the whole stack from beginning to end. When we need help, or don’t understand something, we ask our teammates. And of course, we’re constantly working alongside reporters and editors outside of the team as well. When someone’s app is deploying soon, we all pitch in to help take things off his/her plate.

On a given day, I could be calling sources and doing interviews, searching for a specific dataset, cleaning data, making my own data, analyzing it, coming up with the best way to visualize it, or programming an interactive graphic or news app. And of course, I could also be buried beneath interview notes and writing an article.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you have?

I got started in college when I began making interactive graphics for North by Northwestern. I was a journalism/philosophy/legal studies major, so I can safely say that I had no special degrees or qualifications for data journalism.

The closest formal training I got was an “Introduction to Statistics” course my senior year, which I wish I’d taken earlier. I also had a solid math background for a non-major. The last college math course I took was on advanced linear algebra and multivariable calculus. Not that I’ve used either of those skills in my work just yet.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list. So, here’s just a sample of all the amazing people who I’ve been lucky to consider mentors in the past few years, and one of the many things they’ve all taught me.
Tom Giratikanon showed me that journalists could use programming to tell stories and exposed me to ActionScript and how programming works. Kat Downs taught me not to let the story be overshadowed by design or fancy interaction, and Wilson Andrews showed me how a pro handles making live interactive graphics for election night. Todd Lindeman taught me how to better visualize data and how to really take advantage of Adobe Illustrator. Lakshmi Ketineni and Michelle Chen honed my javascript and really taught me SQL and PHP.

Now at ProPublica, my teammates are my mentors. Here is where I learned Ruby on Rails, how news app development really works and how to handle large databases with first ActiveRecord and now ElasticSearch (which I am still working on learning).

What does your personal data journalism “stack” look like? What tools could you not live without?

  • Sublime Text, whose multiple selection feature is the trump card that makes it impossible for me to switch to anything else. If you haven’t used multiple selection, stop what you’re doing and go check it out.
  • The Terminal, for deploying and using Git or just testing out small bits of code in Ruby or Python.
  • Chrome, to debug my code.
  • The Internet, for the answers to all of my questions.

What are the foundational skills that someone needs to practice data journalism?

An insatiable appetite to get to the bottom of something, and the willingness to learn any tool to help you find the answers you’re looking for. In that process, you’ll by necessity learn programming skills, or data analysis skills. Both are important, But without knowing what questions to ask, or what you’re trying to accomplish, neither of those skills will help you.

Where should people who want to learn start?

In terms of programming, just pick a project, make it simple, make it happen and then finish it. Like Jennifer DeWalt did when she made 180 websites in 180 days.

Regarding data analysis, if you’re still in school, take more classes in statistics. If you’re not in school, NICAR offers CAR boot camps, or you can search for materials online, such as this book that teaches statistics to programmers.

Where do you turn to keep your skills updated or learn new things?

I don’t have a frequent cache of websites that I revisit to learn things. I simply figure out what I want to learn, or what problem I’m trying to solve, and use the Internet to find what I need to know.

For example, I’m currently trying to figure out which Javascript library or game engine can best enable me to create newsgames. I started out knowing close to nothing about the subject. Ten minutes of searching later, I had detailed comparisons between game engines, demos and reviews of gaming Javascript libraries, as well as wonderful tips from indie game developers for any rookies looking to get started.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are two major pipelines for newsrooms to recruit people with these skills. The first is to recruit journalists who have programming and/or data analysis experience. The second is to recruit programmers or data analysts to come into journalism.

The latter, I think, is much harder than the former, though the Knight-Mozilla OpenNews Fellowship is doing a great job of doing this. Schools are getting better at teaching students data journalism skills, but not at a high enough rate. I often see open job positions, but I rarely see students or professionals with the right skills and experience unable to find a job.

The lack of students, however, is a problem that starts before college. When high school students are applying for journalism school, they expect to go into print or radio or TV news. They don’t expect to learn how to code, or practice data analysis. I think one of the largest challenges is how to change this expectation at an earlier stage.

All of that said, I do have one wish that I would like journalism schools for fulfill: I wish that no j-school ever reinforces or finds acceptable, actively or passively, the stereotype that journalists are bad at math. All it takes is one professor who shrugs off a math error to add to this stereotype, to have the idea pass onto one of his or her students. Let’s be clear: Journalists do not come with a math disability.

What data journalism project created by someone else do you most admire?

I actually want to highlight a project called Vax, which was not built by journalists, but deploys the same principles as data journalism and has the same goals of educating the reader.

Vax is a game that teaches students both how epidemics spread, as well as prevention techniques. It was created originally to help students taking a Coursera MOOC on Epidemics really engage with the topic. I think it’s accomplished that in spades. Not only are users hooked right from the beginning, the game allows you to experience for yourself how people are interconnected, and how those who refuse vaccinations affect the process.

How has the environment for doing this kind of work changed in the past five years?

Since I only entered the field three years ago in 2011, all I can say is this: Data journalism is gaining momentum.

Our techniques are becoming more sophisticated and we’re learning from our mistakes. We’re constantly improving, building new tools and making it easier and more accessible to do common tasks. I don’t want to predict anything grand, but I think the environment is only going to get better.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

To me, data journalism has become the umbrella term that includes anyone who works in data, journalism and programming. (And yes, executing functions in Excel or writing SQL queries is both data and programming.)

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Philip Meyer, who wrote “Precision Journalism,” answers the first part of this question with his entire book, which I would recommend any aspiring data journalist read immediately. He says:

“Read any of the popular journals of media criticism and you will find a long litany of repeated complaints about modern journalism. It misses important stories, is too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their cause is not so much a lack of energy or talent or dedication to truth, as the critics sometimes imply, but a simple lag in the application of information science — a body of knowledge — to the daunting problems of reporting the news in a time of information overload.”

Data journalism allows journalists to point to the raw data and ask questions, as well as question the very conclusions we are given. It allows us to use social science techniques to illuminate stories that might otherwise be hidden in plain sight.

News apps specifically allow users to search for what’s most relevant to them in a large dataset, and give individual readers the power to discover how a large, national story relates to them. If the story is that doctors have been receiving payments from pharmaceutical companies, news apps let you search to see if your doctor has as well.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s new, or just a phase the journalism industry is going through.

Data journalism has been around since the 1970s (if not earlier), and it is not going to go away, because the skills involved are core to being a better journalist, and to making your story relatable to millions of users online.

Just imagine, if a source told you that 2+2=18, would you believe that statement? The more likely scenario is that you’d question your source about why he or she would say something so blatantly wrong, because you know how to do math, and you know that 2+2=4. Analyzing raw data can result in a similar question to a source, except this time you can ask, “Why does your data say X, but you say Y?”

Isn’t that a core skill every journalist should have?

Research, Tips & Tutorials

Applying data science to all the news that’s fit to print


Over the past year, an important element of my research into data journalism’s past, present and future has been interviews with young journalists like Jeremy Bowers or Dan Hill and (relatively) hoary veteran practitioners like Aron Pilhofer. Their experience has provided invaluable evidence for debugging debates about the topic.

chase-davis-headshotThat was certainly the case with Chase Davis, an assistant editor on the Interactive News Desk at the New York Times. I first me Chase in 2011 at the first Newsfoo, in Phoenix, Arizona, where he gave an Ignite talk on three news problems data science can help solve. Davis followed up in 2012 with an Ignite on using data to explore the evolution of data journalism. Both 5 minutes videos are well worth watching if you’re even remotely interested in journalism and technology. (Davis also open sourced his data journalism Ignite on Github, if you’d like to explore that way.)

Today, Davis teaches an advanced data journalism class at Mizzou, where he helps transfer his skills and perspective (treat data a source). Our interview, lightly edited for clarity, content and [bracketed] and hyperlinked for context, follows.

What is a day in your life like?

I help supervise the developer/journalists who build many of our cool Web projects. I have a background as a reporter, primarily doing investigations and covering politics, so I try to dabble in that world as well. I also teach a class in advanced data journalism at the Missouri School of Journalism and do some consulting on the side.

How did you get started? Did you get any special degrees or certificates? Quantitative skills?

I got started in data journalism almost by accident. I started learning to program for fun in middle school, then fell in love with journalism and ended up at Mizzou. I lived a typical j-student life for a few years, writing a bunch for the student paper and doing internships, then applied (based on a total misunderstanding) to start working for NICAR. The couple years I spent there really tied those two skillsets together.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list, but I’ll name a few. Jacquee Petchel, Lise Olsen and Mark Katches for schooling me in the ways of capital-J Journalism. Brant Houston and Jeff Porter for taking me in at NICAR and showing me how journalism and data can work together. And, really, the entire IRE and NICAR community, which is outrageously giving of its collective time.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m pretty minimalist: a terminal window and some type of text editor. The only place I splurge is on a database GUI (I like Navicat). The one tool I couldn’t live without is Python, which is the best Swiss Army knife a data journalist can have.

What are the foundational skills that someone needs to practice data journalism?

The same core skills you need to practice any kind of journalism: curiosity, skepticism, an eye for detail and a sense of a good story. [They] also [need] numeracy, or at least conceptual mathematical literacy, which is still unfortunately too rare. Also important are databases and spreadsheets, statistics, and some kind of programming language — doesn’t matter which one. Being your own worst critic doesn’t hurt. And intellectual courage. You need to be motivated, not intimidated, to learn new and difficult things.

Where do you turn to keep your skills updated or learn new things?

Personal projects. I always have at least one on the backburner, and I make sure it stretches me in a new direction. Working on something I care about is the best way for me to stay motivated. I get bored learning from books.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

The oversimplified explanation is that most journalism students can’t code or do math, while most computer science students don’t know storytelling.

Hybrids on either side are rare, and we’re scooping them up as fast as we can.

Journalism schools could be doing more, but it’s not all their fault. It takes intellectual agility and natural curiosity to effectively develop hybrid skills. I don’t think that’s something we can teach solely through curriculum. That’s why I don’t think every journalism student should “learn how to code.” Being able to write a few lines of Javascript is great, but if you let your skills dead-end with that, you’re not going to be a great newsroom developer.

Folks on our interactive and graphics teams at the Times have remarkably diverse backgrounds: journalism and computer science, sure, but also cartography, art history, and no college degree at all. What makes them great is that they have an instinct to self-teach and explore.

That’s what journalism schools can encourage: introduce data journalism with the curriculum, then provide a venue for students to tinker and explore. Ideally, someone on faculty should know enough to guide them. The school should show an interest in data journalism work on par with more traditional storytelling.

Oh, and they should require more math classes.

What data journalism project are you the most proud of working on or creating?

Hard question, but I’ll offer up pretty much anything that my old team at the Center for Investigative Reporting has done. That was my first turn at being a boss, and the fact that they haven’t all been fired suggests that I didn’t mess them up too bad.

What data journalism project created by someone else do you most admire?

Look at the Philip Meyer Awards every year and you pretty much have that answer. Anyone who can take a spreadsheet full of rows and columns, or a bunch of code, and turn it into something that changes (or starts) the conversation about an important topic is the whole reason many of us got into this game in the first place.

How has the environment for doing this kind of work changed in the past five years?

It’s night and day. Five years ago, this kind of thing was still seen in a lot of places at best as a curiosity, and at worst as something threatening or frivolous. Some newsrooms got it, but most data journalists I knew still had to beg, borrow and steal for simple things like access to servers.

Solid programming practices were unheard of — version control? What’s that? If newsroom developers today saw Matt Waite’s code when he first launched PolitiFact, their faces would melt like “Raiders of the Lost Ark.”

Now, our team at the Times runs dozens of servers. Being able to code is table stakes. Reporters are talking about machine-frickin’-learning, and newsroom devs are inventing pieces of software that power huge chunks of the web. The game done changed.

What’s different about practicing data journalism today, versus 10 years ago?

It was actually 10 years ago that I first got into data journalism, which makes me feel old even though I’m not.

Back then, data journalism was mostly seen as doing analyses for stories. Great stories, for sure, but interactives and data visualizations were more rare.

Now, data journalism is much more of a Big Tent speciality. Data journalists report and write, craft interactives and visualizations, develop storytelling platforms, run predictive models, build open source software, and much, much more. The pace has really picked up, which is why self-teaching is so important.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

I don’t think the semantics are important. Journalism is journalism. It should be defined on its own merits, not by the tools we use to accomplish it. Treating these things as exotic specialties makes it too easy to pigeonhole the people who practice them. And I hate that.

What’s the one thing people always get wrong when they talk about data journalism?

That data journalists are unicorns.

Or wizards. Or that they can somehow pull swords from stones in a way that mere laypeople can’t. That kind of attitude is dangerous — not because it mythologizes tech skills, or demonstrates willful ignorance on the part of technophobes (both of which are sad), but because it drives a cultural wedge between data journalists and the rest of the newsroom.

[Imagine hearing] “I’m a conventional reporter, so my specialty is reporting. You’re a tech person, so you write code.”

I think that’s crap. I know plenty of reporters who can code, and plenty of data journalists who can report the hell out of a good story. By dividing them culturally, we almost let people see the “journalist” in “data journalist” as secondary. We turn them into specialists, rather than letting them bring journalism and technology together in new and creative ways.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Numeracy is important. A more universal appreciation of technology in our industry is important. A culture of rapid, constant experimentation is important. To the extent that data journalism has encouraged those things in newsrooms, I think it’s been hugely important.

The actual product of data journalism — news apps, visualizations, stories — those will all continue to evolve, but data journalism’s continuing contribution to newsroom culture is something that I hope is permanent.

How It's Made, Research, Tips & Tutorials

Oakland Police Beat applies data-driven investigative journalism in California


One of the explicit connections I’ve made over the years lies between data-driven investigative journalism and government or corporate accountability. In debugging the backlash to data journalism, I highlighted the work of The Los Angeles Times Data Desk, which has analyzed government performance data for accountability, among other notable projects. I could also have pointed to the Chicago Sun-Times, which applied data-driven investigative methods to determine  that the City of Chicago’s 911 dispatch times vary widely depending on where you live, publishing an interactive map online for context, or to a Pulitzer Prize-winning story on speeding cops in Florida.


This week, there’s a new experiment in applying data journalism  to local government accountability in Oakland, California, where the Oakland Police Beat has gone online. The nonprofit website, which is part of Oakland Local and The Center for Media Change and funded by The Ethics and Excellence in Journalism Foundation and The Fund for Investigative Journalism, was co-founded by Susan Mernit and Abraham Hyatt, the former managing editor of ReadWrite. (Disclosure: Hyatt edited my posts there.)

Oakland Police Beat is squarely aimed at shining sunlight on the practices of Oakland’s law enforcement officers. Their first story out of the gate is pulled no punches, finding that Oakland’s most decorated officers were responsible for a high number of brutality lawsuits and shootings.

The site also demonstrated two important practices that deserve to become standard in data journalism: explaining the methodology behind their analysis, including source notes, and (eventually) publishing the data behind the investigation. 

To learn more about why Oakland Police Beat did that, how they’ve approach their work and what the long game is, I contacted Hyatt. Our interview follows, lightly edited and hyperlinked for context. Any [bracketed] comments are my own.

So, what exactly did you launch? What’s the goal?

Hyatt: We launched a news site and a database with 25 years worth of data about individual Oakland Police Department (OPD) officers who have been involved in shootings and misconduct lawsuits.

Oakland journalists usually focus (and rightfully so) on the city’s violent crime rate and the latest problems with the OPD. We started this project by asking if we could create a comprehensive picture of the officers with the most violent behavior, which is why the OPD is where it is today. We started requesting records and tracking down information. That eventually became the database. It’s the first time anyone in Oakland has created a resource like this.

What makes this “data-driven journalism?”

Hyatt: We started with the data and let it guide the course of the entire project. The stories we’ve written all came from the data.

Why is sharing the data behind the work important?

Hyatt: Sharing is critical. Sharing, not traffic, is the metric I’m using to gauge our success, although traffic certainly is fun to watch, too. That’s the main reason that we’re allowing people to download all of our data. (The settlement database will be available for download next week.)

How will journalists, activists, and data nerds use it over time? That’s going to be the indicator of how important this work was.

[Like ProPublica, Oakland Police Beat is encouraging reuse. The site says that “You’re welcome to republish our stories and use our data for free. We publish our stories under an Attribution-NonCommercial-ShareAlike 4.0 License.”]

Where do you get the data?

Hyatt: All of it came from city and court documents. Some of it came as .CSV files, some as PDFs that we had to scrape.

How much time and effort did it take to ingest, clean, structure and present?

Hyatt: Almost all of the court docs had to be human-read. It was a laborious process of digging to find officer names and what the allegations were. Combining city settlement data records and court docs took close to five months. Then, we discovered that the city’s data had flaws and that took another couple of months to resolve.

Some of the data was surprisingly easy to get. I didn’t expect the City Attorney’s office to be so forthcoming with information. Other stuff was surprisingly difficult. The OPD refused to give us awards data before 2007. They claim that they didn’t keep that data on individual officers before then. I know that’s completely false, but we’re a tiny project. We don’t have the resources to take them to court over it. Our tools were very simple.

Did you pay for it?

Hyatt: We used PACER a ton. The bill was close to $900 by the time we were done. We mainly worked out of spreadsheets. I had a handful of command line tools that I used to clean and process data. I ran a virtual machine so that I could use some Linux-bases tools as well. I heart Open Refine. We experimented with using Git for version control on stories we were writing.

” A used chemical agent grenade found on the streets in downtown Oakland following Occupy demonstrations in 2011. Photo by Eric K Arnold.

Will you be publishing data, methodology as you went along?

Hyatt: The methodology post covers all of our stories. We’ll continue to publish stories, as well as some data sets that we got along the way that we decided not to put into our main dataset, like several hundred city attorney reports about the settled cases.

What’s the funding or revenue model for the site? Where will this be in one year? Or 5?

Hyatt: Everyone wants grant-funded journalism startups to be sustainable, but, so often, they start strong and then peter out when resources run dry.

Instead of following that model, I knew from the start that this was going to be a phased project. We had some great grants that got us started, but I didn’t know what the funding picture was going to look like once we started running stories. So, I tried to turn that limitation into a strength.

We’re publishing eight weeks worth of stories and data. We’re going to cram as much awesome into those weeks as we can and then, if needed, we can step away and let this project stand on its own.

With that said, we’re already looking for funding for a second phase (which will focus on teens and the OPD). When we get it, we’ll use this current data as a springboard for Phase 2.

Could this approach be extended to other cities?

Hyatt: The OPD and its problems are pretty unique in the USA. This was successful because there was so much stuff to work with in Oakland. I don’t think our mentality for creating and building this project was unique.