Alexander Howard

Alexander B. Howard is a writer and editor based in Washington, DC. Currently, he is a fellow at the Tow Center for Digital Journalism at Columbia Journalism School. a columnist at TechRepublic, and contributes to TechPresident, among other fine publications. Previously, Howard was a fellow at the Networked Transparency Policy Project in the Ash Center at the Kennedy School of Government at Harvard University, and the Washington Correspondent for the O’Reilly Radar at O’Reilly Media. Prior to joining O’Reilly, he was the associate editor of and at TechTarget, where he wrote about how the laws and regulations that affect information technology are changing, spanning the issues of online identity, data protection, risk management, electronic privacy and IT security.


The Art and Science of Data-Driven Journalism


Journalists have been using data in their stories for as long as the profession has existed. A revolution in computing in the 20th century created opportunities for data integration into investigations, as journalists began to bring technology into their work. In the 21st century, a revolution in connectivity is leading the media toward new horizons. The Internet, cloud computing, agile development, mobile devices, and open source software have transformed the practice of journalism, leading to the emergence of a new term: data journalism.

Although journalists have been using data in their stories for as long as they have been engaged in reporting, data journalism is more than traditional journalism with more data. Decades after early pioneers successfully applied computer-assisted reporting and social science to investigative journalism, journalists are creating news apps and interactive features that help people understand data, explore it, and act upon the insights derived from it. New business models are emerging in which data is a raw material for profit, impact, and insight, co-created with an audience that was formerly reduced to passive consumption. Journalists around the world are grappling with the excitement and the challenge of telling compelling stories by harnessing the vast quantity of data that our increasingly networked lives, devices, businesses, and governments produce every day.

While the potential of data journalism is immense, the pitfalls and challenges to its adoption throughout the media are similarly significant, from digital literacy to competition for scarce resources in newsrooms. Global threats to press freedom, digital security, and limited access to data create difficult working conditions for journalists in many countries. A combination of peer-to-peer learning, mentorship, online training, open data initiatives, and new programs at journalism schools rising to the challenge, however, offer reasons to be optimistic about more journalists learning to treat data as a source.

Following is a list of the 14 findings, recommendations and predictions explored in detail in the full report, which can be downloaded here (PDF).

1) Data will become even more of a strategic resource for media.

2) Better tools will emerge that democratize data skills.

3) News apps will explode as a primary way for people to consume data journalism.

4) Being digital first means being data-centric and mobile-friendly.

5) Expect more robojournalism, but know that human relationships and storytelling still matter.

6) More journalists will need to study the social sciences and statistics.

7) There will be higher standards for accuracy and corrections.

8) Competency in security and data protection will become more important.

9) Audiences will demand more transparency on reader data collection and use.

10) Conflicts over public records, data scraping, and ethics will surely arise.

11) Collaborate with libraries and universities as archives, hosts, and educators.

12) Expect data-driven personalization and predictive news in wearable interfaces..

13) More diverse newsrooms will produce better data journalism.

14) Be mindful of data-ism and bad data. Embrace skepticism.

How It's Made, Research, Tips & Tutorials

Think about data from the beginning of the story, says Cheryl Phillips


“Stories can be told in many different ways,” said Cheryl Phillips. “A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.”

Phillips, an award-winning investigative journalist, will begin teaching students how to treat data as a source this fall, when she begins a new gig as a lecturer at Stanford’s graduate school of journalism helping to open up Stanford’s new Computational Journalism Lab.

“Cheryl Phillips brings an outstanding mix of experience in data journalism and investigative work to our program. Students and faculty here are eager to start working with her to push forward the evolving field of computational journalism,” said Jay Hamilton, Hearst Professor of Communication and Director of the Stanford Journalism Program, in a statement. “Her emphasis on accountability reporting and interest in using data to lower the costs of discovering stories will help our journalism students learn how to uncover stories that currently go untold in public affairs reporting.”


I interviewed Phillips about her career, which has included important  reporting on the nonprofit and philanthropy world, her plans for teaching at Stanford, data journalism, j-schools and teaching digital skills, and the challenges that newsrooms face today and in the future.

What is a day in your life like now?

I’m the data innovation editor at The Seattle Times. Essentially, I work with data for stories and help coordinate data-related efforts, such as working with reporters, graphics folks, and others on news apps and visualizations. I also have looked at some of our systems and processes and suggested new, more time-effective methods for us.

I’ve been at The Seattle Times since 2002. I started as a data-focused reporter on the investigations team, then became deputy investigations editor, then data enterprise editor. I also worked on the metro desk and edited a team of reporters. I currently work in the digital/online department, but really work across all the departments. I also helped train the newsroom when we moved to a new content management system about a year or so ago. I am trying to wrap up a couple of story-related projects, and do some data journalism newsroom training before I start at Stanford in the fall.

How did you get started in data journalism? Did you earn any special degrees or certificates?

I remember taking a class (outside of the journalism department) while in college. The subject purported to be about learning how personal computers worked but, aside from a textbook that showed photos of a personal computer, we really just learned how to write if, then loops on a mainframe.

I got my first taste of data journalism at the Fort Worth Star-Telegram. That’s where I did my first story using any kind of computer for something other than putting words on a screen. I had gotten the ownership agreement for the Texas Rangers, which included a somewhat complex formula. I kept doing the math on my calculator and screwed it up each time. Finally, I called up a friend of mine who was a CPA, and she taught me Lotus 1-2-3.

My real start in computer-assisted reporting came in 1995, when I was on loan to USA TODAY. I was fortunate enough to land in the enterprise department with the data editors, and Phil Meyer was there a consultant. By the end of five months, I could use spreadsheets, Paradox (for DOS!) and SPSS. What a great education. I followed that up by joining IRE and attending the NICAR conference. I’ve missed very few since then and also done some of NICAR’s specialized training on stats and maps.

I have no special degrees or certificates, but I have taken some online courses in R, Python, etc.

Did you have any mentors? Who? What were the most important resources they shared with you?

Phil Meyer is amazing, and such a great teacher. He taught me statistics, but also taught me about how to think about data. Sara Cohen and Aron Pilhofer of the New York Times, and Jennifer LaFleur of CIR. Paul Overberg at USA TODAY. They have all helped me over the years.

NICAR is an incredible world, full of data journalists and journalist-programmers who are willing to help others out. It’s a great family.

On the investigative journalism front, Jim Neff and David Boardman are fantastic editors and great at asking vital questions.

What does your personal data journalism “stack” look like? What tools could you not live without?

I’m a firm believer in the power of the spreadsheet. So much of what journalists do on a daily basis can be made easier and more effective by just using a spreadsheet.

I use OpenRefine,  CometDocs, Tabula, AP Overview and Document Cloud. I use MySQL with Navicat. I still use Access. I’m a recent convert to R, but also use SPSS. I use ESRI for mapping, but am interested in exploring other options also. I use Google Fusion Tables as well.

Most of my work has been in more of the traditional CAR front, but I’ve been learning Python for scraping projects.

What are the foundational skills that someone needs to practice data journalism?

In many ways, the same foundational skills you need for any kind of journalism.

Curiosity, for one. Journalists need to think about stories from a mindset that includes data from the very beginning, such as when a reporter talks to a source, or a government official. If an official mentions statistics, don’t just ask for a summary report, but ask for the underlying data — and for that same data over time. The editors of those reporters need to do the same thing. Think about the possibilities if you had more information and could analyze and view it in different ways.

Second, be open to learning any skill sets that will help tell the story. I got into data journalism because I discovered stories I would not be able to tell if I didn’t obtain and analyze data. We all know journalists don’t like to take someone’s word for something — data journalism just takes that to the next level.

Third, in terms of technical skills, learn how to use a spreadsheet, at a bare minimum. Really, one tool leads to another. Once you know how a spreadsheet works, you are more open to using OpenRefine to clean and standardize that data, or learning a language for scraping data, or another program that will help with finding connections.

What classes will you be teaching at Stanford, and how?

I will be teaching several courses, including a data journalism class focusing on relational data, basic statistics and mapping. I also will be teaching an investigative reporting class focusing on investigative reporting tools.

In general, I want to make sure the students are telling stories from data that they analyze. They should be not only learning the technical stack, but how to apply the technical knowledge to real-world journalism. I am hoping to create some partnerships with newsrooms as well.

Where do you turn to keep your skills updated or learn new things?

IRE and NICAR and all the folks involved there. I also try to learn from our producers at The Seattle Times, who come in knowing way more than I did when I started in journalism. I try to follow smart people on Twitter and other social media.

I like to reach out to folks about what they are doing. I think reaching out and connecting with folks outside of journalism is a great way to make sure we are aware of other new tools, developments, etc.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience?

Newsrooms are often still structured into silos, so reporters just report and write. They may hand their data off to a graphics desk, but they don’t necessarily analyze or visualize data themselves. Producers produce, but don’t write, even though they may enjoy that and be good at it, too.

Some of this is by necessity, but it makes it harder to learn new skills — and some of these skills are really useful. A reporter who knows how to visualize data may also be able to look at in a different way for reporting the story out too. So, building collaborative teams is important, as is providing time for folks to try out other skills.

Are journalism schools training people properly? What will you do differently?

I think it’s no secret that a lot of change is starting to take place in schools.

Cindy Royal had an interesting piece aboutplatforms just the other day. In general, I think my answer here is similar to the biggest challenge for newsrooms: We need to take a more integrated approach. Classrooms and their teachers should collaborate on work.

So, for example, a multimedia class produces the visualizations and videos that go with the stories being written in another class. (Yes, Stanford already does this.)

Data journalism should not be just one class out of a curriculum, but infused throughout a curriculum. Every type of journalist can learn data-related skills that will help them, whether they end up as a copy editor, a reporter, a front-line editor or a graphics artist.

What data journalism project are you the most proud of working on or creating?

I have been asked this question before and can never answer it well. My last story is always the one I’m most proud of, unless it’s the one I’m about to publish.

That said, as an editor at The Seattle Times, I worked with Jennifer LaFleur (then with ProPublica) on a project tracking the reasons behind foreclosures, a deep dive into the driving factors behind foreclosures from several cities.

When I was a reporter, I was lucky enough to get to work with Ken Armstrong on our court secrecy project in 2006, which changed state practice. I also led the reporting effort on problems with airport security. Both of those used small data sets, which we built ourselves, but told important stories.

I can think of even more stories that weren’t data projects per se, but which used data in the reporting in critical ways. The recent Oso mudslide coverage is an example of where we used mapping data and landslide data to effectively tell the story of the impact of the slide on the victims and of how the potential disastrous consequences had been ignored over time.

What data journalism project created by someone else do you most admire?

Too many to count. There has been so much great work done. ProPublica’s Dollars for Docs was fantastic not only for its stories, but the way they shared the data and the way newsrooms from across the country could tap into the work.  Last year, the Milwaukee Journal Sentinel’s project,Deadly Delays, was such important work.

How has the environment for doing this kind of work changed in the past five years?

It’s much more integrated into new immersive storytelling platforms. There is a recognition that stories can be told in many different ways. A sidebar that may once have been a 12-inch text piece is now a timeline, or a map.

I think there are many more team collaborations, with the developers, designers and reporters and CAR specialists working together from the outset. We need a lot more of this.

What’s different about practicing data journalism today, versus 10 years ago? What about teaching it?

There are more tools, with more coming every day. A few are great, and a lot aspire to be great and some of those will probably get there.

The really fantastic thing about the change is that it’s relatively easy to contribute to the development of a tool that will help journalism, even just as a beta tester.

There are more tech folk interested in helping make journalism better. We’re becoming a less insular world, and that’s a good thing.

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

News apps help tell important stories. It’s the same reason narrative is important.

It always should boil down to that: “does this tool, language, or app help tell a story?” If the answer is “yes,” and you think the story could be worth the effort, then the tool is important too.

What’s the one thing people always get wrong when they talk about data journalism?

I think I’ll have to punt on this one. As you have pointed out, data journalism is a big umbrella term for many different things — precision journalism, computer-assisted reporting, computational journalism, news apps, etc. — so it’s easy to have a different idea as to what it means.

[IMAGE CREDIT: University of Washington]

How It's Made, Research, Tips & Tutorials

Treat data as a source and then open it to the public, says Momi Peralta


Long before data journalism entered the mainstream discourse, La Nacion was pushing the boundaries of what was possible in Argentina, a country without an freedom of information law. If you look back into La Nacion’s efforts to go online and start to treat data as a source, you’ll find Angélica “Momi” Peralta Ramos (@momiperalta), the multimedia development manager who originally launched in the 1990s and now manages its data journalism efforts.

Ramos contends that data-driven innovation is an antidote to budget crises in newsrooms. Her perspective is grounded in experience: Peralta’s team at La Nacion is using data journalism to challenge a FOIA-free culture in Argentina, opening up data for reporting and reuse to holding government accountable. This spring, I interviewed her about her work and perspective. Her answers follow, lightly edited for clarity.

You’re a computer scientist and MBA. How did you end up in journalism?

Years ago, I fell in love with the concept of the Internet. It is the synthesis of what I’d studied: information technology applied to communications. Now, with the opportunity of data journalism, I think there is a new convergence: the extraction and sharing of knowledge through collaboration using technology. I’m curious about everything and love to discover things.

How did your technical and business perspective inform how you approached and La Nacion Data?

In terms of organization, it helped to consider traditional business areas like sales, marketing, customer service, business intelligence, and of course technology and a newsroom for content.

At first, I believed in the unlimited possibilities of technology applied to publishing online, and the power of the net to distribute content. Content was free to access and gratuity became the norm. As consumers embraced it, there was a demand and a market, and when there is a market there are business opportunities, although with a much more fragmented competitive environment.

The same model applies now to data journalism. Building content from data or data platforms must evolve to an economy of scale in which the cost of producing [huge amounts of] content in one single effort tends to zero.

What examples of data-driven journalism should the public know about at La Nacion?

Linked below is a selection of 2013 projects. Some of them are finalists in the 2014 Data Journalism Awards! Please watch the videos inside the posts, as we explained how we manage to extract, transform, build and open data in every case.

How you see digital publishing, the Internet and data journalism in South America or globally? What about your peers?

I can’t tell about everyone else’s view, but I think we see it all the same, as both a big challenge and opportunity.

From then on, it’s a matter of being willing to do things. The technology is there, the talent is everywhere, the people who make a difference are the ones you have to gather.

As the context is different in every country and there are obstacles, you have to become a problem solver and be creative, but never stop. For example, if there are language barriers, translate. If there is no open data, start by doing it yourself. If technology is expensive, check first for free versions. Most are enough to do everything you need.

What are the most common tools applied to data journalism at La Nacion?

Collaborative tools. Google Docs, spreadsheets, Open Refine, Junar’s open data platform, Tableau Public for interactive graphs, and now Javascript or D3.js for reusable interactive graphs tied to updated datasets. We love tools that don’t need a developer every time to create interactive content. These are end user´s tools.

Developers are the best for “build once, use many times” kinds of content, developing tools, news applications and for creative problem solving.

What are the basic tools and foundational skills that data journalists need?

First, searching. Using advanced search techniques, in countries like ours, you find there is more on the Deep Web than in the surface.

Then scraping, converting data from PDFs, structuring datasets, and analyzing data. Then, learning to publish in open data formats.

Last, but not least: socializing and sharing your work.

Data journalists need a tolerance for frustration and ability to reinvent and self motivate. Embrace technology. Don’t be afraid to experiment with tools, and learn to ask for help: teamwork is fun.

How do you and your staff keep your skills updated and learn?

We self-teach for free, thanks to the net. We look at best practices and inspiration from other´s cases, then whenever, we can, we for assistance at conferences as NICAR, ISOJ or ONA and follow them online. If there are local trainings, we assist. We went to introductory two-day courses for ArcGIS and Qlikview (business inteliigence software) just to learn the possibilities of these technologies.

We taught ourselves Tableau. An interactive designer and myself took two days off in a Starbucks with the training videos. Then she, learned more in an advanced course.

We love webinars and MOOC, like the Knight Center´s or the EJR’s data journalism MOOC.

We design internal trainings. We have a data journalism training program, now starting our 4th edition, with five days of full-time learning for groups of journalists and designers in our newsroom. We also design Excel courses for analyzing and designing data sets (DIY Data!) and, thanks to our Knight-Mozilla OpenNews fellows, we have customized workshops like CartoDB and introductions to D3.js.

We go to hackathons and meetups — nearly every meetup in Buenos Aires. We interact with experts and with journalists and learn a lot there, working in teams.

What are the biggest challenges La Nacion faces in practicing data journalism? What’s changed since 2011, in terms of the environment?

The context. To take just one example, consider the inflation scandal in Argentina. Even The Economist removed our [national] figures from their indicators page. Media that reported private indicators were considered as opposition by the government, which took away most of official advertising from these media, fined private consultants who calculate consumer price indices different than the official, pressed private associations of consumers to stop measuring price and releasing price indexes, and so on.

Regarding official advertising, between 2009 and 2013, we managed to build a dataset. We found out that 50% went to 10 media groups, the ones closer to the government. In the last period, a hairdresser (stylist) received more advertising money than the largest newspapers in Argentina. Here´s how we built and analyzed this dataset.

Last year, independent media suffered an ad ban, as reported in The Wall Street Journal: “Argentina imposes ad ban, businesses said.”

Argentina is ranked 106 / 177 in Transparency International Corruption Perceptions Index. We still are without a Freedom of Information law.

Regarding open data from governments, there are some initiatives. One that is more advanced is the City of Buenos Aires Open Data portal, but also there are national, some provincial and municipal initiatives starting to publish useful information, and even open data.

Perhaps the best change is that we have is a big hacktivism community of transparency activists, NGOs, journalists and academic experts that are ready to share knowledge for data problem solving as needed or in hackathons.

Our dream is for everyone to understand data as a public service, not only to enhance accountability but to enhance our quality of life.

What’s different about your work today, versus 1995, when went online?

In 1995, we were alone. Everything was new and hard to sell. There was a small audience. Producing content was static, still in two dimensions, perhaps including a picture in .jpg form, and feedback came through e-mail.

Now there is a huge audience, a crowded competitive environment, and things move faster than ever in terms of formats, technologies, businesses and creative uses by audiences. Every day, there are challenges and opportunities to engage where audiences are, and give them something different or useful to remember us and come back.

Why are data journalism and news apps important?

Both move public information closer to the people and literally put data in citizens’ hands.

News apps are great to tell stories, and localize your data, but we need more efforts to humanize data and explain data. [We should] make datasets famous, put them in the center of a conversation of experts first, and in the general public afterwards.

If we report on data, and we open data while reporting, then others can reuse and build another layer of knowledge on top of it. There are risks, if you have the traditional business mindset, but in an open world there is more to win than to lose by opening up.

This is not only a data revolution. It is an open innovation revolution around knowledge. Media must help open data, especially in countries with difficult access to information.

How do Freedom of Information laws relate to data journalism?

FOI laws are vital for journalism, but more vital for citizens in general, for the justice system, for politicians, businesses or investors to make decisions. Anyone can republish information, if she can get it, but there are requests of information with no response at all.

What about open government in general? How does the open data movement relate to data journalism?

The open government movement is happening. We must be ready to receive and process open data, and then tell all the stories hidden in datasets that now may seem raw or distant.

To begin with, it would be useful to have data on open contracts, statements of assets and salaries of public officials, ways to follow the money and compare, so people can help monitor government accountability. Although we dream in open data formats, we love PDFs against receiving print copies.

The open data movement and hacktivism can accelerate the application of technology to ingest large sets of documents, complex documents or large volumes of structured data. This will accelerate and help journalism extract and tell better stories, but also bring tons of information to the light, so everyone can see, process and keep governments accountable.

The way to go for us now is use data for journalism but then open that data. We are building blocks of knowledge and, at the same time, putting this data closer to the people, the experts and the ones who can do better work than ourselves to extract another story or detect spots of corruption.

It makes lots of sense for us to make the effort of typing, building datasets, cleaning, converting and sharing data in open formats, even organizing our own ‘datafest’ to expose data to experts.

Open data will help in the fight against corruption. That is a real need, as here corruption is killing people.

Research, Tips & Tutorials

Tasneem Raja urges newsrooms to adopt pair programming for better data journalism


New, digitally native media enterprises like Five Thirty Eight have gotten a lot of attention — and some grief — as they’ve gone online this year. It’s media organizations and journalists at them that were born in print, however, that pioneered the practices of computer-assisted reporting that underpin the news apps and data journalism of today. The New York Times’ Upshot is just the latest addition to these efforts.

tasneem-rajaTasneem Raja, the interactive editor at Mother Jones Magazine, knows this reality well. She’s one of the growing number of journalists who aren’t just reporting upon the news but building the medium for the message to be communicated. Before she joined Mother Jones, was the news apps editor at The Bay Citizen, where her team built  a Bike Accident Tracker and a government salary database, among other things, and a feature writer at The Chicago Reader. Raja’s insights into how to build an interactive news team (more on that below) are well worth reading. You can follow her work on Github or her commentary on Twitter. Our interview follows, lightly edited and linked for context.

Where do you work now? What is a day in your life like?

I’m a senior editor at Mother Jones magazine, where I lead an awesome team of data reporters and interactive producers. I’m also a writer and reporter, in print and on the Web.

We live by a few guiding principles on my team. The big one is that it’s our job to make sure everybody in the newsroom can tell a story by any means necessary. That is, reporters should know how to map, the mapmaking pros on my team should know how to factcheck, the fact checkers should know when to use a column chart versus a bar chart, and so on. We don’t believe in siloed skills.

Of course, some folks will always be way better at some skills than others, but you gotta pay it forward, which brings me to our second guiding principle: we are all learners, and we are all teachers.

Put it all together, and you get quite the three-ring circus of hybrid journalism going on here everyday. Today, for instance, I’m finishing up edits on a big magazine feature story about the future of programming, while teaching Illustrator charting to a reporter with good data on air pollution. Producer Jaeah Lee is teaching a reporter best practices in structured data for an easy-to-update map of gay marriage laws. Our interactive fellow AJ Vicens just fired off a quick blog post about racism in sports, and is now working with a reporter on abstracting an open-source template we made. We probably look pretty different than most data teams in this way.

How did you get started in data journalism? Did you get any special degrees or certificates?

I was a staff writer at the Chicago Reader in the mid-2000s, which was, of course, a scary time to be in news. When a bunch of my senior mentors there, all writers, got canned in 2007, I decided to re-evaluate my career and went to j-school at Berkeley to learn new skills. I was lucky enough to be there while Josh Williams was teaching web development (he left for the NYT, where he worked on Snowfall and tons of other big interactive pieces), and essentially attached myself at the hip. It turned into a year-long independent study, and got me a job on the launch team at The Bay Citizen, where I created a news apps team that made some really cool data projects for the Bay Area (RIP, TBC).

What quantitative skills did you start with?

I’ve always appreciated structured ways of looking at information. There’s something about well-formatted tables of information and clean spreadsheets that makes me really happy. That’s the most important skill for a data journalism, in my opinion: a love of working with structured data, and creating whole new systems and worlds atop it. That strange love is what makes you want to put the time in to learn R, command line tools, pivot tables, and so on – all stuff I didn’t pick up ’til halfway through my first job in data journalism.

Did you have any mentors? Who? What were the most important resources they shared with you?

“Mentor” is a funny word. Here are a few people who’ve left deep thumbprints on the way I think about my work (whether they intended to or not).

Josh Williams taught me everything I needed to know to get a really good first job in data journalism and news apps: what’s a text editor, what’s the command line, what’s a Web framework. More than that, he got me thinking in terms of abstraction. For instance, he was always saying, “Never build something you can only use once.” Instead, think both in terms of the specific needs of the project in front of you, and the broader needs of a similar project you might not even know of till next year. Seeing the way he held both of these concepts in his head at once was an incredible lesson in how to be a journalist who is also a pretty decent project manager.

Brian Boyer taught me the importance of having a guiding philosophy (or three) to your work. The why of what you do, not just the how. And that your philosophies can sound more like something a chef or a potter would say, than a data nerd. In other words, he got me thinking about this work as craft.

Scott Klein has inspired me to better know my shit. That is, it’s not enough to read a few blog posts about data journalism and crown yourself the next Edward Tufte. There’s a lot of history to what we do, a lot of important choices to be made, and fortunately, there are very old and very new books out there to learn from. You can’t have a conversation with Scott without wanting to go pick up a book.

What does your personal data journalism “stack” look like? What tools could you not live without?

1. A good, simple text editor, with good syntax highlighting
2. A spreadsheet app, with version control and collaborative editing
3. GitHub
5. The cognitive ability to think in terms of abstraction

Where should people who want to learn start?

A hundred people have said it before me, and better: pick a project you genuinely want to do, and then hack, Google, and plead for help in forums, and read books, until you get it working.

Where do you turn to keep your skills updated or learn new things?

1. The NICAR conference
2. Twitter
3. Increasingly, printed books
4. Dissecting the work of colleagues at other shops

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are a ton of challenges, so I’ll pick one: we don’t have a pair programming model on the editorial side of the newsroom, and we need one.

Journalism schools still teach journalism as a very hierarchical, often solitary pursuit. That’s not the way it works in data journalism, and the best learning is still gonna be on the job. That requires cross-pollination between folks with different skill sets. We need a pairing model across newsrooms, not just in the nerd corner.

I’ve had several people tell me they’re surprised to learn how small my team is, given the daily volume of content we put out. That’s because we’re not the only ones who can work with data and visuals in our newsroom. W’ve spent serious time pairing with something like 1 in 3 staffers here, working and training side by side whenever physically possible. (We have offices on different coasts). We’ve gotten several editors and reporters on Github, and while we don’t have them checking in code through the command line (yet), they’re well-versed in the how and why of version control.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s all quantitative. That’s like guiding principle #3 on my team: everything is data. Words are data. Gifs are data. If it can be sorted, tracked, counted, merged, filtered: it’s probably data. I’d say half the project my team does is more qualitative than quantitative. That is, most people wouldn’t consider it data visualization so much as photo essays, games, quizzes, etc. There’s a lot of power in developing a data skill set — both technical and cognitive — that lets you make cool things with words and pictures, too.

You’re written about Silicon Valley’s “brogrammer problem.” How do data journalists and their communities of practice handle issues of race, sexism or gender?

NICAR is a pretty healthy place to be a non-white, non-male person working in journalism. I can’t speak to issues of class, ability, gender identity, and other types of difference, other than to say we’re almost definitely less good at them, and that needs to change.

I don’t have experience with the way folks in this community handle issues of inclusion issues when they come up, but I have seen evidence of folks working to preemptively to create environments that are more less exclusionary than the norm in web development, quantitative analysis, the visual arts, or journalism. Maybe it’s because there haven’t been that many of us webby data journos till recently. Data journalists are pragmatic by nature, and maybe it just didn’t make sense to alienate potential swaths of new recruits.

That’s not to say everything is rainbows and sunshine, but I’m gonna take a rare moment of optimism here and say that I’m proud to represent this community, because in my experience, it’s genuinely committed to inclusion.

How It's Made, Research, Tips & Tutorials

Data skills make you a better journalist, says ProPublica’s Sisi Wei


sisi-weiI’ve found that the best antidote to a decade of discussion about the “future of news” is to talk to the young journalists who are building it. Sisi Wei’s award-winning journalism shows exactly what that looks like, in practice. Just browse her projects or code repositories on Github. Listening to her lightning talk at the 2014 NICAR conference on how ProPublica reverse engineered the Sina Weibo API to analyze censorship was one of many high points of the conference for me.

Wei, a news applications developer at ProPublica, was formerly a graphics editor at The Washington Post. She is also the co-founder of “Code with me,” a programming workshop for journalists. Our interview about her work and her view of the industry follows.

Where do you work now? What is a day in your life like?

I currently work at ProPublica, on the News Applications Team. We make interactive graphics and news apps; think of projects like 3D flood maps and Dollars for Docs.

At ProPublica, no one has a specific responsibility like design, backend development, data analysis, etc. Instead, people on the team tend to do the whole stack from beginning to end. When we need help, or don’t understand something, we ask our teammates. And of course, we’re constantly working alongside reporters and editors outside of the team as well. When someone’s app is deploying soon, we all pitch in to help take things off his/her plate.

On a given day, I could be calling sources and doing interviews, searching for a specific dataset, cleaning data, making my own data, analyzing it, coming up with the best way to visualize it, or programming an interactive graphic or news app. And of course, I could also be buried beneath interview notes and writing an article.

How did you get started in data journalism? Did you get any special degrees or certificates? What quantitative skills did you have?

I got started in college when I began making interactive graphics for North by Northwestern. I was a journalism/philosophy/legal studies major, so I can safely say that I had no special degrees or qualifications for data journalism.

The closest formal training I got was an “Introduction to Statistics” course my senior year, which I wish I’d taken earlier. I also had a solid math background for a non-major. The last college math course I took was on advanced linear algebra and multivariable calculus. Not that I’ve used either of those skills in my work just yet.

Did you have any mentors? Who? What were the most important resources they shared with you?

Too many to list. So, here’s just a sample of all the amazing people who I’ve been lucky to consider mentors in the past few years, and one of the many things they’ve all taught me.
Tom Giratikanon showed me that journalists could use programming to tell stories and exposed me to ActionScript and how programming works. Kat Downs taught me not to let the story be overshadowed by design or fancy interaction, and Wilson Andrews showed me how a pro handles making live interactive graphics for election night. Todd Lindeman taught me how to better visualize data and how to really take advantage of Adobe Illustrator. Lakshmi Ketineni and Michelle Chen honed my javascript and really taught me SQL and PHP.

Now at ProPublica, my teammates are my mentors. Here is where I learned Ruby on Rails, how news app development really works and how to handle large databases with first ActiveRecord and now ElasticSearch (which I am still working on learning).

What does your personal data journalism “stack” look like? What tools could you not live without?

  • Sublime Text, whose multiple selection feature is the trump card that makes it impossible for me to switch to anything else. If you haven’t used multiple selection, stop what you’re doing and go check it out.
  • The Terminal, for deploying and using Git or just testing out small bits of code in Ruby or Python.
  • Chrome, to debug my code.
  • The Internet, for the answers to all of my questions.

What are the foundational skills that someone needs to practice data journalism?

An insatiable appetite to get to the bottom of something, and the willingness to learn any tool to help you find the answers you’re looking for. In that process, you’ll by necessity learn programming skills, or data analysis skills. Both are important, But without knowing what questions to ask, or what you’re trying to accomplish, neither of those skills will help you.

Where should people who want to learn start?

In terms of programming, just pick a project, make it simple, make it happen and then finish it. Like Jennifer DeWalt did when she made 180 websites in 180 days.

Regarding data analysis, if you’re still in school, take more classes in statistics. If you’re not in school, NICAR offers CAR boot camps, or you can search for materials online, such as this book that teaches statistics to programmers.

Where do you turn to keep your skills updated or learn new things?

I don’t have a frequent cache of websites that I revisit to learn things. I simply figure out what I want to learn, or what problem I’m trying to solve, and use the Internet to find what I need to know.

For example, I’m currently trying to figure out which Javascript library or game engine can best enable me to create newsgames. I started out knowing close to nothing about the subject. Ten minutes of searching later, I had detailed comparisons between game engines, demos and reviews of gaming Javascript libraries, as well as wonderful tips from indie game developers for any rookies looking to get started.

What are the biggest challenges that newsrooms face in meeting the demand for people with these skills and experience? Are schools training people properly?

There are two major pipelines for newsrooms to recruit people with these skills. The first is to recruit journalists who have programming and/or data analysis experience. The second is to recruit programmers or data analysts to come into journalism.

The latter, I think, is much harder than the former, though the Knight-Mozilla OpenNews Fellowship is doing a great job of doing this. Schools are getting better at teaching students data journalism skills, but not at a high enough rate. I often see open job positions, but I rarely see students or professionals with the right skills and experience unable to find a job.

The lack of students, however, is a problem that starts before college. When high school students are applying for journalism school, they expect to go into print or radio or TV news. They don’t expect to learn how to code, or practice data analysis. I think one of the largest challenges is how to change this expectation at an earlier stage.

All of that said, I do have one wish that I would like journalism schools for fulfill: I wish that no j-school ever reinforces or finds acceptable, actively or passively, the stereotype that journalists are bad at math. All it takes is one professor who shrugs off a math error to add to this stereotype, to have the idea pass onto one of his or her students. Let’s be clear: Journalists do not come with a math disability.

What data journalism project created by someone else do you most admire?

I actually want to highlight a project called Vax, which was not built by journalists, but deploys the same principles as data journalism and has the same goals of educating the reader.

Vax is a game that teaches students both how epidemics spread, as well as prevention techniques. It was created originally to help students taking a Coursera MOOC on Epidemics really engage with the topic. I think it’s accomplished that in spades. Not only are users hooked right from the beginning, the game allows you to experience for yourself how people are interconnected, and how those who refuse vaccinations affect the process.

How has the environment for doing this kind of work changed in the past five years?

Since I only entered the field three years ago in 2011, all I can say is this: Data journalism is gaining momentum.

Our techniques are becoming more sophisticated and we’re learning from our mistakes. We’re constantly improving, building new tools and making it easier and more accessible to do common tasks. I don’t want to predict anything grand, but I think the environment is only going to get better.

Is data journalism the same thing as computer-assisted reporting or computational journalism? Why or why not?

To me, data journalism has become the umbrella term that includes anyone who works in data, journalism and programming. (And yes, executing functions in Excel or writing SQL queries is both data and programming.)

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

Philip Meyer, who wrote “Precision Journalism,” answers the first part of this question with his entire book, which I would recommend any aspiring data journalist read immediately. He says:

“Read any of the popular journals of media criticism and you will find a long litany of repeated complaints about modern journalism. It misses important stories, is too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their cause is not so much a lack of energy or talent or dedication to truth, as the critics sometimes imply, but a simple lag in the application of information science — a body of knowledge — to the daunting problems of reporting the news in a time of information overload.”

Data journalism allows journalists to point to the raw data and ask questions, as well as question the very conclusions we are given. It allows us to use social science techniques to illuminate stories that might otherwise be hidden in plain sight.

News apps specifically allow users to search for what’s most relevant to them in a large dataset, and give individual readers the power to discover how a large, national story relates to them. If the story is that doctors have been receiving payments from pharmaceutical companies, news apps let you search to see if your doctor has as well.

What’s the one thing people always get wrong when they talk about data journalism?

That it’s new, or just a phase the journalism industry is going through.

Data journalism has been around since the 1970s (if not earlier), and it is not going to go away, because the skills involved are core to being a better journalist, and to making your story relatable to millions of users online.

Just imagine, if a source told you that 2+2=18, would you believe that statement? The more likely scenario is that you’d question your source about why he or she would say something so blatantly wrong, because you know how to do math, and you know that 2+2=4. Analyzing raw data can result in a similar question to a source, except this time you can ask, “Why does your data say X, but you say Y?”

Isn’t that a core skill every journalist should have?

Between the Spreadsheets, Research

Publishers Can Afford Data Journalism, Says ProPublica’s Scott Klein


This winter, Scott Klein made a prediction at the Nieman Lab that that drew some attention: “in 2014, you will be scooped by a reporter who knows how to program.” As I noted at this blog, he was proven correct within the month, as enterprising journalists applied their data skills to creating scoops and audiences. Yesterday, The New York Times’ promising new data journalism venture,  The Upshot, published the most popular story at, confirming that original data and reporting presented with context and a strong narrative remains a powerful, popular combination. It’s still atop the most-emailed, viewed and shared leaderboards this morning.

scott-kleinSo, Klein was right. Again.

That’s not a huge surprise to me, nor anyone else in the data journalism world.

Klein, an assistant managing editor at ProPublica, draws on years of hands-on experience working with data, reporters and developers at one of the most important nonprofit news organizations in the world. Recently, his team has published projects like The Opportunity GapChina’s Memory Hole and Prescriber Checkup. Klein co-founded DocumentCloud with Aron Pilhofer, the New York Times editor whose perspectives on technology and the news we featured here earlier this month. Before he came to ProPublica, Klein directed editorial and business application development for the, and worked at The New York Times.

This spring, he spoke with me about what he sees in the industry and have an early read and review of the report that the Tow Center will publish next month. Our conversation, lightly edited for content and clarity, follows.

Is data-driven journalism too expensive?

News organizations are contracting and budgets are going down. Times are still very tough. That said, I suspect that some newsrooms say they can’t afford to hire newsroom developers when they really mean that their budget priorities lie elsewhere – priorities that are set by a senior leadership whose definition of journalism is pretty traditional and often excludes digital-native forms. I also hear a lot from people trying to get data teams started in their own newsrooms that the advice that newsroom leaders get is that newsroom developers are “unicorns” whom they can’t afford. Big IT departments sometimes play a confounding role here.

I suspect many metro papers can actually afford one or two journalist/developers — and there’s a ton of amazing projects a small team can do. For years, the Los Angeles Times ran one of the best news application shops in the country with only two dedicated staffers (they still do great work, of course, and the team has grown). If doing data journalism well is a priority of the organization, making it happen can fit into your budget.

What’s changed today?

Lots, of course, has changed since Philip Meyer’s pioneering days in the 1960s. One is that the amount of data available for us to work with has exploded. Part of this increase is because open government initiatives have caused a ton of great data to be released. Not just through portals like — getting big data sets via FOIA has become easier, even since ProPublica launched in 2008.

Another big change is that we’ve got the opportunity to present the data itself to readers — that is, not just summarized in a story but as data itself. In the early days of CAR, we gathered and analyzed information to support and guide a narrative story. Data was something to be summarized for the reader in the print story, with of course graphics and tables (some quite extensive), but the end goal was typically something recognizable as a words-and-pictures story.

What the Internet added is that it gave us the ability to show to people the actual data and let them look through it for themselves. It’s now possible, through interaction design, to help people navigate their way through a data set just as, through good narrative writing, we’ve always been able to guide people through a complex story.

Is this new state of affairs really different?

It’s a tectonic change both in the sense that it’s slow and gradual, and in the sense that it’s reshaping the entire landscape.

Data was always central to journalism. In the oldest newspapers, from the 17th century, you can find data. Correspondents would write about the prices of commodities in faraway cities (along with court gossip) for the benefit of merchants doing international business. Commodity prices, the contents of arriving cargo ships, and even the names of visiting businessmen were a big part of the daily mission of newspapers as they started to become more common.

As technology got better in the late 18th century and readers started demanding a different kind of information, the data that appeared in newspapers got more sophisticated and was used in new ways. Data became a tool for middle-class people to use to make decisions and not just as facts to deploy in an argument, or information useful to elite business people.

The change we’re experiencing thanks to the web increases the role of presentation of the data itself, both in great data visualization and in great exploratory graphics like news applications. We can show people “the back of the baseball card” on a large scale. We’ve got the tools, and the readers can understand it and make use of it. I feel like that’s as big a change as we’ve ever experienced, but I’m biased.

Do people want to read the data?

If it’s done well, people have a really big appetite to see the data for themselves.

Look how many people understand — and love — incredibly sophisticated and arcane sports statistics. We ought to be able to trust our readers to understand data in other contexts too. If we’ve done our jobs right, most people should be able to go to our “Prescriber Checkup” news application, search for their doctors and see how their prescribing patterns compare to their peers, and understand what’s at play and what to do with the information they find.

There are ways to design data so that more important numbers are bigger and more prominent than less important details. People know to scroll down a Web page for more fine-grained details. At ProPublica, we design things to move readers through levels of abstraction from the most general, national case to the most local example.

Do you recruit or programmers to do DDJ? Or teach journalists?

Both. But culture matters a lot, too. People with the right mindset, who feel valued for their editorial judgment and creativity, and who are given real responsibility over their work, will learn whatever they need to learn in order to get a project done. The people on my team focus on telling great journalistic stories and don’t let not knowing how to do something stop them from doing so. They learn whatever skills, techniques and expertise they need to learn.

In terms of journalists learning how to program, I think there are some myths about what “programming” means. It doesn’t have to mean a computer science degree and it doesn’t have to mean what Google does. I know journalists who make incredibly complex scrapers for their reporting work who will tell you they don’t know how to program. Really, making tools to automate tasks is what a programmer does. There’s no magic threshold you have to pass between programmer and not-programmer.

Of course, there is a difference between knowing how to code and being a computer scientist. If you’ve learned about algorithmic efficiency and can express it mathematically, and if you’ve studied how compilers work, all under the guidance of a person who knows the subject very well in an academic environment, you’ve got skills that will help you write better, faster, more efficient code. That’s different than learning how to use a high-level programming language to get a task done.

Much of what we do in newsrooms is on deadline and meant to be put behind a caching system that makes efficient code much less important, so computer science is not a prerequisite for being a great newsroom coder. In newsrooms, most of us rely on frameworks like Rails or Django that already make great low-level programming decisions anyway.

Are there journalists picking those DDJ skills up?

Yes, it’s happening, and the pace is accelerating. A few years ago the NICAR conference was a few hundred people. This year it was almost 1,000 people. Next year, it will be even bigger.

On every desk in the newsroom, reporters are starting to understand that if you don’t know how to understand and manipulate data someone who can will be faster than you. Can you imagine a sports reporter who doesn’t know what an on-base percentage is? Or doesn’t know how to calculate it himself? You can now ask a version of that question for almost every beat.

There are more and more reporters who want to have their own data and to analyze it themselves. Take for example my colleague, Charlie Ornstein. In addition to being a Pulitzer Prize winner, he’s one of the most sophisticated data reporters anywhere. He pores over new and insanely complex data sets himself. He has hit the edge of Access’s abilities and is switching to SQL Server. His being able to work and find stories inside data independently is hugely important for the work he does.

There will always be a place for great interviewers, or the eagle-eyed reporter who find an amazing story in a footnote on page 412 of a regulatory disclosure. But, here comes another kind of journalist who has data skills that will sustain whole new branches of reporting.

How It's Made, Research

What’s the Upshot? A promising data-driven approach to the news.


This morning, The New York Times officially launched its long-awaited data-driven news site, “The Upshot.”

David Leonhardt, the site’s managing editor, introduced The Upshot in a long note posted to Facebook and then to nytimes.this morning, explaining how the site aspires to help readers navigate the news.

Leonhardt shared two reasons for The Upshot’s launch. First, help people to understand the news better:

“We believe we can help readers get to that level of understanding by writing in a direct, plain-spoken way, the same voice we might use when writing an email to a friend. We’ll be conversational without being dumbed down. We will build on the excellent journalism The New York Times is already producing, by helping readers make connections among different stories and understand how those stories fit together. We will not hesitate to make analytical judgments about why something has happened and what is likely to happen in the future. We’ll tell you how we came to those judgments — and invite you to come to your own conclusions.”

Second, make the most of the opportunity afforded by the growth of the Internet and the explosion of data creation.

Data-based reporting used to be mostly a tool for investigative journalists who could spend months sorting through reams of statistics to emerge with an exclusive story. But the world now produces so much data, and personal computers can analyze it so quickly, that data-based reporting deserves to be a big part of the daily news cycle.

One of our highest priorities will be unearthing data sets — and analyzing existing ones — in ways that illuminate and explain the news. Our first day of material, both political and economic, should give you a sense of what we hope to do with data. As with our written articles, we aspire to present our data in the clearest, most engaging way possible. A graphic can often accomplish that goal better than prose. Luckily, we work alongside The Times’s graphics department, some of the most talented data-visualization specialists in the country. It’s no accident that the same people who created the interactive dialect quiz, the deficit puzzle and therent-vs-buy calculator will be working on The Upshot.

The third goal, left unsaid by Leonhardt, is the strategic interest in the New York Times has in creating a media entity that generates public interest and draws the massive audience that  Nate Silver’s (now departed) FiveThirtyEight blog did, as the 2014 midterm elections draw near. In the fall of 2012,  20% of the visitors to the sixth-most-trafficked  website in the world were checking out 538. Many were coming specifically for 538.

First impressions

My aesthetic impressions of The Upshot have been overwhelmingly positive: the site looks great on a smartphone, tablet or laptop, and loads quickly. I also like how each columnist’s Twitter handle is located below their headshot and the smooth integration of social sharing tools.

My impression of the site’s substance were similarly positive: the site led off with a strong story on American middle class and income inequality based upon public data, an analysis of affirmative action polling, a data-rich overview of how the environment has changed in the 44 years since the first Earth Day, a look at what good marathons and bad investments have in common, a short item on how some startups are approaching regulated industries, political field notes from Washington and a simple data visualization of Pew Internet data that correlates an appreciation for Internet freedom with Internet use. Whew! internet-use-freedom-nyt-graphic The feature that many political junkies will appreciate most, however, is a clever, engaging interactive that forecasts the outcome of the 2014 election in the U.S. Senate.

A commitment to showing their work

What really made me sit up and take notice of The Upshot, however, was the editorial decisions to share how they found the income data at LIS, link to the dataset, and share both the methodology behind the forecasting model and the code for it on Github. That is precisely the model for open data journalism that embodies the best of the craft, as it is practiced in 2014, and sets a high standard right out of the gate for future interactives at The Upshot and for other sites that might seek to compete with its predictions. They even include those estimates: leaderboard-upshot Notably, FiveThirtyEight is now practicing a more open form of data journalism as well, “showing their work”:


Early reviews

I’m not alone in positive first impressions of the content, presentation and strategy of the Times’ new site: over at the Guardian Datablog, James Ball published an interesting analysis of data journalism, as seen through the initial foray of The Upshot, FiveThirtyEight and Vox, the “explanatory journalism” site Ezra Klein, Melissa Bell and Matt Yglesias, among others, launched this spring.

Ball’s whole post is worth reading, particularly with respect to his points about audience, diversity, personalization, but the part I think is particularly important with respect to data journalism is the one I’ve made above, regarding being open about the difficult, complicated process of reporting on data as a source:

Doing original research on data is hard: it’s the core of scientific analysis, and that’s why academics have to go through peer-review to get their figures, methods and approaches double-checked. Journalism is meant to be about transparency, and so should hold itself to this standard – at the very least.

This standard is especially true for data-driven journalism, but, sadly, it’s not always lived up to: Nate Silver (for understandable reasons) won’t release how his model works, while FivethirtyEight hasn’t released the figures or work behind some of their most high-profile articles.

That’s a shame, and a missed opportunity: sharing this stuff is good, accountable journalism, and gives the world a chance to find more stories or angles that a writer might have missed.

Counter-intuitively, old media is doing better at this than the startups: The Upshot has released the code driving its forecasting model, as well as the data on its launch inequality article. And the Guardian has at least tried to release the raw data behind its data-driven journalism since our Datablog launched five years ago.

Ball may have contributed to some category confusion by including Vox in his analysis of this new crop of data journalism startups, and he’s not alone: Mathew Ingram also groups Vox together with The Upshot and 538 in his post on “explanatory journalism.”

Both could certainly be forgiven, given Leonhardt’s introduction expressed a goal to help readers understand and Nate Silver’s explicit focus upon explanation as a component of his approach to data-driven journalism. The waters about what to call the product of these startups is are considerably muddied at this point.

Hopefully, over time, those semantic waters clarify and reveal accurate, truthful and trustworthy journalism. Whatever we call them, there’s plenty of room for all of these new entrants to thrive, if they inform the public and build audiences. 

“I think all of these sites are going to succeed,” said Leonhardt, in an interview with Capital New York. “There is much more demand for this kind of journalism right now than there is supply.”

In an interview with Digiday, Leonhardt futher emphasized this view:

“I don’t think this is about a competition between these sites to see which will emerge victorious,” he said. “There is more than enough room for any site that is providing journalism of this kind to succeed. Given there’s a hunger for conversational journalism and database journalism, as long you’re giving people reporting that’s good, you’re going to succeed.”