Aron Pilhofer on data journalism, culture and going digital

March 27, 2014 by

When it comes to computer-assisted reporting and the ways media companies are using technology, there are few people in the U.S.A. as knowledgeable as Aron Pilhofer. He runs a newsroom team at The New York Times that combines journalism, social media, technology and analytics, co-founded the open source DocumentCloud.org project, is a two-time grantee of the Knight News Challenge, and co-founded Hacks & Hackers, a network of people focused on applying development and digital innovation to the method and practice of journalism. Happily, part of my research into data journalism’s past, present and future has been interviewing people like Pilhofer, given the insight that those talks offers for debugging debates about “what it all means.” He was kind enough to talk to me earlier this month. Our interview follows, lightly edited for clarity, content and [bracketed] for context.

Aron Pilhofer

How do you feel about the term “data journalism” supplanting computer-assisted reporting (CAR)?

When someone can tell me what is meant by “data journalism,” maybe I would start to feel strongly about it.

I think there’s a lack of specificity, with various definitions. It tends to be almost geographically based. In Europe, when you talk about data journalism, you’re almost always talking about data visualization. In the United States, it’s sometimes data visualization, sometimes old school computer-assisted reporting.

While I prefer the term data journalism, because it’s much less goofy [than computer-assisted reporting], I think there’s a lack of precision. You do need to define your terms. The way I see it, it is a continuum where the work that Phil Meyer, Barlett and Steele were doing 30 years ago [continues] all the way to today, with people like Sarah Cohen and John Keefe, all sharing kind of the same elements. You treat data as a source [in your reporting].

What’s happening with the market for data journalists and your ability to hire for these skills?

In some ways it’s easier. In others, it’s harder today. There’s way more competition now. We’re losing people to really good newsrooms. We are not the only game in town, which we used to be. There was a time when there was us and there was the Washington Post, and that was kind of it.

What are you working on now that’s new and potentially important?

We just started a newsroom analytics team. The kinds of projects we’re doing there are entirely editorial. They are not tied to advertising at all.

Right now, many newsrooms are stupid about the way they publish. They’re tied to a legacy model, which means that some of the most impactful journalism will be published online on Saturday afternoon, to go into print on Sunday. You could not pick a time when your audience is less engaged. It will sit on the homepage, and then sit overnight, and then on Sunday a home page editor will decide it’s been there too long or decide to freshen the page, and move it lower.

I feel strongly, and now there is a growing consensus, that we should make decisions like that based upon data. Who will the audience be for a particular piece of content? Who are they? What do they read? That will lead to a very different approach to being a publishing enterprise.

Knowing our target audience will dictate an entirely differently rollout strategy. We will go from a “publish” to a “launch.” It will also lead us in a direction that is inevitable, where we decouple the legacy model from the digital. At what point do you decide that your digital audience is as important — or more important — than print?

This sounds similar to the approach that many online outlets are pursuing.

There’s not a digital property on the planet that isn’t doing this kind of thing — or a smart one, anyway. Medium has its own metrics. Upworthy has attention minutes that reflect engagement.

[The Interactive News] team can build just about anything now to scale to a ridiculous amount of traffic, tying into every New York Times system. That isn’t the problem anymore. We can make everything work [from a technical standpoint] on David Leonhardt’s project, which is our answer to 538, but it still may not find an audience.

This is a product build, where we take a particular flavor of journalism and find an audience. We find a way for the audience that would want that to find it. It is really hard to think about when you really only know one tune: Your homepage. It is really powerful, but that alone isn’t going to do it. How does that change what we’re building? How can we consistently get that audience to return?

Building one-off interactives isn’t that important. When you’re starting to build persistent features, like what John Keefe has done with his Cicada Project, or Scott Klein has built with Dollars for Docs, you’ve got to think about these things more deeply. Who in the newsroom is better positioned than a data journalist to do that?

How many data journalists do you have on staff at the Times?

It depends on your definition; we could be anywhere from 5 to 50. We have a computer-assisted reporting team, which is 5-6 people. We have a graphics desk, which is probably 15 primarily or largely dedicated to digital. On my team we have 21 developers. Then there’s our research and development department, and design team.

Is there anyone you’d call a computational journalist?

Maybe Chase Davis. Amanda Cox is a statistician by training. Sarah Cohen was a former statistician before she went into CAR. We have data scientists on the business side. R&D has a couple, like Mike Dewar, who used to be at Bitly. These are people who are applying data science techniques to actual journalism, stories, infographics and data visualizations.

Would you agree with an estimate of several hundred data journalists currently working in the USA?

Absolutely. NICAR has 850 people registered, with a healthy walkup expected. [The final attendance at NICAR 2014 was 997 people.] Five years ago, the conference was on life support, with maybe 250 people. Now, this number of people showing up has changed it a lot, I think for the better. It has become the “must-go “conference for folks who are doing what my team does, for the John Keefes & Scott Kleins of the world.

Is there a mismatch between the supply and demand for people with the skills you’ve referenced?

It’s true. I have two openings now.

What was your path to the profession?

I was a political reporter, but always used data in my reporting. I just started doing it in college. I just started messing around. I had a history professor who was not well known then. Now, he’s borderline famous from doing quantitative methods in history. He’d do statistical sampling of historical census data that had just been paper records before that. Suddenly, you could do queries on the 1930 Census. You were not just basing a historic analysis on papers or on interviews with people, or what you could glean from anecdotes. You were looking at data. It was incredible.

That’s not that different from a data journalist does, on the CAR side. Instead of a person, you’re using data as a source. Over time, I shifted from being a reporter who does CAR to being a specialist at the Center for Public Integrity to a CAR editor at the New York Times and then started this team.

How did you start learning to program?

I can thank an IRS story on 527 committees, which were then the campaign finance loophole du jour. They were previously unregulated and Congress, in its wisdom, put the IRS in charge of regulating them. It was idiotic. The IRS is not a disclosure agency. They put together the world’s worst disclosure website. There was basic data there, but you couldn’t aggregate it or access it in a meaningful way. It would have taken thousands of mouse clicks to get all of it.

I talked to a public information officer, after they denied my FOIA request for the database underlying the site. He said it was all on the website.

So, I created the world’s worst Web scraper in PHP. It ran from the browser. I didn’t know the command line well.

Is “Silent Partners” still on the Web?

Parts of it are long gone, though bits remain at the Center for Public Integrity. What you won’t find is the massive searchable database. We did what IRS should have done. We took all the paper filings and got a grant to do data entry. We sent them to a company in Virginia. We spent $80,000 to create what was then the only searchable database of political donor contributions. It’s completely out of date now. The Center for Responsive Politics has been continuing to do this.

I discovered that I really enjoyed the coding part in addition to reporting. The art of it. That’s how I ended up shifting into my current job.

Have you seen more coders move into data journalism, or journalists learn how to code?

I’ve seen far, far more move the outside in, from non-journalism roles.

Do you have any sense of why that might be?

Journalism is one of the few professions that not only tolerates general innumeracy but celebrates it. I still hear journalists who are proud of it, even celebrating that they can’t do math, even though programming is about logic. It’s hard to get a journalist to open up a spreadsheet, much less open up a command line. It is just not something that they, in general, think is held to be an important skill.

It’s baffling to me. Look at The Sun-Sentinel, which just won another Pulitzer for a story on speeding cops that you could only do with data analysis. You would think you wouldn’t have to make the case that this is core to what journalists should know.

It’s a cultural problem. There is still far too much tolerance for anecdotal evidence as the foundation for news stories.

So this is endemic?

I don’t know how to solve it. Look at NICAR being around as long as it has. Early on, they had the naive belief that if you could train enough people, they could make the organization irrelevant. Now, when you look back, it’s hilarious. Obviously, that’s never going to happen for practical purposes. I don’t think we’re anywhere near the point where you could say, given enough training and time, that you would not need a specialist in the newsroom. We’re so far away from that.

Are there cultures where this is changing? Maybe ProPublica?

It’s as far along as it is is because of Scott Klein. It took years before they put Jeff Larson on news stories. That just happened this year. There are newsrooms making this a significant project. Look at the L.A. Times, or WNYC. I think John Keefe is a fricking genius. I wish I were doing the work he is.

What others would you highlight?

Check out Nicolas Kayser-Bril [of Journalism++] and De-Correspondent, out of the Netherlands.

Given time, given urgency, we will forge something new from the old models. Given how much time we have had, I would have hoped we’d be further along. Maybe I’m just impatient. When do you treat digital as your primary platform?

We are launching three subscription products this year. If all goes well, we will have more subscriptions on pure digital than in print [at the end of 2014]. We have to think about where the eyeballs are. From the perspective of the newsroom, over time, we have to think primarily digital. That’s the cultural change that isn’t happening fast enough.

There needs to be a strategy, where all the things we considered “nice to have” in a newsroom — from analytics to coders to designers — all of a sudden, they’re building our core product. Text only takes you a certain distance in digital sphere. That’s the part that I’m excited about building.

[Image Credit: Knight Foundation]

9 Comments

viagra_online Oct 22, 2014
Rahma Ahmad Aug 31, 2014
شركة نقل اثاث بالدمام شركة تخزين اثاث بالدمام شركة تنظيف خزانات بالدمام شركة مكافحة حشرات بالدمام شركة رش مبيدات بالدمام شركة تنظيف فلل بالدمام  شركه تنظيف موكيت بالدمام شركة كشف تسربات المياه بالدمام شركه تنظيف منازل بالدمام شركة تنظيف بالدمام شركة تسليك مجارى بالدمام شركة مكافحة حشرات بالجبيل شركة مكافحة نمل ابيض بالدمام شركة تخزين اثاث بالدمام شركة نقل اثاث بالدمام شركة رش مبيدات بالدمام شركة مكافحة حشرات بالدمام شركة تنظيف خزانات بالدمام شركة تسليك مجارى بالدمام شركة تنظيف فلل بالدمام شركة تنظيف منازل بالدمام شركة تنظيف شقق بالدمام نقل عفش جدة نقل عفش مكة شركة نقل عفش بالطائف شركة مكافحة حشرات بمكه شركة مكافحة حشرات بالطائف شركة رش مبيدات بمكة شركة تنظيف خزانات بمكة شركة تنظيف خزانات بالطائف شركة تنظيف منازل بمكة نقل اثاث بجدة شركة تنظيف منازل بجدة شركة تنظيف فلل بجدة شركة تنظيف شقق بجدة شركة تنظيف موكيت بجدة شركة تنظيف مسابح بجدة شركة تنظيف مجالس بجدة شركة تسليك مجارى بجدة كشف تسربات المياه بجدة شركة مكافحة الحشرات فى جدة شركات رش المبيدات الحشرية بحدة شركة تنظيف خزانات بجدة نقل عفش جدة شركة تنظيف بيارات بجدة شركة عزل خزانات بجدة شركة تنظيف خزانات بالمدينة المنورة نقل عفش بالمدينة المنورة شركة تخزين عفش بالمدينة المنورة شركة تنظيف بالمدينة المنورة مكافحة حشرات بالمدينة المنورة شركة رش مبيدات بالمدينة المنورة غسيل خزانات بالمدينة المنورة شركة عزل اسطح بالمدينة المنورة تخزين اثاث جدة شركة تنظيف مسابح بجدة شركة تنظيف مجالس بجدة شركة تنظيف موكيت بجدة شركة تنظيف شقق بجدة شركة تنظيف فلل بجدة شركات تنظيف المنازل في جدة شركة تنظيف بجدة نقل عفش جدة شركات مكافحة الحشرات فى جدة شركات رش المبيدات الحشرية بحدة شركة مكافحة البق بجدة شركة مكافحة النمل الابيض بجدة شركة مكافحة الصراصير بجدة شركة مكافحة فئران بجدة شركة مكافحة العته بجدة نقل عفش مصر شركة نقل اثاث بالمنصورة شركة نقل اثاث بالاسكندرية شركة نقل اثاث بالمعادى شركة نقل اثاث فى مدينة نصر شركة نقل اثاث بمدينتى شركة نقل اثاث بالتجمع شركات نقل الاثاث بالرحاب شركة نقل اثاث بالقاهرة افضل شركة شحن فى مصر شركة مكافحة النمل الابيض بالمدينة المنورة مكافحة البق بالدمام مكافحة القوارض بالمدينة المنورة ابادة الحشرات بالدمام احسن شركة تنظيف بالدمام مكافحة الصراصير الدمام عزل خزانات بالمدينة المنورة شركة تطهير خزانات بجدة شركة عزل مائى بجدة شركة تنظيف مكاتب بجدة شركة دهانات داخلية بالرياض شركة كشف تسربات المياه بالجبيل شركة تنظيف قصور بجدة شركه عزل خزانات بالدمام شركة تخزين أثاث بالقطيف شركة تنظيف فلل بالمجمعة شركة عزل اسطح بمكة شركة مكافحة حشرات ورش مبيدات بضرماء ليموزين مطار برج العرب شركة تسليك مجاري بضرماء شركة نقل أثاث بضرماء شركة مكافحة حشرات ورش مبيدات بالدمام شركة تنظيف بيارات بالخبر شركة عزل اسطح بالجبيل شركة تنظيف قصور بمكة شركة مكافحة حشرات ورش مبيدات بالخبر شركة عزل خزانات بضرما شركة تنظيف فلل بالهفوف شركة تنظيف فلل بضرماء شركة تنظيف بضرماء شركة تخزين أثاث برأس التنورة | شركة بروق السيف شركه عزل اسطح بالجبيل شركة تنظيف مكاتب بمكة شركة تنظيف بيارات بالجبيل شركة نقل اثاث بسيهات شركة عزل اسطح بالهفوف شركة عزل خزانات بشقراء شركة عزل خزانات بالمجمعة شركة نقل اثاث فى مدينة نصر شركة نقل اثاث بالقاهرة
polo outlet Jul 10, 2014
Walking on the way home, Nike Air Jordan, suddenly a scenery touched, Ralph Outlet, stopped to savor, MCM Outlet Online, to put a camera gesture, Polo Outlet Online, to leave a shallow spring, Gucci Shoes UK, of negatives here, Michael Kors Outlet, deep in her heart extended spring scenery, Marc Jacobs Bags Outlet, etc, returned home, Canada Goose Jackets, using bamboo memo box, Ralph Lauren Outlet, to do with pen, Michael Kors USA, and ink painting, North Jackets Outlet Online, the intention to write, a sweet words, Beats By Dre, do a recall album, wait until old age, Hermes Bags Outlet, come to appreciate slowly, North Clearace Outlet Online, walked with light, Burberry Bags Outlet, footsteps walked on, the King, Monster Headphones Outlet, or the original scene, Longchamp Pairs, people are still the original person, Prada Outlet Online, just change a mood, Michael Kors Outlet Online, all plain people, Cheap Oakley Sunglaases, things, Coach Factory Shop, and it was better together. Handbags Outlet Online, http://www.superbagsmarket.com/ Louis Vuitton Outlet Online Hermes Bags Outlet Online Prada Outlet Chanel Outlet Online Gucci Outlet Online Burberry Outlet Celine Outlet Balenciaga Outlet Christian Bior Outlet Online Chloe Outlet Online Bvlgari Outlet Online Bally Outlet coach Outlet Michael Kors Outlet Online MCM Backpack Outlet Online Fendi Outlet Online mulberry Outlet Marc Jacobs Outlet Miu Miu Outlet Online Ysl Outlet Online Tory Burch Outlet Online Givenchy Outlet Online Ferragamo Outlet Online Lancel Outlet Online Loewe Bags Outlet Online Tods Outlet Online Paul Smith Outlet Online D&G Bags Outlet Online Alexander Wang Outlet Online Bottega Veneta Outlet Online
cheap_cialis Jun 01, 2014
Mindy McAdams Apr 01, 2014
Thanks, Alexander and Aron. This is one of the best articles about data journalism that I have read. I think it would be a great introduction for journalism students. Let's start a conversation about the fact of journalists "celebrating that they can’t do math, even though programming is about logic." How wrong is that? Yet it always gets a jolly laugh in a roomful of journalists when someone says, "I became a journalist because I'm bad at math." Damn it, that is just wrong. It should change.
Daniel Cave Mar 29, 2014
Aron/alexander, How is the change/improvement in technology changing journalism? There are several tools available today for journalists to mine data without needing to be able to program. Being able to count to 10 is about as much math as you need these days, to grab data from the web.

Post a comment

We're trying to advance the conversation, and we trust that you will, too. We'd rather not moderate, but we will remove any comments that are blatantly inflammatory or inappropriate. Let it fly, but keep it clean. Thanks.