NPR news app team experiments with making data-driven public media with the public
News applications, or ‘apps’ as they are commonly known, rely on data. They increasingly give mobile users better ways to understand the world they’re moving through, from general topics like news, weather and traffic down to the little league baseball scores. For the raw data that creates these applications, journalists have in the past requested the data and gathered it from public sources, in libraries or courthouse basements.
Today, journalists have many more options. Newsroom reporters and developers can download, scrape and digitize data from a wealth of sources. In the future, journalists will create it themselves and look to their distributed audiences of readers, listeners and watchers to help gather it for them.
In the most forward thinking newsrooms this is already happening. Earlier this year, WNYC asked its listeners to help them track the emergence of cicadas with inexpensive sensors. This week, the NPR news applications team released a project around accessible playgrounds.The NPR team made a request of their community of listeners and readers: help public media collect the data that drives it and make the resource better for everyone. The NPR Web app enables parents and children to search for accessible playgrounds, takes a commonly used for consumer recommendation engines and adds a strong public service element.
“This is sort of like Yelp, except for playgrounds for kids with special needs,” said Brian Boyer, the head of NPR’s news applications team, in an interview. “It is the first of its kind, nation-wide database of playgrounds that are well suited to kids in wheelchairs, kids with autism, or kids with other special needs.”
As Robert Benincasa reported for NPR, changes to federal requirements now define playground accessibility as a civil right, which has resulted in more places to play for kids with special needs. Every playground built or altered after March 14, 2013 has to be wheelchair-friendly and support children with physical challenges. The challenge that their parents face, however, lies in knowing where those playgrounds are located in the neighborhoods and cityscapes around them.
Previously, while parents could turn to informal networks of friends, schools, advocacy organizations or government websites for information about accessible playground, finding a place to play wasn’t easy. That’s where NPR’s investigative reporters and news app team saw an opportunity and worked together over the course of two months to make it real.
“Robert, the reporter on this, was looking into the issue of playgrounds and ADA [Americans with Disabilities Act] compliance nationwide,” said Boyer.
“He had this chunk of data that he’d been gathering, came to us and said, ‘Hey, I’ve got this list that we’ve been cleaning up. It’s certainly not everything, but it’s something, and it could be useful to folks.’ We said that sounds incredible. He asked us to build a website. There are people in the news who would have said ‘This isn’t complete, we can’t publish it.’ Much to the credit of our editors and the investigative team, they said that’s OK. Let’s build something instead that creates the whole dataset, that is both a guide and a way to create a much better guide.”
Given the range of news bureaus around the United States, NPR is perhaps better placed to engage its audience in extending its reporting than any other national news organization. Given a mission and vision focused upon “creating a more informed public,” directly involving the public in creating data for the public is a natural progression for an organization virtually defined by radio decades ago to evolve towards as it goes digital in the 21st century.
This data is “horribly incomplete,” said Boyer. “We know that, and we’re OK with that, because we’re asking our audience, and hopefully everybody, to contribute. We’re hoping they’ll walk down the street, look at their playground, and see if it’s got some of these features that we’re looking for — and if it does, we’re hoping that people will edit the playground or add the playground to the database to help everybody else.”
If NPR is able to activate its audience to become active participants in data collection, much in the same way that Audubon’s Christmas Bird Count and eBird are crowdsourcing data collection on bird species, it will have created both a notable case study for the power of public engagement and an important database of public data that government itself can consult. There are decades of precedent where a listening or viewing audience collaborate with a media organization in collecting images, videos or stories. What’s remains relatively new in 2013 is the capacity for a networked populace to contribute data, whether it comes from sensors in droughts, geiger counters near potential sources of radiation. If turning data into stories is now a core element of investigative journalism, NPR’s news applications team is showing how to do it better and serve the public in the process.
Interview with Brian Boyer:
Boyer:: You can search for your town, and see what playgrounds are nearby you. If you know about a playground nearby you that isn’t listed, there a very simple button, “Add a Playground,” and it takes just a moment. You can use our little map to pinpoint where it is and tell us about the playground.
There are basically no required fields. There’s a lot of data we could be gathering but people don’t often even know the name of their playground, right down the street, and they certainly don’t know what organization built it. The mapping element is built so that you don’t need to know the street address, because who knows the street address of a playground?
The mapping is built sort of like how you request a car with Uber, where you drag a little dot around and place it. We tried to make the process of adding extremely low overhead.
How did your team make a intuitive mobile user interface for this data?
Boyer:: We take a user-centric approach to design. We try to put ourselves in our audience’s shoes. We would ask folks, “Do you know the address of your playground?” and they said, “no.” There isn’t address information on playground signage, right? It quickly became obvious that we would need to have a non-address oriented location thing.
We’re all avid app users. Someone brought up the Uber idea, and it was perfect. What’s technically interesting about about how this website works is that it is [lightweight] . One thing we like to do is build applications without servers. This whole thing is deployed to [Amazon] S3. It’s all deployed as flat files. When people contribute and add an edit, once a night, our servers pick up a lot of updates and then apply them, and then the server goes back to sleep. It’s extremely scalable. There’s no way that this website will ever go down.
What did it take to build this, in terms of resources and time?
Boyer:: It’s the largest project that we’ve done as a team in the last year, with the exception of the election last November. For two months, it was pretty much the whole team heads down, with two people you’d most characterize as software developers, two people that you’d most characterize as designers — although our designers are coders as well — and then Matt Stiles, our resident reporter, working on gathering data, cleaning up data, contacting our sources. He worked with a reporter on the investigative team, who’d initially gathered the seed data.
Like all of our projects, this was an iterative process. We worked in one week cycles, showing our work every week to the investigative team, constantly changing it and adjusting the language, and then doing what I think Joel Spolksy called “hallway usability testing.” We’d grab people by the arm, walking by, and ask them to try it out. There are a lot of people who work at NPR headquarters who have kids. We got a lot of potential users just in the building who we were able to test the messages and the interactions.
On the technical side, it uses our app template, which is a baseline that we’ve built for all of our projects. It makes the first 90% of a project really easy, so that we can focus on the top 10% and the user experience, not so much worry about the servers and deployment.
If you vist Github/NPRApps, can you see the code and adopt it?
Boyer:: Yes and no. We’ve been going back and forth with our lawyers about open sourcing our work. Everybody is really into it, but the problem is that there is some stuff in this application that can’t be open sourced. The words we write, NPR is not interested in giving away. The photographs in the app, we can’t open source. It creates a very complicated problem where we would need to complete decouple what you might call the “content” from the code — even though, in my mind, code is content — and decoupling that is such an onerous task that it would add a month of work. We’ve been struggling with this with the team for the last year and trying to figure out what, exactly, is the right thing to do.
So, if you look on our Github account, you will find the source code there, and there’s going to be copyright statement on it that says “Copyright NPR 2013, All Rights Reserved,” etc., and then a note that says “if you’d like to use this, just shoot us an email.” That’s not my ideal scenario, but it’s the best thing we can do. If we receive that email, what we’ll do is tell them “sounds good: grab the code, fork it. You can’t use our name, you can’t use the words, you can’t use the pictures, but we’d be delighted if you use the platform — and we’ll give you permission. It’s messy. It’s not a library, decoupled from the work that we do.
At the same time, I think it’s important to have the code out there, so that people can see what we’re doing, people can copy what we’re doing, and borrow ideas. One of the big parts of this is about educating our member stations, folks within our community, and educating new journalism students, and other interested parties.
Our work is copyrighted, but it does have a very good read me that explains in great detail exactly how to set this thing up and getting it running on their machine. What we hope is that people will grab our stuff and learn from it. If someone wants to create an accessible playground machine in Britain, just shoot us an email and we’ll tell you how.
So what can be adapted and adopted here?
Boyer:: We feel that the most reusable asset that we’ve created here is the dataset, which we’ve gone to great lengths to give away and make usable. If you were gonna take this code and build this thing for Britain, it would be really different. Street addresses are just different in the United Kingdom.
To make a project that is generic enough to be readily used anywhere means you’ve built Drupal. We’re not trying to do that — we’re trying to build tight, focused code that only does exactly what we need it to do, both for our audience’s benefit and because we have deadlines. We build the code for all of our projects. We understand that the code isn’t something that you can just pick up and reuse. You have to make something different for a different story or audience.
NPR’s apps team is taking a daring approach to data collection: they’re trusting their audience. Is that risky?
Boyer:: We’re extremely optimistic about edits. We’ve found when we’ve done projects where we’ve asked our audience to contribute that no one’s rude. People are generally pretty nice. By default, the site accepts all edits.
What we’ve done is created a daily newsletter that goes out to our editors here that care about the site. They can eyeball the edits to make sure they look reasonable, as opposed to a process of editorial approval. How do you approve a playground edit for a playground in South Dakota you’ve never been to? The approval doesn’t matter: you don’t know. We can’t know. We’re trusting our audience to be friendly and responsible and helpful. In other efforts, in projects like this, it turns out that people are totally cool about that.”
Still, letting edits go live without approval seems a bit daring. Are NPR listeners a different slice of the online public?
Boyer:: Maybe. Maybe it’s just the questions we’re asking. We did do a project about the inauguration, where we asked folks to take a picture of themselves with a sign with a message to the president. We decided to be optimistic about it and said, “OK, we’re going to publish these signs the second someone submits them,” as opposed to an editorial filter. With maybe two exceptions, even people who had negative messages were still very thoughtful about it. I do think that’s probably a product of the NPR audience, but also the NPR brand. We had a lot of people who aren’t necessarily NPR listeners who are contributing to the website.
I don’t think people are going to be jerks. It’s for kids with special needs. We don’t need a “jerk filter.” We’re following the Shirky model: build a system where it’s harder to vandalize than it is to clean up, which is what Shirky says about Wikipedia. The idea is that it’s easier to tidy it up than to screw it up, which we’ve found to be a pretty good guiding principle.
How did you find all the data?
Boyer:: The data was gathered from a handful of different sources, including Mara Kaplan, who gave us the seed. She’s an activist who’s been gathering playground that are accessible and runs a website that has some of this data.
We contacted a big mailing list of parks administrators and the people who run that let us send a pitch to everybody asking them to help us out. New York City has a really great parks website, so I think we downloaded the data, instead of scraping it. Some of it was piece mail, others we gathered, some of it was us calling folks. We gathered data from other states and municipalities that had made it available online. And then folks from different parks districts all around the country contributed data.
We didn’t have any FOIAs, mostly because the turnaround time would have been short. We’re hoping that folks will say, hey, that’ a good idea: we should just give you this stuff without making you jump through the FOIA hoop. Immediately after we launch, we’re going to try to contact all of the people we tried to contact before and say that we built this thing, here it is, do you want to contribute to us now?
We’re hoping that we’re going to see, immediately after we launch, that we double or triple the amount of playgrounds in the app very quickly*. People, when they see the example, when they see it done, they’ll get it. There are people that probably didn’t want to give us data because they thought we were going to do a takedown piece — and this isn’t a takedown piece. I can imagine that municipalities, seeing a journalism organization calling them and asking for them for this kind of information without a FOIA in hand, thinking that they didn’t have to respond. I think we’ll see a lot of new data from people contributing and a lot more from agencies who see what we’ve done and think it’s cool.
*Postscript: In the first 48 hours after the app was launched, data for 336 more playgrounds was added to the database, for a total of 1,293 to date.
Where does the data live?
Boyer:: It’s on the home page of the app. Just scroll down, near the bottom and look for links. Download as .csv or JSON. They’re our favorite two formats. The JSON file includes some summary information, just some basic stats about how many playgrounds are in there and that sort of thing. It’s just downloadable files; there’s nothing fancy about it. There’s no API. It didn’t seem necessary. We didn’t want to build something that people were going to build a live application on it. It seemed that would be a heavy weight and low value. The data has latitude-longitude and geodata. All of the information that is in the app is in the data, from the agency that owns the playground to the features it has.
One feature that we haven’t built yet — that I hope we’ll be adding when we update it later — is a way to search by feature. It seems sort of obvious, but let’s say you’ve got a child with a certain special need, you’d want to not just search for all playgrounds but search for playgrounds with a specific feature. The reason we didn’t build that in is because we don’t have that in the data yet. We are relying on our audience to go and tag these playgrounds. The playgrounds are often in there but they don’t have that sort of coding. Hopefully, as people annotate the website, we’ll have the data to support the feature and then we’ll build it.
Can you talk about more about how, as a public media organization, you’re creating a public database with the public?
Boyer:: We’ve asked all of these different groups to contribute data. We’re going to continue asking them to contribute data after we launch. We’re also giving all the data away. We’ve written the screen scrapers, we’ve made the phone calls, and we’ve requested the data — we’d be jerks if you had to scrape our website to do something different with this data.
Part of the site offers a complete data download, so you can use this information in a different way or build your website out of it. Giving the data away and having the code be public is important. There are reasons you want to show your work and be transparent about your methods.
Our new catchphrase at NPR is that it’s important to “work in public.” We are a public resource. We are public media. We are of the people. Being proprietary about things? That’s not public. This whole public media thing is a collaboration with our audience. This project should be just as collaborative.