Robot Journalism and Relative Size

Photo credit: Buzzworthy

At the recent International Journalism Festival in Perugia, Italy, I spoke on a Tow-sponsored panel entitled “Can a Robot Do My Job?” with Emily Bell, Justin Myers, Andreas Graefe, and Nicholas Diakopoulos. We discussed the ways that automated reporting and artificial intelligence can help journalists to uncover story ideas (me), increase algorithmic accountability (Diakopoulos), and automatically generate stories based on structured data (Graefe and Myers).

If you’re interested in hearing our discussion about whether robots will do journalists’ jobs in the near future, you can watch the video recording of our panel discussion.

The short answer to this question is: no, a robot won’t steal your journalism job.

I do love the image of a room full of robots in fedoras typing away and writing stories. But, in reality, when we use automation to write stories in the newsroom we don’t use something that looks like Wall-E. We use software that takes structured data (like earnings reports or sports scores) and combines it with a pre-written outline to form a short journalistic work.

The Associated Press (AP) uses natural language generation software to evaluate the data and “decide” what sentences fit the data. For example, in a 2014 report about the McCormick spice company’s quarterly earnings, the report read: “The results topped Wall Street expectations. The average estimate of analysts surveyed by Zacks Investment Research was for earnings of 81 cents per share.” If the earnings had been down, the report might have read something like, “The results fell below Wall Street expectations.”

After the panel, Graefe brought up something that has come up a number of times in his work on automation: the issue of defining specifically, what does a small change mean? In earnings reports, there can be significant market implications to writing “results fell below Wall Street expectations” versus writing “results fell far below Wall Street expectations” or “results fell slightly below Wall Street expectations.”

As news automation editor at the AP, Myers deals with this issue quite a bit. He oversees the software that automatically writes thousands of stories for the AP. The AP primarily uses the software to generate stories about earnings reports, and they plan to start using it for routine sports stories in the near future. I talked with Myers about how reporters can deal with the issue of relative size in automated journalism. Our edited conversation follows below.

Meredith Broussard: Humans rarely think consciously about relative size. We know what is small and what is large because of context. And there can be two different takes on the issue (case here of the two headlines about the same economic news). But in the world of automated news, if we are writing an algorithm to write the news, we need to make decisions in advance about how big is “big” and how big is “small.” How do you at the AP decide that a change is large or small?

Justin Myers: It depends on the story. When I am working on these projects, my overall process is to talk to reporters a lot and figure out how they make these decisions themselves. They do already, but they may not be aware of the rules they are unconsciously applying. I try to tease these rules out by having an honest discussion. If there is a lot of existing content for me to look back at, I pull articles from the past and see if I can find a pattern. I often do this before I talk to reporters, partly to have a starting point for the discussion.

MB: Is it time to start thinking about standards for automated news writing, so everyone who writes an automated writing program uses the same measure for change?

JM: Yes, and not necessarily. We absolutely should be having standards discussions around algorithmic writing. Our standards editor, Tom Kent, wrote a Medium piece last year with an ethical checklist. I don’t think we need to get as specific as requiring all programmers to use the same standard of change. Part of it, as you mentioned earlier, is context. It depends on the situation, the news organization, and the topic. It should be at the discretion of an individual news organization to determine what makes the most sense for its audience.

We are absolutely having standards discussions at the AP.

MB: Will the standards evolve over time?

JM: Yes, but I don’t think that’s unique to automation. The standards for techniques for journalism have been evolving over decades, centuries. As our reporters, readers and customers get more familiar with the automation technology, the standards will evolve. The conversation will continue for a long time.

MB: I like this idea of evolving standards that change along with human constituencies. Often, we imagine that a computer program is written once, and runs everywhere forever. But really, software needs to change as society changes.

JM: Yes. Especially as we think of these systems as extensions of our human newsgathering, news reporting capabilities. The humans are certainly changing. Why shouldn’t the computers be changing as well?

MB: At the panel, you spoke briefly about the areas of newsgathering and reporting that are ripe for automation. Can you say more about which kinds of news processes are good targets for automation?

JM: The two things I look for are structured data of a good quality (reliable regularly) and high volume. Is this something that we do a lot, or would want to do a lot? If you write a system that saves two hours a month, but it takes you two weeks to write it, it will take a very long time to recoup that effort. I do sometimes do this, because sometimes it helps set us up for larger projects down the road.

Content that we create frequently, or processes that people in newsrooms perform more frequently, are more interesting opportunities.

By and large, when people say “automated writing” or “robot reporting,” people mean automatically generated text content. But software working on any repetitive task in a newsroom can mean non-text content like video or audio. I would extend it further to think about other things in the newsroom, like production processes. One area I’ve been working on in the past year is reporting processes, sifting through the data we get regularly to help reporters more quickly identify what inside the data is newsworthy. That saves them time.

Reporting alerts [automated notifications of notable events like earthquakes or stock fluctuations] are also interesting because they help start conversations about what we talked about earlier – what changes are big, small, newsworthy. This gives us a chance to evaluate in real time: Did this work the way we thought it would? That’s easier when your audience is a small group of reporters and not thousands or millions of people.

MB: The AP is currently using automated writing to produce earnings reports, and is about to use the software to produce sports content. What is the audience like for these stories?

JM: Because of the nature of AP, we are always looking at the needs of a wide variety of customers: newspapers, broadcast, websites. We try to come up with content that works for as many organizations as possible. Many of our earnings reports run on Yahoo Finance, but they’re also important for publications who have a more specific focus. The automated system lets us personalize. You have a better chance of getting an article that is specific to your area, your audience, because we have more individual pieces of content for you to pick from in the first place.

We produce more than 4,000 earnings reports quarterly using this software. We’ll see more variety with sports as well. We’re going to cover games that we can’t attend. If you are a customer in a smaller market where we can’t send a reporter to cover games, there will actually be something on the AP sports wire for you to run in your news outlet. You will have more options.