The Anatomy of a Robot Journalist

June 12, 2014 by

Given that an entire afternoon was dedicated to a “Robot Journalism Bootcamp” at the Global Editors Network Summit this week, it’s probably safe to say that automated journalism has finally gone mainstream — hey it’s only taken close to 40 years since the first story writing algorithm was created at Yale. But there are still lots of ethical questions and debates that we need to sort out, from source transparency to corrections policies for bots. Part of that hinges on exactly how these auto-writing algorithms work: What are their limitations and how might we design them to be more value-sensitive to journalism?

Despite the proprietary nature of most robot journalists, the great thing about patents is that they’re public. And patents have been granted to several major players in the robo-journalism space already, including Narrative ScienceAutomated Insights, and Yseop, making their algorithms just a little bit less opaque in terms of how they operate. More patents are in the pipeline from both heavy weights like CBS Interactive, and start-ups like Fantasy Journalist. So how does a robo-writer from Narrative Science really work?

Every robot journalist first needs to ingest a bunch of data. Data rich domains like weather were some of the first to have practical natural language generation systems. Now we’re seeing a lot of robot journalism applied to sports and finance — domains where the data can be standardized and made fairly clean. The development of sensor journalism may provide entirely new troves of data for producing automated stories. Key here is having clean and comprehensive data, so if you’re working in a domain that’s still stuck with PDFs or sparse access, the robots haven’t gotten there yet.

After data is read in by the algorithm the next step is to compute interesting or newsworthy features from the data. Basically the algorithm is trying to figure out the most critical aspects of an event, like a sports game. It has newsworthiness criteria built into its statistics. So for example, it looks for surprising statistical deviations like minimums, maximums, or outliers, big swings and changes in a value, violations of an expectation, a threshold being crossed, or a substantial change in a predictive model. “Any feature the value of which deviates significantly from prior expectation, whether the source of that expectation is due to a local computation or from an external source, is interesting by virtue of that deviation from expectation,” the Narrative Science patent reads. So for a baseball game the algorithm computes “win probability” after every play. If win probability has a big delta in-between two plays it probably means something important just happened and the algorithm puts that on a list of events that might be worthy of inclusion in the final story.

Once some interesting features have been identified, angles are then selected from a pre-authored library. Angles are explanatory or narrative structures that provide coherence to the overall story. Basically they are patterns of events, circumstances, entities, and their features. An angle for a sports story might be “back-and-forth horserace”, “heroic individual performance”, “strong team effort”, or “came out of a slump”. Certain angles are triggered according to the presence of certain derived features (from the previous step). Each angle is given an importance value from 1 to 10 which is then used to rank that angle against all of the other proposed angles.

Once the angles have been determined and ordered they are linked to specific story points, which connect back to individual pieces of data like names of players or specific numeric values like score. Story points can also be chosen and prioritized to account for personal interests such as home team players. These points can then be augmented with additional factual content drawn from internet databases such as where a player is from, or a quote or picture of them.

The last step the robot journalist takes is natural language generation, which for the Narrative Science system is done by recursively traversing all of the angle and story point representations and using phrasal generation routines to generate and splice together the actual English text. This is probably by far the most straightforward aspect of the entire pipeline — it’s pretty much just fancy templates.

So, there you have it, the pipeline for a robot journalist: (1) ingest data, (2) compute newsworthy aspects of the data, (3) identify relevant angles and prioritize them, (4) link angles to story points, and (5) generate the output text.

Obviously there can be variations to this basic pipeline as well. Automated insights for example uses randomization to provide variability in output stories and also incorporates a more sophisticated use of narrative tones that can be used to generate text. Based on a desired tone, different text might be generated to adhere to an apathetic, confident, pessimistic, or enthusiastic tone. YSeop on the other hand uses techniques for augmenting templates with metadata so that they’re more flexible. This allows templates to for instance conjugate verbs depending on the data being used. A post generation analyzer (you might call it a robot editor) from YSeop further improves the style of a written text by looking for repeated words and substituting synonyms or alternate words.

From my reading, I’d have to say that the Narrative Science patent seems to be the most informed by journalism. It stresses the notion of newsworthiness and editorial in crafting a narrative. But that’s not to say that the stylistic innovations from Automated Insights, and template flexibility of YSeop aren’t important. What still seems to be lacking though is a broader sense of newsworthiness besides “deviance” in these algorithms. Harcup and O’Neill identified 10 modern newsworthiness values, each of which we might make an attempt at mimicking in code: reference to the power elite, reference to celebrities, entertainment, surprise, bad news, good news, magnitude (i.e. significance to a large number of people), cultural relevance to audience, follow-up, and newspaper agenda. How might robot journalists evolve when they have a fuller palette of editorial intents available to them?


human wigs Jul 24, 2014
Lastly, keep the lace on your lace hair extensions intact by taking it to a lace wig repair person or a specialty shop. Swiss lace is far more likely to need repairs on damaged lace overtime. Repairing a small tear before it progresses is imperative. Getting morecostume wigs installed in the hairline is a good idea to keep the hairline looking full and natural. Most wigs can last at least 6 months with quality care. The most important tips to remember are to wash your wig and let it fully dry, handle the lace mens wigs with care, and keep the lace clean. These simple steps can keep your investment wearable and fabulous for the long run.
polo outlet Jul 10, 2014
Walking on the way home, Nike Air Jordan, suddenly a scenery touched, Ralph Outlet, stopped to savor, MCM Outlet Online, to put a camera gesture, Polo Outlet Online, to leave a shallow spring, Gucci Shoes UK, of negatives here, Michael Kors Outlet, deep in her heart extended spring scenery, Marc Jacobs Bags Outlet, etc, returned home, Canada Goose Jackets, using bamboo memo box, Ralph Lauren Outlet, to do with pen, Michael Kors USA, and ink painting, North Jackets Outlet Online, the intention to write, a sweet words, Beats By Dre, do a recall album, wait until old age, Hermes Bags Outlet, come to appreciate slowly, North Clearace Outlet Online, walked with light, Burberry Bags Outlet, footsteps walked on, the King, Monster Headphones Outlet, or the original scene, Longchamp Pairs, people are still the original person, Prada Outlet Online, just change a mood, Michael Kors Outlet Online, all plain people, Cheap Oakley Sunglaases, things, Coach Factory Shop, and it was better together. Handbags Outlet Online, Louis Vuitton Outlet Online Hermes Bags Outlet Online Prada Outlet Chanel Outlet Online Gucci Outlet Online Burberry Outlet Celine Outlet Balenciaga Outlet Christian Bior Outlet Online Chloe Outlet Online Bvlgari Outlet Online Bally Outlet coach Outlet Michael Kors Outlet Online MCM Backpack Outlet Online Fendi Outlet Online mulberry Outlet Marc Jacobs Outlet Miu Miu Outlet Online Ysl Outlet Online Tory Burch Outlet Online Givenchy Outlet Online Ferragamo Outlet Online Lancel Outlet Online Loewe Bags Outlet Online Tods Outlet Online Paul Smith Outlet Online D&G Bags Outlet Online Alexander Wang Outlet Online Bottega Veneta Outlet Online
Lillian Pierson Jun 16, 2014
This story is truly great. It really breaks down the art of journalism into a mechanistic system... almost a type of journalism engineering. I love it!

Post a comment

We're trying to advance the conversation, and we trust that you will, too. We'd rather not moderate, but we will remove any comments that are blatantly inflammatory or inappropriate. Let it fly, but keep it clean. Thanks.