How to Predict News Article Traffic

What if you could predict the number of future visits to your news article by looking at recent past reactions on social media? Turns out researchers are getting pretty good at this.

A forthcoming study led by Carlos Castillo in collaboration with Al Jazeera looked at a set of articles generating over 3.6 million visits and 235,000 social media reactions to find predictive patterns. They considered everything from the rate of shares on Twitter and Facebook, to the uniqueness of tweets, and to characteristics of the people sharing the tweets like followers and friends. After looking at just the first 90 minutes of social media activity after a story was published, the researchers could explain 80% of the variance in the total number of article visits for that story.

In the course of the study they also found a marked difference between the life cycles of articles that were either news or in-depth stories. Three hours after publication, most traffic to news stories came from the home page, whereas for in-depth content there was longer sustained attention on social media. In depth articles tended to attract a larger variety of unique tweets as well — perhaps an indication of provoking different kinds of comments in readers. In the 606 articles they analyzed 78% had traffic patterns that involving an initial spike followed by steady decline, whereas the other 22% exhibited patterns of steady, increasing, or rebounding attention.

The researchers have made the prototype system available online with a selection of Al Jazeera English content. In the future, such predictions might help with content planning on news sites. For instance, if the editor knows that an article is starting to rebound then further steps could be taken to provide additional related content, context, or promotion to the article.

On November 14th, the Tow Center for digital journalism at Columbia will host an evening lecture with the lead author of the study, Carlos Castillo, where he’ll talk about different aspects of social media news mining. After the talk, Betaworks Chief Data Scientist, Gilad Lotan will join in and lead Q&A with Carlos. RVSP Now!