As I mentioned in my last post, I interviewed Jure Leskovec of Stanford University regarding new research methods that are shedding light on how news spreads online. The resulting insights should be of great interest to anyone in the field of communications – and the new methods open up doors to further research and understanding.
In this second post in the series, I explore the team's methods and some of the first conclusions they reached. The researchers looked to nature – more specifically, the field of genetic evolution – for inspiration, and leveraged vast quantities of near real-time information from the Web.
"We developed an algorithm inspired by genetic sequences," said Jure. " It allows us to track the dynamics of info on a large scale."
One of their breakthroughs was to focus on identifying short textual phrases – i.e. quotes – and explore how these are shared and mutate over time. This gave them discrete units of information, often associated with news stories; snippets that get repeated more or less intact, and that can be tracked "in the wild."
Examples included Sarah Palin's famous "lipstick on a pig" remark, from the 2008 presidential election, and an Obama quote regarding the the stimulus bill: "I will sign this law into legislation shortly…"
They used Web data as a vast Petri dish for their experiments. The methods are a significant advance over traditional techniques of studying media – which typically marshaled teams of people to pour over hard copies, annotate and highlight the topics of interest.
Their test bed is called MemeTracker (see the website and related research paper). By tracking mentions of the quotes among 90 million articles from 1.6M mainstream media websites and blogs, they learned how various quotes spread, and variables that affected this.
The team checked their results against intuition and noted the role of obvious factors that influence how "catchy" news is. E.g. they explored the roles of recency ("new" is inherently interesting), imitation (some will mention the news simply because it is hot, and others are buzzing about it) and novelty. Further, they determined that different types of sites exert varying degrees of influence, and that collective actions (e.g., the herd mentality) and personal networks can play roles in how news spreads.
By categorizing the websites (as newspaper, news agency,TV station or blog aggregator) they learned about the influence of each.
"Memetracker showed phrases have different patterns of temporal variation," Jure said.
Six different types of popularity curves emerged, that were a function of what types of site first mentioned the quoted phrase.
Please see the graphs below, from the paper "Patterns of Temporal Variation in Online Media." As Jure explained to me in this excerpt of the interview:
Cluster C1 is typical – everyone is sort of there at same time. With cluster C3 there is a very quick uprise then slower decay, this is something generated by news agencies; A is on top earliest, B (bloggers) the latest. With C6, there is a spike, then it slowly dies off – this is generated by bloggers. With C4, bloggers are late – with, C5 bloggers are early.
Q: So is this a function of who broke the news first, rather than the type of info?
Exactly – we tried to argue that is a function of who, not what. The theory finds six patterns, and we found same ones on Twitter, so the results are robust; depending on when different media types appear in discourse, it forms a different shape
The results begged all kinds of questions, and made me wonder how PR people and social media specialists can use this understanding to improve their campaigns. I will take a deeper dive and explore the latest research in my next and final article in the series.