A study (Kramer et al., 2014) was recently published that showed something astonishing — people altered their emotions and moods based upon the presence or absence of other people’s positive (and negative) moods, as expressed on Facebook status updates. The researchers called this effect an “emotional contagion,” because they purported to show that our friends’ words on our Facebook news feed directly affected our own mood.
Nevermind that the researchers never actually measured anyone’s mood.
And nevermind that the study has a fatal flaw. One that other research has also overlooked — making all these researchers’ findings a bit suspect.
Putting aside the ridiculous language used in these kinds of studies (really, emotions spread like a “contagion”?), these kinds of studies often arrive at their findings by conducting language analysis on tiny bits of text. On Twitter, they’re really tiny — less than 140 characters. Facebook status updates are rarely more than a few sentences. The researchers don’t actually measure anybody’s mood.
So how do you conduct such language analysis, especially on 689,003 status updates? Many researchers turn to an automated tool for this, something called the Linguistic Inquiry and Word Count application (LIWC 2007). This software application is described by its authors as:
The first LIWC application was developed as part of an exploratory study of language and disclosure (Francis, 1993; Pennebaker, 1993). As described below, the second version, LIWC2007, is an updated revision of the original application.
Note those dates. Long before social networks were founded, the LIWC was created to analyze large bodies of text — like a book, article, scientific paper, an essay written in an experimental condition, blog entries, or a transcript of a therapy session. Note the one thing all of these share in common — they are of good length, at minimum 400 words.
Why would researchers use a tool not designed for short snippets of text to, well… analyze short snippets of text? Sadly, it’s because this is one of the few tools available that can process large amounts of text fairly quickly.
Who Cares How Long the Text is to Measure?
You might be sitting there scratching your head, wondering why it matters how long the text it is you’re trying to analyze with this tool. One sentence, 140 characters, 140 pages… Why would length matter?
Length matters because the tool actually isn’t very good at analyzing text in the manner that Twitter and Facebook researchers have tasked it with. When you ask it to analyze positive or negative sentiment of a text, it simply counts negative and positive words within the text under study. For an article, essay or blog entry, this is fine — it’s going to give you a pretty accurate overall summary analysis of the article since most articles are more than 400 or 500 words long.
For a tweet or status update, however, this is a horrible analysis tool to use. That’s because it wasn’t designed to differentiate — and in fact, can’t differentiate — a negation word in a sentence. ((This according to an inquiry to the LIWC developers who replied, “LIWC doesn’t currently look at whether there is a negation term near a positive or negative emotion term word in its scoring and it would be difficult to come up with an effective algorithm for this anyway.”))
Let’s look at two hypothetical examples of why this is important. Here are two sample tweets (or status updates) that are not uncommon:
“I am not happy.”
“I am not having a great day.”
An independent rater or judge would rate these two tweets as negative — they’re clearly expressing a negative emotion. That would be +2 on the negative scale, and 0 on the positive scale.
But the LIWC 2007 tool doesn’t see it that way. Instead, it would rate these two tweets as scoring +2 for positive (because of the words “great” and “happy”) and +2 for negative (because of the word “not” in both texts).
That’s a huge difference if you’re interested in unbiased and accurate data collection and analysis.
And since much of human communication includes subtleties such as this — without even delving into sarcasm, short-hand abbreviations that act as negation words, phrases that negate the previous sentence, emojis, etc. — you can’t even tell how accurate or inaccurate the resulting analysis by these researchers is. Since the LIWC 2007 ignores these subtle realities of informal human communication, so do the researchers. ((I could find no mention of the limitations of the use of the LIWC as a language analysis tool for purposes it was never designed or intended for in the present study, or other studies I’ve examined.))
Perhaps it’s because the researchers have no idea how bad the problem actually is. Because they’re simply sending all this “big data” into the language analysis engine, without actually understanding how the analysis engine is flawed. Is it 10 percent of all tweets that include a negation word? Or 50 percent? Researchers couldn’t tell you. ((Well, they could tell you if they actually spent the time validating their method with a pilot study to compare against measuring people’s actual moods. But these researchers failed to do this.))
Even if True, Research Shows Tiny Real World Effects
Which is why I have to say that even if you believe this research at face value despite this huge methodological problem, you’re still left with research showing ridiculously small correlations that have little to no meaning to ordinary users.
For instance, Kramer et al. (2014) found a 0.07% — that’s not 7 percent, that’s 1/15th of one percent!! — decrease in negative words in people’s status updates when the number of negative posts on their Facebook news feed decreased. Do you know how many words you’d have to read or write before you’ve written one less negative word due to this effect? Probably thousands.
This isn’t an “effect” so much as a statistical blip that has no real-world meaning. The researchers themselves acknowledge as much, noting that their effect sizes were “small (as small as d = 0.001).” They go on to suggest it still matters because “small effects can have large aggregated consequences” citing a Facebook study on political voting motivation by one of the same researchers, and a 22 year old argument from a psychological journal. ((There are some serious issues with the Facebook voting study, the least of which is attributing changes in voting behavior to one correlational variable, with a long list of assumptions the researchers made (and that you would have to agree with).))
But they contradict themselves in the sentence before, suggesting that emotion “is difficult to influence given the range of daily experiences that influence mood.” Which is it? Are Facebook status updates significantly impacting individual’s emotions, or are emotions not so easily influenced by simply reading other people’s status updates??
Despite all of these problems and limitations, none of it stops the researchers in the end from proclaiming, “These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks.” ((A request for clarification and comment by the authors was not returned.)) Again, no matter that they didn’t actually measure a single person’s emotions or mood states, but instead relied on a flawed assessment measure to do so.
What the Facebook researchers clearly show, in my opinion, is that they put too much faith in the tools they’re using without understanding — and discussing — the tools’ significant limitations. ((This isn’t a dig at the LIWC 2007, which can be an excellent research tool — when used for the right purposes and in the right hands.))
Kramer, ADI, Guillory, JE, Hancock, JT. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS. www.pnas.org/cgi/doi/10.1073/pnas.1320040111