Comments on
Emotional Contagion on Facebook? More Like Bad Research Methods

A study (Kramer et al., 2014) was recently published that showed something astonishing — people altered their emotions and moods based upon the presence or absence of other people’s positive (and negative) moods, as expressed on Facebook status updates. The researchers called this effect an “emotional contagion,” because they purported to show that our friends’ words on our Facebook news feed directly affected our own mood.

Nevermind that the researchers never actually measured anyone’s mood.

And nevermind that the study has a fatal flaw. One that other research has also overlooked — making all these researchers’ findings a bit suspect.

33 Comments to
Emotional Contagion on Facebook? More Like Bad Research Methods

The comments below begin with the oldest comments first. (If there's more than one page, click on the last comments page to jump to the most recent comments.) Jump to reply form.

  1. Very good article.

    It might seem obvious, but I’ve never seen a research paper that says that happy people write status updates or tweets with positive words in, and sad people write status updates or tweets with negative words in.

    So even if we accept that someone writing status updates with positive words in means that their friends will also write status updates with positive words in – it doesn’t necessarily tell us anything about emotional contagion because, as you already say, *no one asked the people how they’re actually feeling*.

    Given the tiny reported effect size, it’s just as likely that I see my friend write “awesome” so I use the same word in my own status update. Does it mean I’m feeling awesome, or does it just mean that people copy each other’s language?

    • Exactly.

      Without a pilot study to examine people’s, you know, actual mood states and emotions, this research can’t even say it found what it claimed to have found.

      Yet the journal reviewers apparently didn’t notice that disconnect in approving the paper for publication.

      It appears to me that someone dropped the ball on this one.

  2. At what level were the results found to be significant? The .05 level? The .01 level? Or were the differences not significant at all? Because if they weren’t statistically significant, it is bad science to report them as differences.

    • They were statistically significant. But you have to understand that significance in context of what their data showed, versus the overly broad claims the researchers later made in their discussion section.

      If you show a small, statistically significant correlation between two variables, that really isn’t something to get excited about. Such findings are made all the time in research, but if they don’t have any real-world meaning, then they generally aren’t published.

      This is an example of the study authors, in my opinion, over-stating the significance of their actual, tiny findings. This kind of over-statement is fairly commonplace nowadays, and journals seem oblivious to reigning it in. But it should still be called out whenever one sees it…

  3. What do you think of the apparent sidestepping of standard human subjects ethical review? I don’t see any indication that a Review Board was involved. Why is informed consent not an issue for studies of this type?

    • I think they should have sought out and obtained an IRB’s approval. Even if it wasn’t explicitly necessary, it’s never a bad thing to do to ensure you’ve designed an ethical and appropriate study that respects your study’s subjects and their dignity.

  4. “Emotional Contagion” is a commonly used word amongst psychologists that names an extensively studies phenomenon (NOT, s a matter of fact, the phenomenon studied by the Facebook scientists, but hey). You folks are hacks.

  5. At no time did they say it was for scientific research. My guess is that it’s for marking. How to drive the masses insane.

  6. I’m curious as to what you think about the ethical implications of this study? Granted, as far as I know, Facebook is not bound by the ethical obligations of the APA, but even still, does this work with the APA’s guidelines on debriefing of a subject? Facebook is certainly capable of sending out messages en masse to its users to explain the study post study, but everyone I’ve spoken to has simply heard about it from the news articles.

  7. I believe that people have the choice to read or not to read something.
    We make ourself upset and then blame the circumstances.
    Facebook can’t make someone sad… people already sad read the sad stories and then blame them for their state of mind.
    It’s the other way round…

  8. A bit ironic that Facebook, a business famed for algorithms and notorious for keeping things in-house, decided to use an old(ish) & questionable research tool. This reads as an intellectually lazy study that probably is intended for talk-show bait rather than an addition to the literature base. Is this a trend for PNAS in psychology publishing to not put submissions through more rigorous review?

  9. “Nevermind that the researchers never actually measured anyone’s mood.”

    You’re implying that we cannot conduct science on anything using indirect measurements. Should we say paleoclimatology is an inherently unscientific endeavor, just because no one ever actually measured the climate in the remote past, relying on patterns of growth of organisms (e.g. dendroclimatology) or patterns of abundance of organisms (e.g. palynology) instead? Probably not.

    “Length matters because the tool actually isn’t very good at analyzing text in the manner that Twitter and Facebook researchers have tasked it with. When you ask it to analyze positive or negative sentiment of a text, it simply counts negative and positive words within the text under study. For an article, essay or blog entry, this is fine — it’s going to give you a pretty accurate overall summary analysis of the article since most articles are more than 400 or 500 words long.”

    You are implying that length is able to correct for misinterpretations. That is, overall negations are infrequent, with correctly interpreted statements dominating large texts. It follows that while the tool is not adequate to classify an individual status update or tweet, it would still be able to detect population-wide patterns.

    “But they contradict themselves in the sentence before, suggesting that emotion “is difficult to influence given the range of daily experiences that influence mood.” Which is it? Are Facebook status updates significantly impacting individual’s emotions, or are emotions not so easily influenced by simply reading other people’s status updates??”

    I really don’t see how it is self-contradictory to state there is an effect, although a small one, because there are other factors.

    Note: I’m am not affiliated with the study in a any way. I am not even a sociologist. I’m simply a scientist that is utterly unconvinced by your arguments.

    • I am with Matt on this one.

      Quickly checking online would have told you that Pennebaker and Francis (1996) have established the external validity of LIWC, i.e., they have demonstrated that LIWC successfully measures positive and negative mood (correlating LIWC output with the evaluation of judges). Also, research has directly established a link between a LIWC-based analysis of language use and self-reported as well as physiological patterns of mood (for the arousal dimension, see Saxbe, 2009). Moreover, decades of research established meaningful patterns between LIWC output and people’s behavior. I am neither associated with LIWC nor the authors of the study, but as an emotion researcher, I find that your conclusions about measuring mood indirectly do not correctly reflect the state of the research.

      I also feel that negations may be less of a problem at the scale at which the study was conducted. Saying that something is “not good” or that my day was “not great” is not the same thing as saying something is “bad” or that my day was “miserable”. “Not great” communicates a different affective reality in my view, e.g., the possibility that it could have been great but turned out differently. I agree that this is not the same thing as having a “great” day, but the reality may be a little bit more less black and white than the picture you were drawing. So while there may be a methodological problem, it is not “huge”.

      • Everything you cited was done on differently-sized texts.

        It is, in my opinion, inappropriate to take a tool designed for one purpose and repurpose it for something else, and just assume it’s still going to output the same kinds of valid analyses. At least not without even acknowledging that the tool’s reliability and validity are unknown with short texts (like tweets or status updates).

        One way I suggested you do this is to validate the LIWC’s findings with an external mood scale on such short texts. It would have made such criticism as mine moot.

        Your second point is moot though. If you don’t acknowledge you have a problem with that kind of analysis, then you also have no idea the extent or size of the problem. If the problem affects a significant minority of texts — say 10 or 15 percent — then that could be enough to make your data — and conclusions — simply wrong.

        And again, to say nothing of the way the LIWC was not designed to analyze short texts in the first place, simply ignoring the ways we communicate on these different networks with short texts — emojis, short abbreviations that have their own meaning, sarcasm, etc. To pretend that a tool designed to analyze an essay written in an experimental condition will do the exact same job on such informal snippets of conversation is, in my opinion, unwise.

    • Length is needed — in context. You can’t take 600,000 unrelated and unassociated status updates, do an analysis on them, and then say your results can boil down to individuals. Yet this is exactly what the researchers claim.

      Again, when your effect is that small, it’s not really even fair to call it an “effect.” And then to cite your own previous research as evidence of why such a small effect size could potentially still be important is, in my opinion, hubris (self-citations in support of a theory are not exactly “evidence”).

      • You didn’t even address my main point. SO I’m gonna be extra clear:

        1. I agree that you can’t apply this tool to individual status updates

        2. I agree that you can’t apply this tool to individuals (who may have posted multiple status updates)

        3. You absolutely can apply it to large populations, in order to observe a population wide effect (that don’t necessarily apply to individuals belonging the the population).

  10. Good article, thanks. I too have reservations about the length of the texts analaysed, but I also wonder about whether the LIWC tool is subtle enough to return useful results from what must be a very diverse range of language registers.

    Is ‘wicked’ a positive or negative term?

    In short messages between long-standing friends, irony, sarcasm and outright inversion of speech can be used frequently, and maybe often between some Facebook users. Even I hardly understand what some of the messages mean between my children and their friends! And what about sentences that are positive but contain negative words: “Yesterday my colleague told me a joke – I nearly died.”

    OK, the LIWC results were significant in a statistical sense, but the interpretation must be very insecure.

  11. Overstatement of results is the rule in psychology and neuroscience. My 15 years in academic research has taught me that most of it falls in one of two categories: Open doors, or B.S.

  12. Thank you so much John for this excellent critique! Your analysis needs to be appended to all journalists who are incorrectly labeling this an “emotional contagion” phenomena. There is no convergent validity evidence suggesting text, status, or tweet updates (with or without emoticons) actually reflect subjective, facial, cognitive, or physiological indicators of a particular affective state. I had not even considered the fact many “negative” words in the LIWC include “not,” “isn’t,” or other affectively neutral terms (I was envisioning terms such as angry, upset, sad, tired, etc…).

    Also, the effect sizes are inadequately small suggesting more of a statistical artifact than a real substantive effect. Other issues include the temporal stability of such effects, transference to others in the social network, and failure to control for personality or prior statuses.

  13. And aside from everything that John has rightfully critizised, if the next-to-nonexistent results mean anything, it is much more likely that they reflect simply a kind of repetition, semantic or structural priming – we tend to use the words, the associations and the syntactic structure of things we read, which are long known effects. So – too much fuss over nothing.

  14. Thank you all for this insightful discussion. Are there any better methods or tool sets out there for analyzing sentiments in short online texts?

    I know there are some approaches from machine learning, but is there already any consensus on a better method?

  15. First, this isn’t the first study to use LIWC to analyze short messages. It’s a fairly common method of analyzing large sets of SNS data. I noticed the author of this article and those attacking the PNAS paper never tell us what method they would use, other than personally reading millions of tweets. The presence of “nots” doesn’t make the tool invalid, it just makes the effect smaller. It adds more noise to the signal, not invalidity.

    Second, the editor for the paper was Susan T. Fiske who is a member of the National Academy of Sciences, a fellow of the American Association for the Advancement of Science, former president of the American Psychological Association, and a list of other awards and honors a yard long. It’s not like she doesn’t know what she’s doing or just thought this study sounded neat or something.

    Third, of course the effect sizes were small. In my Basic Research Methods class, I tell my students that the size of the effect is based on the size of the induction. These researchers intentionally made the induction as small as possible in order to avoid disrupting the lives of their users. We only normally use large inductions in the social sciences because we have tiny samples. They had a large sample so they could afford to use a tiny induction to find a tiny effect. If they had filtered out ALL the posts with negative words or ALL the posts with happy words, they probably would have found a bigger effect. BUT they would have been called monsters for doing it.

    The point is that intentionally chasing a small effect in this context was an ethical choice, not bad science.

    It wasn’t a perfect study. They did not measure emotion in 50 ways in a single study to establish perfect validity. But no study does. They found some evidence consistent with a hypothesis and not its alternative. Are there some alternate explanations? Of course. There always are. But we now know a little bit more than we did, not nothing.

    I am not one of the authors of the paper nor am I personally affiliated with anyone involved.

    • 1. You don’t use a tool just because it’s convenient and available if it hasn’t yet been validated on the kinds of datasets you want to analyze. What researchers typically do in this instance is to validate the tool, first, on their datasets and ensure it’s valid for the kinds of data they want to throw at it. To date, nobody’s done with the LIWC for these kinds of short, informal social messages (like tweets or status updates). “Everyone else is doing it” is not a substitute for doing good science. (It’s certainly not my job to do the researchers’ job for them in evaluating — or creating — tools they could use to help them with their research.)

      2. Sorry, “appeal to authority” is a great logical fallacy, but doesn’t help much with this issue. It’s not clear whether Fiske was duped by the researchers or what, but this paper should never have passed peer review. It’s a weak scientific paper — at best — and at worst, it’s an example of a famous, large company pulling their weight to get stuff published that will make them (so they thought) look like they’re doing “serious research.”

      3. Yes, and that would’ve been unnatural and not at all representative of the way people use Facebook. To me, I interpret this to mean that the only effect they found was one that they themselves created. Since our news feeds are a conglomeration of different things, all they showed was that if you manipulate people’s exposure to emotional content, you’ll find a tiny increase in their likelihood to mirror some of that emotional content. Maybe. If you believed their use of the LIWC in the first place was valid (which I don’t).

      The effect size is not due to simply their “small” manipulation (I don’t know how you can call manipulation of up to 90 percent of a person’s exposure to an emotion-laden status update “small”) — it’s also due to the actual findings of their data. (And again, all of this assumes you agree with some of their data analysis assumptions, some of which are very much up for further discussion.)

      It was a horrible experiment because it, in the end, proved really not much of anything. Using a tool that probably couldn’t even tell them what they thought it could. And gave Facebook a black eye for its questionable ethics regarding how they view manipulating their users’ news feeds.

      • 1. The cites provided above by others show that it has been validated to the extent that LIWC can pick up emotions in larger selections of text. Obviously the validity coefficient is going to be smaller for shorter selections of text. I think the researchers, who have used LIWC extensively (especially Hancock), have reason to believe that the validity coefficient is not trivial. You disagree and believe it either is trivial or the risk of it being so warrants studies before this one. Fine.

        2. Appeal to authority is usually only considered a fallacy when the authority is illegitimate or not relevant to the question at hand. It’s not an appeal to authority to say that doctors think we shouldn’t smoke because it’s bad for us. So, we have one of two possibilities: either Susan Fiske was “duped” or there’s more value to this study that you seem to think. I think the latter is more parsimonious. I think it’s safe to assume you disagree.

        3. Our newsfeeds vary in how much positive and negative emotional content there is. Some of us have very negative or very positive friends. I assume, like most things, it is normally distributed. FB was interested in the effects this distribution had when randomly assigned so as to remove the self-selection of positive and negative friends as well as other confounding variables. Yes, they created it, but it doesn’t mean that it also doesn’t happen “in nature.”

        The amount of the subjects’ newsfeed that was ommitted varied a fair amount but for many people it was a small portion of their newsfeed. Additionally, many people barely look at their newsfeed, possibly didn’t login, and were otherwise not even exposed to the induction. And as the authors noted, reading status updates of their friends is an incredibly small portion of their emotional world. Relative to the other predictors of emotions in people’s lives, this study was what I would call a small induction. Again, you disagree, but I suspect neither of us are planning on changing our minds today.

      • Chris,

        You raise some valid points about the degree of manipulaiton and whether people actually spent time reading their posts – most people don’t, as you said, or read only a small amount of them. But the plain fact is that the researchers didn’t make those analyses (I read the paper) although they most ccertainly have the data available. They should have tested a model in which the so called “emotional contaigon” effect was supplemented by the factors you mentioned – magnitude of change in the posts and the time spent reading them (indirect, since they have logs for when you open and close your feed, but you may not be actually reading them – but it is better than nothing). Having done such an analysis it would have been clear, whether the effect was small due to small changes in exposure or is it generally that small. This is something that peer-review should have picked up, it is an easy analysis to make and it would have told us a lot more about the data, without doing much more work.

      • Correction – they actually tested for the effect with a regression based on percentage of omitted posts, I was mistaken.

  16. This article is the first one I’ve read that comes even close to explaining what others failed to explain. However, I believe some points were missed. First, the study did not even follow the journal’s own policy. From the journal’s policy: “For experiments involving human participants, authors must also include a statement confirming that informed consent was obtained from all participants.” IIRC, this study was done before the particular wording that may or may not be construed as “informed consent” was added to Facebook’s TOS. Second, is it not standard ethical practice at the conclusion of a study to debrief study participants if their emotions were manipulated? And third, this article would be more complete had it mentioned that the “participants'” feeds were manipulated for the study. Without that, it is implied that language analysis was performed on unaltered feeds.

  17. Insightful read. If you don’t mind, I think the research community would be well served by you committing your objections to a comment directly on the publication. PLoS ONE supports such comments in lieu of the sort of letters / notes that traditional journals sport for people to question research.

  18. When I used to do research the reputation of the PNAS was that Academy members had special privileges to submit articles. People submitted papers to PNAS because they could not get papers into real peer reviewed journals. Which Academy member referred the paper? does the Academy member have direct expertise and what was their motivation? Also the Cornell academic authors need publications with high impact scores, to make tenure and to get grants. What were their motivations for this very poor quality publication that has caused many harms?

  19. No approval, no debriefing, just abuse of users… where is the ethics behind this research?! or at least give the participants (users) a sort of compensation….. absurd!!



Join the Conversation!

We invite you to share your thoughts and tell us what you think in this public forum. Before posting, please read our blog moderation guidelines. A first name or pseudonym is required and will be displayed with your comment. Your email address is also required, but will be kept private. (Please note that we use gravatars here, which are tied to your email address.) A website/blog/twitter address is optional.

Post a Comment: