You Should Smile More [UPDATED]

Here’s a study. Can you guess the result?

Cartoons are rated on a scale from 0 to 9 with 0 meaning “not at all funny” and 9 representing “very funny.”

The catch is that participants indicate their rating while holding their pen in their mouths. They need to record their ratings using their teeth to hold the pen thus creating a forced smile as in picture on the left. Or, they may need to record their ratings using their lips to hold the pen thus creating a forced frown as in the picture on the right.

The question is which participants indicate that they find the cartoons funnier? Does forcing a smile mean you will rate the cartoons as more funnier? If so, how much of a difference do you think it would make on the 0 to 9 comedy scale?

smile and frown with a pen

From Quentin Gronau through the Creative Commons License

I’d like to hear a few opinions first before revealing the research results. I’ll add [UPDATED] to the post title when it’s ready. Please no spoilers from those who have read about this particular study before. If you’ve read about related ones though, you’re invited to speculate about this one.

The Results:

Thank you to everyone who played. A special shout-out is due dragonfrog, who engaged in a bit of self-experimentation.

The full answer ends up being complicated.

Strack, Martin, and Stepper (1988) found a rating difference of 0.82 units on the 0 to 9 comedy scale. People when forced to “smile” by holding a pen with their teeth found cartoons funnier than those forced to frown. That’s nearly a full point just due to something that you wouldn’t think would make a difference. This is bizarre and puzzling.

Accordingly, Strack, Martin, and Stepper’s 1988 paper became a widely cited article (1433 times according to Google Scholar). It also made an appearance in the hit bestseller Thinking, Fast and Slow by Daniel Kahneman.

Thinking, Fast and Slow
I think it’s fair to guess that most professors don’t hit 1400 citations in their lifetimes across all their papers, let alone one.

But 1998 was a while ago. Recently, the results of 17, independent, pre-registered direct replications of this study were released. And they aren’t nearly as amazing.

Our meta-analysis revealed a rating difference of 0.03 units with a 95% confidence interval ranging from -0.11 to 0.16.

Three hundredths of a point on the 0 to 9 scale. That’s about 0.3%. You couldn’t get closer to it making absolutely no difference if you tried.

This is, frankly, what I would expect. A cartoon is funny or not, and it doesn’t matter what you are doing with your mouth any more than raising your eyebrows during a horror movie will make it more scary. But this was nevertheless accepted for 28 years, and I’m not sure even this replication will soundly truce the original study. There are people who have built their careers on extensions to the original study. Will they give up so easily on an idea simply because it is demonstrably false?

Strack himself, of course, isn’t willing to give up yet. He wrote a response to the replicators. Here is his strongest criticism:

the authors have pointed out that the original study is “commonly discussed in introductory psychology courses and textbooks”. Thus, a majority of psychology students was assumed to be familiar with the pen study and its findings. Given this state of affairs, it is difficult to understand why participants were overwhelmingly recruited from the psychology subject pools.The prevalent knowledge about the rationale of the pen study may be reflected in the remarkably high overall exclusion rate of 24 percent. Given that there was no funneled debriefing but only a brief open question about the purpose of the study, to be answered in writing, the actual knowledge prevalence may even be underestimated.

That participants’ knowledge of the effect may have influenced the results is reflected in the fact that those 14 (out of 17) studies that used psychology pools gained an effect size of d = – 0.03 with a large variance (SD = 0.14), while the three studies using other pools (Holmes, Lynott, and Wagenmakers) gained an effect size of d = 0.16 with a small variance (SD = 0.06). Tested across the means of these studies, this difference is significant, t(15) = 2.35, p = .033; and the effect for the non-psychology studies significantly deviates from zero, t(2) = 5.09, p = .037, in the direction of the original result.

This criticism was has some similarity to what some of our commenters said when speculating on what the results might be. Here was Morat20:

Ah, the problem of research in psychology. People screw with your tests by meta-gaming it, happily hiding the very information you’re after and thinking they’re being helpful.

This behavior, however, seems the opposite of what Strack is now suggesting. Strack seems to claim that psychology students rather than trying to be helpful are somehow inoculated against the effect. This seems…unlikely. (Note that hypothesis-aware dragonfrog in the comments reported results that the cartoons were funnier when smiling.)

Additionally, Strack implies that the psychology students have seen the study results when the replicators made it clear that they the students were tested before they covered the material in class.

Furthermore, even among the set of non-psychology students, the effect size of d = 0.16 is stunningly small. At best, it seems we are talking about an effect so weak that merely enrolling in a psychology course immunizes you to it, and even if you haven’t done that, the effect is still not close to significant.

The other defenses provided, are frankly, borderline cringe-worthy. One was that perhaps Far Side comics aren’t funny anymore. But the replication itself found that they fit nicely in the mid-range of the 0-9 scale as assessed by the pen-mouthing subjects. Some people didn’t get them, but that wouldn’t explain why their oral contortions would therefore be nullified on those cartoons they did understand.

A third critique:

the RRR labs deviated from the original study by directing a camera on the participants. Based on results from research on objective self-awareness, a camera induces a subjective self-focus which may interfere with internal experiences and suppress emotional responses.

It’s entirely possible that this is the case, but it feels an awful lot like the straw-grasping that always occurs after a failed replication.

The final critique:

Finally, there seems to exist a statistical anomaly. In a meta-analysis, when plotting the effect sizes on the x-axis against the sample sizes on the y-axes across studies, one should usually find no systematic correlation between these two parameters. As pointed out by Shanks and his colleagues (e.g., Shanks et al., 2015), a set of unbiased studies would produce a pyramid in the resulting funnel plot such that relatively high-powered studies show a narrower variance around the effect than relatively low-powered studies. In contrast, an asymmetry in this plot is seen to indicate a publication bias and/or p-hacking (Shanks et al., 2015).

He then presents a plot of the effect sizes against the study sizes and fits a line to it and tacks on:

Without insinuating the possibility of a reverse p-hacking, the current anomaly needs to be further explored.

So that we’re all on board the same bus, this how academics say “she’s got blood coming out of her wherever.” He isn’t outright accusing them of shenanigans, but he totally is.

The problem is, however, that his plot doesn’t show what he desperately wants it to show. First, he calls these studies “high-powered” and “low-powered”. In fact, they are all about the same size. The smallest study had 87 subjects; the largest 163. I’d certainly rather have the bigger number, but we aren’t talking about a large difference when it comes to the power of the studies. This is the reason we don’t see the funnel he is looking for.

All in all, this effect seems like it should be consigned to the dust bin. Readers of Thinking, Fast and Slow ought to take note.

Senior Editor
Home Page Twitter 

Vikram Bath is the pseudonym of a former business school professor living in the United States with his wife, daughter, and dog. (Dog pictured.) His current interests include amateur philosophy of science, business, and economics. Tweet at him at @vikrambath1. ...more →

Please do be so kind as to share this post.
TwitterFacebookRedditEmailPrintFriendlyMore options

30 thoughts on “You Should Smile More [UPDATED]

  1. My guess would be that this is not interesting enough to post if the result is obvious, therefore the result is that the non-smiling lipholders will rate the cartoons funnier. Perhaps because they have to fight laughter to keep the pen in place, and thereby notice their reactions more.


  2. I reached a different preliminary conclusion than Steve. I’m guessing that forcing a smile creates a feedback loop and so people gave as many as two extra points to cartoons when they were fake smiling than when they were deliberately holding a vaguely serious face.


    • Ah, the problem of research in psychology. People screw with your tests by meta-gaming it, happily hiding the very information you’re after and thinking they’re being helpful.

      “No, we’d like your response. Not your response after you’ve thought about your response, projected what you think I think of your response, then altered your response to give me a response that isn’t your response, but is actually a fake response designed to make me think better of you or something”.


        • Sadly, true. :)

          Also the internet. Because trying to replicate a study once the results are well known can be…tricky. Because your little test subjects have probably heard of it, and are now modelling your test and trying to spit back the answers they want to give, rather than their actual answers.

          What’s it called? Theory of mind? Sense of self? We’ve got the ability to think about our thoughts, and to model other people’s thoughts, and play the Princess Bride Drinking Game, and that really screws with understanding what’s going on.


        • Jay,
          Dude, you fucking assume we have ethics now.
          Why the hell do you keep on assuming stupid shit?
          MOST psychology research now is evil shit done by corporations to make money off you and me.

          [Redacted by Vik]


  3. I’m going with Jaybird on this one. I’ve read about similar studies, though not this one in particular. I don’t recall the technical name of the effect, something something somatic yadda yadda blahblah probably.


  4. I have not read this study. I have seen other things along this line. Yes, the forced smile makes you rate the cartoons higher. Because feelings are embodied. Mind-body duality is less of a thing than we think it is.


    • A forced smile is not a true smile, and we know a hell of a lot about the difference.
      Mind-body duality is indeed a thing, and we have fifteen billion other studies to prove that.
      But a forced smile is not a true smile, and that’s probably the issue here.


  5. I tried this on the one-person sample size of myself – holding a pen in my teeth, then in my lips, I thought about a Clousseau vs Cato fight scene I watched last night, and found it funnier given the first mouth position.

    So, as Jaybird and Road Scholar, I predict that fake smiling made the comics funnier.


  6. I don’t think it’s out of the realm of possibility that different facial expressions, even if unrelated to the task at hand, can influence the task at hand. And the first two studies (the one in the original paper) were not bad studies, methodologically, but we’re talking about a relatively abstract hypothesis (that facial expressions can influence evaluations, or other behavior, even if the behaviors are unrelated to the cause of the facial expressions), with many, many, many possible confounds, so two studies (it wasn’t actually the first set of studies to show this, but earlier studies were terrible) simply isn’t enough to test the hypothesis.

    For all the problems that social psychology has, this may be the biggest one: once someone has produced an interesting result, he or she has little or no incentive to continue testing it, which, combined with the difficulty in publishing null results, particularly null results in replications of someone else’s work, means that a lot of findings like this will exist in the literature with none of the rather obvious follow-ups.

    And the original study is still pretty heavily cited today, rarely with any acknowledgement of methodological or conceptual issues, or the lack of replication.


    • Profit motive seems to give incentive to keep double-checking your work. Because if it wasn’t a true finding, well, you’re going to be not making the money you thought you were.


  7. I’ve heard that our general understanding that mood dictates behavior is often reversed. Which is to say that behavior can dictate mood. You’re feeling grumpy and acting like a grump? One way to get out of the funk is to make an intentional effort to be more pleasant. It will feel forced, at least to start, but eventually the effect should take hold. My own personal/anecdotal experiences tell me this is largely true. But there may be more to it than just modifying behavior and smiling even though your sad as there is a mental component as well, which I reckon might serve as the bridge between the physical and the emotional. Maybe knows more?

    Quasi-related… what of the phenomenon that things tend to feel funnier when watched in groups? I laugh much more at things when watching with others — even just one other — than I do alone. What’s the deal???


    • Yes, both thoughts and behavior can influence mood. This is of course part of the basic paradigm of Cognitive-Behavioral therapy, but cognitive and behavioral regulation of mood is pretty ubiquitous. In a way, mood disorders (unipolar depression and bipolar disorder, e.g.) involve limiting thoughts and behaviors in ways that make it difficult to regulate mood.

      To your quasi-related question: humor has a social component and social function that is often overlooked. The mutual experience of emotions often heightens them, and at some point you may be laughing less at the joke than at the fact that you’re all laughing.


      • Heh, while I was doing all that digging, Chris left a more useful answer than I could give you. (Another facet of librarians’ disease, trying to get to the basic research asap instead of just waiting to see what someone else can offer ;) )


      • When I learned this, it really changed things for me. I’m generally a happy person, but when I get into a funk, I can spiral downward. It is generally short-lived (as in a matter of hours if that) unless there is something real going on. So if I’m upset with someone, I can get it into my head that I’m upset with them and a feedback loop is created. Suddenly everything they have ever done is cast in a negative light. Monsters! And this shows in my demeanor towards them, consciously or not.

        But now I know I can say, “Hey, yea, you’re upset about X. Address X and then tell the person a joke or a funny story or that you like their shirt. Something to turn the tide.” I regret all the times I let my funks consume me.

        This is also helpful because it allows me to address X. Often times, as I said above, X would turn into A-J, PQRS, X, and sometimes Y. I’d have my little self-indulgent internal temper-tantrum and eventually get distracted by something else and move on… never really addressing X. At least until Z happens at which point X gets added to that list. This oscillation wasn’t healthy.

        As you note, to simply say, “Hey, smile more and you’ll be happy,” is insulting and insensitive to many folks for whom it just isn’t that simple for. But at least some of us — people likely myself — would do well to learn that we may have more control over our emotional state than we realize.

        Thanks for chiming in, .


    • Well, you’re not imagining it (or if you are lots of other people are too, the same problems with lack of replication and overly small sample sizes still apply) … but as to why it happens there’s no real neurological explanation as yet (I think (they are doing some rat tests to try and make better guesses before moving up to more expensive subjects, as far as I can tell from some cursory research. ) )

      If you’re interested in learning more about the study of social groups on laughter, I found this fairly interesting peer reviewed article (this link goes directly to a pdf!) – lots of studies cited in the literature review that you could look into if you want, and the article itself is making a case for the effect of in-group vs out-group status of the other laughers that you hear on social increasing (or not) of laughter.

      You know, you could do all that reading in the copious free time available to a single father. ;) Yeah, right!

      (I’d be more helpful, but I actually don’t know much more than what I said above either, even at the casual gossip level of information sharing. Librarians’ disease is that we love looking stuff up to start a research trail – esp. if related to an area where we have some subject knowledge to help the research – and then would rather do ANYTHING else than take things all the way to the drawing conclusions stage. And I have a copious case of it this week.)


      • Ha! I just always found it interesting that watching a show with someone else will make me laugh out loud and then watching that same show alone will simply lead me to think, “That was a really funny joke. I should tell someone about that later.” And sometimes I laugh during the retelling!

        I do occasionally laugh out loud alone which always leads to dueling sense of delight and strangeness.


        • The laughing together effect is an interesting one. A while ago i was sitting in a hippyish sound chanting circle thing and the person next to me started laughing, getting me laughing, getting her laughing, (etc) at essentially the fact that we were laughing. Took us a few minutes to stop.

          To some extent maybe it’s like dancing (?). It feels good to dance alone, but it takes some motivation and the music had better be really good. Dancing together we keep dancing together because we’re dancing together, and the music just has to be good enough.


  8. Sample sizes of the individual studies aside, does it make sense to do a funnel plot with 17 data points? I don’t know how to do the statistical analysis on that, but intuitively that seems too small to reliably generate a decent funnel. As it is, the plot is vaguely triangular; I’m not sure how much more you can expect from 17 points.


  9. Chris,
    How different is the smile one gets from “holding a pen in one’s teeth” from a true smile? How different is it from a social “fake” smile?
    (Sorry to put you on the spot if you aren’t trained in analyzing facial expressions).


Comments are closed.