The Beauty of a Theoretical Mess – Center for Mind and Culture

This is the 2nd in our series, “Can We Trust Psychology?”

Part 1: What is the replication crisis and why did it happen?
Part 3: The Social Side of Psychology
Part 4: Pscyhology has a Theory Problem

The last post was a brief overview of the replication crisis. If you haven’t already, check it out here. The gist is that researchers are trying, and failing, to reproduce the results from classic studies in psychology. Over half of the studies tested don’t replicate, which is shaking the foundations of the discipline.

But, the question is– what do we do once a study doesn’t replicate? Do we just abandon the whole theory?

This question has occupied philosophers for a long time, much longer than the current crisis. If you’ve been schooled in the scientific method, then this may seem like a ridiculous question– of course you throw such a theory out! If Karl Popper came to mind when you read the question, then you get extra credit. He was the guy who argued that this process of hypothesis, testing, falsification, and rejection was the key to distinguishing scientific theories from pseudo-scientific theories like those of Freud or Marx.

From that perspective, we should be throwing out a lot of psychological theories right now. Reminding people of old age doesn’t make them walk slower. Toss that theory out!

But here’s the tough part of this process– what theory gets tossed out? It may seem obvious at first glance. Obviously, you reject the theory that reminders of old age will make you walk slower. But, this seemingly specific and narrow hypothesis is part of a whole messy web of other theories. How deep does the falsification go?

Let’s stick with this example for a moment.

This specific study is part of a theory called social priming. The (very brief) gist of social priming theory is that cues in our social or physical environment (e.g., in this case words, such as gray, pension, Florida, or bingo) will evoke meaningful associations for people that then give rise to some fitting behavioral response such as walking slower (for a more thorough explanation, read this).

So, does a failed replication of the [old-age à slow walk] hypothesis mean that social priming theory is also bunk? How many social priming studies would need to not replicate in order for the big theory to be abandoned?

Even if all of them failed to replicate, we’d still have to sort out what part of the social priming theory was wrong. Social priming theory itself depends on theories about networks of meaning within our minds, the perception of salient cues, the translation of meaning into action… There’s a whole mess of theories that come together to create predictions. A big beautiful mess.

If we can get a little more specific here, the point is that you logically don’t have a single theory T generating a prediction P. Instead, you have a set of theories (T₁ + T₂ +T₃ +…T₄₂) that collectively predict P. So, when P doesn’t pan out, any one of those theories could be the culprit.

This complexity isn’t a flaw in psychology or in experimental design, it’s just fundamentally how the nature of hypothesis generation and observation work.[1]

Popper’s theory of falsification depends on a picture of science that is orderly, precise, and clearly non-existent. This isn’t to say he was wrong about scientific progress– over time the community of scientists does tend to go through theory change in a way that responds to evidence.[2] The whole thing’s just messier than Popper thought.

To understand that messiness and what it means for the replication crisis, let me introduce you to someone who appreciated this whole messy picture– Imre Lakatos. His insight is that it’s necessary to think about whole research programs[3] instead of individual theories. Once you’re looking at the whole program, you’ll notice two different types of theories.

First, some theories act as a hard core of the research program. These are the foundational theories and they’re relatively immune to being refuted. This immunity comes about in two different ways. First, scientists are just practically reluctant to give them up– if they find contradictory evidence, they’ll find some other explanation. Second, the hard core is immune because you need a bunch of more specific theories in order to get actual predictions.

These are the other type of theory in a research program. They’re what Lakatos calls auxiliary hypotheses and they’re necessary for generating any sort of observable prediction from the hard core. These auxiliary hypotheses also create a protective belt around the hard core.

If a replication fails, instead of throwing out the hard core, you look for the delinquent auxiliary hypothesis, toss that one out, and then think through a new one that makes more sense. While that may seem like cheating, Lakatos’ point is that it’s both practical and logically sound to do so.

This may seem like the sort of thing that only philosophers could get worried about, but it makes a huge difference in how you interpret the replication crisis. If you’re a Popperian, then each failed replication is a huge deal. If priming people with ideas about old age doesn’t lead to slower walking, then all of social priming is suspect.

But, if you’re a Lakatos-ian, one you get a cooler name, and two, you don’t have to tear the whole house down when one window doesn’t fit. There are a lot of other explanations for why that specific experiment wouldn’t replicate.[4]

In other words, the hard core of social priming remains untouched (as it should– if we abandoned the idea that information from our environment changes our behavior, then we’d be left with an absurd picture of human nature), and psychologists can go about trying to rework the auxiliary hypotheses.

I’ve been using the example of social priming, but this same dynamic holds for pretty much every replication. The amount of energy and time required to pull off a replication requires focusing on singular experiments. The researchers leading these efforts are good at finding prototypical studies for any given research program.

But, they’re still single studies and can’t get around the logical structure of hypothesis generation mentioned above. So, whenever you read that any of these broad theories in psychology didn’t replicate, bear in mind that the evidence of the replication is actually striking at a whole set of theories. So, it’s logically and practically premature to toss out the whole idea.

What’s the point then– how can we tell which theories are trustworthy and which are fatally flawed?

Fortunately, Lakatos can help us out here too. Instead of watching the failed replication, he turns our gaze to the response researchers have to contradicting evidence. Watching the process by which people rework the auxiliary hypotheses is the way to distinguish between healthy and unhealthy science.

When research programs respond to falsifications by producing compelling hypotheses that point out new parts of reality, then they’re what Lakatos called progressive programs. For example, revamping the auxiliary hypotheses of social priming should be exciting. It could involve getting a clearer picture about how people’s dispositions shift between settings; why some nudges from the environment are more salient than others; who is susceptible to these nudges in the first place… If failed replications can spur on this sort of work, then social priming can be a progressive research program.

The new hypotheses of degenerative research programs, on the other hand, don’t actually point out anything new or interesting. Instead they just explain away the contradicting evidence: “the replication just messed up the procedure” or “the associative networks must’ve been different in the new groups” These sorts of explanations don’t actually open up new terrain to investigate, they just explain away the anomaly.

Over time, they’re like the buildup of plaque on the research program, with too many of them the whole program starts to go bad.

This process of forming and testing new hypotheses takes a lot of time, so it’s hard within the moment to really judge which research programs are critically weakened by a failed replication and which are being honed. Researchers are justified in defending their theories and coming up with explanations for why the replication didn’t work.

But, over the long run, by watching their responses you can see how these explanations play out and whether they lead to new fertile ground or whether they’re barnacles beginning to grow on a sinking ship.

[1] This idea originally comes from Pierre Duhem, a historian and philosopher who’s also quite cool… but three dead white guys in a single post is one too many.

[2] Sorry Kuhn.

[3] Since he was in Britain at this time, he called them research programmes… but I just can’t type that without feeling silly.

[4] Foremost among them, you can’t possibly control all of the potentially relevant cues within any environment and even if you could, humans are complex dynamic systems that are going to be different at any given moment . For an in-depth discussion of this issue check out this post.