A Conversation with Daniel Kahneman About “Noise”

Readers of 万博皇家马德里 are unlikely to need an introduction to Daniel Kahneman. For more than six decades, the Nobel Prize-winning psychologist has worked to deepen our understanding of human behavior and decision-making, pointing out when we err and how.

Much of that time was spent understanding how different cognitive biases affect our decisions and behavior. His book Thinking Fast and Slow showcased this work and was for many outside the research world their introduction into the science of decision-making.

Ten years on from Thinking Fast and Slow, Kahneman is back with a new book that will again have you questioning what you thought you knew about making decisions. Noise, coauthored with Olivier Sibony and Cass Sunstein, covers another way we make systematic errors in decision-making—in the variability of our aggregated judgements.

For instance, if a group of judges gives vastly different sentences to defendants who committed the same crime—some judges give a one-month sentence, others one-year, others seven years, and others somewhere in between—then one could call the system noisy. We’d expect similar punishments for the same crime. In a biased system, judges might consistently give sentences that are too high for certain types of crimes. Systems can be both biased and noisy. That’s what we’d have if judges are too varied in their sentencing and consistently dole out too harsh of a sentence.

Kahneman and co argue that it’s time we pay more attention to noise. And that’s because reducing noise in a system can help reduce error, just like reducing bias does. The field’s recent attention to bias has overshadowed noise; it’s like we’re fighting systemic error with one hand tied behind our back. The case of judicial sentencing is an example that features in the book. And, in that example, it’s not hard to see how noise isn’t simply a decision-making quirk but a feature of the decision-making systems we’ve set up, and one with serious consequences.

Kahneman and I had the chance to discuss noise over a Zoom call. We covered a lot of ground in our hour-long conversation, which I’ve distilled below and organized in three sections: what noise is and how it differs from bias, how we can measure and deal with noise, and some of noise’s nuances.

What noise is and how it differs from bias

Evan Nesterak: At this stage in your career, after all you’ve studied, you could focus on anything you wanted. What is it about noise that it was able to capture and hold your attention?

Daniel Kahneman: In the mathematics of accuracy, there are two types of error which are equivalent. There is the average of error, which is bias, and there is the variability of error, and that’s noise. I’ve been studying bias all my life, but a few years ago encountered an instance of noise, and I was very impressed both by how much noise there was (among underwriters judging exactly the same thing) and mostly I was impressed by how little people knew about it.

There is a chapter where I have that equation—and it’s completely trivial, yet when you think about it it’s extremely important—that the mean squared error is equal to bias squared plus noise squared. That sets noise as a big problem, because we know that bias is a big problem. In fact, I suspect that in many situations noise is significantly a more severe source of inaccuracy and error than bias is.

Overall Error (Mean Squared Error) = Bias squared + Noise squared. “[The figure above] shows how MSE (the area of the darker square) equals the sum of the areas of the other two squares. In the left panel, there is more noise than bias; in the right panel, more bias than noise. But MSE is the same, and the error equation holds in both cases.” Source: Noise, Chapter 5.

Let’s talk about bias and noise, because our readers will be familiar with cognitive biases. You mentioned how both influence decision-making, but they do so in different ways. Can we dive in more on that distinction?

On the one hand, bias is an average error. On the other hand, it’s a psychological mechanism, and it’s a psychological observation. There are mechanisms that cause systematic errors in people’s judgments and in people’s decisions, and those errors are called biases. And it’s basically a psychological mechanism that explains events inside the individual—why an individual is inclined to make one mistake or another.

I’ve been studying bias all my life, but a few years ago encountered an instance of noise, and I was very impressed both by how much noise there was (among underwriters judging exactly the same thing) and mostly I was impressed by how little people knew about it.

The noise that we are mainly interested in is a completely different phenomenon, because it’s a phenomenon of individual differences. It’s not within any one individual, it’s just variability across individuals. It’s a different story altogether, and they’re not two competing sources of error within the individual. There is within subject noise, which is very confusing, but the noise that we’re really interested in is system noise.

I want to bring up a line in the book that stuck out to me. You write that “bias has a kind of explanatory charisma, which noise lacks.” I was wondering if we could explore that quote a bit.

Bias is found, and you can recognize it, in a single decision. If a woman who is supposed to be hired is not hired, say because she’s a woman, we recognize it in a single decision. Furthermore, there is a causal explanation—that’s where the charisma comes from. There’s causal force to the bias, the bias produces that kind of error.

Noise, in contrast, is something you cannot identify in any particular judgment. It doesn’t make any sense to say that the error in this judgment is produced by noise. Noise, by definition, is a statistical phenomenon. And when you say that a judgment is noisy, you mean that judgments of this kind are noisy that the statistics indicate variability, indicate noise.

Accounting for and remedying noise

For an organization that wants to address noise, how could they could begin? In the book, you describe a “noise audit.” Is that where you would start?

This is our first recommendation. If you have a bunch of employees who are performing an interchangeable function, such as different physicians in the E.R. or different federal judges or different underwriters in an insurance company. If that situation exists, then you can do a noise audit. And we really recommend strongly that anyone who is concerned with that possibility try to conduct a noise audit.

In a noise audit, people are presented with a problem which is realistic, the kind of problem that they could encounter on their job. A set of those interchangeable employees are all presented with the same question and are asked a very precise question—to put a dollar number or in some other way indicate what they expect to happen in that case. Then you just look at the variability of the case. You don’t have to know the correct answer, because what interests you are the variability of judgments. If the judgements are variable, then the errors are variable.

Okay, so you’ve conducted the noise audit. In the case of the insurance underwriters, you write that executives expected about 10 percent variability, but there was more like 55 percent. So as an executive, you realize there’s more variability than you expected—what do you do next?

There are several possibilities. If the judgement is relatively simple, you may ask yourself if you actually need human judgement at all, or you can replace human judgment with some rule or some algorithm. The rules don’t have to be very complicated. Sometimes the rules can be checklists. It doesn’t even have to be a computation. The Apgar score, how to decide whether infants are healthy, is a rule. And it eliminates noise almost perfectly among physicians.

In more complex cases, like underwriters or judges, a simple rule will not do. In those cases, you try to discipline judgment in various ways. The idea is that disciplined judgement is likely to be more uniform, and that the interchangeable people who are making judgments for an organization, if they follow the same thought process, are likely to reach similar conclusions and that reduces noise. We call those steps “decision hygiene,” and those are steps that an organization can take, without considering specific biases, to improve the quality of the judgment process.

You list six different components of decision hygiene. Could you pick one and explain why you think it’s important?

In the first place, what we try to do in decisions hygiene is a disciplined process. It’s not rule governed, but it is disciplined to some extent.

I think the most important example that we have of decision hygiene is that when you’re facing a decision with multiple options, we have a slogan: treat options like candidates. The reason we want to treat options like candidates is there actually is an answer, research-based, on how you should conduct selection interviews and how you should select people who are candidates for jobs. It doesn’t lead to perfect prediction of performance, because that’s impossible, but it’s the best that can be done, probably. And the answer is to break up the problem.

I think the most important example that we have of decision hygiene is that when you’re facing a decision with multiple options, we have a slogan: treat options like candidates.

If you’re going to have to produce an evaluative judgment at the end, it’s breaking up the problem and evaluating various aspects of the option, just like you would break up various traits of the candidates. Do that while keeping the specific judgements a) fact-based as much as possible and b) as independent of each other as possible. You do not want the judgment of one characteristic to be influenced by the judgement of another. So independence and fact-based are two basic processes, and the psychological idea is to delay intuition. Do not eliminate intuition but delay it.

This kind of process that we call decision hygiene is applicable to unique decisions. It’s not only noise reducing, but if you have an executive facing a decision, there is no noise because there’s no nobody else, but obviously anything that improves repeated decisions also would improve unique decisions. Decision hygiene is intended to reduce noise, designed to reduce noise, but it is applicable to singular decisions where noise is completely invisible.

Let’s talk about the case of sentencing guidelines for judges, which was part of a long, multiyear fight. In some cases, as you write, it was the difference between a defendant receiving months or multiple years for the same crime. But those guidelines, which went into place in 1984, were removed in 2005, and now they are advisory, rather than mandatory. Judges were pushing back on the guidelines, saying this is taking away my ability to do my job. I’m curious about the pushback you might receive after you do implement guidelines to constrain noise. How do you overcome that?

In many situations, you can expect some pushback initially. People perceive what is happening as limiting their options. If people feel constrained by it, if they see that as a bureaucratic infringement on their role, then you’re going to get a lot of pushback.

Part of the way that an organization has to deal with this is that the employees who are affected by this have to see this as something that helps them do their job, rather than something that replaces them or constrains them too much. In some organizations like the justice system, it’s hard to accomplish, because the judges feel that their individual sense of justice is the measuring instrument that should be applied, and they’re likely to resist anything that will make them more uniform.

How you could convince judges that there are methods that will make them more uniform and will actually help them do their job better—that is something that so far hasn’t happened. Once the guidelines were removed, noise came back roughly to the previous level and judges were actually happier. They saw [the removal of the guidelines] as an improvement from their point of view.

It’s perhaps not surprising that if the move to reduce noise is perceived by judges as taking away their ability to do their job or they feel they’ve just become an automaton then you’re likely to lose the battle. But perhaps, by reducing noise, judges would have the opportunity to do something more, something new that they don’t get a chance to do currently.

We do not want to eliminate intuition from the process, and by intuition, I really mean that subjective sense that it’s a judgment that you are making. What we’re trying to do is to delay intuition and process information prior to exercising your intuition. If we could convince judges to do that—to engage in a disciplined thought process before they form an intuition or global judgment—that would be a big improvement.

This is really essential. Clearly, if you prevent people, do not allow them to feel that they are doing an intellectual job, a job that occupied their mind, they will resist you. They will sabotage it, and this is definitely not the way to go.

I’m wondering about relativity here. Say we start with a sentencing discrepancy of three months to 10 years, and we reduce noise and now it’s three months to five years. Without the earlier context, three months to five years still feels very unfair, even if it is fairer than it was before. How do we understand this and continue to work to reduce error?

We have to accept that wherever there is judgment, there is noise. Just as you would want to reduce bias—even if you cannot completely eliminate it—reducing noise is a good thing. It improves accuracy.

What you’re saying is really quite interesting, because there is a rhetoric of solving problems—like we want to eliminate bias. Well, you cannot eliminate bias from judgment, not completely. You can reduce bias and you can reduce noise, but you cannot eliminate noise from judgment. It’s part of what makes it human, that it is noisy, that it’s not perfectly accurate. Unless we want our lives governed by rules and by algorithms, we’re going to have to make our peace with that.

We have to accept that wherever there is judgment, there is noise. Just as you would want to reduce bias—even if you cannot completely eliminate it—reducing noise is a good thing. It improves accuracy.

On that point, in what instances should noise reduction not be the goal? In the book, you discuss the distinction between judgment and taste, opinions, or values. When is noise reduction perhaps not the priority?

We really speak of noise, we define noise, as unwanted variability. The situation in which we look at noise is primarily that in which there is an organization that makes judgments and decisions, and it wants to do that with one voice. To the extent that it does that in noisy ways, this is undesirable.

There are many situations in which diversity is actually very interesting and valuable. You don’t want all your film reviewers to be identical and so on. You certainly don’t want any creative process to be identical. Furthermore, there are situations in which you create a team of people with diverse expertise, and they make partial judgments, and you don’t want them to be identical, you want them to reflect the different characteristics of the problem.

The nuances of noise (and other questions about decision-making)

How we feel after decisions can be really subjective, and we can adapt to our decisions over time. I can imagine some people who, if they set up a very disciplined decision-making process, would feel they did what they needed to do to make a decision and they’d feel confident in their choice. I could imagine other people who, if they tried to make their decisions more scientific, it would maybe strip the meaning for them or the story they would tell themselves. I’m curious how you think about these two approaches.

I think you’re absolutely right, and I think there is actually evidence of situations where deliberation doesn’t help you. There is a study that when you’re choosing a poster, then spending too much time analyzing why you like it, why you like it more than another poster, actually may not pay off. That you may be happier when you have a bunch of posters and you pick one. So, I would say for decisions where ultimately the criterion is whether you will like it and it’s simple and relatively small, the evidence suggests that intuitive judgment may be better than analysis.

On a thing that is really complex, like building a house, it really is not the same as living with a poster. Whether it has a pool, doesn’t have a pool, whether there’s a long commute or not a long commute—you don’t have a simple attitude to house, it’s a complicated object. Where it’s a complicated object with many features, a disciplined process is likely to be worth it.

This reminds me of research you conducted with executives, who felt that going with their gut was how they added value. The idea being that through experience they cultivated their judgement. I’m curious what they thought their gut was doing—what made up their gut, so to speak?

There is an intuitive weighting of information. You’re exposed to a story and then there are certain elements of the story that grab you. We tend to create stories, and the process of creating the stories that guide our decisions, that’s where a lot of this comes from. Your gut speaks to you when you have a simple story. When everything seems to point in the same direction, giving you a lot of confidence, then there is a good chance that your gut is simplifying the situation.

What the gut does is it creates coherent stories. It creates coherent stories in an incoherent world—by stripping away some of the difficulty, some of that complexity, some of the internal contradiction. People are really uncomfortable with internal contradictions. They want to have a simple story where everything is pushing in the same direction. And people who say their gut speaks to them, they have that. They have simple coherent stories. It is what gives them confidence, and other people trust them because they’re very confident.

When everything seems to point in the same direction, giving you a lot of confidence, then there is a good chance that your gut is simplifying the situation. What the gut does is it creates coherent stories … in an incoherent world.

What are some of the misconceptions about noise or myths about noise that you’d want to dispel?

I think that the first misconception we want to dispel—noise is a big problem. Noise is generally neglected, we think, and it should not be neglected, it is worth paying attention to.

The second misconception is one that we discussed, which is that reducing noise means being completely mechanistic or that it leaves no role for human judgment. Our attempt is to maintain human judgment and reduce noise.

I’m calling you from Prague right now, so I would be remiss if I didn’t ask about the differences between cultures or different countries as it relates to noise and decision-making.

This is a very good question to which I do not have an answer, and I’ll explain why I don’t have an answer. I view the book as premature in a way. I started thinking about noise six or seven years ago and now a book is coming out. That, in principle, is too soon. That is, when you have a relatively big idea, you know, 20 years is a better time frame than six. I started when I was in my 80s, so I just didn’t have the luxury. There are fascinating questions, like the one you raised of noise in a cross-cultural context, that I would have loved to explore and in 20 years, I would have gotten to it, I and my collaborators. In six years, this is what we managed to accomplish, but your question is a good one.