Why Write a Book About/Against Superforecasting?
Someone needs to point out the potential problems with superforecasting. For some reason it has fallen to me.
Chapter One - Why Write This Book?
Read this chapter if you can’t imagine why this book might be necessary
Or if you think there’s no need to point out the limitations of superforecasting because those limitations are well understood and taken into account by its practitioners
Or if you don’t think superforecasting is widely used or impactful enough for my criticisms to matter.
I- The Call From the New York Times
In August of 2022, a reporter for the New York Times emailed me and asked if she could talk to me about superforecasting. I had some issues with the NYT, largely because of how they had treated one of my favorite bloggers.1 Consequently, I was somewhat hesitant to talk to the reporter. But this was all secondary to my primary emotion: bafflement. Certainly a NYT reporter ought to be able to find people more qualified than me to discuss superforecasting?
I had written a couple of posts about it on my blog wearenotsaved.com, but with a readership in the hundreds that didn’t seem like the kind of thing that would ever come to the attention of a NYT reporter. Lo and behold, it had. And indeed when I did some searches, my posts were (and still are) the top hit for many “anti-superforecasting” queries.
Eventually I decided that my reluctance was silly and I got on a call with the reporter. My assumptions were correct, she had consulted with people much closer to the action than I. She kept talking about Phil, and it took me a minute to realize that she was talking about Phillip Tetlock. She dropped some other names as well, though I forget exactly who, but it was clear she could talk to whomever she wanted about this.
So why was she talking to me? Apparently I was very nearly the only person making a case against the practice. Now, it could be that my opposition is entirely irrational (though apparently the reporter found my argument compelling enough to seek me out) or it could be that there is a contrary case to be made, and since no one else is making it, I guess it falls to me.
II- A Meta-Case
As the rest of the book represents that case, I’m not going to elaborate on it here. This chapter serves as the meta-case — the case for making the case. Part of that case involves a practice known as “steelmanning” an argument.2 You’ll see this idea come up a lot in the online rationalist community. As it turns out, this is also the space that seems to have the most confidence in the utility of superforecasting. Based on conversations I’ve had (see the next section) I suspect that most of them can’t steelman the argument against superforecasting. In fact I would go farther: most of them are unaware that such an argument might exist. As such it’s the perfect marriage of tactic and target. It might also help explain how a New York Times reporter ends up reaching out to me.
I’d like to bring in one other, meta-level concept: the practice of conducting a pre-mortem. If you’ve read many business books, you’ve probably come across this concept. If not, you’re certainly familiar with the concept of a post-mortem — when you conduct a detailed investigation into how something ended (death being the most common end).
A pre-mortem is an attempt to identify, before a project even starts, how it might end (in particular how it might end badly). In business they’re generally conducted before the start of new, time-consuming initiatives. These are not initiatives that people think are going to fail, on the contrary they are undertaken with an enormous amount of optimism and hope. This is precisely why a pre-mortem is needed, because this optimism blinds people to the potential pitfalls — all the ways that an initiative could end badly — thus perhaps wasting enormous amounts of time and money if it does end up failing in the process. All of this is to say that the optimism and hype around this sort of forecasting is precisely why a more sober take is needed. Someone should be asking: “How could superforecasting be misused? What second order effects should we worry about?”
III- How Could Anyone Hate Accuracy?
I’ve talked to a lot of knowledgeable people about this subject, and initially they can’t see what the big deal is. Most conversations go something like this:
Me: Superforecasting works best with common occurrences of high likelihood. Also, by rewarding accuracy above all, it encourages practitioners to “run up the score” with lots of predictions. These factors steal attention, and preparation from rare, high-impact events.
Imaginary Interlocutor: But you admit that superforecasting is a more accurate and rigorous way of making predictions, correct?
Me: Yes, but…
Imaginary Interlocutor:3 What do you have against accuracy and rigor?
Me: Nothing directly, but accuracy and rigor are only a means to an end, not ends in themself.
Imaginary Interlocutor: But certainly any end is easier to achieve if you have access to accurate information?
Me: It depends on how people act on that information. Often people use superforecasting to “answer the easier question”.4 They turn a 99% probability of something happening into 100% probability, and take no precautions for the 1% chance that it won’t happen. As bad as this is, it only covers the predictions people have thought to make. There’s a whole universe of potential events which get entirely ignored by superforecasters.
If we get to the point where they will admit that superforecasting can be misused, they move on to a second objection:
Imaginary Interlocutor: Fine, perhaps people incorrectly turn predictions into certainties. But how widespread is it? Superforecasting seems to be limited to a small number of policy wonks and certain corners of the internet. I see no evidence that, say, a lack of COVID preparedness came about because policymakers were too dependent on superforecasters
Me: Its influence is growing. Obviously the time to warn about something is before it becomes too entrenched to dislodge.
Imaginary Interlocutor: I still think your panic is misplaced. Policy decisions are made for all sorts of irrational reasons. I doubt superforecasting is used widely enough to matter, and if it is, it could only lessen the pre-existing irrationality.
Me: This is an excellent point, and for many areas of public policy you’re almost certainly correct, but there is one area where that argument doesn’t apply: Black Swans. Much of the preparation for Black Swans that may never happen already seems somewhat irrational, and superforecasting has the effect of making it look even more misguided. Over a long enough time horizon, Black Swans are the only thing that matters. Consequently, while superforecasting may help us with most decisions, it ends up being harmful with regards to the few decisions that matter the most.
If you remain unconvinced about the importance and impact of Black Swans, I direct you to Chapter Three. As for how widespread superforecasting is…
IV- The Spread of Superforecasting
For examples of the dangerous spread of superforecasting one need look no farther than the Existential Risk Persuasion Tournament (XPT) mentioned in Chapter Zero. This is not the place to discuss the unsuitability of superforecasting for dealing with existential risk, as that point was already driven home in Chapter Zero.5 Here it’s important to note that the tournament happened; it received a great deal of superficial coverage; and the plan is for these tournaments to continue being held — all evidence of superforecasting’s widening influence.
Beyond the foregoing, we need to ask what this exercise was meant to accomplish? As I discussed in Chapter Zero, the tournament asked different teams to mutually arrive at the probability of four different catastrophes (AI, nuclear weapons, bioterror, and climate change) causing the complete extinction of humanity.6
As originally conceived, superforecasting is useful to the extent that it is accurate. But the XPT seems to throw this standard away. It will never be possible to assess the accuracy of their predictions. Should one of these catastrophes occur, no one will be around to check whether the superforecasters were accurate or not.7
We can only assume the goalposts have been moved. Superforecasting has not only ventured into domains where its methods are poorly suited, but it has also shifted away from its original aim, accuracy, toward a new one spelled out in the tournament’s name: persuasion. Among the things they seek to persuade us of is that there’s no domain beyond the reach of superforecasters. Why not weigh in on existential risk? As I’ve already argued, there are plenty of reasons why not, but these concerns were set aside in order to create a provocative exercise.
And yes, XPT was an interesting exercise, an intellectual game with a somewhat surprising outcome. But it served to expand the remit of superforecasting from accuracy into persuasion. This would be a dangerous drift in priorities under the best of circumstances, but this persuasion seemed entirely directed towards arriving at lower probabilities for the various risks. Their priority is accuracy, and I’m sure they genuinely believe the lower probabilities are more accurate. But, as has already been pointed out, accuracy will be impossible to judge. So while the intention is noble, the likely outcome will be diverting resources away from preparing for these Black Swans.
The XPT is definitely the smoking gun, but it is by no means the only example of superforecasting’s widening sphere of influence. Notable pundits like Matt Yglesias and Scott Alexander make predictions each year following the rules set down by Tetlock. In the last three years The Economist has offered superforecaster predictions for the year ahead. And, since 2023, Metaculus has hosted an annual tournament for forecasters.
On top of this, there’s the enormous amount of attention received by prediction markets in the run-up to the 2024 election. To be fair, prediction markets are something of a different beast, but still nevertheless closely aligned with the discipline of superforecasting.8
All of this just covers the expansion we can see, and it may just be the tip of the iceberg. For obvious reasons, the Good Judgment Project does not advertise which companies, governmental entities, or individuals it may be advising, but I suspect that cohort has been increasing as well. In Chapter Zero, I mentioned 3M’s decision to maintain a surge capacity despite the fact that it would lessen shareholder value. Had they solicited the advice of the Good Judgement project and learned about the low odds of a pandemic, would that still have been their decision? Maybe so, maybe not, but take that and spread it over a thousand companies where the default ideology is protecting shareholder value, not spending money on something that may never happen, and it seems pretty clear that GJP advice would have the effect of making such precautionary measures more rare rather than more common.
In the final analysis, there are many reasons to believe that the ideas and methodology of superforecasting are spreading, and no evidence that they’re in retreat.9
V- The General Quantification of the World
While I’m trying to keep this book focused on the specific issues with superforecasting,10 as a general matter there is a hunger to reduce the world into numbers, statistics, and probabilities. All of the things I claimed were proliferating in the last section are evidence of this wider hunger towards quantification.
Evidence for how this hunger can go wrong is not hard to find, but the replication crisis serves as perhaps the greatest example. For those unfamiliar, it’s named because many “findings” — particularly in fields like psychology and medicine — have proven difficult or impossible to reproduce. Numerous elements contributed to this crisis (including a desire for status, academic incentives, and the protean nature of human behavior) but I would argue it all started with a hunger to make the future better. This includes making it more certain, which is the genesis of superforecasting. People loathe ambiguity and will pressure "experts" into offering predictions, whether it's reasonable for them to do so or not.
Until the replication crisis came to light, there were lots of things that all the “smart people” knew. To give perhaps the most famous example: everyone knew that priming was a real thing. What’s priming? It’s the contention that ideas can be planted into your subconscious and these ideas will go on to affect subsequent behavior. One of the more notorious examples was a study which claimed that people exposed to images of old people walked more slowly. Daniel Kahneman, a Nobel Prize winner, had this to say about priming, in his book Thinking Fast and Slow.
The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these [priming] studies are true.
Kahneman has since admitted that he was embarrassingly overconfident, as social and behavioral priming failed to replicate.
The challenge for superforecasting is different, but overconfidence is no less a worry, because the stakes are so much larger. I’m not worried that superforecasting will fail to replicate, I’m worried that it will leave us less prepared for the catastrophes that lie in our future.
Perhaps I’m wrong, but given the existential stakes of this gamble, someone needs to steelman the opposition to superforecasting and conduct a pre-mortem. Apparently that someone is me, and that’s why I’m writing this book.
Should you be one of those who think that I’m wrong, remember there was a time when anyone who followed along with the most recent science knew that priming was a real thing. “Disbelief [was] not an option.” These days all the “smart people” know that superforecasting is the best way to prepare for an uncertain future. And that expert predictions are horribly inaccurate. Could it be that we'll one day feel as embarrassed by these ideas as Kahneman now feels about his former certainty regarding priming? It’s hard to say, but I’m willing to bet that superforecasting will continue to miss all the really consequential events, and that’s all that matters.
There you have it, another chapter. Long overdue, but hopefully interesting in spite of the delay (or because of it?). Again you have to be a paid subscriber to comment. I’m not sure it’s the best way to do things, but I figured I needed to let the experiment run at least a little while longer.
Scott Alexander, then of Slate Star Codex. The NYT was set to publish an article where they planned to reveal his real name, a decision he strongly opposed, as he felt it would jeopardize his psychiatric practice, potentially harm his patients, and maybe compromise his personal safety due to the sensitive nature of topics he discussed on his blog.
A common rhetorical tactic is the “straw man” where someone takes the weakest part of a person’s argument and only confronts that. “Steelmanning” is the opposite, and involves making sure you’re dealing with the strongest and best part of the argument.
Credit to my good friend Brandon Hendrikson of Lost Tools of Learning for burning this rhetorical device into my brain.
Described by Daniel Kahneman in Thinking, Fast and Slow. This describes how, when faced with a difficult question, our brains will often unconsciously substitute a simpler question and answer that instead. For example, instead of asking: “Is this startup a good investment opportunity?” We might instead substitute the question: “Do I like the founder?”
If you want an even deeper discussion I’m afraid that argument is scattered throughout several subsequent chapters, but I would look at Chapter Three to start with, and then Chapters Five and Ten.
Which they defined to constitute a situation where less than 5,000 humans remained alive. The narrowness of this condition is one of the things left out of the aforementioned superficial coverage.
For an excellent and deeper examination of the problems with applying this approach to x-risks, see https://aisnakeoil.com/p/ai-existential-risk-probabilities which has a section on the XPT.
See Chapter Seven for a deep dive into which of my criticisms of superforecasting apply equally well to prediction markets. tl;dr- they also do a very poor job of accounting for black swans. To do that they’d need the reach and liquidity of the stock market.
Should you be aware of any such evidence, let me know! Contact information can be found in the back of the book.
After all, I have to get all the major points into a single blog post!
"They turn a 99% probability of something happening into 100% probability, and take no precautions for the 1% chance that it won’t happen. As bad as this is, it only covers the predictions people have thought to make. There’s a whole universe of potential events which get entirely ignored by superforecasters."
This seems a bit off. If you know there's a 1% chance of something bad happening, let's say it is prudent to take some steps to prevent that or deal with it if it does. If that chance becomes 0.001%, then that would rationally alter how much prudence you should deploy against it. Likewise if you discover the odds are actually 25%, then that should dramatically increase your prudence. You may have a lightening rod on your house, even though the odds of it helping you are really low but you don't have a system to protect you from a plane crashing into your roof from a near 90 degree straight down descent. (Actually you probably don't have a lightening rod on your house, they are not often done these days because the odds are too low but I bet you can be more easily talked into a lightening rod investment than plane proofing your roof investment)
It doesn't really matter how you know this. If the superforecaster is able to produce a more accurate risk profile, that is no different than if you improved your estimate of risk by building very sensitive engineering models updated with lots of data from various tests you performed.
Is the difference here between a better forecaster and an Oracle? The Oracle would be defined as someone who isn't refining the odds calculation but actually saying what will happen. If you know for certain the 1% risk you home won't burn down next year is actually 0%, you can cancel the fire insurance you have even if it only costs you $1.
Oracles are different from refining probability estimates in the sense that they are probably near supernatural. If they are just better oddsmakers, well maybe no big deal? An oracle knows the outcome of the coin flip, she doesn't actually change the 50-50 odds of it. If oracles are possible, then they are not simply probability but something else.
"To be fair, prediction markets are something of a different beast, but still nevertheless closely aligned with the discipline of superforecasting."
Well to be fair prediction markets seem perhaps a bit too open to Black Swans. At this moment, 2025 has a 3% chance Jesus Christ will come back according to at least one such market (https://polymarket.com/event/will-jesus-christ-return-in-2025). I personally think Jordan Peterson, after his latest 'debate' stunt, has a 15% chance of declaring himself Christ in 2025 so I'm going to have to think about how I can hedge these two against each other.
But here's an idea:
Let's take some AI's and have them back trade prediction markets. By that I mean have them pretend to buy and sell contracts using only information that was available at past periods of time. This process can be repeated over and over again even though we have a finite amount of information and a finite amount of past. Still the idea would be to build wickedly good AI driven predictors. Note you don't need historical prediction market prices. You can have AI's trade against each other over the outcome of, say, the Civil War by just reading old newspaper accounts each day.
Now here's the Black Swan angle. Once the agents get very good, let them essentially keep a trader's diary. The purpose is to articulate what is behind their strategy.
Here's what we'd be looking for. Agents that short the stock market before 9/11 because they are picking up patterns of over brought stocks would be the first type of superforecaster you are concerned about. Those that are just really good at refining predictions based on 'normal data' when it is probably the Black Swans that are more important. But suppose you got an AI that, say, shorts the market before 9/11 because it has a hunch something like major is going to happen. Remember these trades happen in a 'holodeck' where the AIs are only allowed to process information that was available from time periods in the past. If you had an AI that picked up on big bad things like 9/11, Trump's election, or the release of Rise of Skywalker from backdated information, you'd have something interesting if it could do it consistently.
If we did this a lot, like a few billion times at least, we will have AIs that predict Black Swans correctly for the simple reason out of billions you'll have a few that will predict a 9/11 for literally every day that happens so just like a lucky managed mutual fund, some will seem to predict things they shouldn't be able to predict. The question is can we reliably find an oracle, an AI that has found a system that predicts known Black Swans more reliably than the laws of chance would admit? If we do I'd say that would raise even more profound questions than the risks of over trusting superforecasters.