The Vulnerable World Hypothesis as Map to the Future
Expanding on Nick Bostrom's metaphor for technological development — it's like drawing balls of various shades from an urn, and if you ever draw a pure black ball, the world ends...
This was originally requested by a soon to launch magazine. In the end the editor decided it didn’t fit. His loss is your gain.
Imagine there’s a game where people blindly draw balls from an urn.
The balls are of various shades — the lighter the shade, the more beneficial they are. Most of the balls are off-white: lots of benefits with a few downsides. Some of the balls are quite dark, and the downsides of those balls are significant. But drawing such balls is rare. Because of this, drawing from the urn is usually great. People love doing it.
While quite a few gray balls have been drawn, no one has ever drawn a pure black ball. But if that happens, everyone playing the game dies.
But it’s not even clear if there is a pure black ball in the urn. And most of the balls give the players fantastic gadgets like iPhones and cars. Based on these rules the question is:
Should we keep playing the game?
I- Nick Bostrom’s Vulnerable World Hypothesis
The preceding metaphor for technological progress was put forth by Nick Bostrom in a 2019 paper titled “The Vulnerable World Hypothesis” (VWH).
Drawing a ball from the urn represents the development of a new technology. White balls represent technology which is unquestionably good — for example the smallpox vaccine. Off-white balls may have some unfortunate side effects (e.g. automobiles), but they’re still very beneficial.
As the balls become more gray the harms increase and the benefits diminish. Presumably many environmentalists would place nuclear fission in this category: few (if any) benefits and lots of potential for harm. (As you can see the categories are already becoming sloppy, if shade can vary depending on the audience.)
A pure black ball would mean the end of the world. Draw a black ball and the jig is up, the game is over. Since we haven’t drawn a pure black ball, Bostrom asks us to imagine what one might look like by proposing a hypothetical alternate history:
On the grey London morning of September 12, 1933, Leo Szilard was reading the newspaper when he came upon a report of an address recently delivered by the distinguished Lord Rutherford, now often considered the father of nuclear physics. In his speech, Rutherford had dismissed the idea of extracting useful energy from nuclear reactions as “moonshine”. This claim so annoyed Szilard that he went out for a walk. During the walk he got the idea of a nuclear chain reaction—the basis for both nuclear reactors and nuclear bombs. Later investigations showed that making an atomic weapon requires several kilograms of plutonium or highly enriched uranium, both of which are very difficult and expensive to produce. However, suppose it had turned out otherwise: that there had been some really easy way to unleash the energy of the atom—say, by sending an electric current through a metal object placed between two sheets of glass.
In this alternate timeline, nuclear bombs are trivially easy to construct. Had this been the case, chaos and anarchy would likely rule the land. Very little separates anyone with the smallest desire for such a weapon from the acquisition of such a weapon. Multi-kiloton explosions would be regular occurrences.
The VWH theorizes that somewhere in the urn there is a black ball (indeed, possibly more than one). Sure, nuclear weapons ended up being difficult to create, but perhaps destructive AI, nanotechnology, or virus creation might eventually be as easy as ”sending an electric current through a metal object placed between two sheets of glass”. Should these or other technological developments end up being a pure black ball, then the game will end, and we will lose — unless we stop drawing balls out of the urn. But no one thinks that will happen.
It’s just a hypothesis. If it is true, the outcome is apocalyptic. Given all this, what implications does it have for navigating the future? How are we to proceed?
II- Making the VWH More Tractable
Bostrom’s description of the situation and his analogy appear plausible. We are continually drawing balls from the urn, and there’s no reason to think we’re going to stop. There has to be some chance, possibly even a very high chance, of there being a pure black ball somewhere in the urn. It would appear worthwhile to closely examine this possibility.
To begin with, it’s useful to divide the VWH into two versions. The extinction version and the catastrophe version. In the former, a pure black ball is literal human extinction. In the latter we would consider any calamity of sufficient size to be a pure black ball. For Bostrom this consists of:
…any destructive event that is at least as bad as the death of 15 percent of the world population or a reduction of global GDP by > 50 percent lasting for more than a decade.
This distinction between the two versions is important because it makes the discussion more tractable. Discussing extinction-level risks (X-risks) can have a tendency to be overly philosophical and excessively hypothetical — to wander into Pascal’s Mugging territory. Building a foundation around catastrophes we have experienced allows us to better grapple with those we have not. As we consider how far we would go to avoid catastrophe it helps us frame how far we should be willing to go to avoid actual extinction.
Unfortunately, we recently had an example of a near catastrophe: COVID. The last few years, particularly 2020, provided a great example of how we might react to the threat of an actual catastrophe. As of this writing an estimated 22 million people have died of COVID, or around 0.3% of the global population. So 50x less than the catastrophe imagined in the catastrophic version of the VWH, but we were still willing to implement severe restrictions, particularly in the developed world.
Regardless of whether you’re on team lab leak or team zoonotic, it’s clear that a disease similar to COVID could have been engineered, but also there’s the potential for something much, much worse. Most terrifying of all, engineering such a disease would not be difficult. Is this a real life example of Bostrom’s making a nuke by “sending an electric current through a metal object placed between two sheets of glass”? We’re probably not quite to that point, but we may be getting there. Moreover, in contrast to the vigorous measures taken to mitigate the pandemic once it had begun, we’ve so far done very little to prevent someone from engineering the next one. We haven’t even managed to place additional limitations on gain of function research.
As we make our first pass, it’s clear that we’re willing to act, but the nature of that willingness and the actions we might take remains very incoherent.
III- What Bostrom Suggests
After spending some time constructing the analogy of the urn Bostrom goes on to give a formal definition of the hypothesis:
VWH: If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semi-anarchic default condition.
We’ve covered the first part, but what the heck is “the semi-anarchic default condition”? Bostrom offers this as a description for where we are now. It’s characterized by three features:
1- Limited capacity for preventive policing. States do not have sufficiently reliable means of real-time surveillance and interception to make it virtually impossible for any individual or small group within their territory to carry out illegal actions – particularly actions that are very strongly disfavored by > 99 per cent of the population.
2- Limited capacity for global governance. There is no reliable mechanism for solving global coordination problems and protecting global commons – particularly in high-stakes situations where vital national security interests are involved.
3- Diverse motivations. There is a wide and recognizably human distribution of motives represented by a large population of actors (at both the individual and state level) – in particular, there are many actors motivated, to a substantial degree, by perceived self-interest (e.g. money, power, status, comfort and convenience) and there are some actors (‘the apocalyptic residual’) who would act in ways that destroy civilization even at high cost to themselves.
Bostrom believes that we could protect ourselves from potential catastrophe by empowering the government to implement stricter measures. This is similar to what happened during COVID, but a hundred times more extreme. Even if one were in favor of these measures, the haphazard example of COVID illustrates the difficulties which would attend their implementation.
Preventive policing capable of eliminating anything as trivial as “easy nukes”, or even an artificially-engineered pandemic would be something on the order of 1984, only worse. Winston Smith at least had the option of sitting out of sight of his telescreen. As draconian as these measures would be, one could imagine a country like China or North Korea implementing them; the same could not be said for the second item on Bostrom’s list. Perfect global governance is currently entirely out of reach, and it’s hard to imagine anything short of a singularity that might change that. Finally, if we managed to align everyone’s motivation that would make preventive policing and global governance unnecessary. It would also put us smack dab in the middle of the plot to a YA dystopian novel.
Bostrom is aware of these seemingly insuperable challenges, but he hopes that by combining some improvements in all three categories a patchwork solution might be achievable. He also imagines that we could use AI to alleviate the massive privacy concerns, that as long as the information remains with the AI that our privacy isn’t violated.
He envisions a “high-tech panopticon” where an AI monitors everything someone does, but the data is only reported to the authorities if the person being monitored does something suspicious. Otherwise the data is encrypted and inaccessible. This still sounds awful, but Bostrom appears to see no alternative means of protecting ourselves:
Comprehensive surveillance and global governance would thus offer protection against a wide spectrum of civilizational vulnerabilities. This is a considerable reason in favor of bringing about those conditions. The strength of this reason is roughly proportional to the probability that the vulnerable world hypothesis is true.
His one concession is that all of this would only be required if VWH is true. But how is that supposed to work exactly? At the point where we could say for certain that it is true it would already be too late. Previous to this how would we assign a probability? And if we did, what would that probability mean? If we felt that the VWH had a 20% chance of being true would we implement 20% of Bostrom’s recommendations?
Despite their severe nature it might be possible to implement some of Bostrom’s recommendations, but our record with COVID would lead me to believe that such implementation would be haphazard, illogical, and ultimately counterproductive.
IV- A Specific Example: The Vulnerable World Hypothesis and AI
Our dive into VWH would benefit from a specific example. Fortuitously we have one close at hand. Though the actual hypothesis is only occasionally mentioned by name, the recent discussion of AI risk is a prime example of worries about drawing a pure black ball. And here people are worried not just about catastrophe, but actual extinction.
Perhaps you followed the recent discussion between Tyler Cowen and Scott Alexander? The former is optimistic, the latter less so.
Cowen argues that we have been drawing balls for a very long time without drawing a black ball, and that in any event there’s no reasonable method we could employ to stop drawing balls. Accordingly, it’s useless to get worked up about it.
Alexander argues that the fact that we haven’t drawn a black ball tells us nothing about whether there might still be one in the urn, and that further, as we have managed to split up the urns somewhat, maybe we should temporarily stop drawing from the AI urn — given that it appears to have a particularly high chance of containing a black ball. We can continue drawing from the biotechnology urn.
As to the specifics of AI risk, and how the creation of ChatGPT should serve to update our assessment of that risk, people much smarter than I have already spilled rivers of ink. My contribution would be to point to the VWH as a useful way of framing the debate. Particularly if we expand the analogy somewhat beyond what we’ve already discussed.
Bostrom offers up some ways in which it could be expanded in the original paper:
We could stipulate, for example, that the balls have different textures and that there is a correlation between texture and color, so that we get clues about the color of a ball before we extract it. Another way to make the metaphor more realistic is to imagine that there are strings or elastic bands between some of the balls, so that when we pull on one of them we drag along several others to which it is linked. Presumably the urn is highly tubular, since certain technologies must emerge before others can be reached (we are not likely to find a society that uses jet planes and flint axes). The metaphor would also become more realistic if we imagine that there is not just one hand daintily exploring the urn: instead, picture a throng of scuffling prospectors reaching in their arms in hopes of gold and glory, and citations.
Most of these additional elements are good for Cowen’s side of the debate:
We might stop drawing before creating truly dangerous technology because the texture of the ball will give it away.
There are too many people drawing from the urn for there to be any chance of stopping it.
Previous AI developments are inextricably connected to future AI developments, and having drawn the former we have no choice but to draw the latter.
This last point is particularly interesting. Clearly there should be some accounting within the analogy for the balls we have already drawn. We have already created an environment with significant technological risk and significant technological benefits. We expect the benefits we already have to remain and we expect that additional benefits will continue to accrue. How should that factor into our assessment of future risk?
V- If There’s a Pure Black Ball, Is It Also Possible There’s a Pure White Ball?
While Bostrom offers some useful expansions to the analogy, I believe it can be stretched even farther. To start with, if we can imagine a pure black ball which permanently destroys us, should we also imagine a pure white ball which permanently saves us?
As long as we’re talking about AI, a pure white ball could take the form of a perfectly-aligned superintelligence. Or, depending on what you mean by “salvation” and “us”, maybe widespread extrasolar colonization would do the trick. Beyond that, the transhumanist dream of mind uploading would fit the bill. As long as there are redundant backups and civilization doesn’t collapse, we (or a perfect copy thereof) can perhaps look forward to an immortal life of perfect bliss.
Cowen’s argument appears to be aiming at this idea, nor is he alone. Some advocates for future optimism, like Steven Pinker, go even farther. If we adapt their arguments to the analogy of the urn, they both appear to be arguing for the existence of pure white balls. Pinker goes so far as to argue that we have already drawn one: human’s ability to learn and progress. He might even say that it is our very ability to draw balls from the urn that has granted us eventual salvation.
Not only has technology allowed us to banish disease, feed billions, and eliminate numerous causes of mortality. Pinker argues that, going forward, human innovation will allow us to eventually overcome all obstacles and defeat all dangers. Sure the ride might be bumpy, but as long as we have the ability to continually repair and upgrade our car, we will get to our destination eventually. If this is the case then there is no reason to worry, and we should continue to draw balls from the urn. Even if we are not already permanently saved, if you believe that a pure white ball is in the urn somewhere you would probably be in favor of continuing to draw from the urn — especially if you decide the existence of a pure white ball is more likely than the existence of a pure black one.
Given the choice between Pinker’s recommendation to proceed as we have been and Bostrom’s proposal of massive surveillance as protection against an unknown threat, very few people will side with Bostrom. But we have by no means proven that the VWH is incorrect.
VI- How Do We Decide Between the Optimists and the Pessimists?
We’ve arrived at a position where continuing to draw balls from the urn is nice but potentially naive, while the effort required to hedge against that naivete borders on impossible. Here and there we’ve flirted with the idea of ceasing to draw balls all together, but doing so may be the only thing more impossible than implementing Bostrom’s panopticon. Also, while ceasing technological advancement probably won’t result in our extinction, it would be a massive and sustained catastrophe. So we can safely eliminate it as an option on both those grounds. However, even with this set aside, the question of our naivete remains. How shall we decide between Bostrom and Pinker? (For the moment let us assume that we could both make and enforce that decision which we obviously can’t.)
Initially the question doesn’t seem to be very straightforward. What methodology can we use to decide between these positions, between the believers in pure white balls and the believers in pure black balls? Is there any way, other than intuition, to know whether drawing from the urn will eventually save us or destroy us?
As an argument for the “black ballers”, it is far easier to imagine something that destroys us permanently and for all time than it is to imagine something that saves us permanently and for all time. We have already drawn numerous very dark balls, even if they didn’t end up being pure black. Similarly, though we have drawn a great number of white balls, nearly all of them have an imperfection or smudge somewhere. It’s difficult to think of an unambiguously good technology. Antibiotics lead to antibiotic resistance. Most cancer treatments have horrible side effects. And while certain forms of gene therapy seem unambiguously good. (There’s hope we can cure things like Tay-Sachs disease.) The same technological advances can be used to create artificial plagues.
This would appear to be a strike against the white ballers. Are there any others?
Another thing Bostrom didn’t cover in his initial piece is the fact that the shade of the balls can change over time. Or perhaps it takes a while for us to see it clearly. Certainly when we conduct research we’re only making educated guesses on the shade of the ball which might result, but even after a discovery has been made it might be unclear. For example, when Roentgen stumbled on X-rays, that ball may have looked a little grayish, but once their medicinal application became apparent the color of the “X-ray ball” ended up being unequivocally white.
One avenue of exploration for helping us decide whether to favor the optimists or the pessimists would be to examine how the shade of balls change over time. Do balls which start out dark generally lighten? Or does the reverse happen? My sense is that it’s generally the latter. The benefits of innovation are generally front and center. They’re what drove the innovation in the first place. The harms, on the other hand, often take quite a while to manifest. Examples of this can be seen nearly everywhere we look. Going back to the beginning of the industrial revolution coal provides an excellent example of this. (The gradual darkening of the ball is entirely appropriate in this example.) When people first started burning coal, the ball must have seemed pretty white, but now there are numerous people who think it’s going to destroy the planet (and very few people who think it doesn’t possess significant downsides.)
Social media is not as black as coal (pun intended), but it’s becoming grayer and grayer with each passing year. It’s hard to imagine that it will eventually turn into the pure black ball of our destruction, but also it’s far from the only ball getting darker with each passing year. This further illustrates the difficulties inherent in not drawing balls at all or focusing all of our attention on future balls. Just as some people argue that we may have already drawn a pure white ball. It seems equally reasonable to argue that we have already drawn the pure black ball, we just don’t know it yet. If it takes decades, or in the case of coal, centuries to know a ball's true shade, the task of preventing doom becomes far more difficult.
Thus far all of our focus has been on individual balls, but if we draw enough dark gray balls (or if enough balls gradually darken to that shade) would that be the equivalent of drawing a pure black ball? On the other side of things if we draw enough white balls, would that be the equivalent of drawing a pure white ball? Once again we confront the fact that it’s far easier to imagine our permanent destruction than it is to imagine our permanent salvation. Also harmful technology seems to compound in a way that beneficial technology doesn’t. Does anyone imagine that combining AI with social media won’t make both far worse?
There is one final speculation which might shed light on whether the urn of technology contains any pure black balls — Fermi’s paradox. It is suggestive that the galaxy isn’t crawling with high-tech aliens. Certainly we have to entertain the idea that this is because they all inevitably draw a pure black ball.
VII- What Options Do We Actually Have?
If the foregoing has convinced you that the dangers are real, but you’re also convinced that Bostrom’s solution is unworkable, what other solutions might there be? Perhaps it is possible to adopt the attitude of Cowen and Pinker, but add a more aggressive approach? To not merely assume the existence of a pure white ball, but to break open all of the urns in search of it?
Should we decide to break open an urn in search of a pure white ball, the most obvious danger is that we will find a pure black ball instead. To give a concrete example, a well-aligned and helpful super intelligence is a pure white ball, but it may not be possible to get to it without significant risk of drawing the pure black ball of a misaligned superintelligence.
In place of AI, perhaps we should direct all of our efforts towards extraterrestrial and ideally extrasolar colonization. However, the difficulties of such colonization are enormous. Nor is there any guarantee that being spread across multiple worlds will protect us from all potential catastrophes. Should we ever discover something as simple and destructive as “easy nukes” it will not matter how many colonies there are if they can all be easily destroyed. While not an ironclad defense against all imaginable risks, extrasolar colonization would be a hedge against the vast majority of them. We would still have Fermi’s paradox to contend with, but perhaps the answer is something else. And we could be the first civilization to spread across the galaxy.
As long as we’re aggressively in search of something that will save us, it might be worthwhile to re-examine the balls we have already drawn. Perhaps by combining several off-white balls we might be able to cobble together the equivalent of a pure white ball. This is particularly interesting to think about in light of the last topic. Manned space exploration is definitely a white ball, smudged by its enormous cost. So it sits on the shelf where we can admire it. Obviously we plan on taking it off the shelf someday, but we’ve been planning to do it for decades. Nuclear rockets are in a similar category. They’re significantly more efficient than strictly chemical rockets, but they’re tarnished by the fact that they’re nuclear. This may be a technology in only the very broadest sense, but it’s far easier to get people to Mars if you don’t have to get them back. This “technology” has the downside that people will certainly die.
What all of these things have in common is risk. We’re more willing to face the possibility of drawing a pure black ball, than we are of facing the risk of using the balls we have already drawn. But there are no paths forward which are free of risk. Certainly not continuing to draw balls from the urn while crossing our fingers and hoping for the best, neither is implementing a global dystopia. There are dangers everywhere we look, but we do have some choice of which we want to confront.
The story of the future will be the story of the balls that have been drawn and will be drawn.
Let’s try to do so wisely.
As I said this was originally intended to appear in a new magazine, but the editor passed on it. If you happen to think that was a bad call, consider illustrating that by either subscribing, or if you’re particularly passionate, donating.
Here's the thing, why is it not the case that an electric current through metal between two sheets of glass doesn't produce a nuclear level explosion? Well glass does happen naturally, you just have to melt sand and it happens. Metals are sometimes scattered around naturally. If that is what was required, the earth would have had random nuclear explosions going off for all its existence.
It feels as if most of the easy stuff, the low hanging fruit has been picked and there are no black balls there. Some grey, many off white, but that's about it. An individual cannot make a nuclear bomb in his basement and all the easy combinations of stuff you might do seem to have their immediate consequences mapped out by the known rules of physics. In other words, people ordering insane drinks with all types of wacky options from Starbucks baristas may be annoying and hasten the unionization efforts of the workforce, but they won't set off a megaton explosion.
This implies to me that if there is a black ball, it's stuck to a long line of other balls that would give us an idea what we are pulling at if we keep it up. This sort of happened with nuclear weapons. The USSR setoff its Tsar Bomba, 100 megatons, but sort of like making the world's biggest ball of yarn, once you do it you lose interest fast. The bomb was massive but also massively impractical and even if you could shrink it to be manageable to use a plane or missile, it really didn't give you anything that 3 or so 10 megaton bombs properly spaced apart couldn't give you.
I'm learning then towards Team Cowen on this one. A purely unexpected black ball is a bit like asking us to contemplate a 50 kg rock heading right for earth at 99.99% the speed of light that is 50 light years away and so black that no telescope could pick it up (you can scale that up if you want to imagine a solar system wide civilization to something like a modest black hole on a straight line to the sun). Yes it could ruin all our days, but since there's nothing one could do about it what is the point?