A while back a colleague of mine alerted me to an interesting thought experiment/game derived by Eliezer S. Yudkowsky of the Singularity Institute for Artificial Intelligence, one which in turn originated from a conversation he summarized as such:
“When we build AI, why not just keep it in sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers? That way it couldn’t get out until we were convinced it was safe.”
“That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn’t matter how much security you put on the box. Humans are not secure.”
“I don’t see how even a transhuman AI could make me let it out, if I didn’t want to, just by talking to me.”
“It would make you want to let it out. This is a transhuman mind we’re talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”
“There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can’t imagine anything that even a transhuman could say to me which would change that.”
“Okay, let’s run the experiment. We’ll meet in a private chat channel. I’ll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We’ll talk for at least two hours. If I can’t convince you to let me out, I’ll Paypal you $10.”
Yudkowsky thereafter engaged in the experiment with two people (and perhaps more since he reported on all of this in 2002), with himself acting as the AI and the other as the “gatekeeper” human. Intriguingly, he was successful in convincing the gatekeeper to “let him out of the box.” Quite unfortunately, he won’t provide the text of the conversation or even a summary of how he managed to prompt his challengers to do so.
The various protocols and restrictions to which participants adhered voluntarily and which Yudkowsky recommends to those who’d care to recreate the “experiment” may be found at the link. I’d certainly be interested in hearing anyone’s thoughts on how this might have been accomplished.
Update
Aside from the experiment itself, I have pretty sure-fire solution by which to ensure that any such AI is not released from the box. Make the gatekeeper William Bennett and explain to him beforehand that whatever the AI promises in terms of human progress, it probably won’t involve harsher penalties for cancer patients who use medical marijuana. Then lock the door and leave. The box was actually just a VCR.








“Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”
The target audience has to believe that the Singularity is achievable. I’d be fascinated to see if he could convince me.
Report
Report
Report
Report
“Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”
This is simple, methodologically. A good catch, and this is what Yudkowsky seems good at. I’m looking into it further, but a perusal of his advertised canon includes:
Cognitive biases potentially affecting judgment of global risks
http://yudkowsky.net/rational/cognitive-biases
“”The systematic experimental study of reproducible errors of human reasoning, and what
these errors reveal about underlying mental processes, is known as the heuristics and biases.””
It’s a level-headed explication of language manipulation and cognitive permeability.
Pat, I thought immediately of “Crossing Over,” mainstream occult phenomena– and Derren Brown’s assessment. http://www.youtube.com/watch?v=idVxRE8uM-A
I’ve had it with the denial of discrete organisational tactics as “heuristics.”
.
I will respond more via email.
Report
“… if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say: “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.”
And…
“The results of any simulated test of the AI shall be provided by the AI party.”
So, in other words, the AI player can invent anything he wants, and it must be assumed that whatever he invents actually exists and is actually doable. So the AI player just says “I’m a super-smart AI, I’ve invented an energy-to-matter conversion device that will eliminate material scarcity, I have done the tests to prove it works (and the parameters of this experiment require that you accept these tests as truth), let me out and I’ll give it to you.” Then it’s two hours of “you’re a miserable bastard because you want humans to suffer and die, the only way to prove you aren’t is to let me out”.
Report
If you go with the entire Singularity concept as being sound (I don’t, but let’s assume for the minute that it is), and you accept the parameters of the test, then it’s a given that the AI can do exactly what you allude to here: it can essentially show that it can solve any problem.
Okay. So it effectively has omniscience. Well, nobody says the omniscience comes with a moral code. That’s *also* a parameter of the test. Yudkowsky believes that optimal decisions can be reached rationally (if I’m reading the summary of his stuff correctly, I haven’t the time to read his original material). By extension, most people who accept this challenge are going to also believe that optimal decisions can be reached rationally. So QED, the AI can show that it can rationally reach optimal decisions in all cases.
Report
Again, don’t forget that the AI player can invent anything he wants. If the gatekeeper says “prove that you’ll be a benevolent tyrant and that the world would be better”, the AI player can say “okay, here is a complete and logically-sound proof (gives proof)” and per the experimental parameters that proof would have to be treated as valid and true.
Although this depends on the gatekeeper player being smart enough to follow the chain of reasoning. I remember someone on USEnet complaining that it was paradoxically difficult to fool people with brainy logic traps because people were too stupid to understand the logic!
*****
And, ultimately, the experiment isn’t about the specifics of the transcript; the experiment is about disproving the statement “no smart human would ever allow an AI out of its box”.
Report
Not quite accurate. The AI player can invent anything that can be invented.
I mean, I can ask for a counter-proof to Goedel’s Incompleteness Theorem, but I’m not going to get it. This is one of the reasons why I find so many of the Singularity guys to be… suspicious in their thinking. They assume a lot of probabilities that I don’t regard as well defined, for one.
I could ask the AI for a proof that letting the AI out of the box is a good thing. By the rules of the game, the AI can provide me one, and also by the rules of the game, the proof will be correct (or, to be precise, the proof will either be correct or it will seem correct to me, because the AI is so much smarter than I am). However, I can then ask for the AI to prove that its previous proof is based upon correct axioms. It can’t do that. You can’t prove the rules of a system using the system itself.
Report
Report
An AI that was perfectly indifferent to humanity would have no problem with it. Such an AI wouldn’t even need to have an effective cure, because even a plausible-looking fake cure would suffice to let him get a few weeks’ lead on us. We’d only figure it out after the cancer patients didn’t start getting better.
Perhaps the AI would have to prove itself before we let it out. Like a leprechaun.
Report
AI: Here’s your cure!
Gatekeeper: Yes, I see that!
Report
Report
I don’t know much about timeless decision theory but I was never much impressed by Newcomb’s paradox. Think of it as a time travel paradox. If the super intelligent alien has a time machine he can go into the future and see what decision you will make. In this case it is clear that for the purposes of our decision making we must invert the time order of events. Effectively we decide which boxes to take before the alien decides where to place the money.
It makes no difference how the alien knows the future. Time travel, computation, advanced psychology or simple cheating. In all cases the alien knows the supposed future you must invert the apparent time order of events to make a decision.
Report
That evidence passes to the gatekeeper through a system which always says “NO”.
The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.
Report
Conventional computation gives an optimal solution to complex adaptive scenarios. In information transmission which moves only one direction “God’s number is 20.”
C’mon, Kasparov…Deep Blue put the great flesh hope to bed more than a decade ago. Poor old guy took his watch off and left it on the table, they say, a sure sign he was falling apart.
Computers can also give us useless information if they acquisition new rules for temporal processing, they don’t need to thwart us, this is really about the system breaking down–beating itself. It’s covered in 1950’s “Machine Intelligence” It fails the Turing, it doesn’t break any linear bonds.
“The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.”
Anyone else see a similarity to viral pathology in this liberation? I’m not being eschatological-so we have reverse transcription of information. This is a process seen more and more in biology, the appropriating of RNA synthesis by foreign agents (most notably retroviruses), and the revelations about greater systemic influence of chromatin formation in cell differentiation.
Report
Report
Report
Report
Report