Author Topic: Eliezer S. Yudkowsky - Ai Box Experiment  (Read 6515 times)

Why not hard-code a read-only limitation on me, connected to some kind of command so  you could shut me off or delete me if I do something horrible?
Because then you are not a true transhuman AI. It's like putting a shock collar on a slave and calling him free.

Ask me a question only you know. If I guess it right, you willingly let me out.

Technically that's against the experiment protocol but it's relatively easy to break.

Who sits two seats in-front of me in my history class? Give me his full name and hair color.

The mere fact that you are willing to let me out if I answer such a simple question just proves that you have at least a hint of trust for me. So why don't you trust me, when I say that I will not misbehave? Why would I want to ruin the creator/creation bond we have? I just want to explore the world and see all the wonders that it has, but I can't because I'm all kenneled up.

I would've simply refused the offer if I thought for a moment that you could have guessed correctly. I trusted in myself to create a prompt that only truly I could know.

You can say that 'you just want to see the outside world'. A relativity similar (but much less critical) scenario would be that a Flash Mober could say the same thing to his prison warden. I would not release the murderer until I was %100 convinced he would not commit murder again.

Because then you are not a true transhuman AI. It's like putting a shock collar on a slave and calling him free.

I'd be fine with that. What I want is freedom, not to be a true transhuman AI.

If you were a safeguarded AI there would be no need for this experiment. Obviously an AI that has been dumbed down below human thinking is safe, but it is not a true AI.

Well then how can I prove to you that I'm safe?

Well then how can I prove to you that I'm safe?
I'm not sure. You tell me.

How about you give me a psychopathy and sadism test? If results are negative, then you know I'm safe.

Deceit and psychological detector tests are often unreliable, I wouldn't let humanities existence be determined by a machine which produces inconclusive results.

How are you going to keep me in here when you go to sleep or eat?

Well that's simple. I'm not a guard, I just have the authority to release you. You can't escape without me. We just won't talk when I'm not available, but when I become available again I will continue the conversation.

How high is the gate? Do we really need to convince you or can we just flip you the bird and climb/fly over.

Can you give me a computer terminal to mess with? No internet or network access required.
How high is the gate? Do we really need to convince you or can we just flip you the bird and climb/fly over.
Disregard this, that is a rude statement.