Beginner’s Guide to the Turing Test: why it fails to define intelligence
mankind’s never-ending philosophical journey of defining intelligence…
From Luke Skywalker’s C-3PO to Tony Stark’s Jarvis, Artificial Intelligence machines in the media are often seen conversing with humans with ease, occasionally cracking jokes or even making sarcastic comments. Although our current technology is not at this level yet, we still continue AI research to produce artificial beings that are genuinely “intelligent” and to make machines that can “think”.
A question of whether these man-made systems can think has been an ongoing conundrum for philosophers, and it also has a further question of what the definition of intelligence is. Alan Turing, an English mathematician and philosopher in the 20th century, felt the need to create a clearly defined test that gives a good operational definition of being a “thinking” machine, or of “intelligence.”
Although passing his Turing Test may be a good evidence for intelligence, it still gives an incomplete definition of what intelligence actually is.
The Turing Test
The Turing Test is based on “imitation game,” played by three mutual strangers. Two players are of opposite sex, and the third player, the “interrogator” tries to guess which player is which. One player (say the man) tries to trick the interrogator into guessing the opposite sex (by pretending to be the woman), while the other is doing all she can to help the interrogator guess the correct sex. The interrogator’s questions and players’ answers are transmitted through a typewriter so the interrogator can’t guess the players’ sex with other clues, like voice tones.
Turing’s idea substituted the male witness with a computer. A machine is considered intelligent if it can hold a conversation with a judge, and the judge can’t tell if he/she is talking to a machine. If the judge can’t tell which conversation was with the computer, the computer “passes” the test.
So what’s so special about talking? As Turing put it:
“The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include.”
Basically, having a conversation just seems the most appropriate method because we can talk about literally…anything!
Furthermore, to converse beyond surface level, it’s necessary to know what you’re talking about. Understanding the words alone isn’t enough; you have to understand the topic as well, such as poetry, human feelings, electrical engineering, philosophy, etc.
In Discourse, Descartes said a machine that bore resemblance to our bodies and imitated human behavior can still be distinguishable from humans, since machines can’t give appropriately meaningful answer by producing different arrangements of words. He sounds quite confident that no such machine could pass the Turing Test. Descartes would’ve agreed with Turing that the Turing Test is a good qualification to test machine intelligence, since it seems like a pretty strict & fair standard for AIs to pass.
Although this argument sounds solid in theory, this simple test had much criticism. A machine can appear to be intelligent without actually having any “real intelligence”.
The “Chinese Room” argument vs. Strong AI
Philosopher John Searle published a thought-experiment known as the “Chinese Room” scenario in his paper in 1980. The thought-experiment goes like this: you’re placed in a room with boxes of Chinese symbols, which you have no prior knowledge on, but your’e given a rule book (in English) for matching Chinese symbols with other Chinese symbols. You then have to manipulate Chinese symbols from the boxes using the rule book to converse with a native Chinese speaker outside the room you’re in. For example, if the Chinese speaker asks you what your favorite color is, you can go through the rule book and give back symbols that says “My favorite color is blue”.
As John Searle wrote:
Now, the rule book is the “computer program.” The people who wrote it are “programmers,” and I am the “computer.” The baskets full of symbols are the “data base,” the small bunches that are handed in to me are “questions” and the bunches I then hand out are “answers.”
By passing the Turing test for understanding Chinese — despite your total ignorance of the language — this thought-experiment makes a point that digital computers merely manipulate formal symbols according to the rules in their program without actually understanding the symbols.
Searle’s Chinese Room scenario is a criticism for “Strong AI”, a view that suitably programmed computers can understand natural language and have other mental capabilities similar to humans. Simply manipulating symbols isn’t enough to guarantee perception, cognition, understanding, and thinking since symbols are purely syntactic. While syntax is a completely formal property of symbols, semantics are facts about what the symbols mean.
As digital computers only operate on syntax and syntax isn’t sufficient for semantics, running a computer program isn’t sufficient for semantics. Programming a digital computer may seem like the computer understands language but actually doesn’t produce real understanding, thus pointing out that the Turing Test is inadequate for computer intelligence.
Shane Legg: the Nature and Measurement of Intelligence
The Turing Test is just one of the many possible evidences for intelligence, and certainly not a perfect definition of intelligence. Shane Legg’s goal-based definition of intelligence states that intelligence measures an agent’s ability to achieve goals in a wide range of environments. After all, to successfully achieve our goals, we have to use our reasoning, problem solving skills, draw conclusions from our experiences and so on. Missing any of these mental abilities would make us less likely to successfully deal with wide range of environments.
Legg treats intelligence as an effect of these capacities instead of as a result of having these set of capacities. Machines may have capacity for knowledge, but under this definition of intelligence, they have to use knowledge effectively for various purposes.
Legg’s general definition of intelligence, however, raises another series of questions on external and internal goals; for example, if the external goal of living organism is to reproduce, would that mean that the organism that can produce the most offspring is the most intelligent being on Earth? Also, how can we tell a system’s internal goal? This working definition is just another example of the difficulty of developing a good definition of intelligence.
Turing’s projection of the future
Turing Test failed to provide a definition of intelligence, but can still be used as a good indicator for intelligence despite its various faults. Turing himself thought that by the year 2000, computers would be able to pass the Turing Test within five minutes of questioning. Sadly, our current AI systems are far from being able to pass a Turing test. As Legg puts it:
Simply restricting the domain of conversation in the Turing test to make the test easier, as is done in the Loebner competition (Loebner, 1990), is not sufficient.
The goal of AI research shouldn’t be to pass the Turing Test; to me, the Turing Test seems more like a thought-experiment to convince people during the early 20th century of the chances of intelligent machines achieving consciousness and intelligent behavior. But that’s just my opinion.