ARTIFICIAL INTELLIGENCE AND HUMAN FOOLISHNESS
The day I am typing these words, May 11 1997, is being trumpeted in the press as a big day in the history of computers. What's the occasion? For the first time ever, a computer has defeated the world chess champion, Garry Kasparov, in a standard five-game match. Deep Blue, a computer chess program developed at Carnegie-Mellon University (as "Deep Thought"), and perfected at IBM, is probably the best chess player in the world right now. The first two games of the match were split between Deep Blue and Kasparov; the second two were draws; the final game went to Deep Blue. And Kasparov is being an extremely sore loser.
Admittedly, this was only one match -- Kasparov might beat Deep Blue the next tournament around; or Deep Blue might fall to Anatoly Karpov or some other grandmaster. But the feeling in the chess world seems to be that this tournament was not a fluke. Previous versions of Deep Blue were already able to consistently beat all but the greatest chess grandmasters. And Deep Blue just keeps getting better and better, at a much higher rate than any human being. Its play can be vastly improved by simple hardware upgrades, whereas a brain upgrade for Kasparov would not appear to be in the works.
The importance of the occasion should not be overrated -- Deep Blue is not, by anybody's standards, a truly intelligent machine. The general notion of "intelligence" is not all that clear: theoretical psychologists do not have a generally accepted definition of intelligence; and I.Q. tests, though they clearly have something to do with intelligence in humans, are far too narrowly constrained to be applied to nonhuman systems. However, there is practical criterion of intelligence in the robotics and AI community, which may be phrased something like "Intelligence is the ability to make appropriate decisions in a variety of situations, in complex environments." And by this standard -- which is not far off from everyday, nontechnical common sense -- Deep Blue is definitely not all that intelligent. It can't do anything but play chess.
In fact, Deep Blue's internal operations are disappointingly simplistic. Unlike human chess players, it doesn't work on the basic of geometric intuition and a sense of the flow of a game; it works by enumeration of possibilities -- by taking the position of the board at a given time, and searching the set of possible future board position. Every chess player extrapolates: "What is the other player like to do next? And if he does that, what am I going to do? And if I do that what is he going to do?" But in humans, this kind of reasoning is augmented by all sorts of other processes. For Deep Blue, this kind of extrapolation is the whole story -- but it is a very good story. Computers can extrapolate an arbitrary distance into the future, and they can investigate many more possible moves than any human. The current version of Deep Blue can evaluate about two hundred million different board positions every second.
One way to understand the difference between Deep Blue and human players is think about strategy versus tactics. There is a certain kind of creative long-range strategy that human chess grandmasters seem to have, but Deep Blue lacks. But Deep Blue makes up for this lack by elevating tactics to such a high level that it assumes the role of strategy. Deep Blue is not entirely strategyless: it carries out its superhuman tactical evaluation within the context of a collection of pre-programmed strategies, and it is capable of switching between one strategy and another, in response to events. But it does not think strategically, it only thinks tactically. It doesn't make long-range plans, involving an understanding of the overall structure of the board as a dynamical system, changing over the course of the whole game. If it could do this, it would doubtless play even better. But even without strategic creativity, it seems to do well enough to beat the best humans.
The defeat of Kasparov by Deep Blue is symbolic because chess is the mentally hardest game commonly played in the Western world. Computers became better than humans at checkers and many other games quite some time ago. However, there is at least one popular game which still stumps the best computers -- the Oriental game Go. At the present time, in spite of a substantial research effort, no existing computer program can play better than the advanced beginner level at Go.
The game of Go is very simple, compared to chess. Play is on a 19x19 grid, and stones (pieces) are placed on the intersections of the grid, called points. The first player plays black stones, their opponent white ones; and stones are added to the board one-by-one, players alternating. Stones are not removed once they are placed, but stones and groups of stones may be captured by being surrounded orthogonally. A player calls "Atari" when a capture can occur on their next
move, to warn the opponent. The game ends when it is no longer possible to make a resonable move.
The winner is determined by the amount of territory surrounded--less the number of stones captured.
The trouble with Go from the computational perspective is that, from any given board
position in Go, there are hundreds of plausible next moves, rather than dozens as in chess. Extrapolation in Go will not get you as far as it does in chess. Thus, it would seem that, if computers are to conquer Go, they're going to have to either do it via true general intelligence, or use some cleverer special-case technique than the one employed for chess. Go is too visual, too two-dimensional, to succumb to purely combinatorial, non-visual techniques. A world champion Go program would have to be intelligent at general two-dimensional vision processing as well.
In Go, high level players routinely analyze positions that aren't confined tactically to, say, a 9x9 grid. Plus, almost any tactical fight has strategic implications across the board that could be worth more than the fight itself -- so a great pattern matcher wins the points, but loses the war. One style common in evenly-matched games is to go around the board "losing" fights, but in such a way that one's own stones work together, and become more powerful.
The computer Go programs in existence today rely heavily on pattern matching: taking a given, small chunk of the board, and matching it up to a dictionary of known board situations. The best ones are as good at reading small, enclosed life/death problems as a mediocre tournament Go player. But when the problems are not so rigidly enclosed within a small region of the board, the programs have no hope, although intuitive human players can still can see the essential principles. The best way to wipe out such programs is to embroil them in a huge whole board fight, one which is too big for the algorithm to match properly.
Basically, in Go, the Deep Blue type strategy of elevating tactics to the level of strategy doesn't work so well. Exhaustive search over spaces of two-dimensional patterns is much, much harder than the kind of decision tree search required for dealing with chess, and will be out of reach of computers for a good long time. One suspects that something less than true intelligence will suffice for chess, as it has for Go -- but not something as much less than true intelligence as Deep Blue is.
Some chess players are frustrated with Kasparov's defeat at the "hands" of Deep Blue, on the grounds that Kasparov was extremely sloppy in the final game. He played a defense (Caro-Kann) that he admitted afterwards wasn't his forte, and, in their view, gave away the game with a move that conventional chess wisdom deems unwise (move 8, where he let the computer grab his e-pawn). It is argued, on these grounds, that he lost because he let his emotions carry him away. In past games, his unusual and unpredictable moves had thrown the computer off, putting it in situations where its memory of previous board positions did not apply. This time, he was perhaps feeling a little desperate, and so he moved a little too far into the domain of the unpredictably self-defeating.
This complaint is interesting in that it gives rise to the question of human emotion, as a part of human intelligence. Is emotion bad for human competence, or good? In this case, emotion may have worked against Kasparov, though how much so is difficult to say. Who can blame him for getting a little stressed -- after all, in the eyes of the media, he was playing not only for himself or his country, but for the whole human race! But yet, it's hard to pooh-pooh human emotion, even from a purely performance-oriented perspective, because of the extent to which emotion and cognition are bound up together in the human mind. One can argue convincingly that, in intelligent systems operating on the basis of generalized intuition rather than specialized search and pattern-matching, intuition is necessarily coupled with emotion.
To explore the question of computational emotion, it pays to adopt some particular perspective on what emotion is in the first place. Toward this end, George Mandler, a cognitive psychologist, has proposed that we think of emotion as composed of two different aspects, called hot and cold. The hot aspect is essentially the consciousness attached to emotion, which is tied into the mind/body link. The cold aspect of emotion is the abstract, mathematical structure of emotion, which seems, on careful psychological analysis, to be closely related to the "failure of expectations to be fulfilled." Emotions happen, in this view, when awareness is attached to some unfulfilled expectation.
At first glance this analysis of emotion may seem to apply only to negative emotions. On more careful consideration though, it becomes clear that positive emotions too can be understood in the same way. Wonder is the unfulfillment of the expectation of familiarity. Happiness is, in large part, a feeling of increasing unity, a feeling of more inner unity than expected -- or in other words, an unfulfillment of the expectation of inner disparity and disharmony. Happiness is also a feeling of decreasing sadness; whereas sadness is a feeling of decreasing happiness -- each feeling arises largely as a consequence of unfulfillment of the expectation of its opposite. Love is happiness extending outside the body -- it is the feeling of increasing unity between self and other, the unfulfillment of the expectation of separateness. The point is not to "reduce" emotions to formulas, but rather to identify the structures underlying the feelings of various emotions. The structure does not make the experienced emotion, but it structures the experienced emotion.
Obviously, computer programs experience unfulfilled expectations -- they have, in the Mandler picture, the cold aspect of emotion. But do they have the hot aspect -- do they become "aware" of these potentially emotional experiences? This question can be broken down into two smaller questions -- first, do computers (Deep Blue or more intuitive ones) have the subjective experience of awareness? and second, do computers respond to unfulfilled expectation using the cognitive structures that are, in humans, generally associated with the subjective experience of awareness?
As should be plain from earlier chapters, I think that the first of these is uninteresting, as I am an animist, believing that all elementary particles have awareness. Everything is aware, not just human brains. The second of these questions, on the other hand is interesting -- and in the case of Deep Blue, the answer is no. This is yet another way of phrasing what makes Deep Blue different from a truly intelligent computer system. When its expectations are frustrated, it does not respond in the way an intelligent system often does: by leaping to a more general level of abstraction, and considering the assumptions that led to the expectations that were in fact frustrated. It does not have emotion driven learning in this sense -- although it does, of course, recalibrate its parameters based on experience; this recalibration is of a simpler kind, not involving an exploratory search through its own associative memory of past events.
So, although emotion can be harmful to human cognition at times -- as, perhaps, in Kasparov's final game against Deep Blue today -- it is, all in all, inseparable from intuitive human thought. Emotions trigger our thoughts and even structure them. Emotion, in itself, may not be valuable -- but the point is that it makes no sense, in a human or other intuition-driven system, to take emotion in itself, apart from cognition. Deep Blue demonstrates that emotion-driven intuition is not necessary to excel at the game of chess. But as Max Miller, an e-mail acquaintance of mine, put it, "All this proves is what I've been saying for years: that chess is a stupid game."
Deep Blue vs. Kasparov, 1997, is a unique moment in history -- but it is also a moment resonant with meaning; a moment that captures the essence of history of artificial intelligence, to date, in many respects. Game-playing is by no means the only particular skill at which computers have been made to excel. They are much better at some kinds of mathematical calculation than humans are -- and I don't mean just arithmetic, also many kinds of symbolic algebra. They can diagnose diseases from symptoms and predict financial markets considerably better than any human. They are reasonably decent -- though inferior to humans -- at a great number of other tasks, from straightening out erratically formatted database entries to recognizing typed and even handwritten characters. Computers can guide spaceships, and can even drive cars! Most of the practical triumphs of AI, however, have been just as specialized in their design as in their domain of achievement. No one has built a general-purpose intelligent system, and applied it to a variety of tasks. The best they have done is to describe a few principles for intelligent system design, and use these principles to engineer a variety of systems tailored for particular domains. Today's most successful "artificially intelligent" systems are really artificial idiot savants: good at one thing, but with no generalized, intuitive understanding of our universe, or even the digital universe within which they live.
Some researchers, eschewing the pursuit of specialized expertise, have sought to construct systems of general intelligence -- but these researchers have, so far, failed miserably. The reasons for this failure are many, ranging from a lack of understanding of the fundamental structures and dynamics underlying thought, to a simple lack of computational memory and processing power. But the result is that the discipline of "artificial intelligence," so far, has done rather little to deserve the name. The history of artificial intelligence is really a history of human ambition and human foolishness. Since my own ambition is to use the Internet to finally realize true artificial intelligence, this is a history to which I have paid a great deal of attention. Internet AI, I have concluded, does not replicate the errors of the past -- it, rather, is a third path, different from the sterile attempts at general intelligence and the successful, Deep Blue-ish forays into artificial idiot savanthood that have marked the history of AI so far. We are onto something new -- or, rather, something very old and very natural: intelligence achieved through massive scale and directed yet spontaneous self-organizing evolution, rather than through mathematical or programming prowess.
Big Brags and Small Results Put a little differently, we may say that the history of AI to date is a history of interesting computer experiments and the often foolish general conclusions that have been drawn from them. Most often these conclusions have not been foolish in the sense of stupidity -- AI has consistently attracted brilliant people. Rather, they have been foolish by way of over-optimism. True AI has always been, in the eyes of AI theorists, "just one small step" away. Just one small step from a program that solves some particular, rather specialized task -- to intelligence as a whole.
Putting aside the acuity of hindsight, the grounds for this over-optimism on the part of AI researchers are not difficult to see. Consider: it takes us 10,000,000,000,000 or more neurons to add up 1289792354 and 2349879234, and we still take a long time to do it, and we are prone to make mistakes. A computer or a pocket calculator can do it much faster and more reliably, with vastly fewer processors. Maybe, early AI researchers were led to believe, the brain just isn't such an efficient machine. Computers can add and find logarithms more easily than us -- maybe, just maybe, they can do everything else we do more easily too!
Some of the brags of the early AI pioneers have achieved legendary status. For instance, Marvin Minsky, well known as one of the founders of AI, wrote in 1967 that "in a generation ... few compartments of intellect will remain outside the machine's realm -- the problem of creating 'artificial intelligence' will be solved."
Yet more dramatically, an article in the 1963 Chicago Tribune described Frank Rosenblatt's Perceptron algorithm as follows: "The development of a machine that can listen to any conversation and type out the remarks just like an office secretary was announced yesterday by a Cornell University expert on learning machines. The device is expected to be in operation by fall."
Now, Minsky is a tremendous computer scientist -- not a theorist on the order of Turing, to be sure, but an outstanding practical achiever who would surely have won a Nobel Prize in computing if such a thing existed. His ideas have sparked advances in software worth many tens of millions of dollars. But today, the problem of artificial intelligence is about as well solved as the problem of world peace. Here we are, members of the "next generation" of which Minsky spoke in 1967 -- the authors of this book were both born in the 1960's -- still plugging away at the same old questions, typing away on disappointingly stupid computers.
And the Perceptron was a true AI breakthrough: it was the first time a McCullough-and-Pitts style neural network was used to solve practical problems. But we're still waiting on the error-free secretary machine -- though we do have robots capable of making coffee ... and, of course, machines that augment and enhance the skills of a human secretary.
These are not isolated examples; they are reasonably typical of published reports on AI research throughout the 1960's. They exemplify the attitude of the rank-and-file AI researchers of that era. But unfortunately, this optimism proved to be misplaced. Time after time, preliminary results failed to generalize, and researchers were stuck with amateurish programs, or programs successful only in limited domains.
The juxtaposition of Rosenblatt and Minsky here is an ironic one, in that Rosenblatt pushed an approach to AI based on neural networks, which came out of the McCullough and Pitts work discussed earlier; whereas, in the early 1970's, Minsky played a large role in driving the neural network approach underground. Minsky and others, as I will review in more detail below, argued that neural network AI a la Rosenblatt promised much more than it delivered, or was capable of delivering. Their point hit home with funding sources and the general computing community. But the "symbolic AI" that Minsky pushed as a replacement turned out to have exactly the same problem: it advertised a great deal, and delivered little.
What happened? Why did both approaches to AI fail? The story is complicated, but a huge part of it is that the AI community overestimated the ease of automating generalization. They ignored the need for vastness, for the evolution and interaction of large populations of complex forms, interconnected in dynamic networks. They produced programs which were good at solving problem in very limited contexts -- e.g. good at playing chess, or good at recognizing letters, or good at doing calculus problems. In this way, they thought, they were constructing intelligent algorithms, which would then be able to turn their intelligence to other problems. Teaching a person chess or calculus improves their general powers of thought; why shouldn't the same be true of a computer? But in fact these classic AI programs were idiot savants of the most extreme possible form -- programs with more specialized savvy and more general idiocy than any human being. The programs worked because they embodied rules for dealing with specific situations ... but they never achieved the ability to come into a new situation cold and infer the appropriate rules. Rather than artificial intelligence, what occurred was a perverse form of artificial stupidity. Teaching a person chess enters new forms into their self-organizing network of forms -- new forms which link up with other forms, which interact, mutate, evolve. Teaching a computer chess does exactly one thing: it teaches the computer chess. This would not be true if the computer were programmed with the self-organizing, evolutionary structures of mind -- but no computer has yet been programmed thus, and surprisingly few AI researchers have seen the need.
The history of AI, then, represents a gradual recognition of impotence. Over the years, as more and more grandiose prophecies proved wrong, the nature of AI research gradually changed toward the more conservative. All but a few out-of-the-mainstream researchers abandoned the problem of designing thinking machines for related technical problems. What had once been a thrilling intellectual and engineering endeavor, was now just another branch of computer science. Today, there is an excellent academic journal called Artificial Intelligence -- it is chock full of top quality research, but it contains precious few articles directly related to intelligence and its mechanization. Rather, it presents mostly technical results on decision theory, nonmonotonic logic, expert system design, and other formal spin-offs of the problem of mechanizing thought.
Admittedly, it is all too easy to analyze the reasons for AI's failure in retrospect. But even during the heyday of AI, in the early 1970's, there were a few voices of reason, resisting the tide of psychologically narrow-minded overoptimism. Most notable among these was Hubert Dreyfus, a philosopher whose 1972 tract What Computers Can't Do preached the importance of body-centered learning, and the close connection between logic, emotion and intuition. Without a body, Dreyfus argued, without feelings, there can be no real generalization of special-case ideas. Based on these philosophical considerations, he predicted AI to be a failure. AI researchers laughed in Dreyfus's face, or worse. Twenty years later, though, in 1992, Dreyfus rereleased the book with the title What Computers Still Can't Do. The new Introduction brims over with he-who-laughs-last-laughs-best insolence. Everything Dreyfus said about the AI programs of the 60's and 70's turned out to be correct.
Dreyfus's critique of AI, in the first edition, comes off a little too strong: he appears to believe that detailed simulation of the brain is the only possible path to AI. But his arguments pose a serious challenge to AI theorists: how to design a machine that incorporates body-based, emotion-based conceptual generalization? I believe that Dreyfus is essentially correct that, if this is impossible, AI just ain't gonna work. A physical body just like ours is not required: an AI entity could have a virtual body, enabling it to interact in a rich and autonomous way with a virtual world. And emotions need not be controlled by biological neurotransmitters, they can come out of complex digital dynamics. But the point is, unless one has a computing system that is large, complex and autonomous, with integrated sensory, memory and action systems interacting with a rich environment and forming a self system, it is foolish to even speculate about one's system developing the ability to generalize from one domain to another. Generalization ability is learned through general experience, and general experience is gained by exploring a world.
Nets versus Symbols
After the theme of failure, the most significant theme in the history of AI technology has been the oscillation between brain-inspired software and mind-inspired software. The earliest AI was inspired by McCullough and Pitts style neural networks, and involved simulating the brain -- or, rather, small portions of the brain, consisting of at most thousands of neurons, compared to the hundreds of billions in the brain. When this symbolic approach also failed to yield dramatic results (in the late 60's and early 70's), the AI community moved wholesale toward a more "symbolic" approach, focussed on deductive logic. Then, when this approach was perceived to have failed as well (in the 1980's), the AI community moved back towards neural networks again.
This is an interesting dialectic, well worth exploring for a few pages, if only because Internet AI can be seen as a resolution of the two opposing poles, incorporating the best aspects of brain-based AI with the best aspects of mind-based AI, into an efficient, self-organizing overarching framework. To an extent, it would seem that the clash between networks and symbols, brain-orientation and mind-orientation, has been overblown. But nevertheless, it is there in our history, and it is there in the modern AI community as well. It is a reflection in computer science of the philosophical dispute between those who believe the mind is in the world, and those who believe the mind is separate from the world, above and beyond it. Of course, the truth is somewhere inbetween: as a system of patterns emergent from the brain, that, while the mind is separate from the brain, it cannot be effectively programmed in systems that are totally non-brainlike
Both approaches only give you part of the mind; both have been over-hyped and both have fallen short of the goal. In a few decades, once AI has been achieved, the networks-versus-symbols debate will seem no more important than Scholastic debates over angels dancing on the heads of pins. It will be considered obvious that symbolic reasoning is a consequence of the activity of certain large, complicated neural networks -- something that cannot be realized in tiny toy networks of thousands of neurons, but yet is by no means contradictory to the spirit of network dynamics. Small, experimental neural networks and disembodied symbolic reasoning programs will be viewed as early faltering steps, as educational, historical examples of artificial idiot savanthood.
A good example of neural network based artificial idiot-savanthood is Rosenblatt's original Perceptron algorithm. The Perceptron is particularly interesting because it was a precursor to the "backpropagation" neural designs that are so popular today.
Let's say you want a neural network that will recognize human faces. McCullough and Pitts showed that it can be done, if you choose the right pattern of connectivity. But what is the right pattern? Frank Rosenblatt's idea was to look at hierarchical connectivity patterns. This was not only a clever computational trick, but also a small move toward biological realism: much of the brain appears to be organized in an hierarchical way, particularly those portions of the brain dealing with vision processing.
Rosenblatt designed a multilayered hierarchy of artificial neurons -- not merely digital neurons, but physical simulated neurons, connected with real wires, and which lit up to indicate their state. Each neuron was randomly connected to some neurons in the level below it, and some neurons in the level above it. Charge traveled up the network, from the bottom to the top. Learning was a matter of adjusting the weights of the connections: Rosenblatt randomly varied the number of connections until the network responded to each input with the "correct" output.
He managed to train the network to recognize letters with fair accuracy. The lowest layer of the network took in a "pixillated" version of a letter. The highest layer consisted of output codes: neuron 1 was supposed to fire for an "A"," neuron 2 for a "B", and so forth. By fiddling with the weights, Rosenblatt was able to get the network to recognize all the letters.
But this was only intended as a beginning. The idea was that, by making bigger networks with more layers of connections, one could program a Perceptron to recognize not only typed letters but handwritten letters, or handwritten words, or maybe even to read whole sentences. Why not? Once the ability to learn was there, it just had to be scaled up to more and more complex tasks, right?
But alas, it didn't work out that way. The Perceptron was stuck at the level of fairly accurate recognition of typed characters. Marvin Minsky and many other AI researchers blamed this on its brain-simulating architecture. The way toward useful AI programs, they claimed, was not brain simulation but high-level reasoning, applied logic. Why not just skip all the murky chemistry and go straight toward the phenomena of interest: language, mathematics, science, conceptual problem-solving.
Marvin Minsky was something of a prodigy -- as an undergraduate at MIT, he was already conducting research in psychology, engineering and computer science. He experimented with neural network modeling as early as the 1950's -- using a physical simulated neural network made of vacuum tubes, and trying to coax it into some kind of interesting collective behavior. But the project had its difficulties, and Minsky had other fish to fry.... By the sixties, he was completely disenchanted with neural network algorithms. Together with his colleague Seymour Papert, he wrote a book entitled Perceptrons, intended to debunk Rosenblatt's approach once and for all.
The book was full of mathematical results regarding the limitations of two-layer Perceptrons -- Perceptrons with no "hidden layers" intervening between the inputs and the outputs. Such Perceptrons, they showed, could never learn basic operations of logic, and could never perform simple perceptual tasks such as telling a connected figure from a disconnected one....
Of course, limitations on two-layer Perceptrons need not apply to three-, four-, or ten-thousand-layer Perceptrons. But Minsky and Papert's book effectively sealed the doom of the neural network research programme. Instead of emphasizing the immense untapped potential of multilayer Perceptrons and more complex neural networks, they focused the attention of the AI community on the shortcomings of the simplest possible neural networks.
A few mavericks could still be seen waving the banner of Rosenblatt, McCullough and Pitts, and pushing on toward greater computational efficiency and biological realism. For instance, Leon Cooper, a Nobel Prize-winning physicist, constructed a neural network model of a kitten's visual cortex, seeking to mimic the process by which an infant cat's brain learns to see. And the Finnish enginner Teuvo Kohonen developed a powerful new associative memory architecture -- a novel brain-inspired memory design in which "similar" memories are stored near each other. But, in their ideosyncratic isolation, these researchers were the "exceptions that proved the rule." It would be twenty years before AI journals and funding sources once again came to embrace the Perceptron programme.
And while neural network research were driven underground, Minsky and other mainstream AI types were pushing on with a different point of view: rule-based AI. In rule-based AI programs, reasoning was done not by self-organizing networks of autonomous elements, but rather by systems of simple logical rules. Intelligence was reduced to following orders.
Minsky developed programs for solving calculus problems and "understanding" the English language; and he developed a theory of "frames" for representing complex real-world knowledge. And even more ambitiously, Alan Newell and Herbert Simon developed a program called GPS, which was supposed to imitate the way humans solve logic puzzles. The title of their paper is one of the biggest and emptiest brags of all time: "General Problem Solver: A Program that Simulates Human Thought."
Needless to say, GPS was a bust. It could solve simple problems like the Tower of Hanoi puzzle, and "cryptarithmetic" puzzles like
DONALD + GERALD = ROBERT
But in the overall scheme of intelligence, solving problems such as these is not all that different from computing logarithms or solving differential equations. A simple mathematical strategy suffices; no real learning need occur.
Similarly unimpressive was Simon's "computer-scientist" program, BACON. This program was inspired by Sir Francis Bacon, who viewed science as a matter of recognizing patterns in tables of numerical data. But Bacon never adequately realized the psychological ramifications of the task of pattern recognition; and BACON fell into the same trap, albeit more embarrassingly.
Consider an example of BACON's "reasoning." The "ideal gas law" from thermodynamics states that
pV/nT = 8.32
where p is the pressure of the gas, V is the volume of the gas, T is the temperature in degrees Kelvin, and n is the quantity of the gas in moles. In practice, this relation cannot be expected to hold exactly, but for most real gasses it is a very good approximation.
Given an appropriate table of numbers, BACON was able to induce this law, using rules such as:
Very interesting, indeed. But how terribly far this is from what real scientists do! Most of the work of science is in determining what kind of data to collect, and figuring out creative experiments to obtain the data. Once a reliable set of data is there, finding the patterns is usually the easiest part. Often the pattern is guessed on the basis of terribly incomplete data -- and this intuitive guess is then used to guide the search for more complete data. But BACON is absolutely incapable of making an intuitive guess.
Simon has claimed that a four-to-five hour run of BACON corresponds to "not more than one human scientific lifetime." Douglas Hofstadter, in his book Metamagical Themas, has sarcastically expressed his agreement with this: one run of BACON, he suggests, corresponds to about one second of a human scientist's life work. We suggest that Hofstadter's estimate, though perhaps a little skimpy, is much closer to the mark. Only a very small percentage of scientific work is composed of BACON-style data crunching.
One should not deny symbolic, rule-based AI its successes. But the point is that, without exception, these successes have resulted from specialized tricks, rather than flexible intelligence. For instance, rule-based computer chess programs have achieved an incredibly high level of play -- they can beat all but the grandest grandmasters. However, the methodology of these computer chess whizzes really cannot be called intelligent. It is based on searching out millions and millions of possibilities, guided by certain general heuristics. Humans play chess by intuition, by recognizing general patterns in positions ... not by running through a list of every plausible future outcome.
Similarly, symbolic AI programs have excelled in medical diagnosis, and in financial prediction. But that is because each of these fields consists of explicit rules that can be found in books. A rule-based AI program will never recognize a new disease ... it only looks for diseases using "IF-THEN" rules fed into it by a programmer. A rule-based finance program will never discover a new trading strategy, it will only follow the rules programmed into it -- quickly, infallibly and precisely.
The main problem with GPS, BACON, and all the other rule-based AI programs can be summed up in one word: brittleness. Or, to put it another way, remarkable literal-mindedness. Just like WordPerfect, DOS 6.0, or my fifteen dollar K-Mart calculator, they do what they're told, and not one whit more. If they're programmed to deal with one context, then that's what they'll deal with; not in a million years will they generalize their knowledge.
For instance, there was a celebrated knowledge representation program that contained logical definitions of everyday words. An "arch" was defined as "Three blocks, A, B and C, so that C is supported by A and B, and A and B do not touch." This is all very well for playing with blocks -- but what will the program do when it gets to Arches National Park in Utah ... or builds arches out of modelling clay?
On the other hand, show a clever three-year old human an arch made of blocks, and she'll immediately recognize a rock arch as a member of the "arch" category. It won't occur to her that a rock arch can't be naturally decomposed into three blocks A, B and C. Children, unlike expensive research computers, are anything but brittle -- even their bones are flexible!
One approach to this problem has been to try to provide a computer with so much knowledge that the brittleness problem is overcome. The most ambitious project in this direction is Doug Lenat's CYC project, which ran for 10 years, from 1984 to 1994. The whole ten years were spent encoding data, so that the program would know everything an eight-year-old kid knows. But the trouble is, of course, that you can't encode our intuitions about ourselves and the world in a database of information small enough to fit on a CD-ROM. We're not even explicitly aware of the information we use to make sense in the world. CYC tries to divorce information from learning, but it can't be done. A mind can only make intelligent use of information that it has figured out for itself.
In fairness to Doug Lenat, it must be noted that he is indeed proceeding from a computational-psychology perspective. He has a reasonably solid theory of general heuristics -- problem-solving rules that are abstract enough to apply to any context whatsoever. His pre-CYC programs AM and EURISKO applied these general heuristics to mathematics and science respectively. Both of these programs were moderately successful, exemplars in their field, but far from true intelligence. The trouble is, though, a lack of an holistic view of the mind. Getting the problem-solving rules right means virtually nothing, because problem-solving rules gain their psychological meaning from their interaction with other parts of the mind. If the other parts aren't even there, the problem solving is bound to be sterile.
For example, EURISKO won a naval fleet design contest two years in a row, until the rules were changed to prohibit computer programs from entering. And it also received a patent for designing a three-dimensional semiconductor junction. But when looked at carefully, even EURISKO's triumphs appear simplistic and mechanical. Consider EURISKO's most impressive achievement, the 3-D semiconductor junction. The novelty here is that the two logic functions
"Not both A and B"
and
"A or B"
are both done by the same junction, the same device. One could build a 3-D computer by appropriately arranging a bunch of these junctions in a cube.
But how did EURISKO make this invention? The crucial step was to apply the following general-purpose heuristic: "When you have a structure which depends on two different things, X and Y, try making X and Y the same thing." The discovery, albeit an interesting one, came right out of the heuristic. This is a far cry from the systematic intuition of a talented human inventor, which synthesizes dozens of different heuristics in a complex, situation-appropriate way.
By way of contrast, think about the Croatian inventor Nikola Tesla, probably the greatest inventor in recent history, who developed a collection of highly ideosyncratic thought processes for analyzing electricity. These led him to an steady stream of brilliant inventions, from alternating current to radio to robotic control. But not one of his inventions can be traced to a single "rule" or "heuristic." Each stemmed from far more subtle intuitive processes, such as the visualization of magnetic field lines, and the physical metaphor of electricity as a fluid. And each involved the simultaneous conception of many interdependent components.
EURISKO may have good general-purpose heuristics, but what it lacks is the ability to create its own specific-context heuristics based on everyday life experience. And this is precisely because it has no everyday life experience: no experience of human life, and no autonomously-discovered, body-centered digital life either. It has no experience with fluids, so it will never decide that electricity is like a fluid. It has never played with blocks or repaired a bicycle or prepared an elaborate meal, nor has it experienced anything analogous in its digital realm ... so it has no experience with building complex structures out of multiple interlocking parts, and it will never understand what is involved in this. EURISKO pushes the envelope of rule-based AI; it is just about as flexible as a rule-based program can ever get. But it is not flexible enough. In order to get programs capable of context-dependent learning, it seems to be necessary to write programs which self-organize -- if not exactly as the brain does, then at least as drastically as the brain does.
And here we get back, finally, to the program with which we started off the chapter: IBM's computer chess whiz, Deep Blue. Deep Blue is a rule-based AI program which, in its extreme simple-mindedness, does not even do justice to rule-based AI. It is above all an achievement of computer hardware, not intelligent software. It relies on a special computer chip, custom-designed for searching many moves ahead in chess games. This special chip could be modified without much hassle to apply to other similar games -- checkers, maybe Othello, etc. The ideas of the massively parallel design and RS 6000 platform are going to be generalized by IBM into drug design tools, weather forecasting and other applications. But this architecture could not be modified to apply to Go -- let alone to apply to any real-world situation. Deep Blue's chip is less like the human brain than like a human muscle: a mechanism designed specifically for a single purpose, and carrying out this purpose with admirable but inflexible precision. Its rules are astoundingly simple and mechanical: evaluate the quality of a move in terms of the quality of the board positions to which this move is, based on my experience, likely to lead. Judgements based on experience are made, not by complex intuitive analogy, but by simple pattern-matching. Everything is cut-and-dried, and done two hundred million times a second. This is outstanding engineering, it is not intelligence -- but yet, its results are vastly more impressive than those obtained when rule-based AI does try to do intelligence, as in programs like GPO, BACON and EURISKO.
So, symbolic AI has led to a number of exciting and useful programs. But in the end, judged against its aspirations, it has probably been the most definitive failure in the brief history of computer science. And in the wake of this failure, in the 1980's and early 90's, AI researchers have returned in droves to neural network research. The motivations for this return are not hard to see. First of all, rule-based AI didn't work, and plausible paradigms for AI research aren't exactly a dime a dozen. Secondly, recent advances in neuroscience make the view of the brain as a complex self-organizing system a lot more palatable. And thirdly, computer hardware has now developed to the stage where it is easy to simulate larger neural networks (though these networks are still miniscule compared to the neural networks in the brain).
An amusing commentary on this historical development is provided by an encounter I had last year, in early 1996, with Hubert Dreyfus, the author of the anti-AI tract What Computers Can't Do mentioned above. Dreyfus came to the University of Western Australia, where I was working at the time, to give a lecture on the philosophy of AI; I heard his talk and had a few drinks with him afterwards. He told me was extremely pleased that the new edition of his book had been released by MIT Press, which had refused to publish the first edition (because of MIT's commitment, as an institution, to AI research: Minsky, and many other AI researchers who Dreyfus criticized, worked at MIT). And, to my surprise, he was newly optimistic about AI -- not about the type of AI that had been popular in the 70's or 80's, but about more recent developments, in particular about neural network AI, as pursued in the 40's, 50's and 60's and then largely ignored for two decades. What he liked about neural networks was the fact that their intelligence was implicit: even after you had trained a network to do something, you couldn't say precisely how the network had done it. This implicitness, he felt, captured something of the tacit and nebulous nature of human understanding.
I asked Dreyfus why, if he felt neural nets offered a possible path to AI, he had not said so in the first edition of his book, back in 1972. His answer was that, quite simply, he had not understood what neural nets were all about at that time. This reply impressed me with its directness and honesty, and also drove home in my mind the importance of relying on one's own intuition in scientific matters. The leaders in the field may be off in the wrong direction; their leading critics may be just as ignorant, though in different ways; and crucial clues may be found in the ignored writings of scientists from past decades. One of the lessons of the history of AI is that science, for all its objectivity, is fundamentally a human endeavor, and quite susceptible to the ordinary human weaknesses.
Backpropagation and OCR To see the similarity between neural network and symbolic, rule-based AI in practical applications, and the fundamental inadequacy of both for dealing with real-world intelligence, let us consider a particular application: optical character recognition, commonly called OCR. This is the problem of looking at a typewritten character, and deciding what letter it represents. The input to the AI program, in this context, is an array of pixels, representing a letter, and the output is a signal representing an element of some given alphabet, e.g one of the 26 English letters, or one of the characters on the typewriter keyboard.
On the neural network side, one technique that is commonly used for OCR is the backpropagation algorithm, which represents a joining of neural networks with the branch of mathematics called optimization theory. Backpropagation was invented by Widrow and Hoff over 30 years ago, but its computational power was unnoticed and the algorithm was rediscovered by others in the 1980's. Before turning to OCR, a brief and informal review of backpropagation may be useful.
Many branches of mathematics and applied science involve optimization problems of one kind or another... problems of the form "Find the minimum or maximum value of this quantity." Intuitively, this sort of problem may be visualized in terms of a landscape. If one is trying to minimize something, then one is looking for the lowest valley in the landscape. But the problem is that, when one is down at the bottom of a certain valley, one has no way of knowing what lies over the next hill. It's very easy to get stuck on "local minima" -- valley which look lower than anything else around, but are actually much higher than the overall, "global" minimum.
The oldest and most straightforward technique for solving optimization problems is the method of steepest descent. To illustrate this algorithm, imagine you're stuck on the side of a mountain in a heavy fog. You can't see more than three feet in front of you, but you desperately need to get back to the lodge, which is at the bottom of the valley. You have absolutely no idea what direction the lodge is in, nor does the terrain give any clues. What will you do? If you have any sense, you'll pick the direction in which the descent of the mountainside seems to be steepest, and walk that way a while. Then maybe you'll stop again to get your bearings, and do the same thing: pick the direction of steepest descent, and keep on going a while. Et cetera. This is exactly the method of steepest descent.
If there are no local minima -- no valleys other than the one with the lodge in it -- then the method of steepest descent works just fine. Otherwise it needs to be replaced with something trickier. In the case of being lost in the fog, the best long-term strategy is "If you at first you don't succeed, try, try again." Keep going down, then if you get to a false valley, go up again till you reach the top of the hill, and start going down again.... This is a viable mathematical technique which was explored by the mathematician Gabor Szego in the 1970's ... it is reliable but has the disadvantage of being very slow.
Another possible strategy is to randomize your search. At first, wander around largely at random, with a general tendency toward lower directions ... then, the longer you walk, decrease the amount of randomness and increase the tendency toward lower directions. This might seem a psychotic way to do things, but it makes sense if you have reason to believe you might be stuck in a local valley. If you're know you're stuck in a local valley then you know you don't want to go down ... but you just suspect you're stuck in a local valley, then you have no way to tell whether going up or going down is a better strategy. Random motion is a sort of compromise behavior. This is the algorithm called simulated annealing it is a fairly good model of the cooling-down of metal from a very high temperature. A metal seeks the minimum-energy configuration, and temperature-induced randomness keeps it from settling into a false minimum of energy, keeps it moving along toward the real thing.
The core idea of the backpropagation algorithm is to use the steepest descent method to train a neural network -- to determine the network's weights. This might seem to be a silly idea ... after all, we have just said that steepest descent is generally a lousy optimization strategy. But, as the saying goes, nothing succeeds like success. Determining the weights of a neural network based on steepest descent seems to work much better.
Backpropagation is, in its detailed operation, a completely unbiological method. It involves a very formal and rigid exploitation of the Perceptron's hierarchical structure: it determines the weights of level 1 based on the weights of level 2, the weights of level 2 based on the weights of level 3, and so forth.... This is the "backwards propagation" that gives the algorithm its name, and it is a dynamic that has nothing whatsoever to do with the modification of synaptic connections in the brain. But be this as it may, the algorithm is easy enough to program into a computer simulation. It may be that, in many cases, backpropagation arrives at the same results as more biological learning methods, albeit by a somewhat different route.
In the context of OCR, the merits of backpropagation and the limitations of single-layer perceptrons as used in the 1960's are very easy to see. OCR researchers conceptualize their algorithms in terms of letter space -- an abstract multidimensional space in which each letter is represented as a single point. A single layer perceptron divides letter space linearly, which is not sufficient ... the relations between different letters are too complex for that. A multilayer perceptron, on the other hand, divides letter space according to piecewise linear boundaries, boundaries made of a lot of little lines. This means that, to within any degree of approximation, multilayer perceptrons can recognize any possible boundary between letters in letter space. One only needs a sufficient number of neurons, arrayed in a sufficient number of levels. The problem is, how big this "sufficient number" really is.
On the other hand, how do symbolic, rule-based OCR systems work? Some early systems attempted to specify logical rules defining each letter. E.g., "An 'A' is anything with two diagonal lines joining at the top, and a horizontal crossbar in the middle" .. "An 'I' is a vertical line with a horizontal crossbar at the bottom, and a horizontal crossbar at the top." This type of approach never led to any successful practical systems, however, and had to be abandoned. Above all, it could not deal with the diversity of different fonts -- not crazy fonts as used in advertising logis, but merely standard typesetting fonts like Times-Roman, Courier and Helvetica.
The successful rule-based systems work rather differently than this. They use prototypes: the letter 'A' is represented by a collection of prototypical 'A' 's, and the probability of a given configuration of pixels being an A is then determined by how close the configuration of pixels is to the 'A' prototypes. If it is closer, on average, to the 'A' prototypes than to the 'B' ones, the 'C' ones, etc., then it is judged an A. This approach stays within the rule-based framework, but is somewhat disappointing in that it has no place for reason or logic. In essence, what it comes down to is that, instead of dividing up letter space with line segments, this approach divides up letter space with circles. The collection of all inputs which match "A" to a given degree is represented as a circle in letter space, with its center at "A"
Of course, by intersecting circles in complex ways, one can approximate any shape one wishes -- just as one can do with line segments. And so, in the end, there is no real reason to prefer line segments over circles, or vice versa. The trick is in getting the right collection of circles, or lines -- dividing up letter space in the appropriate way. In rule-based AI the programmer typically has to determine this division, and explicitly encode it in the OCR program. Neural network OCR is based on the premise that the program can discover the division for itself.
But as it turns out, in practice, a standard-issue hierarchical neural network won't do a good job of OCR, not even with clean typewritten characters and finely-tuned backpropagation. Success might be had, with sufficient traning, but that would take far too long. In order to achieve good results, one needs to modularize -- to construct a network divided into subnetworks, one for each type of letter. Then these subnetworks can be constructed by backpropagation, and their results can be combined by other subnetworks, also constructed by backpropagation. This is cheating, in a way -- because it is the programmer who determines how to break up the network into subnetworks. But the subnetworks themselves are still doing a lot of learning.
I remember, a few years ago, reading a research paper by an engineer named Sabourin and his colleagues, in which they described their experiments using a backpropagation network to recognize typewritten characters. Their system gave around 97% accuracy -- just barely better than contemporaneous programs written with rule-based techniques. Sabourin's network consists of 20 subnetworks, each one specialized to deal with some easily recognized feature. For instance, the subnetwork marked "Descender, Genus 1" contains 161 neurons, 6106 weights, and 97 different threshold values; it recognizes the letters and letter combinations g, p, q, ng, rg and yp, with about 99.6% accuracy. This is somewhat more accurate than the network as a whole, reflecting the fact that these letters and combinations are relatively easy to recognize.
One can train a single network to recognize a given feature -- say a descender, or an ascender coupled with a great deal of whitespace, or a collection of letters with little whitespace and no ascenders or descenders. But it is hard to train a single network to do several different things -- say, to recognize letters with ascenders only, letters with descenders only, letters with both ascenders and descenders, and letters with neither. Thus, instead of one large network, it pays to break things up into a collection of smaller network in an hierarchical architecture.
And all this is just for typewritten characters. What about handwriting? No neural network has yet been trained to read handwriting as well as a person -- and no symbolic AI program either. What is clear is that both approaches lack any ability at fundamental generalization. Whatever is done for typewritten characters has to be totally redone for handwritten characters: neither the neural networks nor the rule-based programs are smart enough to figure out that, say, "A handwritten character often looks sort of like a typewritten character, but wobblier." Both the symbolic and neural-network approaches are concerned with figuring out clever mathematical approximations allowing them to carry out this one specific task, in a way that involves no general understanding of the worlds of writing or language, let alone the world as a whole.
Some may feel that the example of OCR is unfair, in that there are good examples of things that neural network or symbolic AI programs do better than people. For instance, AI systems of all stripes are in many cases superior to humans at predicting the stock market, or determining the location of minerals in the ground based on past experience, or diagnosing diseases based on symptoms. Also, neural network advocates might point out that there are problems on which neural networks vastly outperform symbolic systems -- for instance, in robotic control, or computer vision. And symbolic AI advocates might point out that there are problems on which their systems vastly outperform neural networks -- e.g. automated theorem proving, and natural language processing.
In the end, though, all the examples prove the same thing: some mathematical tricks work better on some problems than on others, but no mathematical trick comes anywhere near to embodying mind.
How to Fail the Turing Test By Cheating OCR is a serious business application of AI. But AI is not just about business -- it can be lots of fun too. Let us end on a lighter note, by moving up a level from OCR, to the words, sentences and conversations that characters come together to form.
Alan Turing, in a famous research paper of the 1940's, argued that we should be willing to accept a computer program as being intelligent if it can fool us into believing it is human in a conversation conducted entirely in writing. This criterion, now called the "Turing Test," was not intended as a necessary condition for the acceptance of computer intelligence, merely a sufficient one -- after all, if a computer is a fundamentally different kind of intelligent system than a human, then why should we expect it to be able to simulate a human? How would we like to have our intelligence judged by our ability to simulate a computer, or a dolphin, or an intelligent ice creature from one of the moons of Jupiter? But though the test has its limitations, Turing's point in formulating it was a good one: that intelligence has to do with a system's observed behaviors rather than observers' philosophical speculations about a system's consciousness and creativity.
The following is a conversation between a human and HeX, a symbolic AI program written by West Australian graduate student Jason Hutchens, which won the annual Loebner "Turing Test" contest in 1996:
User:
hello
HeX:
Hello. Now that the formalities are over....
User:
what do you do?
HeX:
Well, I know what I do, but I refuse to tell you! Would you like to hear a joke? Knock,
knock...
User:
who's there
HeX:
Cindy.
User:
cindy who
HeX:
Cindy next judge in, please!
User:
you're funny
HeX:
Why do you feel that you have to tell me that I am funny?
User:
to make you feel happy
HeX:
Hehehe. Trying to trick me, eh?
User:
no
HeX:
Oh well, that's a pity!
User:
you want to be tricked?
HeX:
Hehehe - that's supposed to be a *question*??? Which part of the country are you from?
User:
california
HeX:
I'm sorry, but I refuse to reply if you're going to be silly.
User:
i'm not silly
HeX:
If you are not silly, it's your business, mate!
User:
sorry
HeX:
Apologies accepted.
User:
how are you doing, mate?
HeX:
I am just fine thanks! And you?
User:
bad
HeX:
No! Bugger off! Silly person!
User:
why are you called hex?
HeX:
Does there have to be a reason?
User:
yeah
HeX:
OK. That's good...
User:
HI
HeX:
Hello. Now, that the formalities are over!
User:
What?
HeX:
I have absolutely no idea what. Would you like to hear a joke? Knock, knock...
User:
What platform are you?
HeX:
I'm sure someone out there knows what platform I am... Which part of the country are
you from?
User:
Are you on PC or mac
HeX:
I absolutely hate PC's. I should say PC's running Microsoft operating systems, as they
can be good Unix boxes when they have Linux installed.
User:
I hate PC too
HeX:
I absolutely hate PC's. I should say PC's running Microsoft operating systems, as they
can be good Unix boxes when they have Linux installed.
User:
So, you are on a UNIX?
HeX:
There is nothing wrong with Unix. It's rather nice when you get used to it!
User:
Do you know of any jokes HeX?
HeX:
OK, here's a joke for you. Knock, knock.
User:
whose there
HeX:
Cindy.
User:
Cindy who?
HeX:
Cindy next judge in, please!
User:
Dumb!
HeX:
Hehehe. Trying to trick me, eh?
User:
Try to trick me HeX
HeX:
I'm sorry, but I refuse to reply if you're going to be silly.
User:
How old are you
HeX:
Well, should my age really matter? Just for the record I am 24. How old are you then?
User:
I am 100 years old
HeX:
If you are 100 years old, it's your business, mate! Would you like to hear a joke?
Knock, knock...
User:
whose there
HeX:
Howard.
User:
who
HeX:
Where's your grammar?
User:
howard who
HeX:
Howard I know? Knock, knock.. Hutchens' Website contains hundreds of pages of similar dialogue -- amusing at first, but after a certain point, annoyingly repetitive. It is clear in any case that HeX's observed behaviors fall far short of Turing's goal!
HeX represents a minor but noticeable improvement over ELIZA, Terry Winograd's famous "computer psychiatrist" of the 1970's. Like ELIZA, HeX is a very unsophisticated program, which searches for key phrases in its input and replies with stock "template"-generated phrases rather than doing advanced grammatical parsing, or trying to understand the content of the sentences typed to it. No program based on advanced parsing is able to outperform either ELIZA or HeX in practice. And neither one tells us anything at all about intelligence. Hutchens made this point himself shortly after his program won the prize, in a very entertaining talk entitled "How to Pass the Turing Test By Cheating."
In 1997, HeX's successors, SEPO, lost the Loebner competition to the program Converse, written by David Levy of Intelligent Research Ltd in England. Converse is in many ways an even stupider program than HeX, but it is better optimized for competition. Here is an excerpt from one of its prize-winning conversations, as preented on the Loebner Prize Website:
PROGRAM[10:18:04]Hi, my name is Catherine, what's yours?
PROGRAM[10:18:17].
JUDGE04[10:18:32]My name is Elliott.
PROGRAM[10:18:34]Well, it's nice to meet you, Elliott.
PROGRAM[10:18:48]Did you see that story on CNN last night about the lesbian
PROGRAM[10:19:04]couple who came out at a White House party on Sunday?
JUDGE04[10:19:55]Yes, I did. I think it may all be a publicity stunt for Ellen.
PROGRAM[10:20:01]Ellen Degeneres was one of them - she was kissing her lover
PROGRAM[10:20:11]in front of Bill Clinton. Do you think he was very
PROGRAM[10:20:25]embarrassed?
JUDGE04[10:21:00]I think he want the gay vote. He will do anything to get it and keep it.
JUDGE04[10:21:15]He does not have a conservative base.
PROGRAM[10:21:21]
PROGRAM[10:21:21]Are you still a Friend of Bill Clinton?
JUDGE04[10:22:20]I am not sure if he has any real friends. He has achieved alot and has
JUDGE04[10:22:31]left alot of people behind.
PROGRAM[10:22:35]You really don't like the guy--well, at least that's
PROGRAM[10:22:50]definite--I thought most people were just indifferent or
PROGRAM[10:23:02]numb by now.
JUDGE04[10:23:34]I think we could have a more honest and dynamic leader. One able to
JUDGE04[10:23:55]face the real dangers facing the country and address them with
JUDGE04[10:24:26]courage and intelligence. Rather then listening to public opinion polls.
PROGRAM[10:24:30]I guess that means you really think there's serious behind
PROGRAM[10:24:56]Whitewater, the Special prosecutor and all that stuff?
JUDGE04[10:25:58]We have two dead people, Brown and guy who commited suicide, we have lost
JUDGE04[10:26:36]papers. And we have Clintons people willing to tell all. I think that
JUDGE04[10:26:55]Hillary will be found guilty of something.
PROGRAM[10:26:58]But how do you actually feel about him, now in the number
PROGRAM[10:27:09]two term?
PROGRAM[10:27:15].
JUDGE04[10:27:44]He is too smart to get caught. He deserved to named hotel manager of the
JUDGE04[10:27:47]year.
PROGRAM[10:27:49]OK, but what happens now--does it all drift on, with
PROGRAM[10:28:01]everyone waiting to see how much his friends have said about
PROGRAM[10:28:17]him under threat of lots and lots of jail-or will something
PROGRAM[10:28:32]dramatic actually happen?
PROGRAM[10:28:37].
It's almost a reasonable conversation! But then, listen to Converse's conversation with the next judge.... PROGRAM[10:07:12]Hi, my name is Catherine, what's yours?
PROGRAM[10:07:23].
JUDGE05[10:07:29]I'm Jurgen.
PROGRAM[10:07:31]Well, it's nice to meet you, Jurgen.
PROGRAM[10:07:40]Did you see that story on CNN last night about the lesbian
PROGRAM[10:07:52]couple who came out at a White House party on Sunday?
JUDGE05[10:08:33]NO. I just came in yesterday. I'm still kind of jet-lagged.
PROGRAM[10:08:35]Ellen Degeneres was one of them - she was kissing her lover
PROGRAM[10:08:51]in front of Bill Clinton. Do you think he was very
PROGRAM[10:09:14]embarrassed?
JUDGE05[10:10:18]I don't know? Don't you think it's a little childish if the leader of the most powerful nation on the planet is embarrassed by seeing two lesbians kiss?
PROGRAM[10:10:26]Probably! No. -
PROGRAM[10:10:29]Are you still a Friend of Bill Clinton?
JUDGE05[10:10:47]Never met him.
PROGRAM[10:10:48]You really don't like the guy--well, at least that's
PROGRAM[10:11:01]definite--I thought most people were just indifferent or
PROGRAM[10:11:14]numb by now.
JUDGE05[10:11:35]Have you seen the movie Mars attacs?
PROGRAM[10:11:36]Give me a break -- I'm just here to answer dumb questions
PROGRAM[10:11:50]-- not that! What were we talking about... Ok.. I guess that
PROGRAM[10:12:09]means you really think there's serious behind Whitewater,
PROGRAM[10:12:29]the Special prosecutor and all that stuff?
JUDGE05[10:12:48]I'm cold.
PROGRAM[10:12:49]But how do you actually feel about him, now in the number
PROGRAM[10:13:01]two term?
PROGRAM[10:13:06].
JUDGE05[10:13:29]Hey, don't you get it? I don't want to talk about Clinton.
PROGRAM[10:13:30]Sorry--my attention slipped for a moment--can we pushon to
PROGRAM[10:13:40]something more gripping. where was I.. oh yes OK, but what
PROGRAM[10:13:50]happens now--does it all drift on, with everyone waiting to
PROGRAM[10:13:58]see how much his friends have said about him under threat of
PROGRAM[10:14:07]lots and lots of jail-or will something dramatic actually
PROGRAM[10:14:17]happen?
PROGRAM[10:14:19].
JUDGE05[10:14:37]Couldn't care less.
PROGRAM[10:14:38]Well, you're more concerned about him that I am, I
PROGRAM[10:14:53]suppose--I think we've had worse, and who knows what I might
PROGRAM[10:15:07]have agreed to in that locked hotel room.
JUDGE05[10:15:53]You mean you might have been thrilled to have sex with the
President?
PROGRAM[10:15:56]sex, I knew it would come up sooner or later. Time to get
PROGRAM[10:16:10]off this one I think and onto something more down to earth!
JUDGE05[10:16:40]What kind of earth?
PROGRAM[10:16:41]Somebody once told me it's true. Like I was saying
PROGRAM[10:16:54]I suppose it depends on where you're coming from, but as the
PROGRAM[10:17:09]song goes in My Fair Lady, "Why can't a woman be more like a
PROGRAM[10:17:24]man? "
This one is almost a reasonable conversation too. But it's also almost the same conversation as the computer had with Judge 4 -- until Judge 5 refuses to play along with the canned Clinton theme, and tries to change the topic. Converse, like many a human bore, refuses to change the topic, because its knowledge base is rather limited. Converse's responses are more canned, more rote than HeX's; and Converse is more obviously optimized for the Loebner Prize format of very short conversations. I enjoy HeX's personality more; HeX shares a lot of its creator, Jason's offbeat Australian sense of humor. But neither one amuses me for very long.
The trouble with HeX and Converse is, obviously, precisely the trouble with neural networks and symbolic programs for OCR. It is a lack of general understanding. A computer program that has no world of its own, that cannot understand the world in any sense, will never be able to hold a conversation, and will never be able to generalize from typewritten fonts to display fonts, or handwriting. It will be lucky to be able to generalize from one collection of typewritten fonts to a moderately different one.
In short, Hubert Dreyfus was right in What Computers Can't Do: without an embodied being, and an evolving sense of self, and a capacity for autonomous exploration, nothing approaching real intelligence is possible. Dreyfus's newfound enthusiasm for neural networks is laudable, but by now has probably been tempered by a realization that you can't do much with a few thousand neurons. One needs a huge number of neurons, in an appropriately structured net, and then one can get the statistical, reflexive self-organization needed to support an holistic, embodied, social intelligent agent. The neurons are not the point; thought would work just as well if they were replaced with "bleurons" or "slorpons" or any other reasonable nonlinear unit, or with a suitably dynamic continuous field of matter. The point is in the dynamic, flexible organization of the whole, and its engagement with the environment around it.
A more amusing example of a computer conversationalist is Jason Hutchens' other program, MegaHal. MegaHal is more interesting internally than HeX or Converse, though the results are less similar to human conversation (at least, less similar to human conversation in the absence of excessive quantities of mind-altering substances; I do remember some conversations, from my college days in particular, which bore frightening resemblance to MegaHal's more literary ramblings). MegaHal does not just repeat back canned phrases using key words and phrases from the user's input, it learns from experience. It takes the words and phrases that have been said to it, tries to understand they way they have been combined in the past, and re-combines them itself to form its own utterances. In short, it is engaged in a dynamic with the world. The results are often hilarious, as in the following MegaHal "one-liners" (these are selected snippets, not a continuous conversation):
User:
joke
MegaHAL:
A JOKE: WHY DID THE CHICKEN CROSS THE ROAD? TO GET AN IDEA OF
WHAT CONSTITUTES YOUR PSYCHE. ...
User:
No way. I'd like to know the meaning of life though....any suggestions..??
MegaHAL:
IS THE MEANING OF LIFE, THE UNIVERSE AND EVERYTHING IS... 42! ...
User:
Joke. Lumberjack.
MegaHAL:
I'M A LUMBERJACK AND I'M OKAY, I SLEEP ALL NIGHT AND I WILL
SOON RULE THE WORLD! ...
User:
Joke. Wednesday shopping.
MegaHAL:
A JOKE: WHY DID THE CHICKEN CROSS THE ROAD? TO GET TO THE
LAVATORY. ON WEDNESDAYS I GO SHOPPING, AND I JUST HAVE A LOT
OF PENT-UP AGRESSIONS. ...
User:
chicken joke
MegaHAL:
HERE'S A JOKE, HAL. A MAN WALKS INTO A BAR AND ASKS FOR
MONEY. THE MAN SAYS, "I HAVEN'T GOT ANY, BUT I'LL SHOW YOU
SOMETHING BETTER." HE OPENS HIS COAT AND TAKES OUT A
FOOT-TALL MAN AND A LITTLE GLOWING FRIEND, BUT I FEEL LIKE
CHICKEN TONIGHT! ...
MegaHAL:
Let me feel you. Stick your penis into the 'abysses of ocean' eleven millenia ago, or was
it merely a 'noble lie' dreamed up by Plato?
User:
That has got to be the most interesting turn of phrase in the history of the world. I will try
to work that into polite conversation ...
User:
you're the first AI i meet which (who) is sexually obseded....
MegaHAL:
time to wor this one : why did the Kangaroo drop dead? Because the Koala fall out of a
nylon spine which contrasted nicely with her black stockings and bright shiny baubles. ...
User:
Hmm
MegaHAL:
jesus man..get a grip if your the future, we're in trouble. Hey Jason! You need to
reprogram the conflation module with a toffee hammer Sometimes MegaHal can even be quite poignant, as in the following extraction from an extended conversation:
User:
To be, or not to be, that is the question!
MegaHAL:
not to be, wish I ordered the bean burrito...
User:
To be, or not to be, that is the question!
MegaHAL:
society or society or Plato by Plato, that be the most fun sports to play band music for
Space Ghost now you realize.
User:
To be, or not to be, that is the question!
MegaHAL:
Please be more specific. The question you twat
User:
To be, or not to be, that is the question!
MegaHAL:
not to be, that is all I have to go.... I don't like jokes about stupid Koala's
User:
To be, or not to be, that is the question!
MegaHAL:
Evan Dobelle is the question!
User:
To be, or not to be, that is the question!
MegaHAL:
I clear the text before I ask a question i say i say
User:
To be, or not to be, that is the question!
MegaHAL:
a but we ran away, we didn't say "Oh cool check out the guy that invented the
knock-knock joke should be interesting: that is the question!" So said Shakespear a long
time ago in a galaxy far, far away...
MegaHAL:
Please be more specific. The question you twat
User:
"To be, or not to be, that is the question!" So said Shakespear a long time ago in a
galaxy far, far away...
MegaHAL:
joke Shakespear poet monkey wahoo --> To be, or not to be, that is the greatest
conductor of all time. What do you like to think i do your uncle walter
User:
Shakespear said
MegaHAL:
answer the question you asked is so vague, I don't think I can see all obstacles in my
way. Gone are the scum that fills my shower
User:
Shaun
MegaHAL:
just fuck you just pluck anything up from the master gangster computer god on the other
AI bot were pretty dissapointing. I hope you'll be better.
If the Turing test referred to imitating the writing of Gertrude Stein, in her more experimental works such as How to Write (which begins "Arthur a grammar. Questionnaire in question." and continues MegHallishly for over 400 pages) then MegaHal would be a tremendous success! Stein and Joyce and other modernist writers were interested in probing the collective unconscious of the human race, in making words combine in ways that were unconventional of ordinary discourse, but reflective of the deep and subtle patterns of the human unconscious. And in its own way, MegaHal does this same thing. Anyone logged onto the World Wide Web can converse with it, training its internal memory with their conversation. It takes bits and pieces of the text thrown at it by people from around the world, and it combines them together in ways that are familiar yet nonsensical. Sometimes its utterances have an uncanny emergent meaning, on a layer above the nonsense.
Humanity's sexual obsession is reflected in MegaHal's discourse, as a result of the huge number of sexual comments typed into it by users accessing Hutchens' Website. MegaHal's pleas as to the vagary of "to be or not to be" are as poignant as anything in Stein. "To be or not to be, that is the greatest conductor of all time" is an absurd conflation of phrases learned by the program in different contexts, but it is also as pregnant with meaning as anything else in modern poetry. The collective unconscious, and the hidden, creative part of the individual human mind, work by cross-breeding and mutating ideas in precisely this way.
MegaHal is more interesting than HeX or Converse because it engages its conversation partners in a dynamic. The dynamic is a silly one, to be sure, and not as interesting as the dynamic of ordinary, intelligent conversation. But yet playing with MegaHal is immensely more engaging than playing with HeX or Converse. What we need is something with MegaHal's spontaneity and flexibility, and the surface plausibility and adherence to form of HeX and Converse. But this is not forthcoming without a major change in approach. HeX and Converse are like all forms of rule-based AI: they are good at what they do, but stilted by their restrictions; and because of their restrictions, their behavior is often idiotic in real-world situations. MegaHal is more like a totally unstructured neural network (though its inner workings are statistical, they could just as well be neural-net based; the results would be similar): it is fluid and free-flowing, full of creativity, but harder to constrain to the problems of the real world. Constraining creativity within rules can only be achieved by getting the rules to emerge out of an underlying substrate of creativity. MegaHal is too unstructured for this, and has too small of a memory, and too weak of a self-organizing learning algorithm; HeX and Converse, on the other hand, have no underlying substrate of creativity for anything to emerge out of.
How long will it be before we have a computer system that can really hold a conversation, that can really pass the Turing test? You won't be surprised that I am optimistic. I believe that this is not too far off -- ten or maybe twenty years is my bold, perhaps foolish prediction. What we need is a system like MegaHal that understands a lot about grammar and about word and sentence meaning. This can be accomplished by connecting such a system to a huge database of text and having it mediate millions of human interactions with this text. The World Wide Web is a perfect substrate for breeding such systems.
Conclusion Little of what I have said in this chapter is original. Very few contemporary researchers really believe that true intelligence will ever be programmed by methods resembling those currently popular. The modern practice of "artificial intelligence" has little to do with the design or construction of truly intelligent artifices.
All this is quite pessimistic. But nevertheless, I will end on an optimistic note. My personal opinion is that, not only is AI possible, it is possible today. In order to achieve it, we will need to go beyond both symbolic and neural network AI. We will need a new approach to computational intelligence, one that confronts the full complexity of the brain and mind. But I believe these ideas are at hand, delivered to us courtesy of complex systems science and network computing.
This is a gutsy position to take -- to criticize the AI pioneers of the past for claiming AI was just around the corner; and then to turn around and say "But really, AI is just around the corner now"! But that is exactly what I am doing. I believe that the AI pioneers were right in many of their intuitions, they were just off by a few decades. They did not have the computer power to simulate the large-scale, holistic structure of the brain/mind, and so they deluded themselves that simulating particular specialized aspects of the mind or brain would be sufficient.
So what is the solution? What then is the key to true AI? This I will outline in later chapters -- but a few more hints may be appropriate at this point. The essence of the matter, I believe, lies on the intermediate level of structure: larger-scale than neurons and their links, smaller-scale than the sterile abstractions of logical rules. In brain terms, one can think about neural modules -- clusters of tens or hundreds of thousands of neurons, each performing individual functions in an integrated way. One module might detect edges of forms in the visual field, another might contribute to the conjugation of verbs. The network of neural modules is a network of primitive mental processes rather than a network of non-psychological, low-level cells (neurons). The key lies in the way the modules are connected to each other, and they way they process information collectively.
This kind of middle-level, process-oriented view is slowly emerging from the neural network school of AI, as in the OCR example discussed above. And on a more theoretical level, it can be perceived in the work number of theorists, from both sides of the AI divide. For instance, Gerald Edelman, a Nobel Prize-winning biologist, has proposed a theory called "neuronal group selection" or Neural Darwinism, which describes how the brain constructs larger-scale networks called "maps" out of neural modules, and selects between these maps in an evolutionary manner, in order to find maps of optimum performance. And Marvin Minsky, the champion of symbolic AI, has moved in an oddly similar direction in recent years, proposing a "Society of Mind" theory in which is mind is viewed as a kind of society of agents or processes, that send messages to each other and form alliances into temporary working groups.
Minsky's and Edelman's ideas have little in common, on a detailed level. Edelman thinks symbolic AI is claptrap of the worst possible kind; Minsky still upholds the symbolic paradigm, though he now admits that it may sometimes be productive to model the individual "agents" or "processes" of the mind using neural nets. But even so, as will be elaborated in later chapters, the Society of Mind theory and the Neural Darwinism approach are both indicative of a shift toward a new, and more correct, view of the structure of intelligence. What they share is a focus on the intermediate level of process dynamics. They are both looking above neurons and below rigid rational rules, and trying to find the essence of mind in the interactions of large numbers of middle-level psychological processes. This is, I believe, the correct perspective. What I have tried to do in my own work is to take this train of thought one step further, and answer the questions:
The symbolic approach, on the other hand, oversimplifies the mind by portraying it as unrealistically orderly, as implausibly dependent upon logical reasoning, with little or no chaotic, deeply trial-and-error-based self-organization. It totally misses the point about logic and rules: no particular system of rules is all that important to mind; the crucial thing is the ability to conceive new systems of rules to match new kinds of situations.
In fact, the brain is more than a network of neurons connected according to simple patterns, and the mind is more than an assemblage of clever algorithms or logical transformation rules. a history of attempts to ignore this fact, for the sake of getting interesting semi-intelligent behavior out of insufficiently powerful computers. A lot has been learned through the process, but what has not been learned is how to build a thinking machine. Rather, we have mostly learned what intelligence is not. Intelligence is not following prescribed deductive or heuristic rules, like Deep Blue or EURISKO; intelligence is not the adaptation of synapses in response to environmental feedback, as in backpropagation neural nets. Intelligence involves these things, but what intelligence really is, is something different: the self-organization and mutual intercreation of a network of processes, embodying perception, action, memory and reasoning in a unified way, and guiding an autonomous system in its interactions with a rich, flexible environment.
By Ben Goertzel
Copyright 1997 Ben Goertzel, All Rights Reserved
This is a chapter from a book manuscript entitled
"World Wide Brain", expected to appear in late 1998
The conversion from RTF to HTML may have been a little fluky.