Ernest W Kent, Associate Professor
Dept of Psychology
The University of Illinois at Chicago Circle
Chicago IL 60680
The idea of a machine that thinks like a man has always fascinated us. If we talk about substantial improvements in processor design, it is relegated to a few technical journals, but if we talk about robots, or write stories about computers with personalities, or make movies about HAL, everyone gets interested, and computer buffs stay up all night trying to figure out how to make their machines behave like that. An artificial "intelligence," in the sense in which we apply that term to our own thought processes, has been a recurring theme dating back to antiquity, despite the fact that no such machine has ever existed. I don't know why this is so, perhaps it springs from a desire to understand our own thoughts, or perhaps the human race is just lonely. We keep hoping for dolphins and Martians to start talking to us too. Whatever the reason, the idea is a potent one, and an enduring source of interest.
Why then don't we build computers that think like men? Well, people have certainly tried, and some very elegant software has been created towards that end, but the results have always been frustratingly limited. You have all heard the old adage that a computer to equal the human brain would require a machine the size of the Empire State Building with the electrical output of Niagara Falls to power it, but that was always just an excuse. Aside from the fact that whoever coined that notion didn't have any idea of what a brain -like computer would require, the fact is that we would have built it anyway, if we'd known how. Moreover, when that guess was made, we were making computers out of 12AX7s, and it's a long way from the 12AX7 to the Z -80. We would certainly build that machine today, if we knew how. What then is the problem? Our big fast machines can do elegant and complex mathematics, even proofs, far beyond our own powers, yet we have the greatest difficulty making them display even the slightest degree of common sense. The System 370 can do amazing things with numbers, but it wouldn't have the intelligence to duck if you swung a club at it. A frog could do better at dealing with its environment.
I assure you that the problem has nothing to do with any mysterious properties of the brain. If we have learned anything about the brain, it is that it is a machine. A complex machine to be sure, but a machine nonetheless. This notion frequently upsets people; they are bothered by the suggestion that they might be "only a machine." I think that this is the wrong interpretation. The statement that the brain is a machine does not mean that it is "only" a machine in the sense of the simple machines of limited ability that we have produced. Rather, it is a statement that extends our concept of what a machine can do. It does not denigrate what the brain can do. Of course, it raises the possibility that we can design a machine that thinks like a man, with all of the attendant problems of philosophy and theology that that raises.
One could take refuge in the notion that brain and mind may somehow be different, but the evidence is that when the brain is manipulated experimentally, all of our mental processes, our sensations and perceptions, our feelings and emotions, our intellectual processes, our memories, even our states of consciousness are manipulated too. The evidence suggests that this manipulation is predictable and occurs in the same fashion in all individuals.
Why are we having so much trouble making our computers behave in brain-like fashions if both are machines and both information processing machines at that? I think the answer is that we are trying to make a wrench do a screwdriver's job. What I mean is that while all information processing machines may theoretically be capable of imitating all others, Turing didn't say that all had to be built to handle all problems with equal ease. In point of fact, the brain's architecture is quite different from that of computers as we customarily build them, and different in ways that are very instructive with regard to the problem of building a "thinking machine."
The brain and the computer have both developed in an evolutionary manner, with "survival of the fittest" determining what features were retained and what were discarded. The differences in their designs arise from the fact that nature and computer engineers have different notions of what constitutes "fittest." There are two aspects to this difference: the nature of the problems the machine is required to solve, and the nature of the hardware available to build the machine. The successful brains, the ones whose genes contributed to the next generation, were the ones that had good designs for solving problems like recognizing and avoiding dinosaurs, and recognizing and catching frogs. Ability at higher mathematics was never a very important criterion in determining successful brain design, and our poor brains get quickly strained when they're required to do much of it. Computer design was judged from the beginning in terms of its success at mathematical function, and as a result they are very good at it, but very bad at catching frogs.
The kind of hardware available to computer engineers and to organic evolution was also different, and in part determined the differences in architecture of successful brains and computers. Although the logic gate and the neuron have a great deal in common as we shall see, some of their differences turned out to have far -reaching consequences. The brain never had speed on its side, neurons operate in milliseconds not nanoseconds, but it never lacked for quantity. (You want million bit bytes and ten thousand legged gates? Sure, how many trillion ?) The computer engineers on the other hand were limited in quantity by the expense and difficulty of assembly of their components, which dictated designs that were hardware conservative. This was compensated by using the speed of electronic components to substitute for quantity. Thus, our computers emphasize small bytes and few registers, but achieve high data thruput with iterative reuse of these components at great speeds. Bus oriented design and other hardware conservative adaptations arise from these same considerations.
In contrast, parallel multiprocessor designs with hierarchical organization, which the brain uses with wild abandon, are only seen in the most primitive form in our current computers and computer networks. The brain also has no compunctions about freely mixing digital and analog computing elements, using each to best advantage where needed. Thus, the brain and the computer have each found a design best suited to the problems they are required to solve and the hardware available, albeit big brains can do some computer-like functions, poorly; and big computers can do some brain-like functions, poorly.
Curiously enough, both the brain and the computer seem to have settled on a single basic organization whether the device is large or small. We all know how a computer works in principle, although different machines differ in detail. The processor, the bus, the clock, memory, the IO interface, all are arranged according to the same basic plan in large machines and small. Similarly, the brain of the white rat in my laboratory is of the same design, basically, as the brain reading this page. All the same parts are included in both, and hooked up in the same way. The differences are in capacity and relative development of the parts. Brains are even more similar to one another than computers.
What I am suggesting is that the computer has developed an architecture that is optimized for logical and mathematical problems, but that that is not an optimal architecture if the problem is to display basic common sense, whether it can do Fourier transforms or not. Thus, our machines as presently configured are terribly inefficient at the kinds of problems brains solve easily, and prodigious feats of programming, vast amounts of memory and all the speed that can be mustered give us only the most trivial results.
I am not going to suggest that we try to build a hardware replica of the brain. We don't have the hardware or the knowledge yet. I would like to suggest however that if we are interested in approaches to a science of robotics, it would be very instructive to examine the only model of an intelligent machine that is available to us, and to try to identify principles of operation that could be put to good use with the kinds of hardware that we do have. This is a particularly appropriate time for such an exercise, since the hardware revolution has freed us to a degree from some of the original constraints upon computer design, and the growth of the computer hobby community has provided us with a group of eager experimenters who don't have a large investment in standard approaches, or a requirement to produce commercially useful business machines. Finally, our understanding of the operation of the brain has undergone something of a revolution in the last ten years, and we are in a much better position to discern the outlines of its architecture than we were when the computer was born. The time is right, the means are at hand, all we need are the geniuses in the basement workshops.
am not going to tell you how to build a machine to work like a brain, because I don't know how. What I propose to do is to tell you, in terms of computer concepts, how a brain works. You supply the ideas from there. It is generally very difficult to explain brain operation to the layman, be- ause he doesn't have the necessary concepts readily available. I have found, however, that it is very easy to explain brain operation to computer people because it can be translated into terms and concepts with which they are already familiar. If you can understand digital and analog electronics, you can understand in principle, if not in detail, how your brain works. You don't have to understand neurophysiology, neuroanatomy, neu ropharmacology, and physiological psychology. (Well, maybe a little bit, but I'll try to keep it painless.)
First, you have to get a picture of the basic unit of brain structure. This is called a neuron. It is a cell like all the others in your body, but it is specialized for information processing. You can think of it as performing much the same function as a logical gate in a digital machine, or an operational amplifier in an analog machine. In fact, it is a very versatile device and can do either or both jobs. The brain uses neurons, billions and billions of them, to do everything it does. Let's look at the diagram of a neuron in figure 1. The output of the neuron appears on the long thin part labeled axon. Think of the axon as a wire. The brain uses them for transmitting information over distance. The difference is that this wire transmits only pulse streams, not DC levels. It's a digital wire. All the pulses are always the same height and duration, and can be thought of as binary bits. Only ones and zeros are allowed. Now look at the part labeled axon hillock. That's like a Schmitt trigger. It puts a pulse on the axon whenever the analog voltage in the part labeled cell body crosses a preset threshold value. Whenever this occurs, the voltage in the cell body is reset to the baseline or initial value. This cell body voltage is raised toward the Schmitt trigger's threshold, or dropped away from it, by the action of pulses impinging on the cell body from the axons of other neurons. The place where an axon meets a cell body is called a synapse, and it only transmits in one direction. Think of it as a diode. Now a synapse may be positive or negative. That is, a pulse at a given synapse may either add to or subtract from the cell body voltage. This is a function of which synapse receives the input, not the nature of the pulse (just like the little inversion circles on logic gate inputs). Remember however that the voltage in the cell body is an analog voltage and is performing an algebraic sum of the inputs. Moreover, the inputs may have different weights. The ones furthest from the axon hillock have the least effect, the ones nearest to it have the greatest effect. Weighting may be reduced greatly by placing the input far out on an extension of the cell body called a dendrite. The effect of an input outlasts the pulse that produces it, so that inputs which do not arrive synchronously in time may still sum within brief time limits. Think of it as a pulse stretcher at each input coupled with a time constant in the cell body. The effect of the finite time constant is to give each input pulse a temporal weighting (ie: the more recently arrived pulses have a greater weight in the sum).
When the inputs to the cell body have summed past threshold at the axon hillock, and a pulse has been placed on the axon as a result, we say that the neuron has "fired." An equivalent circuit (for our purposes) of the basic neuron is shown in figure 2. If you study figures 1 and 2 for a moment, you will see that the neuron has digital inputs, which are converted to analog values and operated on algebraically in an analog fashion. The result is then converted back to digital form for transmission. Now this is a very powerful tool. It can act as an AND gate (coincidence of several equally weighted inputs required to reach firing threshold), an OR gate (any of several inputs can drive the cell past firing threshold), or any of a variety of other functions (NAND and NOR can be achieved by cells with a resting potential above firing threshold, and inhibitory inputs). When you consider that one or more of the inputs to the cell can be feedback from its own axon, and that an average neuron may have ten thousand inputs, you begin to appreciate the possibilities. It can integrate and differentiate and do other useful functions by virtue of feedback and analog operation.
No one would try to design a very large system out of gates this complex unless someone finds a way to make them in quantity on a chip. Consideration of their operation however reveals some important considerations of brain architecture. First, although neurons can be synchronized by simultaneously driving them with an overriding input, they are normally asynchronous devices. The system can use local "clocks" where they are useful, but it doesn't have a system clock.
This permits an interesting development. In synchronous systems, information can only be coded in terms of which lines are active. If you want to indicate on or off status of some condition, you put a zero or one on the appropriate line. But if you want to indicate a numerical value greater than one, you have to do it by coding it as the activity, or lack of it, simultaneously on several lines. For example, we represent numbers by the bit pattern in a byte on the data bus. Now the brain can, and in several instances does, function in this mode. We refer to it as "place coding" because the information is contained in the location of the incoming signal. However, because it does not have to maintain synchronization, the brain can also encode information in terms of the frequency of arrival of pulses on the axon. We refer to this, obviously enough, as "frequency coding." Examination of the model circuit (figure 2) for the neuron reveals that the greater the positive input drive to the cell body, the more quickly the analog voltage will reach threshold, and the more quickly a new pulse will be placed on the axon after the reset following the preceding pulse. This means that it is very easy to use frequency coding to indicate the magnitude of the summed input activity. Due to the "pulse stretchers" at the inputs to the neuron, and the time constant of the cell body, pulses that are sequential on the axon can sum with one another to produce a greater analog voltage. This voltage is proportional to the frequency of the incoming pulses, and this achieves decoding of frequency coded input back to the analog mode.
Interaction of inputs from different axons is termed "spatial summation," and interaction of sequential pulses on the same axon is termed "temporal summation." One of the interesting things the brain can do with this capability is to use both place and frequency coding simultaneously on the same line. Another advantage is the ability to represent very large numerical quantities with what might be called a "temporal byte," or integration over a brief time period, of a single input line. As an example, consider the way the brain encodes sensory information from your skin. The type of sensation (ie: heat, cold, pressure, etc) as well as the location of the sensation is place coded. That is, what you feel is a function of which line is active. The magnitude of the sensation, how much you feel, is frequency coded on the same line. Thus, the brain structure receiving the information can determine the type and location of the stimulation with a "spatial byte," (place code) which determines the set of active lines, and the intensity of the stimulation with a "temporal byte" of frequency code.
The brain's basic "byte," therefore, has two dimensions, a spatial dimension and a temporal dimension. Two independent sets of information may be encoded in these two dimensions, and they may then interact in the receiving structure in a way determined by that structure. Notice that the spatial aspect of the byte is essentially digital information, and that the temporal aspect of the byte is essentially analog information, although it is encoded in the frequency of digital pulses.
The mathematical treatment of this is an information theorist's nightmare, although it can be done, but from a practical standpoint there are some clear advantages. Digital information output from a structure which determines the type of action to be taken and analog information output from another structure which determines intensity of action required may "gate" one another in a third location to produce an output stream which simultaneously defines the nature and magnitude of the action taken. One could think of it as specifying the enabling of a set of switches with an appropriately coded digital byte while presenting a set of analog values to the switched lines. We do see this sort of thing of course in some electronic IO applications, but the brain makes use of this, and much more complicated interactions, in its internal processing. It may use the information from one aspect of the byte to determine the nature or extent of the operation to be performed on the information in the other aspect of the byte.
Two additional properties of the neuron need to be mentioned to complete our understanding of the basic gate. The first is a different type of inhibitory input. The "negative synapse" described earlier puts an inhibitory input into the cell body to act on the analog sum there and retard the achievement of firing threshold at the axon hillock. This action of course simply antagonizes (with a specified weight) the action of all the positive inputs. Clearly, it does this without regard to which input it is antagonizing. It is also possible to have a "negative synapse" which antagonizes only a specific synaptic input. This is called "presynaptic inhibition," because it may be thought of as a disable input to one of the input one shots.
The final desirable property of the neuron as a computer element is that speed of transmission of pulses down the axon may vary over a wide range (although it is always the same in any given axon). This means that we may use high speed axons to move data quickly, but low speed axons may be employed as delay lines. Since axons can have branches coming off at any point, we may have tapped delay lines. We shall see some stunning examples of the utility of this feature.
It should be apparent by now that the basic neuron is an enormously powerful tool. In practice, few situations call for all of the complexity of this device, and it is frequently seen acting as simply a switch or AND gate or other very domestic sort of creature. Indeed, in many situations, neurons take on a variety of specialized shapes and connections which optimize them for one or another function, to the exclusion of others. In all cases however, their operation may be understood in terms of the basic design we have discussed.
Now that we have some terms for the basic elements, let us take a leap to the other end of the size spectrum and examine the overall structure of the brain. The exact anatomy is actually of little relevance for our purposes, but it may help to have a visual image of the device as we discuss the features of its parts. Figure 3 shows the general appearance of the human brain, together with some of its internal structure. Figure 4 shows the general organization of the parts as they would appear if the brain were taken out of the body, unfolded, and flattened out in a neat plan view.
The functional structures of the brain may be generally divided into two categories, fiber tracts and nuclei (plural of nucleus). Fiber tracts are simply bundles of axons going from somewhere to somewhere else, the cabling and wiring of the brain. The nuclei are groups of cell bodies. Each nucleus may be thought of as analogous to a central processor with a dedicated function, and (in most cases) a hardwired or ROM program. Most of the nuclei are irregular blobs of cell bodies, but in some cases the cell bodies are arranged in layers and the layers form a folded sheet of cells. In this case it is called a cortex rather than a nucleus, but the idea is the same. (The most famous of course is the cerebral cortex, of which humans are very proud because it is better developed in man than in most species.) The cells in the nuclei may be divided into two types: local neurons, whose function is in the data processing internal to the nucleus, and whose axons do not leave the nucleus, and output neurons which give rise to the axons that make up the fiber tracts and communicate with other nuclei. There are thousands of nuclei, of all levels of size and sophistication. Unfortunately, there is very little system to their names, and the names are either in Latin, or unpronouncable (Nucleus of Darkeschwitz, etc). The fiber tracts are bad too (Habenulointerpeduncular tract).
The only rational thing to do with the names of neuroanatomy is to endure them or ignore them. We shall try to ignore them. It will help however if you will take the time to learn the names of a few of the major divisions of the brain which are shown in figure 4 and to remember their basic relation to one another. The most important items, from bottom to top, are: the spinal cord, the medulla, the pons, the cerebellum, the mesencephalon, the diencephalon (and its two major subdivisions, the thalamus and the hypothalamus), the limbic system, the striatum, and the cerebral cortex.
This bottom to top sequence corresponds in a general way to a sequence of increasingly more global levels of control, from most detailed and specific, to most general and abstract. It also corresponds roughly to the evolutionary sequence from oldest and most primitive to most recent and advanced. All of the apparatus shown here is present by the time the evolutionary level of the mammals is reached.
The basic architecture of the system is hierarchical. Each of the major functions of the system is partially organized at each level of the system, rather than particular structures being devoted to particular major functions. At the lowest levels, there are a multitude of relatively simple processing elements doing similar jobs, and at the higher levels there are a few very complex and powerful processing elements defining system goals and priorities, and organizing the activities of the lower levels to achieve them.
On the input side, the lowest levels gather raw data which is then progressively abstracted, sorted and refined at each stage according to general guidelines which may be hardwired or provided by higher levels. The highest levels then receive abstract symbolic information about the general state of the environment rather than details. ( "There is a black cat there" as opposed to "The following points of the visual field are dark. ") Similarly, output functions begin at the highest levels, which determine general goals and strategies and transmit these in the form of statements about more limited momentary objectives to lower levels, which in turn send information about desired actions and timing to the lowest levels for execution.
Thus, at each level there are a number of relatively independent processing elements pursuing their own jobs in parallel real time, while trading information with echelons above and below, and laterally with one another. It follows that it doesn't make sense to ask where in the brain any large scale function is processed. Different aspects of it will be handled in different portions of functional subsystems which are represented at all major levels of the physical system. It might sound hopeless to try to follow the operation of such a device, but in practice there is order, not chaos. At the lower levels where semi -independent processors are most numerous, there is least diversity among them. The organization is in many ways like a military command chain, and one doesn't have to study each lieutenant and platoon individually to comprehend the principle. A general plan of this type of organization is shown in figure 5.
Three features of such an organization are immediately relevant to a machine that must deal with the real world environment and do it in real time. These are: hierarchical decision making capability, parallel processing of IO data, and "fail safe" backup function. With regard to the first, the vast majority of the decisions that have to be made are trivial and can be handled at lower levels without taking up time that the higher levels can spend working on more complex problems. IN computer science we already see the beginnings of this concept in "intelligent terminals," and in some cases two or three levels of preliminary IO processing buffering a big machine. This sort of operation need not be limited to big installations, however. With the appearance of cheap microprocessors, it should now be possible to have, for example, a devoted processor driving each joint of a robot limb by computing required forces on the basis of positions and velocities requested by higher levels and resistive forces and other local perturbations which constitute its input from below. The brain provides such a processor for each fiber of each muscle, and it will be instructive to examine their approaches to the problems involved.
The second feature of the brain's architecture, parallel IO processing, provides speed sufficient to deal with a complex world in real time, even in the face of (sov) components. By breaking down the tasks into small parts that can be handled simultaneously by numerous simple processing elements, the time required for the task is not greater than the time required for the execution of one of its components. This is a fairly obvious statement, of course, but its real importance is apparent when we examine the differences between the types of problems typically faced by a brain operating on the real world, and the types of problems tradional computers are designed to handle. In the solution of mathematical and logical problems, it frequently doesn't make sense to attempt simultaneous solutions of parts of the problem. The results of one operation are essential to the beginning of the next. Such problems are inherently serial in nature. The brain, being unable to apply the power of its parallel organization to such problems, becomes terribly slow. On the other hand, the two tasks requiring the greatest feats of processing from the brain in the normal conduct of business, analysis of the flood of information impinging on the sense organs, and the design and execution of movements in space, lend themselves perfectly to parallel processing in small subunits. Here the brain can vastly outperform our typical current computers which have only one, or at most a few, processing units capable of simultaneous operation. For example, all of the data in the visual field is available simultaneously on the surface of the retina. Rather than dealing with it point by point, the brain sucks it all in at once in one enormous byte and sets to work on the analysis of many small areas of the visual field simultaneously. (We shall examine its algorithms in detail later.)
Even with the small cheap processors available to us now, we could obviously never afford to match the brain in quantity. However, we don't have to go to the other extreme and try to do it point by point serially with a single very fast processor as has been typically attempted. The job is just too large for even the fastest machine to do this way, and there are certain advantages as well in terms of the feature extraction process to having a basically parallel system. On the other hand, we do have a speed advantage and it certainly should be possible to simulate the operation of a number of the brain's processors with only one of ours in the same time frame. (There will be some increase in complexity where the results of neighboring units are interactive.) Just how to optimize this sort of tradeoff is, of course, a matter for much study. A first step which we shall take here will be to examine some of the tricks and shortcuts in the feature extraction process that the brain itself uses to save time.
The third system characteristic which results from the brain's hierarchical organization is high survival value. We will learn nothing about Asimov's first two "laws" of robotics (the protection of human beings) by studying the brain. The third law, ensuring the survival of the robot, has always been a major concern of brain architecture. It is annoying, but not usually fatal when the big machine in the computer center develops a fault. When it happens to a brain, or in the situations we will send them into, a robot, the whole device may be destroyed. The redundancy inherent in the brain's basic structure is, of course, valuable in this regard, but there is more to it than that. Recall two facts about the brain: There is an evolutionary order of development to its structure, and the major functions have representation at all levels. These two facts are related. Whereas our computers have never been expected to incorporate pieces of earlier models, the brain in the present form contains most of the parts of its earlier forms. The simplest early brains obviously had to be capable, in their own inelegant way, of getting the organism around in the environment and surviving. During the course of evolution, more complex structures capable of more sophisticated handling of the same basic functions became available. Rather than eliminating the older structures and duplicating their functions, the newer ones simply took control of the older and used them as subprocessors. A fairly general principle of organization evolved in which the higher level structures control the lower not by turning them on when needed, but by inhibiting their actions except as desired. The beauty of this system is that if a higher center is suddenly damaged, the older, more primitive units which it normally holds in inhibition are released to function on their own. Thus, damage tends not to eliminate vital functions, but only to downgrade the complexity with which the job can be performed. This is especially true of functions such as defense. The typical result of damage to higher brain centers is a "nasty" animal, ie: one which can adequately fight, but which fails to make fine discriminations about the appropriate stimulus conditions for doing so, and which defaults in the safe direction by attacking any strong stimulus source. Of course this kind of thing has its limits, and this is particularly true of the most highly developed brains where some of this type of organization is sacrificed in order to give the highest centers direct access to the lowest for feedforward in the control of complex operations.
In the case of damage to lower centers, the multitude of processing elements available allows some of the higher levels to be reprogrammed to take over the functions of lower level systems by simulating their operation. The process takes a little time to organize, but it can be quite effective if the organism can survive for a few weeks while reorganization takes place.
While it is apparent that it is not possible to give a definite answer to the question of where a function of any complexity is performed in the brain, it may be useful (for orientation to the device) to identify some of the anatomical divisions of the brain shown in figures 3 and 4 with some of the functions which have important representation at those levels.
The lowest level of the central nervous system, the spinal cord, is a major route of input and output to the rest of the brain. With the exception of a few special cases, most of the sensory input from the body and most of the output to the muscles passes through this structure. Although it contains immense fiber tracts, it should not be considered merely a cable. The nuclei of the spinal cord perform many important functions as intelligent terminals on both the input and output sides. Moreover, some simple actions are processed entirely at the level of the spinal cord from input to output. Everyone has seen an example of such a "spinal reflex" in the knee jerk produced by their doctor's rubber hammer.
The medulla and pons are also importantly involved in "intelligent terminal" types of 10 activities, but at the so- called supra-segmental level of control. That is, these structures are frequently concerned with coordinating the activities of the 10 routines in the cord so as to direct activities that involve the entire body rather than individual segments such as a single limb. For example, the pattern of motor activity involved in walking requires the coordination of the whole body to maintain balance as the center of gravity shifts, etc (such things as the decision to walk, and the choice of direction are in the province of higher centers). The medulla and pons also have important relations to a number of the special senses such as hearing and balance which are not represented throughout the body. Some of this sensory information is utilized immediately as input to supra -segmental reflexes, and some is processed for output to higher centers.
A complex physiological organism requires a great deal of regulation of its internal environment; the temperature must be exact, the heart rate must be regulated, inhalation must be controlled. These "housekeeping" routines also have representation at this level of the brain.
The mesencephalon is in many ways similar to the pons and medulla in its functions. In general, there is a transition from higher to lower degrees of abstraction in these 10 systems as one progresses from the mesencephalon down to the medulla. The mesencephalon is in addition one end of a system, originating in the limbic system of the forebrain, which is important in regulating the type and intensity of the high level processing performed by the more advanced structures of the forebrain. There are two systems of the brainstem (medulla, pons, and mesencephalon) which may be men- tioned in this context. The first is a system known as the reticular formation which has important functions in the brain analogous to the vectored interrupt system of the computer. It continually monitors the input of all the sensory systems, more for quantity of activity than for detailed analysis, and controls the degree of activation of various portions of the higher centers on this basis. Thus, it can immediately arouse the brain when a novel or intense stimulus is en- countered, and jump the whole system to a stage of attention to, and analysis of, the important event. It also seems to exercise similar functions on the output side of operations.
The second "system" of the brainstem might be called the "amine" system, because it consists of a set of interacting nuclei which use various compounds of the chemical class known as "amines" in their operation. This system has a great deal to do with the mode of operation of the rest of the brain. Like the reticular formation, with which its activities are integrated, it sends its axons into all parts of the brain to make synaptic contacts with large groups of neurons in the forebrain and the brainstem and cord. Since the axons of these few neurons make millions of synaptic connections with vast numbers of neurons, we might expect that their function was a regulatory bias rather than the transmission of very specific information, and this appears to be the case. These nuclei are involved in such functions as regulation of waking, sleeping, and dreaming states, and apparently other "altered states of consciousness" since the hallucinatory drugs such as LSD are thought to exert their major effects here. Problems in some of these amine systems appear to underlie such abnormal operational modes as schizophrenia. Some portions of the amine system also regulate the intensity and selection of the various detailed patterns of activity generated by higher structures. Thus, in Parkinson's disease, which involves a disorder of part of this system, the ability to execute voluntary activities is impaired, although the conception of them is not. The systems of the brainstem thus exert a very major control over the general types of activity in which the higher levels of the device engage. A very interesting development here is the closing of this loop by return projections from the highest levels of the brain, the cortex, which allows the machine to gain control over its own status. This loop is fundamental to the phenomenon of consciousness.
Lying above the pons is the structure known as the cerebellum. This device is a subprocessor for some types of motor output. It is basically involved in the parallel to serial conversion of output that is not to be continuously modified by feedback control to the higher levels which specify its input. It can accept a parallel byte which defines an action to be undertaken, modify it to incorporate the current status of many variables of limb position, loading, etc, and convert the instruction to a series of sequenced operations with specified durations. Damage to the cerebellum has no effect on conscious processes, but seriously impairs the performance of muscular actions.
Above the mesencephalon, we encounter the diencephalon. The two major subdivisions, thalamus and hypothalamus, are quite different. The hypothalamus is heavily involved in the sort of housekeeping functions mentioned earlier. It controls the secretions of the endocrine system, for example, and is involved in a wide range of functions such as temperature regulation. In its role as chief executive of internal operations it, of course, must continually monitor internal conditions. These internal conditions in turn are frequently the ultimate sources of the whole brain's functional orientation. Thus, if the hypothalamus detects a low level of sugar in the blood, it initiates a state which we experience as hunger. This state represents a reorganization of the brain's systems for the control of goal directed behavior which causes the organism to engage in food seeking behavior. The hypothalamus thus contains part of the brain's analog of a computer's priority interrupt system. On the output side, the hypothalamus is important in changing the state of the internal body functions to correlate with higher priority interrupts. For example, if the limbic system initiates a "danger encountered" state, which we experience as fear, the hypothalamus must see to it that the body is mobilized for action with regard to blood flow, adrenaline levels, etc. The hypothalamus is also an important link in the limbic system -mesencephalon organization that in general specifies goals based on drives and emotional states for use in selection of overt behavioral activity.
The thalamus, the other major component of the diencephalon, functions as a very high level IO processor which prepares information for, and in part organizes the activity of, the cerebral cortex. Arrival of sensory input in the thalamus is sufficient for some rudimentary conscious experience of sensation, at least in some sensory modalities, but any very detailed resolution of the experience requires the enormous digital processing power of the cerebral cortex. In many respects, the various cortical areas act as subprocessors doing detail work for the various nuclei of the thalamus which route the information to and from them. Some thalamic -cortical systems are concerned with simple feature extraction of the input data, others with extrapolation of current events, and yet others with transmission of these extractions and extrapolations to other parts of the brain which, for example, evaluate them for relevance to current drive states or return sets of similar data from memory.
In this activity, the limbic system functions in the analysis of current and extrapolated data for relevance to the organism's needs. Thus, when we are hungry and see food, the limbic system in conjunction with the other structures mentioned earlier initiates a state which enables the sensory signal of food to initiate appropriate behavior, which is generated in detail by the cortex -thalamus- striatum apparatus. We experience the operation of this state of the limbic system as pleasure. If on the other hand, the limbic system recognized unfavorable situations, other states are generated (experienced as fear or anger) which cause other sorts of detailed actions to be generated.
The striatum, like the cerebellum, is importantly involved in the organization of motor output. It generates patterns of movement on the basis of input from analytic cortical areas, which are regulated in intensity by inputs from the amine systems under the direction of the limbic system, and it outputs these patterns both to the movement controlling areas of the cortex, and more directly to lower motor mechanisms. Unlike the cerebellum, the movements generated in the striatum are under continuous control of the cortex, and since the cortex is continually receiving and processing sensory information from the environment, a closed loop system is formed. This system is importantly invo
We have already mentioned most of the functions of the cortex since it is of necessity involved in almost all the higher functions of the other brain structures. Its operation is essentially that of a vast decoding and encoding network which gives analytic and synthetic power to the operation of the other systems. Without it, their operation would be the same in type, but would be much reduced in capability due to the loss of capacity for fine distinctions and discriminations on the one hand, and large scale generalizations and synthesis on the other. The cortex first appears in other than rudimentary form with the evolutionary appearance of the mammals. Their behavioral diversity and plasticity as compared with the stereotyped, reflexive, instinctive behavior of the reptiles is probably associated with this structure.
This brings me to the final general point I want to make about the brain before we consider the detailed operation of some of its functions. One of the most important features of the brain is that it learns. Our computers learn in a limited sense during programming, and some programs can learn to improve their performance on the basis of experience. This latter type of learning is characteristic of the way the brain operates, and it applies the process to almost everything it does.
You will notice that when I described some of the general functions of the different regions of the brain, I made no mention of memory. This is because we have no idea where it is. Indeed, the evidence suggests that the brain's memory is incorporated into its structure at whatever point the stored information is to act. Its memory then may be thought of as being distributed throughout its structure. At present, we can offer only speculation as to the physical nature and detailed processes of the brain's memory storage. Fortunately, we know a great deal about the operation of the brain's memory as a "black box" so that we can understand how it enters into the brain's algorithms, and we do not really need to understand its detailed physical nature to effectively use its principles of operation. Our current memory chips are a little inferior to the brain's memory in terms of capacity, but they are superior in speed and accuracy. Some programmable scratch memory and ROM chips associated with each of a robot's processing elements would do nicely, especially if supplemented by a disk for slow mass storage.
Researchers identify at least two types of learning that the brain permits, both very pragmatic. It has found that things that have occurred sequentially several times are likely to do so again, so it learns to associate them and act in anticipation. Thus, if the reflexive response of the nervous system to a painful stimulus applied to the foot is to quickly flex the leg, and if such a painful stimulus is repeatedly preceded by a "neutral" stimulus such as a bell sound, the brain will quickly learn to flex the leg whenever the bell sounds. This is the so- called "Pavlovian conditioned reflex." The potential utility of this scheme is obvious. Assuming the natural reaction to the painful stimulus is of use to the organism, then performing the reaction in response to the antecedent neutral stimulus allows the organism to get a jump on the world and perform more efficiently. This kind of action, although not the capacity to learn it, is employed in some computer memory systems which anticipate the next address call. Two limitations of this type of learning should be noted. The first is that the only thing that can be learned is the early performance of the natural response to the second stimulus. Thus, the organism's behavioral repertoire is not expanded, just made more efficient. The other is that all that is necessary for this type of learning is temporal contiguity of the events; it does not matter whether or not the anticipation is successful in improving the results. If you tape an electrode to the foot so that a shock following a bell occurs whether or not the leg flexes, the flexion of the leg still gets conditioned to the bell.
A second type of learning in which the brain engages is called "operant conditioning." This is the type of learning that permits us to expand our behavioral repertoire and base such expansions on the quality of the results. Simply stated, this type of learning is based on the principle that behaviors immediately preceding a reward are increased in future probability of occurrence. "Reward" here refers either to some pleasing event occurring, or some unpleasant state being terminated. Thus, behaviors that lead to good results will tend to recur. If we now add to this the second principle, that the behaviors immediately preceding the reward are the most strongly affected, it follows that more efficient behavioral routes to the reward are more strongly affected than less efficient ones. In this fashion, whole new behavioral patterns are built up out of successful components of more or less random exploratory behavior, and these quickly become welded together into tight and effective behavioral sequences.
The more developed brains rely very heavily on learning to produce most of their behavioral patterns. Less developed ones rely most heavily on prewired inflexible behaviors. Thus, evolutionarily primitive brains, such as those of fish, amphibians and reptiles, while capable of limited learning, generally rely on wired in behavior patterns that are available at birth. The advantage is early ability of the immature organism to fend for itself. The disadvantage is inflexible behavior that cannot easily adapt to an environment that differs from that in which the species evolved. Advanced mammals on the other hand, particularly man, are characterized by heavy reliance on learned behavior, which results in a protracted state of infantile helplessness followed by enormous behavioral flexibility and adaptiveness.
The parallel with our current attempts at robots is obvious. The major hurdle is designing a system that can operate in a generalized environment rather than being restricted to a specialized one with which it is preprogrammed to deal. The answer is also obvious. A successful robot must be capable of operant conditioning including the ability to be rewarded for successful attempts and to feel punished otherwise. This device is carried to such an extreme in advanced brains that even our basic ability to see, our perceptual structure, is learned in infancy. A newborn child has only the most rudimentary ability to interpret its visual environment; the relation between movement of the limbs and the result in visual space, the relation between certain output commands and the result on auditory input, all this and much more is painfully learned by trial and error in advanced brains. The result is the ability to modify behavior towards desired ends rather than react to stimuli in a preprogrammed fashion. The apparently random play of infants is in fact a deadly serious matter. Its emulation in our machines will be of the utmost importance to their handling of the generalized environment.
From this overview of the brain and its
functional organization, it is apparent that
we must now select a limited set of brain
functions to discuss in detail. I think that
those of most practical interest to robot
designers at the present time are:
1. The brain's mechanisms of output control, including coordination and timing, and the use of feedback in the design and execution of movements in space.
2. The brain's mechanisms of sensory perception, including its principles of feature extraction, and the tricks and shortcuts that it employs in pattern recognition.
3. The brain's mechanisms for achieving goal- directed behavior, including mechanisms of emotion and motivation and their control of behavior patterns.
4. The brain's mechanisms of consciousness, intelligence and learning.
I will attempt to cover each of these subjects in some detail in this series of articles. Next month the series continues with a discussion of the output generations of biological computing devices.
Carlson, N, Physiology of Behavior, Allyn and Bacon, NY 1977.
Gardner, E, Fundamentals of Neurology, Sixth Edition, Saunders, Philadelphia 1975.
Shepard, G, The Synaptic Organization of the Brain, Oxford University Press, NY 1974.
Rakic, P, Local Circuit Neurons, The MIT Press, Cambridge MA 1976.
Ernest W Kent, Associate Professor
Dept of Psychology
The University of Illinois at Chicago Circle
Chicago IL 60680
With this second article on the brain's output control system, we begin a more detailed look at the mechanisms by which the brain accomplishes some of the functions which robot systems will also be called upon to perform. (A number of the terms which are used in this article were defined and discussed in the first part which began on page 11 last month.) As we reach a more concrete level of description of the brain's operation, we will encounter many points which are not yet entirely resolved, and many questions which are subjects of dispute between competing theories. Since it would seem that the present reader is more likely interested in potential applications of brain architecture than in the exact nature of the debate on fine points of physiology, I will simply present the position which seems to me to be most strongly supported at the present time. I will also make some simplifications where they seem warranted by the intended purpose of these articles. (To atone for these sins, I will also offer a list of references for the reader who is interested in pursuing the subject in greater depth.)
It seems likely that any robotics system will require some kind of output controller concerned with the generation and execution of patterns of movement in space, and the required control systems may be expected to range from very simple to very complex. The evolution of the biological brain of course has also had to solve this problem, and it has accomplished it with a set of capabilities for control which are probably as complex as any that we will be likely to encounter for a long time to come. The jointed limb scheme which has been employed as the chief means of locomotion and manipulation in terrestrial animals requires a very complex control system. It is true that a robot, which is free of such restrictions as an uninterrupted blood flow to all of its parts, has other options; wheels and treads for example. These devices might permit simpler control systems, but I would like to suggest that for a system capable of operation in a generalized environment, the jointed limb scheme may be superior. Try to picture a wheeled or treaded robot scaling a cliff or climbing a tree, or even using a stool to dust the bookshelves. Since a motion control system which can handle the jointed limb scheme can also handle simpler systems, it may be most appropriate to plan for the future by starting with this basic scheme in early designs.
With regard to the actual mechanisms which are to be controlled, it is interesting to note that they are of only two basic types. The only two things that you are capable of doing are contracting a muscle and releasing glandular secretions, period. Everything else is only some combination of these two. Muscles and glands are the only devices to which the brain interfaces. In the present discussions we will concern ourselves exclusively with the muscles and the system which controls them, usually called the "motor control system."
There are two fundamental principles employed in the brain's motor control system. The first is to buffer each level of command with subprocessors which interpret the commands from higher levels as objectives; and compute appropriate outputs for achieving the objectives, while taking into account local feedback inputs and environmental information. A whole series of such steps is employed, with the "objectives" becoming more concrete at each stage. In this fashion, a pyramid of processors is defined which can accept very general directives and execute them in a reflex fashion with quite considerable flexibility in the face of varying loads, stresses and other perturbations. This system by itself is quite capable of things such as bipedal locomotion with maintenance of balance on uneven terrain. It cannot, however, operate in a goal directed fashion.
The second principle of the motor control system involves the operation of higher level systems which generate output strategies in relation to behavioral goals. This principle is the division of output tasks on the basis of their relation to input information rather than type of motion required. We shall examine some specific examples which illustrate each of these ideas.
The operation of the motor control command chain depends heavily on certain sensory inputs which provide feedback and status information for moment to moment operations, and it is appropriate to begin our investigation of output with a look at these inputs. Perform this small experiment. Close your eyes and put one hand somewhere out in front of you, then touch it with your other hand. Most people have no difficulty doing this quite accurately. The question is how, with your eyes closed, could you guide your hands to the right spatial locations? The answer is that we have a number of special sensory systems of which most of us are not even aware. These senses have the primary purpose of informing the brain's output control processors of things such as the relative positions of the limbs, the tensions of the muscles, the acceleration of the body in different directions, etc. Most people are unaware of these senses because they do not have a conscious content or "experience" associated with them, as do senses such as vision and smell. Nonetheless, they are among the most extensive and intricate sensory systems of the brain, and when they are damaged, the results are immediately apparent. With damage to the systems which report limb position, some people are unable to carry out the small experiment you just performed. In fact, such people are generally unable to execute any muscular action correctly without constantly watching what they are doing.
The sensory system which reports on the status of the limbs is called kinesthetic sense, or kinesthcsis, and it handles three sorts of information. These are joint angle, degree of load on a muscle, and degree of stretch or extension of the muscle. These three types of input information are used at various levels of the motor system to control sequencing and provide feedback information. This is another instance where place coding specifies the particular unit and type of quantity in question, and frequency coding carries the intensity information. The transducers which translate these quantities into neural impulse streams need not be discussed in detail since adequate mechanical counterparts are readily available.
The other sensory system which is strongly related to the brain's output control is the vestibular sensory system. This is the system responsible for the "sense of balance" among other things. Specifically, it provides continuous readout of the inclination of the head with respect to gravity, and the acceleration of the head in three perpendicular planes. This sensory system is located in a single set of transducers on either side of the head near the middle ear, rather than a multitude of transducers distributed through the body as is the case with the kinesthetic sense. Although the output therefore only refers to the head, the position of the head with regard to all other parts of the body can be computed from the information provided by kinesthetic inputs. Accordingly, the output of the vestibular transducers is made widely available throughout the system as input to most of the high and low level motor processors. In this case too, the existence of easily available transducers for such quantities makes it unnecessary to discuss them in detail. Any device capable of reading out inclinations and accelerations will do when designing our robots.
In most cases, muscles work in opposing pairs, one to open or extend a joint and one to flex or close it. This is necessitated by the fact that muscles can only exert force in one direction (contraction). Figure 1 demonstrates the arrangement for a typical joint. This diagram also shows some of the neural elements which control the contraction of these muscles. The principal neuron of this system, the one which provides input to most muscle fibers, is called a lower motor neuron, and is labeled L in figure 1. This type of neuron (and the other neurons associated with it) is located in the spinal cord, and is the final processing stage before output to the actuator. This little system is a good place to illustrate some of the principles of the brain's motor organization. We shall refer to the lower motor neuron and its associated elements as an "LMN system." Basically, LMN systems must accept commands from a multitude of other systems which desire access to the muscle in question, attend to them according to their priority, modify them according to inputs from kinesthetic and vestibular systems as well as status information from related LMN systems, provide an appropriate output to the muscle, and make their own status information available to other systems. There are a great many LMN systems in the spinal cord. Every muscle is composed of thousands to millions of fibers, and in the case of muscles used for precise operations, there may be an LMN system for each individual fiber. In other cases, a single LMN system may control many fibers of a muscle.
In a practical robotics application, I see no reason why a single servo actuator and "LMN" processor for each joint would not suffice. There are reasons why a single processor for many joints is less practical, but before addressing this issue, let us examine the LMN system to see what sorts of things it does.
In figure 1, for clarity, we show only a single LMN driving each muscle. The degree of contraction of the muscle is proportional to the output pulse frequency of the LMN; the higher the frequency, the stronger the contraction. The circuit shown on the right illustrates the simplest type of protective spinal reflex; a pain receptor in the skin (P) fires a neuron in the LMN system which fires the LMN driving the flexor muscle. This simple high priority operation quickly removes the limb from danger. Inhibitory cross connections of the LMNs driving the two muscles insure that they do not act antagonistically; one relaxes as the other contracts. This reciprocal circuitry is generally active in all LMN operations unless specifically overridden. Not shown are outputs which inform higher centers of this action to allow for the necessary corrective action of other muscles and limbs which must take up the redistribution of weight, counteract shifts in center of gravity, etc.
Inputs to the LMN system from higher centers may request a variety of actions, such as holding a particular position, moving to a specified position, moving with a particular velocity, etc. The LMN attached to the extensor muscle on the left in figure 1 is shown with some of the associated neurons which are involved in the process of carrying out these instructions while compensating for external loads. Note that there is a special muscle fiber (S) which receives its input from the small motor neuron (G) rather than from the LMN driving the other fibers in the surrounding extensor muscle. This special fiber is part of the transducer system for a kinesthetic monitor of muscle stretch. There is a sensory neuron (I) which has an input attached to the S fiber, and this neuron is fired when the S fiber is subjected to stretch, at a rate proportional to the degree of stretch. Since the S fiber is mechanically attached to the rest of the muscle, it is stretched or relaxed by inputs or forces which extend or contract the main muscle, as well as by its own private input signals from neuron G. The axon of the I neuron makes an excitatory synapse on the LMN, thus increasing its drive when the S fiber is stretched. Since increased output by the LMN tends to contract the main muscle and relieve the stress on fiber S, we have a negative feedback loop.
Suppose that the higher centers in the system wish the LMN system to maintain a particular angle on the joint. This is specified by a set of constant inputs from above (X) to the LMN, and to neuron G. Now suppose that a stress such as increased load in the hand is suddenly applied to the joint. This will tend to flex the joint further, causing the extensor muscle to be stretched beyond the specified degree of contraction. This in turn stretches the S fiber and increases the output of neuron I, and thereby, the output of the LMN. The resulting increase in contractile force of the muscle compensates for the increased load. This allows the system which requested the maintenance of joint angle to remain ignorant of loading conditions and fluctuations.
On the other hand, a new input to neuron G can cause the S fiber to contract independently of the drive to the main extensor muscle, thereby increasing the output of the I fiber for the same degree of extension of the main muscle. This defines a new "set point" for the system. (Hence the need for a separate joint angle kinesthetic system for output to higher systems which don't want to untangle the effects of inputs to G on outputs from I.)
From this point, it is clear that the normal considerations of control theory are applicable, and it does not matter whether the system is neural or electronic. For example, in this system the mechanical re- sponse time of the muscle and joint, which are in the feedback loop, may be slow compared to the response time of the neural elements. In this as in any other system, that means that instability and oscillation may result if the system gain does not roll off at higher frequencies. This roll off is accomplished by the small neuron R which produces a fast self- inhibitory action on the LMN with each LMN output pulse. At low input pulse rates from higher systems, the weightings of the synaptic contacts (as described in last month's article) is such that the pull down from firing threshold in the L cell produced by the R cell's input has substantially decayed away before the next positive input arrives, and thus has no effect on it. At higher input frequencies however, the positive input pulse will encounter increasingly greater antagonism from the recurrent negative input produced via R by the preceding output pulse, and will thus be less effective in bringing the axon hillock above threshold. This effectively reduces the gain of the system progressively as higher frequencies are approached.
Looking at the LMN system in the context of the whole hierarchical motor output system, it is apparent that the brain is using a "temporal byte" of frequency coded analog information to specify information about degree or quantity of action. In addition, the set of all of the input lines to the numerous LMN systems constitutes a "spatial byte," or place code, which is essentially digital in character, and in which the selected lines (bits) select the set of LMN systems which are addressed and thereby determine the nature of the movement to be performed, but not its speed, force, etc.
At first glance, it would seem reasonable to try to model the behavior of the LMN system with an analog device such as an op amp with a feedback loop. In practice, such an analog device might be quite tricky since the LMN system must integrate inputs from a wide variety of sources with different priorities. A real LMN has about 10,000 synaptic inputs. There is also the difficulty of encoding the analog information from other systems. Given that we will have many fewer LMN type units to worry about, it may be more practical to do both addressing and value transfer with digital techniques. This would suggest a digital processor of some simple type to replace the LMN unit rather than the op amp, and it may be that this would in the long run be the easiest way of dealing with the interactions of the various inputs to the system.
The next question that arises is, why not use one processor at high speed to run all the joints? There are several considerations. One that is immediately obvious is reliability. If one LMN system is lost, the others can take compensatory measures almost automatically. Second, since the output of each LMN system is a factor in the output of each of the others, and since the LMN system is a part of several otherwise distinct feedback loops, a single central processor system would have to be quite complex. Essentially it would face the solution of a number of simultaneous differential equations, or else have to deal with each component motion in sequence. This sort of sequential operation would produce a slow, jerky "movie robot," because each action would have to be completed to obtain the results as input data for computing the next action. A processor with sufficient speed, sophistication and core to handle the differential equations might well be more complex and costly than the multiple simple parallel processor approach. At the other extreme, which the brain has apparently found to be the best approach, programming would be a very simple test -operate- test -exit sequence, in which the actions of other units performing other actions simultaneously are entered as data each time around the loop. The moment we break out of this sequence to handle several "simultaneous" operations with a serial set of such sequences, things get more complex. However, at processor speeds it should certainly be possible to do some of this without doing much more than adding a little scratch pad memory to the simplest robot system's ROM. The best compromise for a robot remains to be demonstrated. Finally, a hierarchical system with interactive parallel units at the bottom frees the upper levels of the system to engage in coordinating the actions of the lower parts into complex actions of the entire organism or device. This function by itself may require substantial processing power and time without the added burden of those jobs which the brain delegates to the LMN systems and their immediate superiors.
This organization of LMN units and their "supervisors" forms a reflex machine capable of quite elaborate motion control and generation (although it does not initiate motion except in response to high level commands, or as a predetermined response to specified sensory inputs). It is essentially an automaton, but a very complex one. The organization of the hierarchy is quite conventional, and similar to a military command chain. The processing elements which have the responsibility for coordinating the movements of different limbs, for example, output control commands to the LMN units at the local level, rather than to the muscles directly, and leave the LMN units to handle the details. They in turn receive orders from, and report to, processing units that are concerned with coordination of whole body actions, the maintenance of posture and balance, and so on. Its major departure from a "command chain" model is the existence of elaborate lateral information transfer between processing elements at the same level in the hierarchy. The operational principles at each level are quite similar to those we have examined in detail in the LMN units which form the lowest rank in the system.
In the brain, this hierarchical system is capable of receiving and executing commands to perform such high level reflex actions as running, carrying, etc, without further attention. Beyond this point, we find several more specialized systems which may issue commands to this "motor automaton," or reach around it and access the LMN systems directly, or enter the automaton at any level. To understand the division of labor among these systems, we need to focus on the way in which the execution of the output is related to the data which directs it. There are basically two systems which can be used, and both have been used in robot systems. The first is the "dead reckoning" approach, in which the details of the required action are computed in advance, and then executed without regard to their results. (An interesting example of this in a robot system is described in Ralph Hollis' article on NEWT in the June 1977 BYTE, page 30). The other approach of course is to continually monitor the results of the movement and apply corrections as required. Both of these systems have their uses, advantages, and weaknesses, and the brain employs both systems, usually cooperatively in the same actions, although "pure" examples of each can be found.
One of these systems is associated with the part of the brain called the cerebellum. The cerebellum is not an instigator of action, nor is any conscious experience associated with its activities. It plays an important role however in the expression of actions, of both reflexive and voluntary types, which are generated elsewhere. Among the functions which the cerebellum performs are the translation of parallel to serial output, and the control of feedforward correction in open loop control circuits.figure 2_3
Before describing these functions further, it will help to examine the circuitry of the cerebellum. This structure, which lies above the pons, consists of two parts, an overlying cortex and a set of nuclei. The neurons of the cerebellar cortex are arranged in a distinctive pattern which is endlessly repeated over the surface of the structure. A few elements from this pattern are shown in figure 2. Simplified to the bare essentials, this consists of an input element (G) which has an axon that runs for some distance, spatially parallel to the axons of all of the other input elements, and which in the course of its passage activates a row of output elements (P). Firing an input element thus selects a particular set of outputs. Since pulses may travel rather slowly in small diameter axons such as those of the input elements, the time of arrival of the select pulse at successive output elements may be long compared to the duration (or transmission time) of their outputs. Thus the cerebellar cortex may act as a tapped delay line, as well as a decoder. If the final output elements are switched to other input elements, elemental sequences may be serially cascaded to form larger patterns. There are a number of auxiliary elements associated with the G and P types, and these are lumped as O elements in our diagram. They are capable of performing such functions as selectively inhibiting individual output elements, and controlling interactions between adjacent parallel row systems. Thus, these elements may impose modifications on output sequences, or call on adjacent systems (which control similar muscle functions) for assistance. Some of these functions have actually been simulated on large digital machines in experimental motion control systems. A schematic of a circuit modeling the essential features is shown in figure 3.
The outputs of the cerebellar cortex fall on the neurons of the cerebellar nuclei, which relay them widely throughout the brain. Inputs to the cerebellum likewise originate in many portions of the system. There is evidence in fact that different motor system functions may time -share the device! A major function of the cerebellum however is to allow for interaction between different command systems.
To illustrate this point, let us see how it is applied to feedforward modification of output. In any system which is not amenable to feedback control, such as one involving actions that are more rapid than the loop time that would be required to control them, or ones that would require very extensive processing of feedback input, it is nonetheless possible to achieve considerable correction for moment to moment conditions by passing the basic output command to both the next level of the output system and to a controller which computes the necessary deviations from the basic command and forwards these to the lower echelons of the output system. The concept is diagrammed in figure 4. Thus, a reflex motor loop which performs some function such as walking sequences may need to be modified from its basic pattern by information about head tilt from the vestibular system, while at the same time the reflex vestibular motor systems which keep the head level may require information about what the stepping generator is about to do, in order to allow for impending body tilt. The whole sequence needs to take place before any muscle action occurs which could generate feedback information if we wish to move swiftly and still avoid a fall.
The process is popularly called "coordination," and the quality of yours is dependent on the excellence of your cerebellum. What happens in this process is as follows: sequences of motor actions generated at any level of the hierarchical reflex "automaton" system, or at any high level system which inputs to it, are also sent to the cerebellum, either as inputs to the parallel fiber decoding systems or as inputs to the "other" elements which control interactions across parallel systems and gate individual output elements. Thus, the waves of parallel fiber activity generated by different command systems can interact in the cerebellum and modify one another in predetermined fashions. The resulting modified command is sent forward as a set of corrections to the basic command, and the two interact at lower echelons to produce a corrected action. (Yes, they can get there at the same time. We've got control of transmission speed, remember.) One clear advantage is the provision of a common site of interaction for systems which are functionally related, but do not possess physical elements in common.
Now that we've got it taking care of interactions and corrections, how do we get "dead reckoning" of movement parameters? This process relies on a parallel to serial conversion which uses time as an analog of position. A basic function of the cerebellar nuclei is holding or maintaining positions by appropriate outputs to the biasing elements in systems such as the LMN system. The output elements of the cerebellar cortex however act to inhibit the cerebellar nuclei. Thus, damage to the cerebellar nuclei results in tremor, oscillation, and similar signs of excess activity. Damage to the cerebellar cortex on the other hand results in deficits related to underactivity, motions that fall short of the target or fail to initiate. In the case of a pure example of the "dead reckoning" type of motion (frequently referred to as saccadic motion), such as the motion of the eyes in fixing on a new point of focus, the motion itself is of constant velocity. (More accurately, it is driven by a constant input, it clearly can't accelerate and decelerate instantaneously.) Given this, it follows that the extent of the motion is determined solely by the duration of the driving signal. If the motion generating "automaton" circuits are held in check by the cerebellar nuclei, then action of the cerebellar cortex which inhibits the cerebellar nuclei disinhibits the motion generators and the move- ment begins. If the outputs of a group of output elements from the cerebellar cortex which are fired in sequence by the same parallel input fiber fall on the same group of neurons in a cerebellar nucleus, they will keep that group inhibited, and the associated motion in progress, for as long as the sequential firing of output elements is maintained in the cerebellar cortex. This will then determine the extent of the motion. The cerebellar nuclear cells are OR gating the output sequence of the cerebellar cortex.
It follows then that if some high level command system computes the type, direction and extent of a required motion, it can pass this information to the cerebellum in a parallel form as a select request for a particular set of input elements, and perhaps a set of gating and switching elements as well. This request will set in motion a time sequence of activity in the cerebellum, which will be appropriately modified by interaction with other current activity in the cerebellum, and output as a motion in space with a particular duration of action and spatial extent. Meanwhile, the requesting device is free to go about its business.
There are many kinds of activity which rely heavily on this type of control, and many of them are learned activities. A good example is playing the piano. This is clearly a learned sequence of movements, but once learned, the action is too rapid for guidance by feedback from ear or eye. It has been suggested that the learning of such motor sequences may proceed through the formation of new functional connections in the cerebellum, so that the end elements of one sequence become select inputs for the next sequence. In any event, we certainly could do it this way.
The action of the cerebellum involves a large analog component, and although this could be, indeed has been, modeled with a fast processor and an array of digital words to represent the states of output elements, this may not be the best approach. A device which offers great promise for a very close analog to cerebellar operation is the surface acoustic wave (SAW) device which transforms electrical signals into surface waves on a piezoelectric medium, manipulates them in unique ways related to their travel time, and regenerates electrical signals at the outputs. A similar result can be achieved with charge transfer devices. Tapped delay lines are easily made, and many such in parallel on a chip have been used for such tasks as electronic focusing of imaging systems. This technology would seem to offer a splendid opportunity for developing a "cerebellar chip." An excellent review of these devices can be found in Brodersen and White's article in the March 18 1977 issue of Science.
Turning now to the motor structures of the higher brain regions, we find two which stand out as particularly important, the basal ganglia and the motor cortex. These structures operate in an interactive fashion in a supersystem which also involves parts of the thalamus, and which has important inputs from systems whose principle functions are best regarded as cognitive and emotional rather than motor. At this level of motor organization, the distinction between concept, desire, and action begins to blur, and these "motor" systems may also be involved in at least certain motor oriented aspects of other functions. It is somewhat misleading, but probably necessary, to discuss separate functions for the higher motor systems. The fact that they are parts of a functional supersystem should be borne in mind.
The motor cortex, more accurately called the somato -motor portion of the cortex, was once thought to be the highest level of motor integration in the brain because of the late evolutionary development of the cortex. It now appears however that it is more properly viewed as a specialized parallel processor system which has been developed to refine and increase the resolution and processing speed of functions which are directed from older structures. A notable feature of the somato -motor cortex is a massive projection of large fast axons which run all the way down the spinal cord and end directly on the LMNs. Along the way, these axons give off many branches to higher level motor centers of the medulla, pons, cerebellum, etc. It appears that this direct communication from highest to lowest levels of the system allows high level command systems to reach around the motor automaton hierarchy for direct intervention. It is obvious that this type of control must be available to a system which is to have a behavioral repertoire that is not built solely of stereotyped action patterns. This is particularly true if the system is to have the capability of constructing novel behavior patterns, either to meet a particular problem, or to serve as a basis for learning new behavioral repertoire items.
Although systems such as the cerebellum and the basal ganglia have direct communication with the hierarchical motor system to control the many motor stereotypies which it automatically generates and regulates, they also both access the somato -motor cortex, and apparently provide most of its direction and control. The somato -motor cortex then may be viewed largely as an extensive decoder for cerebellar and basal gang- lia initiated actions. There is one situation, however, in which the somato -motor cortex is itself the originating device for motor function. That situation is the control of action based on feedback information from the sense of touch. The reason we refer to it as the "somato- motor" cortex is that this region not only contains the neurons which give rise to the axons controlling the lower motor systems, but also the neurons which receive the input from the touch receptors in the skin. The sense of touch is technically referred to as "somesthesis." The special relation of the sense of touch to this system is explained by the fact that a great deal of fine motor control is under feedback control derived from the various transducers for pressure and other sensations which comprise the sense of touch. This is especially true of organisms such as human beings which place so much behavioral emphasis on the control of precise manipulative movements. While a great many movements which are under feedback control may initially be under visual guidance in reaching the general area, the fine control of the later stages is generally under the control of feedback from touch receptors. When you pick up an object with your hand, it is not your eyes which tell you how hard to squeeze, or just how to grasp. (Have you ever used a keyboard that didn't provide tactile feedback ?) The somato- sensory function of the somato -motor cortex involves elaborate encoding schemes which are similar to those which we will consider later with the other cortical sensory systems. For now, suffice it to say that this information may act directly on the motor output aspects of this region to initiate motor activity in those cases where touch information is the appropriate controlling input. In other cases, this information may be used to provide correction to outputs of the somato -motor region which are being initiated and controlled from other structures.
The somato -motor cortex receives its principal control inputs from a group of nuclei in the thalamus, which in turn receive the major share of their input from the cerebellum and the basal ganglia. These thalamic nuclei thus serve as preprocessors which synthesize directives for the sensorimotor cortex out of requests from several systems.
The final portion of the higher motor system which we shall consider in detail is the collection of nuclei known as the basal ganglia. I shall use this term to include some nuclei of the mesencephalon and diencephalon as well which function largely in conjunction with the basal ganglia.
Just as the cerebellum is heavily involved in the operation of feedforward and dead reckoning kinds of control, the basal ganglia are primarily involved in graded, feedback controlled movements, particularly those of a learned nature, or those under direct conscious control. It can probably be regarded as the highest level in the command system which has a primarily motor oriented function.
The structure of the basal ganglia at the
neuronal level is entirely different from that
of the cerebellum. There is no obvious
pattern of spatial arrangement to its
neurons, although both local and output
elements can be identified. The local elements
are much more numerous than the
output elements, and form an extensively
branched system within the basal ganglia. It
appears that most of these have an inhibitory
action, so that neighboring elements are
quickly turned off by any activity. Some of
these connections are recurrent, so that
input driven elements, too, tend not to
remain active beyond an initial response to
input. This is in sharp contrast to the
situation in the cerebellum where the entire
principle of operation is based on a propagated
response in a neuronal network,
initiated by a single input. The action of the
basal ganglia is of a sort called "self- quenching."
That is, an input will initiate a burst of
activity, but unless the input is maintained,
or augmented by another input, it will
rapidly inhibit itself. This is true not only
because of the local recurrent inhibitory
neurons of the basal ganglia, but also
because of negative feedback loops from the
basal ganglia to its inputs which tend to
damp their initial activity. Notice the similarity
of basal ganglia action to that of a
differentiator. If one could consider the
space coded byte of the active input elements
to the basal ganglia as encoding some
static scheme of output for motor behavior,
the temporal output byte of the basal
ganglia might be thought of as having
properties similar to the first time derivative
of the behavior specified. This output would
then be decoded into commands to the
motor cortex, cerebellum, and reflex motor
system. By outputting this time decaying
command, it is ensured that the behavior
will not continue unless
(1) the command is sustained by some other means, or
(2) a new command set is tried, producing a new set of self-quenching output pulses.
This feature is essential if the continuation of a behavior is to be made contingent on its consequences. The basal ganglia in fact have sets of inputs which are precisely configured to achieve this contingency.
The outputs of the basal ganglia run principally to: the thalamus and thence to the motor cortex; to the motor nuclei of the mesencephalon and thence to the subsystems of the reflex motor apparatus; and to the motor nuclei of the pons and thence to the cerebellum. The basal ganglia are thus in a position to transmit information and commands to all aspects of the motor system. There is little here that is not understandable in terms of principles we have already dealt with, and it requires no elaboration.
The inputs to the basal ganglia, on the other hand, are the key to understanding its function. There are three major components of the input. First, the entire cortex projects fibers into these nuclei. These fibers, and those of the second component which arises from the thalamus (a structure which organizes the activity of the cortex, and processes 10 for it), tend to make contact with a few specific neural elements in the basal ganglia. These two input groups may be thought of as specifying discrete patterns of activation which are encoded by an action like a series of cascaded AND gates into a pattern of activity, or potential activity, on the output lines of the basal ganglia. If output continuously, these outputs could be decoded by lower motor structures into specific movements. It appears however that these inputs alone are insufficient to sustain much activity in the face of the strong local inhibition generated by their own action.
The third input component to the basal ganglia arises from a group of nuclei which are related to other brain systems that detect the rewarding or punishing quality of the stimulus pattern being decoded by the sensory systems. This input component has a very different distribution; it branches widely within the basal ganglia, each axon making synapses with tens of thousands of neurons. As a result, it cannot specify any very specific pattern of activation in the basal ganglia. Its action is diffuse, and principally temporarily coded. On the other hand, it can exert a widespread gating action on all ongoing basal ganglia activity. Thus, an input containing information about the intensity of the organism's emotional response to the results of ongoing behavior is capable of sustaining or inhibiting the next phase of the behavior. Given the self- quenching nature of activity in the basal ganglia, it is easy to envision a process by which a behavior "suggested" by the cortical and thalamic inputs is only sustained if the initial input results produce a sustaining input which strengthens the initial activation pattern, perhaps by summing with it to overcome the self- inhibition. The third input component is of course ideally situated for such a function.
In its most primitive form, this scheme results in a sort of "homing device" which will cause an organism to follow an increasingly intense stimulus, such as odor, to its source, such as food. That is, as the searching and locomotor patterns generated by the animal result in increases or decreases in the intensity of the pleasurable stimulus, they are appropriately facilitated or eliminated. Out of this simple feedback guidance mechanism, a host of more elaborate behaviors are developed, by evolution and learning, with the aid of the immense processing power of the cortex to provide detailed analysis of the environment and to generate more complex patterns of behavior for trial.
At the present time, we cannot precisely specify the pattern of detailed connections in the basal ganglia which results in these actions. The nature of its operation is inferred indirectly from evidence derived by stimulating its inputs or disabling its outputs. This evidence seems to establish that normal operation of the basal ganglia is essential to orientation and approach to stimuli, and initiation of voluntary behavior and complex learned behavior, particularly that involving anticipatory actions. The convergence in the basal ganglia of processed sensory information from many areas of the cortex provides a source of feedback information which can interact with and modify basic action plans generated by other cortical areas. Damage to the basal ganglia causes a loss of the ability to modify complex actions and judgements on the basis of sensory feedback. (This sort of feedback modification is distinct from the nonspecific sustaining action of feedback from the reward detector circuitry.) Finally, as predicted by the model outlined above, damage to the diffusely connected third input component results in failure to initiate behavior or orient to and approach stimuli, while stimulation of this component results in continuation of the immediately preceding behavior.
There is also a growing body of evidence to indicate that the type of learning called "operant conditioning" (see the preceding article in this series) may depend on, or even occur in the basal ganglia. This type of learning essentially involves an increase in the future ability of a behavior pattern to compete with other potential behaviors if it is followed by activation of the reward system. To achieve this, all that would be required in addition to the basal ganglia model we have described here would be provision for activity in the diffuse input from the reward system to lower the firing threshold of neurons which were active at the time of this input. No such mechanism is presently known, although it is suspected, but in our robots it would be easily contrived.
An electronic analog of the model of basal ganglia action described here is shown in figure 5. (This model does not include the learning function just described.) The essential features are: the provision of a set of gates to encode the simultaneous inputs from the many cortical regions which contribute to the design of the behavior; a circuit which shuts off the encoded output after a brief delay; and an enabling bus representing the input from the reward system which inhibits the shut off circuit on active gates. This model is only illustrative, and better ones could be designed to mimic basal ganglia function. For example, the intensity of activity in the enabling bus should be employed to modulate the intensity of the output.
In practice, considering the very large number of gates required, and the fact that operation of the system is slow since it requires direction from physical results of actions, it will probably be best to simulate much of the gating and modulation in software on a fast processor. A few relevant principles are worth noting here. The ratio of input to output lines in the basal ganglia is very high. It receives input fibers from the entire cerebral cortex, which is by far the largest structure in the human brain. Output neurons on the other hand comprise less than five percent of the neural complement of the basal ganglia. Clearly a great deal of encoding takes place here; output line permutations are selected by gating an enormous number of inputs. Consistent with this, the outputs undergo an equally enormous decoding and fan out into the entire downstream motor system, ultimately specifying the actions of billions of LMN units. The basal ganglia outputs thus represent a "narrow spot" in the system, through which most of the organism's complex goal directed behavior passes. Similarly, the reward system which provides the gating or modulating input to this information flow represents the ultimate distillation of analysis of the entire sensory world of the organism as it pertains to reward. The amount of processing going on at higher levels to generate behavior patterns, and the amount required to evaluate their effectiveness is awe inspiring. Yet, the closing of this most complex feedback loop of all time is carried out relatively easily thanks to interaction at the "narrow points" of the two systems in a simple decision to keep going or quit doing what you're doing. The need for specific feedback to the behavior generating elements is thus eliminated. They simply try something else which they derive from established hierarchies or generate from similarities with past situations.
If we are to provide the capacity for robot behavioral systems to modify large scale behavioral strategies on the basis of evaluation of their effects, or if we wish to provide an operant conditioning capability, it will be necessary to gate or modify massive amounts of information. The most hardware conservative approach may well be to emulate the basal ganglia system by allowing a simple statement of the evaluative system's reaction to perform a "more or less" modulation of the output of the behavior generators at a highly encoded "narrow spot," and leave the behavior generators to try again according to trial and error algorithms, rather than trying to correct them directly. Specific feedback information of a nonevaluative sort, such as corrections to intended position from visual observation of the limb, become part of the command pattern prior to modulation by the evaluative system, simply by being part of the input pattern to be processed in generating the next attempted output patterns. These inputs could be handled by a software gating system, given processor speed, and the intensity of the evaluative function could be digitally coded and applied by software arithmetic rather than by mimicking the brain's analog system.
Having looked at the detailed operation of some of the important components of the brain's motor output system, let us finish with it by looking at a schematic summary which emphasizes the interactions of the different parts of the system. Figure 6 shows the main routes of information flow in the system, together with the major controlling inputs. Some of the "black boxes" such as "reward system" will be covered in future articles.
One of the outstanding features of the system taken as a whole is that it does function in an organized and integrated way, despite the fact that its parts are in many ways autonomous, and certainly not synchronized in their operation. A key to this capability is the provision of status information to each unit of the system by each of its neighors, and the ability of each to employ this information in an intelligent way in formulating its own output. A further refinement is the provision of a structure such as the cerebellum where status information from diverse systems can interact to generate correction information which returns into the main line of the relevant systems. Wide scale availability of information from special movement relevant sensory input systems is another unifying feature.
If we leave out the "behavior generating system," which is properly a decision making system to be considered later, not a system for execution, we can discern four major portions of the motor system (although some structures service more than one portion). The first is a system which handles most of the routine traffic according to established rules, and provides automatic elaboration according to established rules when given high level commands. The second is a system which converts parallel statements of action patterns into serially executed instructions to the first system. The third system provides a highly intelligent output terminal which can access the final output elements directly in the service of any of the higher systems on request. The essential feature here is that it is a parallel control for refined special purpose control, and is not necessary for most routines. Finally, a fourth system provides for interaction of the high level decision making systems with elaborately processed feedback information to generate complex instructions to the other systems, after screening them for effectiveness.
In this constellation of functions, we find the capability to deal with rapid emergency movements, automatic compensation for externally imposed deviations, fine graded control under the direction of any sensory input, and the execution of arbitrary novel patterns. The organizing principle which seems to best define the system is its emphasis on successively more abstract command functions at higher levels in the system, and a corresponding increase in "situation free" statements. That is, a high level element can issue a "walk" command without being concerned about the nature of the terrain. It has distinct analogies to high level programming languages. We shall see a similar organization in reverse in the sensory systems, where detailed information at the receptor level is gradually reduced to powerful statements of object recognition, independent of details of the sensation as the information ascends in the system.
Even with all of this elaborate apparatus to direct and coordinate body motion, the problem of movement in the generalized environment remains a challenging one. Despite the massive investment in processing power that the brain has devoted to the problem, we still fall down sometimes. Producing a robot system that even approaches the brain's abilities will be a great challenge.
Llinas, R, "The Cortex of the Cerebellum," Scientific American, January 1975.
Kornhuber, H, "Cerebral Cortex, Cerebellum, and Basal Ganglia: an Introduction to Their Motor Functions," Neurosciences, Third Study Program, Schmitt, F, and Worden, F, eds, MIT Press, Cambridge, 1974.
Ernest W Kent, Associate Professor
Dept of Psychology
The University of Illinois at Chicago Circle
Chicago IL 60680
Perhaps the most remarkable feat performed by organic brains is the resolution of the flood of data flowing from the sensory receptors into conceptually meaningful elements. It is also one of the most difficult tasks faced by the designer of robot systems. Consider the nature of the information which the brain receives about the visual world. Patterns of light, of varying wavelength and intensity, are imaged on the retina of the eye by the lens. This illumination results in a barrage of neural impulses flowing through millions of fibers in the optic nerve and activating neurons in a portion of the cerebral cortex called the primary visual cortex. Obviously, our visual experience is nothing like this barrage of impulses in axons. We "see" objects, colors, groupings of objects, all interpreted in meaningful terms. Our experience of the visual world is thus a far different matter from the visual stimulus which initiates the experience. It is not necessary that a robot have "experience" as we do, but it is necessary for one to resolve the sensory information into behaviorally relevant elements as we do.
To understand the nature of the operations that are being performed, we must carefully discriminate between the terms "stimulus" and "sensation." "Stimulus" refers to the actual physical event that activates a receptor. In the case of vision, this would be a ray of light falling on the retina. Since the intensity and wavelength of this light are determined by properties of the physical object which reflects it, and since there is a fixed relation between the two, we often refer to the reflecting object as the stimulus, with the understanding that its action as a stimulus depends on the properties of the light it reflects. "Sensation" on the other hand refers to a property of our mental experience which results from certain kinds of activation of our receptors. There is a close relation between the sensations we have and the stimuli which produce them. Our senses would be useless if it were not so. However, this close relation often leads us to confuse the two, and this is a great error, because they belong to entirely different worlds. A stimulus is a physical object; a sensation is a mental event.
To clarify the distinction, consider the sensation of the color red. We all know what we mean when we say an object "is red," or that we "see a red object." A moment's reflection though will demonstrate that, strictly speaking, there can be no such thing as a red object. The object can only possess or not possess the properties necessary to reflect light of a particular wavelength. If an object reflected light of a wavelength which gave rise to the sensation "red," and we were to somehow change the wavelength between the object and the eye, the object would appear to be of some other color. Can we then say that "redness" is a property of the light? No, because the only relevant physical property of the light is its wavelength, and wavelength is not a color. Color is a property of your sensation.
Sounds have wavelengths too, and there the sensation is interpreted as pitch. Wavelength is only a piece of information which the brain can interpret as it will. In your computer, you could make an analogy with ASCII code. We use particular bit patterns to represent letters and numbers, but the same bit patterns could just as easily represent something else. There is nothing that inherently requires the binary pattern 01000001 to be interpreted as A. Similarly, "redness" as an interpretation of a particular wavelength is simply a convention that the brain uses.
The situation becomes clearer if one goes into the brain a little further. Sensation is not a result of activating the retina or the optic nerve. If the optic nerve is cut, light falling on the retina produces no sensation, even though the retinal neurons and their axons in the optic nerve are activated. At the same time, however, artificially activating the visual cortex to which the nerve used to project will result in visual sensations. It follows that sensation is, or is dependent upon, the firing of neurons in the visual cortex. Yet, after striking the retina, light never reaches the visual cortex. All that does reach it is a pattern of neural activity in the axons of the retinal cells. Information about the wavelength of the light striking the retina is carried in the optic nerve by place code. That is, the wavelength information is carried to the visual cortex in terms of which lines are active. There is certainly nothing that seems intuitively "red" about which of a set of axons are carrying impulses. Yet, the sensation of "redness" clearly occurs at or beyond the cortex, after that encoding process. If we could somehow change which set of axons were active between the retina and the cortex, the sensation produced by the light would change.
Now if we accept the notion that sensations are mental events that are produced, or at least determined, by the brain decoding stimulus produced activations of receptor lines according to specified conventions, it becomes clear that the process of sensation is basically one of information processing. The nature of the conscious "experience" of the processed data is a topic we shall take up later. For now, our objective will be to examine the kinds of transformations the brain imposes on its input data and to ask why this particular transformation and not some other is useful to the organism in dealing with the environment. The utility of such a pursuit lies in the fact that the most likely system for detailed examination of distant objects in an artificial robotic system will use an image forming system acting on a grid of sensitive transducers. The problems of information processing in such a system will be exactly those that the brain has solved.
There are about a million light receptive elements in the retina, and the brain produces a complete analysis of their patterns of illumination about ten times per second. If this were done in a straightforward manner, say by examining all the possible permutations of a million bits of information and decoding it against a table of known codes, a tenth of a second's worth of vision would be too big a job even for the brain to handle in a reasonable time. In fact, it goes to some extremes to cut corners in this process, and some of its tricks are of quite general utility. The first step in the process is to make a number of decisions about what not to look at.
If an area of uniform illumination is bounded by an area of some other degree of illumination, the information from the center of the area is superfluous. That is, if one had a system that could detect only boundaries between different illumination levels, the center of a uniform bounded area would not produce a signal. Yet, information about its illumination could be accurately reconstructed by simply extrapolating the illumination level on the inside edge of the boundary clear across to the next boundary. If the level at the inside edge of the boundary did not hold clear across the area, that would mean that there had to be a change, and hence a boundary, somewhere in the middle, and that boundary would be detected. Any change in illumination constitutes a boundary between a lighter and a darker region. Thus, if only boundaries can be detected, extrapolation of levels on either edge of a boundary to the next boundary reconstructs the whole field of illumination.
The reduction in the number of points to be considered which is achieved by considering only boundaries is quite large. Think of a square patch of retina 100 receptor cells on an edge, illuminated at level A on the right half, and level B on the left. If we had to examine every element's illumination to arrive at a picture of this pattern, we would have to examine 10,000 elements. If we now examine only the ones near the boundary between area A and area B, and extrapolate the rest, we have to examine only about 100 elements. In general, the savings go up approximately as the square of the boundary of a uniform area.
We will return in a moment to the matter of how the brain locates spatial boundaries, but first mention should be made of the next shortcut, because their underlying mechanisms are related. Basically, this second trick is to look only at things that change. Aside from the fact that changing patterns of illumination usually imply moving objects, and that these are usually important items in the sensory world, special attention to change also has advantages in terms of processing time. The situation is really very similar to the preceding one, except that here we must think of change as representing a temporal boundary between illumination levels. If we only attend to an element when its illumination changes, and if we always know when it does, we can safely ignore it in the meantime. This is because the illumination during the intervening period of no change must be at whatever level the preceding change brought it to. Thus, it is only necessary to extrapolate the value immediately following a change until the next change is detected.
The eye is sensitive to two dimensions of light, intensity and wavelength, which we perceive as brightness and color. We have discussed the two boundary situations, spatial and temporal, only in terms of brightness so far, but the same arguments apply to boundaries of color. Two areas of equal brightness but different color also must be discriminated. The same mechanisms actually are applicable to both, since the brain handles color by providing some receptor elements with differential sensitivity to different wavelengths. For these elements, a change in wavelength effectively is a change in illumination. It will either be from a wavelength to which the element is sensitive to one to which it is not, or vice versa. The brain handles the color information simply by recognizing the output of these elements as encoding the wavelength information, and interprets it as color. The color boundary problem therefore reduces to the intensity boundary problem.
Now let us examine the mechanisms of boundary detection. The temporal boundaries, that is, changes in illumination with time, are responded to selectively by a process that is similar in its results to AC coupling the receptor elements. In fact, AC coupling of analog to digital converters with an appropriate time constant would be a good way to model the process in a robot. In the brain, it is simply a property of the receptor neurons themselves, and the details need not concern us. The interesting thing is that the brain uses this same AC coupled characteristic of the neural elements to detect both spatial and temporal boundaries. A selective sensitivity to change, or temporal boundaries, is inherent in the AC coupling, but a sensitivity to spatial boundaries requires some additional mechanism.
We are all aware of course that the eye moves. We observe it all the time when our gaze turns from one point of fixation to another, or when it follows a moving target. In addition to these motions, however, there is another that is not detectable by ordinary means. Even when the eye seems to be at rest, even when you are holding your gaze as intently as possible on a fixed point, there is still a very fine motion with a frequency of about 10 Hz. The amplitude of this motion, which is rather erratic in its direction, is just sufficient to move the retinal image back and forth over the receptors by a distance equal to a few times the average separation between the sensitive elements. Those elements that are near a boundary are thus swept back and forth continuously from the lighter to darker sides of the boundary at about 10 Hz. This produces in them a changing signal of the sort to which the AC coupled property of the system can respond. At the same time, their neighbors further from the boundary in either the lighter or darker regions to either side are not moved into a region of different illumination level. Hence, they "see" an unchanging input, to which they are insensitive. The receptor elements of the eye itself therefore act as intelligent terminals which transmit only information about boundaries and changes to the higher levels, with an enormous savings in amount of input requiring attention from more sophisticated analyzers.
Now I hear you say, "Yes, but I can see the insides of uniform areas." True, but remember I said your sensations were an arbitrary decoding of the stimulus information, and that the information from areas distant from boundaries was redundant and could be reconstructed by inference or extrapolation. The experience of "seeing" the inside of the area is simply the experience of receiving the appropriate code from the right set of boundary activated elements. In the first place, it is relatively easy to demonstrate that you cannot see anything if there is no change. By virtue of some clever optics it is possible to stabilize an image on the retina so that it does not move with respect to the receptor elements, despite the fine motions of the eye. When this is done, the image seems to disappear about a tenth of a second after it is presented: poof! It is of course still really there on the retina, but your AC coupled system can't respond. Now, consider a green disk with a smaller red disk in the center. It is possible to stabilize just one portion of this image in the same fashion that we stabilized the whole image a moment ago. If we choose to stabilize just the boundary between the green outer ring and the red inner disk, it should not be possible for the brain to detect that boundary. If this is done, not only do you not see the boundary, you also don't see the inner red disk. What do you see? You see an unbroken green disk all the way across. In other words, if no boundary is detected in the middle, the brain not only doesn't see the red disk, it extrapolates the green all the way across from one outer boundary to the other. Think about it the next time you rely on the evidence of your eyes: such evidence must be interpreted with knowledge of the system's characteristics.
The AC coupling is not perfect; there is a "DC leak" around it, but the "changing signal only" property of the neurons is e nhanced at each step in the transmission process, until the cells of the visual cortex are found to have almost no response at all to unchanging uniform illumination of the retina. This means that the sensory experience of the interiors of uniform regions is simply what is coded for at the cortex by the byte of information on the boundary conditions. It is not a result of direct translation of retinal illumination conditions on a point for point basis into activation of some set of "experience neurons." It is important to grasp this idea, because it points up the fundamental similarity between the natural brain and the artificial computer. There is no "inner eye" looking out through neural windows. If the encoding process ultimately produces a single neural line that is activated by, say, the sight of a face, then that line being active is sufficient for the processing of response to the face, in man or robot; and at least in us, it is also sufficient for our correlated mental experience.
That this is the nature of the encoding process for our experiences becomes even clearer at higher levels of the encoding process than those involved in the green and red disk experiment. At some level of the process, referred to as "feature extraction," we arrive at a byte of active lines which encodes for some complex pattern. Take for example the repeated patterns of a wallpaper covered wall. It seems that even at this level, the brain continues its policy of dropping redundant information and carrying forward only information on boundaries. If we look at such a wall, we of course see a continuous pattern repeated all the way to pattern boundaries such as the ceiling floor, edges of intervening furniture, etc. Now suppose we present this same scene to a person with damage to certain high levels of the visual system, having no vision in a particular small region of the visual field. If his injury is at the right level in the feature extracting process, he will report seeing the unbroken wallpaper pattern just as we do, including the region within which he is "blind." It can be demonstrated, however, that his experience of the pattern in the blind region is due to the fact that both he and we are extrapolating the detected pattern across the intervening space between pattern boundaries. His deficit becomes apparent when we create a boundary in the pattern within his blind region. For example, if we inverted a small patch of the pattern, it would constitute a boundary in the pattern, and we would not extrapolate across it. If it occurred in his blind region, however, he would not react to the pattern boundary and would receive the same encoded byte of visual information as before, and claim that he saw an un- broken wallpaper pattern. In an important sense he is blind, yet he has visual experience. You do the same thing. There is a blind spot in the visual field where the optic nerve leaves the retina. You can make small objects disappear by centering them there, but since you can't see boundaries there either, your brain normally extrapolates across it.
Now we might ask, if the brain is reducing complex features of the visual stimulus to a simple code of one or a few lines, does that mean there are things we might not have feature extractors for, and if so, would we be unable to see them? That is probably exactly the case. Experiments suggest that the visual world of simple creatures like frogs is quite impoverished. They have some kinds of elementary feature extractors, and some complex ones for stimuli (eg: bugs) which are important to their behavior, but nowhere near the complex set of feature extractors that a mammal has. In theory it would be possible to have a unique line or coded set of lines activated by every possible combination of activities on the retina, but this would be beyond even the capacity of the nervous system to generate processing elements. Instead, certain decisions are made as to what things are important to see, and decoding for these is provided. This does not imply that you would not see anything when looking at a novel stimulus for which you have no appropriate high level extractors. At the first level, simple features such as edges, arcs, lines and spots are extracted. More complex features are extracted from combinations of these. You might be aware only of the activity of the low level extractors for lines, edges, etc, and fail to recognize it as an object, or you might fail to discriminate it from objects which were not identical, but differed in ways which did not correspond to features you could extract.
As an example, it is possible to fool high level extractors by giving them marginal data. Look at figure 1. About 95 percent of people seeing this picture for the first time are only able to activate low level extractors for patches of bounded light and dark. It is in fact a photograph of the head and upper forequarters of a black and white cow (facing left) against some trees and a fence. Once you know what to look for, you can nudge the "cow extractors" and get an entirely different experience. Indeed, once you've seen it, it's difficult to not see it. (Don't panic if you can't; about 5 percent of people never see it.)
Actually, there is probably no "cow extractor" per se, but rather some assemblage of feature extractors which together constitute a code for "cow." Let's look however at some of the properties which such high level extractors should have. The most important one is that they should be free of constraints on position, orientation, context, etc. That is, if we had to have a separate extractor for every position the stimulus might assume in the visual field, we would need so many elements that the advantages of the feature extractor approach would be lost. Next, they should be capable of implementation by learning, so that the available processing elements can be best used to fit the organism's normal visual environment. Third, they should not be limited to spatial forms, but should include detectors for properties such as motion, distance, and other aspects of our visual experience. These are difficult problems, and we have no good notion of the real number or nature of the highest order extractors in the human visual system. We can examine some of their properties by fatiguing the extractors through prolonged exposure to different types of stimuli and looking at the effects on our visual abilities. In animals, we can follow the process by recording activites of neurons in the visual system during presentation of stimuli to the eye.
From these latter experiments, we have a fairly clear notion of the operation of the lower order extractors, and the process seems easily extensible to higher order features. To serve as a general example of the algorithm, I will describe in detail the process by which a feature extractor is formed which can detect a line segment only if it is at a particular angle of inclination to the visual field, but which is location independent. That is, it does not matter where the line is located in the visual field, only that it be a line and that it possess a certain angle of inclination. This sort of unit appears to be one of the typical low level feature extractors of mammalian visual systems.
The basic gating action used is very similar to an AND gate. As we have mentioned, this is one possible mode of action of neurons, which can be implemented by having a number of inputs required to achieve firing threshold. In this case, however, we have an AND gate with a safety factor. By this I mean that firing level is achieved if some percentage of the relevant inputs are active: 100 percent is not required. Think of it as an "ALMOST gate." (The brain, unlike our conventional computers, is continually dealing in "best guesses" rather than precise solutions, by just this technique. This is why we make mistakes, but it also provides for inductive leaps of enormous power that are right most of the time.) Connecting a grid of two legged AND gates together, as in figure 2, illustrates the basic logic of the scheme. At the bottom level we have a line of receptors. Above these are several levels of two legged AND gates, culminating in the top level with only two elements. It is clear that the two top level elements will discriminate between two patterns of activation of the bottom row which differ only by one element. Thus, activation of element A encodes the activity of a set of bottom level elements indicated by bracket A, and element B and bracket B represent a different set. If the bottom row were retinal receptor elements, A and B could be feature extractors for illumination conditions (A) and (B), which are quite similar. It should be apparent that with enough gates and elements, this sort of general convergence scheme could be employed to extract any feature. This being impractical, the brain adds two principles which enormously reduce the processing required, at the expense of generality. Once the set of retinal activation patterns to be recognized has been selected, specific feature extractors for that pattern are built from the underlying type of logic illustrated in figure 2, but modified by the addition of processes called "selective convergence" and "lateral inhibition." The meaning of these terms will become clear shortly. At the lowest levels, only a few simple types of feature extractors are implemented and higher levels build progressively on these. To begin with, let us examine the first step in this process.
Within the retina itself, there are several levels of processing resulting in an output neuron, a retinal ganglion cell (RGC), which sends its axon into the optic nerve to enter the brain. If we record from these RGC neurons, we find that they can be classified into a few basic types depending on the kinds of stimulus to which they maximally respond. Figure 3 shows the portions of the visual field which affect the activity of a typical RGC type, and figure 4 shows the connections which result in this type of response. We see that the RGC receives positive synapses from a small group of receptor elements located in a central spot (+ region), and inhibitory synapses from receptor elements in a ring surrounding this spot ( - region). Remember that the type of synaptic effect is the choice of the receiving neuron, the receptor elements in the "inhibitory surround" area are free to make facilitatory positive connections with other RGCs. Now, when the central spot receives light, it increases the firing rate of the RGC. When the inhibitory surround is illuminated, it decreases the firing rate of the RGC. (In this and all subsequent descriptions, it is to be taken as understood that we refer to the intermittent presentation of the stimulus, either deliberately or through fine motion of the eye, since the AC coupling properties would tend to eliminate the response to any maintained stimulus.) If the entire retinal area which affects our RGC is illuminated, the excitatory and inhibitory effects tend to cancel. Here as elsewhere in the visual system, there is thus little response to diffuse light. Notice that due to the shape of the inhibitory and excitatory regions, a line of light just the width of the excitatory center spot, and crossing the entire active area, would fire retinal elements in both the inhibitory and excitatory regions. However, such a stimulus would fire the entire excitatory central region, but would only fire a small percentage of the inhibitory elements since it only crosses the inhibitory ring in two spots (see figure 5). The response to a line stimulus crossing the central spot would therefore be strongly positive, although less so than to a stimulus which did not touch the inhibitory region.
There are several other basic types of RGC organization with regard to the sizes and shapes of the retinal areas whose illumination affects them. For example, another common type has the inverse of the type of receptive field just considered, that is, an inhibitory center area surrounded by an excitatory ring. We shall not pursue these in detail, but pass on further into the brain with the development of our abstracted inclined line detector. The next way station, the target of the optic nerve, is a nucleus of the thalamus, called the "lateral geniculate nucleus." The axons of the RGCs make synaptic contact with the cells of this nucleus just as the retinal elements made contact with the RGCs. If we record from these cells while testing for retinal areas that excite or inhibit them, we find that they have response patterns rather similar to the RGCs. That is, central spots and oppositely acting surrounding rings, etc. The active fields tend to be somewhat larger, but it appears that the thalamic cells receive positive inputs from RGCs whose positive centers are close together, so that essentially the same pattern is maintained. This is illustrated for a typical "oncenter, off -surround" thalamic cell in figure 6. Actually, at this level there is a great deal of additional processing going on that has to do with modifications on the basis of data returning from the cortical areas to which the thalamic cells project, and also data from other brain regions which have input to the visual analysis system. We shall speak more of these other inputs later, but for now we shall follow our line detector system on to the cortex.
The axons of the thalamic cells make contact with a class of cortical cells known as "simple field cells." These cells typically respond preferentially to more complex stimuli, such as lines of light at particular inclinations, and particular locations. Usually, it is found that a column of cortical simple field cells deals with the analysis of some small area of the retinal image, and contains a large number of such line analyzers. Each is responsive to a line at a slightly different angle, but all are concerned with the same small area. Figure 7 shows how such a line detector can be constructed from the output of the thalamic cells. What is required is that the thalamic cells which have a positive influence on the firing of the cortical simple field cell are selected to be ones that have their own circular excitatory centers in the visual field in a straight line at some angle to the vertical. If the inputs to our cortical cell are so selected (sélective convergence), then the optimal stimulus for firing it will be a line of light which passes through the excitatory central spots of all the lower echelon thalamic cells' receptive fields. Such a stimulus will produce a strong (but sub -maximal) firing in each of the thalamic cells since it will intersect some of the inhibitory territory of each, but such a sub -maximal output from each of them is the maximal input for the cortical level cell. Now look what happens if the line is turned at a different angle, or moved to a different position as in figure 8. If the angle is not aligned with the line of the "oncenters," one thalamic cell will show a positive response; but the others will not, and may have their inhibitory areas activated. There is thus little input to the cortical line detector. If the line is kept at the correct angle, but moved to the side, it falls in all inhibitory territory, or else beyond the active region altogether. Thus, while our simple field cell can discriminate angle, it also discriminates location.
The next level of abstraction is reached with the so- called "complex field cells" of the cortex. A typical example of one of these would be a cell of the type we have been seeking, one which would respond to a line at a specified angle, located anywhere in a large area of the visual field. Such cell is easily constructed if it can OR gate the outputs of a large number of simple field cells, all of which respond to a line at the same angle, but whose specific sensitive locations with regard to retinal position differ, and are spread over a wide area, as shown in figure 9. Again, certain convergence patterns are selectively implemented. In this case the convergence principle is parallelism.
In actuality, the connections are not so straightforward as I have suggested; there is much up and down traffic from thalamus to cortex and back. There is much up and down traffic between different levels of the cortex as well. The principle however is essentially as illustrated. By continuing this type of operation, and by combining outputs of different types of cells, it is clear that feature extractors of any desired degree of complexity could be built. Arc detectors, edge detectors and numerous other types are already available at the simple field cell level. At the level of more complex feature extractors, which may be in areas of cortex outside the primary visual cortex, it is very difficult to determine the effective stimulus for a cell simply because of the enormous number of stimuli that it might respond to. In a monkey brain, for example, a cell has been reported by one researcher which responded only to the outline of a hand. It is not necessary for us to specify all these types of feature extractors for our purposes. The particular set that was most useful to a human's brain would probably differ from the most useful set for a robot brain. It is sufficient to see the principles by which the feature extractors can be constructed. Let us review these.
First, it is clear that not all possible combinations of retinal receptor activa tion are encoded by higher level cells. Rather, some types of features are settled upon as useful building blocks, and these are encoded by cells upon which the outputs of certain lower level cells converge. The particular set of lower level cells is selected on the basis of the spatial relationship of their receptive fields on the retina. This is the principle of selective convergence. Look again at figure 2; it is as if we had abandoned such a generalized system in favor of a more limited but more economical one by omitting some branches, and bringing several more from selected places together at each stage. Second, the response of the higher echelon cell is frequently fine tuned by provision of lateral inhibition. That is, lower echelon cells frequently have inhibitory projections both to their neighbors to either side at the same level, and to the neighbors of the higher echelon cell to which they send excitatory projections. The arrangement of these inhibitory projections is often chosen to help the cell discriminate against stimuli which are similar enough to the target feature to potentially generate some responding, if not full responding, in the cell. Thus, in the simple cortical field cell line detector in figure 8, if the angle of the line were only slightly off of the desired angle, it still might cross the on- centers of several of the thalamic cells which input to the cortical cell, and cause some considerable response in the cortical cell. It should of course only activate some other angle detector. This difficulty is surmounted by the fact that in order to cross the first cell's line of oncenters at a small angle, the stimulus would also have to cross a large amount of territory which inhibited the thalamic cells' output. (If the angular discrepancy is large of course there is little problem.)
The third principle of general relevance is related to the problem of how much input will be required to fire a higher echelon cell. Recall that the neurons are not functioning strictly as AND or OR gates, in that a certain percentage of inputs active is all that is required for firing. This ALMOST gate principle is one of enormous power, and we shall have more to say about its application to intelligence in a later article; but for the moment look at what happens in the sensory system if we let the percentage of inputs required for firing be an adjustable parameter. If we required that all the lines be active as in a conventional AND gate, we would have a perfectly accurate system, like any good conventional computer. We would also have a slow and insensitive system. To get all the inputs properly set up, we would need to wait for perfect alignment of the image, probably close up for good resolution, and have good illumination to avoid any marginal situations. It would be accurate, but your ancestors would never have reached reproductive age if they'd had to wait on that kind of situation before decoding the stimulus as a wolf. On the other hand, if we let the system be sloppy and fire feature extractors when only a small number of relevant input lines are active, we will get quick results, with a lot of errors. In particular, we would be unable to make fine discriminations amongst similar stimuli which would activate many lines in common.
This kind of error is easily demonstrated. Briefly flash a picture of a circle with a small piece missing on a screen, and your subjects will report that they saw a complete circle. Only if they get to examine the image longer will they be able to discriminate the broken circle from a complete one. Now clearly both modes of processing have their uses, and it would be nice to adjust the percentage input requirements of the ALMOST gates to suit the task at hand. This is done in the brain by axons from control regions of the brain outside the sensory system which make diffuse and widespread contact with large numbers of sensory processing elements. These inputs carry no specific visual information, but by excitatory or inhibitory action they can bias the processing elements towards or away from firing threshold, thus increasing or decreasing the amount of input from lines carrying specific information which is required before firing occurs. When this process is driven beyond normal limits, as with various drugs, the feature extractors can be biased so close to firing that little or even no input is required. The result is a variety of visual distortions and hallucinations.
A fourth point worth noting is that the system resembles a pipelining type of processor. As soon as the cells of any echelon have fired in response to the current state of their inputs, succeeding echelons begin dealing with that fact while the earlier echelon begins to respond to the next state of their inputs. It is not clocked, it just all trickles through as fast as it can, but that only means that some things take longer to recognize than others. There is no need for it all to be processed in lockstepped stages like a real pipelining system. However, information can be siphoned off the line at any stage as well as being passed on to the next. If you need to catch a fast moving object, you can respond to information about its position, which is encoded fairly early in the process, without having to wait for a detailed analysis of its surface markings based on more extensive processing.
We have emphasized the development of a particular feature extractor to clarify the process involved. The emphasis on selective convergence should not obscure the fact that each lower echelon cell's outputs usually go to many higher echelon cells, not just one. Further, these outputs may be involved in the extraction of entirely different features at each of the higher echelon cells to which they project. It is not the case then that we have a grand convergence that starts with a million bit byte of retinal elements and gates itself down to a few high level cells. Rather, we come out the other end of the process with a "byte" containing even more lines than the input byte. The difference is that the bits in the input byte represent the spatial pattern of illumination on the retina in a simple point for point code. The bits in the output byte of the system each represent the occurrence or non -occurrence of a complex pattern of features in the visual world, and can be used to directly activate appropriate responses. Thus, the input byte and the output byte of the visual system each contain the same basic information - the content of the visual world. However in the output byte the information is recoded so that the bits each represent highly useful pieces of information about the patterns occurring among the input bits. Referring again to figure 2, the real situation would be one in which there were as many cells at the top of the figure as at the bottom, with each convergent tree leading to a top level cell containing many elements in common with other convergent trees, just as the two shown do.
We have dealt so far only with the processing of spatial patterns of retinal illumination. There are many other things which are dealt with; motion detection by sequential activation of retinal elements is one example. Depth perception by comparison of the patterns from the two retina is another. One that deserves special mention here is the handling of intensity information. This is done in the brain by use of the analog information in the cell's "temporal byte." That is, each line carries one bit in the "spatial byte" which encodes the existence of some set of conditions at the retina related to which cells are activated. The rate of firing of the line encodes, in pulse frequency analog form, information about the strength of that activation. For low echelon cells, this is essentially information about the intensity of the light falling on the receptors. At higher echelons in the sensory system, it is information about the "degree of certainty" of the cell in question with regard to its identification of a feature. This information derives from both intensity and spatial information, since both higher pulse rates and more lines active will increase the firing of the cell. This is an example of the way in which the brain may combine digital and analog information in a single decision process. The nature of what is being encoded by intensity at the higher levels of the process may be better understood by applying the "degree of certainty" concept to the lowest levels, where the temporal byte represents light intensity. Obviously the low level element has the greatest degree of certainty that it is being illuminated when it is being illuminated most strongly. At higher levels, number of inputs and activity of inputs can trade off with regard to drive on the receiving cell, and this is generally appropriate, because the degree of certainty about the existence of the feature to be decoded is increased if there is either a broad agreement among the inputs, or if the inputs are themselves "very certain." In general in the brain, "He who yells the loudest has the most to say." Since cells don't have egos, it works.
In any realistic approach with present day hardware, this would probably have to be modeled using a byte of several bits in place of each single line in the brain. "Which byte" would be equivalent to "which axon," and the bit pattern would carry the information carried by the temporal byte on the axon.
Given that the number of conceptual features into which the visual world could be subdivided is virtually unlimited, whereas the number of available bits in the systems' output byte is merely enormous, how does the brain decide which features to encode? Some of it, the simplest parts, are undoubtedly the result of evolutionary selection, hardwired at birth. Much of it however is probably developed in response to the type of visual environment in which the animal grows up. There is evidence, for ex- ample, that if a kitten is exposed to a visual world containing only vertical lines at a certain period of its development, its visual cortex will be rich in line detectors with a near vertical orientation, and poor in detectors for other orientations. Apparently this pattern persists throughout later life. It seems similar to a PROM.
Finally, we should mention some types of nonvisual input to the process that carry very specific correction information. Try this experiment. Look across the room while moving your head from side to side. Notice that the world seems to stay still, even though you are moving its image around on your retina. Now move the image around on the retina in a different way. Place your fingertip against your lower eyelid and lightly jiggle the eyeball while looking across the room (keep the other eye closed). Notice that this time the world seems to jump around as the image is moved about on the retina. Why the difference? In both cases the image is moving around on the retina. The answer is that movement of the image caused by moving the head in the usual manner is naturally an everyday problem for the brain in interpreting the visual world. It solves the problem by using feedforward information from the motor nuclei which control the movement of the head and body to correct the interpretation of the relative motion of image and retina to precisely allow for the motion as it occurs. Since you don't usually go around jiggling your eyeball with your finger (I assume), your brain has never developed a mechanism to pre-correct for doing so, and you see the mo- tion. There are more subtle nonvisual inputs to the processing too, such as your motivations, but these are poorly understood and beyond the scope of these articles.
Now the hard part. How might we model such a visual system with current digital technology? As a start, let's examine what would be required of a "brute force" approach if we didn't care what it cost. It would seem the most straightforward method would be to have a set of microprocessors at each echelon modeling the activity of each of the elements at that level. Since with straight digital techniques we would have to code intensity on a byte of several bits length, each lower echelon input element talking to an upper echelon unit would have to present a byte rather than a line. This means (say) eight lines for each converging step and each lateral inhibition, instead of one. Each processor would then accept a number of bytes from elements at a lower level, which it would process according to a small ROM program, and a number of bytes laterally from its neighbors, which data would also figure in the result. The ROM program would determine the type of response of the "cell," and its output would be a byte on a bus that ran to a number of yet higher echelon processors, and laterally to its own neighbors. If we really wanted to model the brain's operation, this would all be conducted with handshaking logic and the processors would all have their own private clocks. Each processor would simply continually compute the result of whatever inputs it had available at any instant and output the result. When the input from any of its information sources changed, the output would change. With a processor and a ROM to represent each cell, such things as the weighting of percentage input from an ALMOST gate action, and the continuous alteration of output on the basis of the output of lateral neighbors, are simple. Such a system would be fast, powerful, and incredibly expensive. Let's say we opted for a minimal system running off a 64 by 64 grid of photosensitive elements. Further let's say we want to keep the ratio of input to output at each echelon approximately unity, so we wind up with about 4,000 highest echelon feature extractors. (Not bad; that means the system can recognize 4,000 different complex stimuli.) Then let's say we want to carry the analysis to a depth of five echelons. (That determines the complexity of the stimuli which can be extracted by the highest level. Remember that the brain only took four echelons to get to the complex field cell. Hypercomplex cells can handle some very advanced extraction problems.) At this point, however, we are talking about 20,000 processors. Even at 8008 prices, that's not exactly cheap.
Now suppose we try to trade speed for cost. The system just described obviously runs much faster than the real brain. A first step might be to have single processors at each echelon doing the work of many, even all, of the 4,000 elements at that echelon. Suppose we could update the output of a single simulated element in 100 µs. That means we could do all 4,000 in each echelon in about half a second. That's not too bad; it's still pipelining the processing from echelon to echelon, so the system would see a picture updated every half second with a (1/2 second x number of echelons) delay between the stimulus event and the final analysis. Even if we pulled some information off the pipeline early for rapid action, however, it's still too slow for real time work. (If you ever have the equipment at hand, try playing catch in a room illuminated with two per second strobe light flashes. Anything below ten per second gets difficult.) Two complexities also appear when we try to update the simulated elements of an echelon serially. One is that the program for each element is different, which makes our ROM a little more complex. The other is that the output of each element in the array depends in part on the current output of its neighbors, including the ones you haven't gotten to yet in the current pass. With only one pass across the array per update, a "lateral lag time" error would be introduced. Correcting this with the simple expedient of iterative passes takes too long. Furthermore, you have to carry some information in scratch pad. How far do you want elements to be able to interact laterally? For most purposes, a few elements away might do, but for some tasks such as motion detectors, the brain converges outputs from widely separated elements. Lateral interaction among these is probably best ignored in our hypothetical simple system. Presumably, some optimization could be found in which several processors simulated each echelon, each one handling a number of elements serially.
A different approach to trading speed for cost would be to have all your available processors simulate the elements of a single echelon, store the result, switch programs and simulate the next echelon, then the next, etc. This way you get through each echelon faster because a complete update of an echelon is divided among more processors, and fewer elements simulated serially per processor means you finish quicker. However, with this scheme you lose the pipelining feature of the system, since a new input byte has to wait until the last byte gets all the way through before the system can start to deal with it by simulating echelon number one again.
These notions of course do not exhaust the approaches to the problem, and I didn't promise to solve it for you, but they illustrate some of the kinds of difficulties we can expect to have to overcome. (Actually, I have some more advanced ideas on the subject, but you're not going to hear about those until somebody offers me a vice presidency for Psycho- cybernetic Architecture!) The best approach may well not involve replicating the detailed features of the brain's processing steps in recoding the sensory input. What does seem worth study however is the general logic of the approach. Specifically, this would include such items as: ways of eliminating redundant information, the logic of using selected feature extractors as building blocks at each stage of the perceptual process, the elimination at each level of restrictions such as position on the generality of the feature encoding line, and the use of the ALMOST gate concept to provide continuously variable levels of stringency in the encoding process .
Hubel, D, "The Visual Cortex of the Brain,"
Scientific American, November 1963.
Karner, N (trans), Current Problems in Neurocybernetics,
Wiley, New York, 1975.