James Albus
Project Manager
National Bureau of Standards
United States Dept of Commerce
Washington DC 20234
The ideas presented in
this article represent the
views of the author and
not those of the Department
of Commerce or the
National Bureau of Standards.
In order to build a computer model of the brain for robot control we must start with a clear understanding of what the brain is for (ie: its primary function). If one examines what most brains do all of the time, and what our own brains do most of the time, it is clear that the brain is not used primarily for thinking.
The brain is first and foremost a control system. All brains, even that of the tiniest insect, control behavior. Some brains can produce very complex behavior, but only the most sophisticated and highly developed brains exhibit the phenomenon of thought. Clearly then, thought is not the central purpose of the brain, but is, rather, an artifact that arises out of the complex computing mechanisms required to generate and control extremely sophisticated behavior.
This implies that would-be brain modelers should first attempt to understand, and if possible, reproduce the control functions and behavior patterns that exist in insects, birds, mammals, and, in particular, primates. Only after these control systems are success- fully modeled can we expect to understand the mechanisms that give rise to intelligence and abstract thought in the human brain.
If the brain is primarily a control system, then any brain model we construct should control something. One of the most obvious candidates is a robot manipulator, since it rather closely resembles a limb, the most common type of device controlled by the brain. We shall therefore first develop a computer model of a basic neurological structure which can compute control functions for a robot manipulator.
We shall then attempt to demonstrate how this basic model can be generalized to compute a broad class of analytic, transcendental, or logical functions and production rules of many multivalued variables. We will show how this same model can learn, re- member, and recognize patterns and how it can be interconnected into a hierarchical network for generating sensory interactive, goal directed behavior.
We will suggest how such a hierarchy might remember experiences, solve problems, plan tasks, select goals, answer questions, structure knowledge of the world and events, and understand and generate music or natural language. Finally, we will also suggest some possible experiments and lines of research that might be pursued by one or more ambitious personal computer enthusiasts with limited resources.
Back to ContentsThe brain is, of course, not a single computer, but rather a network of billions of individual computing devices interconnected so as to produce coordinated and unified action. There are millions of photo-detectors in each eye and thousands of audio detectors in each ear. The body is embedded with sensors which detect touch, pressure, heat, cold, and pain; chemical analyzers that detect the smell and taste of things; and sensors that measure the position of joints, the tension in tendons, and the length and velocity of contraction of muscles. Inertial sensors measure roll, pitch, and yaw accelerations, and the position of the head with respect to gravitational attraction; and hormone detectors, thermosensors, and blood chemistry analyzers report on the internal biological condition of the organism.
All of this information is analyzed and processed in innumerable computing centers which detect patterns, compare incoming data with stored expectations, and evaluate the results. In many different ways and at many different levels this sensory data stream interacts with the behavior generating system to select goals, modify habits, and direct the actions of millions of muscles and glands to produce what is observed as behavior.
Perhaps the most obvious feature of the brain is that many computations are going on in many different places simultaneously. The brain does not execute sequential programs of instructions under control of a program counter. There is no fetch /execute cycle. The mathematics of finite state automata and Turing machines are not well-suited for describing the basic operations of the brain. In fact, the fundamental computations performed in the brain are not even digital - they are analog. Each neuron in the brain is essentially an analog computer performing complex additions, integrations, differentiations, and all sorts of nonlinear operations on input variables that can number from one to several hundred thousand.
The brain is a digital device only in that information is encoded for transmission from one neuron to another over long transmission lines (called axons) by pulse -frequency or pulse -phase modulation. When these pulse encoded signals reach their destinations, they are reconverted into analog voltages for the computations which take place in the dendrites and cell bodies of the receiving neurons (see "Designing a Robot from Nature" February 1979 BYTE, page 28).
The brain achieves its incredible precision and reliability through redundancy and statistical techniques. Many axons carry information concerning the value of the same variable, each encoded slightly differently. The statistical summation of these many imprecise and noisy information channels results in the reliable transmission of precise messages over long distances. In a similar way, a multiplicity of neurons may compute on roughly the same input variables. Clusters of such computing devices provide statistical precision and reliability orders of magnitude greater than that achievable by any single neuron. The outputs of such clusters of neurons are transmitted and become inputs to other clusters, which perform additional analog computations. These are the variables we have to deal with and the computations we have to simulate if we are to model the brain in any meaningful way.
To those familiar only with fetch /execute machines, this may seem an extremely difficult structure to model. I hope, in the course of these articles, that some of the difficulties will be cleared away and the prospects for building such structures will seem less dubious.
In order to discuss an engineering design for a robot control system modeled after the brain, we must first devise a mathematical convention and notation to bridge the gap between the structure of the brain and the structure of currently available computers. This is essential if we are to describe behavior precisely and to translate that description into a design for circuits and program statements to generate behavior in a computationally concise manner.
Back to ContentsOne way to describe many variables and deal with many simultaneous multivariant computations is to use vector notation. A vector is simply an ordered set, or list of variables. A vector can specify magnitude and direction. The vector V in figure 1 b has two components vx along the X axis and vy along the Y axis. The ordered set, or list of components define the vector so that we can write V = (vr, vy ).
The components of a vector can also be considered as the coordinates of a point (vx, vy) which corresponds to the tip of the vector. The locus of all pairs of components which can exist defines a vector space (for two dimensions the vector space is a surface). A vector can have more than two components. A vector with three components defines a volume (figure 1 c), and a vector with four or more components defines a hyperspace (figure 1d). A hyper - space is impossible to visualize, but is a very useful concept for our discussion.
A vector in a higher dimensional space can usually be visualized as a projection onto a lower dimensional space. For example, typical mechancial drawings portray front, side, and top views of a three -dimensional form projected onto a two -dimensional sheet of paper. Each projection can either illustrate a cut through the object at a particular plane along the projection axis, or a superposition of all the salient features of the object collapsed into the plane of the illustration. In the collapsed version, the fact that two points or lines intersect in the projected image does not necessarily mean that they coincide or intersect in the higher dimensional space - they may simply lie behind each other along the projection axis. The projection operator ignores variable differences which correspond to distance along the projection axis.
It is not necessary to make the projection axis coincident with any of the coordinate axes. For example, in the oblique projection (perspective drawing) of figure 1 c, the projection axis (the normal line to the paper through the origin of the coordinate system) is not aligned with any of the coordinate axes. The lines in the drawing represent the projections of lines in a three -dimensional space onto the two -dimensional surface of the paper. In a similar way we can project higher dimensional vectors and hyperspaces of any dimension onto a two -dimensional drawing. Figure 1 d illustrates a four- dimensional vector projected onto a two-dimensional drawing.
A vector can specify a state. This is the
primary use we shall make of vectors in this
discussion. A state is defined by an ordered
set of variables. For example, the state of
the weather might be characterized by a state vector
W = (w1, w2, w3, w4) where:
w1 = temperature,
w2 = humidity,
w3 = wind speed,
w4 = rate of precipitation.
Now the weather, like many things, is
not constant. It varies with time. Each of the
state variables (temperature, humidity, wind
speed, and rate of precipitation) is time dependent.
Thus, as time passes, the point
defined by Wt will move through the
four dimensional space. Figure 2 illustrates the
locus of the point traced out by W as it moves to define a trajectory T.
About the Author: Dr James S. Albus worked for NASA from 1957 to 1972 designing optical and electronic subsystems for over 15 spacecraft, and for one year managed the NASA Artificial Intelligence Program. Since 1973 he has been with the National Bureau of Standards where he has received several awards for his work In advanced computer control systems for industrial robots. He has written a survey article on robot systems for Scientific American (February 1976) and his Cerebellar Model Arithmetic Computer won the Industrial Research Magazine IR-100 Award as one of the 100 most significant new products of 1975.
It will often be convenient to represent time explicitly in our notation. We can easily do this by simply adding one more variable, time (t), to our state vector, thus increasing by one the number of dimensions in the space defined by the state vector. For example W = (w1, w2, w3, w4, t).
As time progresses, any point defined by the state vector moves along the time axis. A state vector whose w1 components do not vary with time will now trace out a straight line trajectory, parallel to the time axis as shown in figure 3a. If, however, any of the w1 components is time dependent, the state trajectory will contain velocity components that are orthogonal, as well as parallel to the time axis, as shown in figure 3b.
If we project the state space of all the variables except time onto a two -dimensional surface, we can represent the passage of time by the motion of this two -dimensional plane along the time axis normal to it, as in figure 4. The state trajectory T5 is the locus of points traced out by the state vector as time passes.
A large variety of things can be represented as vectors. For example, we can represent an ASCII character as a vector (figure 5). The ordered set of binary digits in the ASCII representation corresponds to the components of a binary vector. Each symbol in the ASCII alphabet is uniquely paired with a vector in an eight -dimensional hyperspace. Each symbol thus corresponds to a point in the hyperspace.
This is an important concept, because it
allows us to define any set of symbols as vectors or points in hyperspace. Any string
of symbols then becomes a trajectory
through the hyperspace. For example, the
string of symbols, "the cat chased the rat," can be
described as a trajectory through a hyperspace defined by any set of variables
defining the English alphabet (plus a blank
character). This also applies to the string
WXYZ when:
W is the command: Reach to Position A;
X is the command: Grasp;
Y is the command: Move to Position C;
Z is the command: Release.
We need not restrict ourselves to binary
vectors. Symbols may be represented by
vectors with continuously variable components
as well. This allows us to introduce the
concept of fuzzy symbols. If the hyperspace
is continuous, then each point which corresponds
to a symbol has some neighborhood
of points around it which are much
closer to it than any other symbol's points.
This is illustrated in figure 6. We may view
the points in such a neighborhood in one of
two ways:
This is a fundamental concept in pattern recognition theory. Hyperspace is partitioned into regions, and the existence of a feature vector in a particular region corresponds to the recognition of a pattern or symbol. By definition, the best set of features is the one that maximizes the separability of pattern vectors. In the design of pattern recognizers it is important to select a set of features which is easily measured and which produces widely separated and compact clusters in feature space.
Back to TopIn the physical world, functions are usually defined as relationships between physical variables. For example, we could say that climate over a particular geographical region is a function of the heat input, the prevailing wind conditions, and other factors, or that the seasons are a function of the position and orientation of the earth relative to the sun. Similarly, we may say that the level of hunger we experience is a function of the signals on nerve fibers reporting on the state of the stomach, chemistry of the blood, the time of day as indicated by internal biological rhythms, and so on.
In mathematics a function defines (and is
defined by) a relationship between symbols
that can sometimes be set in one-to-one correspondence
to physical variables. As in the
physical world, a function usually implies
a directional relationship (eg: the relationship
between cause and effect has a direction
which flows from cause to effect). In traditional
terms a function may be expressed as an equation, such as:
y = f(x)
which reads: y equals a function f of x. The function:
y = 2x2 + 3x + 6
is a relationship between y and x.
Functions can also be expressed as graphs. Figure 7 is a plot of the equation y = 2x2 + 3x + 6. Functions may sometimes be defined by tables. The table in figure 8a defines the Boolean AND function Z =XY. This function can also be drawn as a circuit element (see figure 8b) which performs the AND function on two inputs.
Tables can also be used to define non-Boolean functions. Tables of logarithms or trigonometric functions are good examples of this. Of course, a table defines a continuous function exactly only at the discrete points represented in the table. Thus, the accuracy of a continuous function represented by a table depends on the number of table entries (ie: the resolution on the input variables). Accuracy can, of course, be increased by interpolation techniques. In general, the number of entries required to compute a function by a table lookup is proportional to RN, where R is the resolution of each input variable, and N is the number of input variables. This exponential increase in size of the table required is the principal reason that multidimensional functions are seldom computed by table lookup.
Modern mathematics often expresses
functional relationships in terms of mappings
from a set of states defined by independent
variables onto a set of states defined by dependent
variables. In one notation, this is
expressed by the string f:
f:C-40.E
which reads, "f is a relationship which maps
the set of causes C into the set of effects E."
It means that for any particular state in the
set C, the relationship f will compute a state
in the set E. This is shown in figure 9.
We have already shown that states can be denoted by vectors and sets of states by sets of points in vector hyperspaces. Thus, the notion of a function being a mapping from one set of states to another naturally extends to a mapping of points in one vector hyperspace onto points in another.
Suppose, for example, we define an operator
h as a function which maps the input
S = (s1, s2, s3, . . . SN) onto the output
scalar variable p. We can write this as:
p = h (S)
or
p = h (S1, S2, . . . SN)
We can also draw the functional operator as a circuit element or "black box" as in figure 10. (A black box is an engineering concept sometimes used to depict a process with inputs and outputs. The viewer sees the effects on the output of changes to the input, but the internal workings of the process remain hidden in a black box.)
If we assume that we have L such operators,
h1, h2, ... hL, each operating on the
input vector S in figure 11, we have a mapping:
H: S -> P or P = H(S)
where the operator H = (h1, h2, ... hL)
maps every input vector S into an output
vector P. Now since S is a vector (or point)
in input space and P is a vector (or point) in
output space, we can represent the function
H as a mapping from input space onto
output space, as shown in figure 12.
For the purposes of our discussion we require that both the input and output space be bounded and that each S will map into one and only one P. Several different S vectors may map into the same P vector, however. Of course, if any of the variables in S are time dependent, S will trace out a trajectory TS through input space. The operator H will map each point S on TS into a point P on a trajectory Tp in output space.
Back to TopWe are now ready to consider the structure of control systems for sensory interactive, goal directed behavior. The simplest form of goal seeking device is the servomechanism. The setpoint, or reference input to the servomechanism, is a simple form of command. Feedback from a sensing device, which monitors the state of the output or the results of action produced by the input, is compared with the command. If there is any discrepancy between commanded action and the results, an error signal is generated which acts on the output in the proper direction and by the prop- er amount to reduce the error. The system thus follows the setpoint, or, put another way, it seeks the goal set by the input command.
Now almost all servomechanism theory deals with a one-dimensional command, a one-dimensional feedback, and a one-dimensional output. Our vector notation will allow us to generalize from this one-dimensional case to the multidimensional case with little difficulty.
Assume we have the multivariable servo-mechanism shown in figure 13. The function H operates on the input variables in S and computes an output P = H(S). Note that we have partitioned the input vector S into two vectors: C = (s1 , s2 ..., s¡, 0 ..., 0) and F = (0, ... 0, s ¡ + ... sN ); such that S = C +F. If i = 1, N = 2, L = 1, and H computes some function of the difference between C and F, we have a classical servomechanism.
In our more general case C may be any vector, and in some cases it may be a symbolic command. The feedback vector may contain information of many different types. It may simply report position or velocity of the controlled outputs, but for a complicated system such as a robot manipulator or the limb of an animal, it may also report the resistance to movement by the environment, the inertial configuration of the manipulator structure, and other parameters relevant to the problem of making rapid and precise movements.
Figure 14 illustrates the situation when a stationary command vector C establishes a setpoint, and as time progresses the feedback vector F varies, creating an input trajectory TS. The H operator computes an output vector for each input and so produces an output trajectory Tp . The variation in F may be caused by external forces imposed by the environment, or by actions produced by the output, or both. One or more of the variables in the feedback vector F may even be taken directly from the output vector P. I n the latter case the H operator becomes the transition function for a finite state automaton. In any of these cases the result is that a single command vector C produces a sequence of output vectors Tp. The process is driven by the sequence of feedback vectors F1, F2, F3. The superscript Fk denotes the vector F at time tk.
The sequence of operations illustrated in figure 14 can also be viewed as a decomposition of a command C into a sequence of subcommands P1, P2, P3. The vector C may be a symbol standing for any number of things such as a task, a goal, or a plan. In such cases the output string P1, P2, P3 represents a sequence of subtasks, subgoals, or subplans, respectively.
Whether figure 14 is a servomechanism or a task decomposition operator, there are many practical problems concerned with stability, speed, gain, delay, phase shift, etc. In our notation these are all embedded in the H functions. If the H functions are correctly formulated and defined over the entire space traversed by the S input, then the output Tp will drive the physical actuators in such a way that the goal is achieved (ie: the error between the command C and the result P is nulled) and stability is maintained under all conditions.
Servomechanisms are, of course, only the simplest form of sensory interactive, goal seeking devices. By themselves they are cer- tainly not capable of explaining the much more complex forms of goal seeking commonly associated with purposive behavior in biological systems. However, when connected together in a nested (or hierarchical) structure, the complexity of behavior in feedback control systems increases dramatically.
Assume that the command vector C in figure 14 changes such that it steps along the trajectory Tc as shown in figure 15. The result is that the sequence of input commands C1, C2, C3, followed by the sequence C4, C5 produces the sequence of output vectors P1, P2, P3, 0, P5. In this case the subsequence P1, P2, P3, is called by the commands C1, C2, C3 and driven by the feedback F1, F2, F3. The subsequence P4, P5 is called by C4, C5 and driven by F4, F5, etc.
If we now represent time explicitly, the C, F, and P vectors and trajectories of figure 15 appear as shown in figure 16. The fact that C remains constant while the feedback changes from F1 to F2 to F3 means that the trajectory Tc is parallel to the time axis over that interval. The jump from C1, C2, C3 to C4, C5 causes an abrupt shift in the Tc trajectory in the time interval between F3 and F4.
Note that each instant can be represented by a plane (or set of coplanar regions) perpendicular to the time axis. Each plane contains a point from each trajectory and represents a snapshot of all the vectors simultaneously at a specific instant in time.
We are now ready to consider a hierarchy of servomechanisms, or task decomposition operators, as shown in figure 17a. Here the highest level input command C4 is asymbolic vector denoting the complex task (ASSEMBLE AB). Some of the components in C4 may denote modifiers and arguments for the assemble task. The subscript Ck denotes the C vector at the kth level in the hierarchy.
Note that in figure 17 vectors are not repeatedly drawn for each instant of time during the trajectory segments, when they are reasonably constant. Thus, C4 is shown only at the beginning and end of the trajectory segment labeled (ASSEMBLE AB). C2 is shown only at the transition points between (REACH to A), (GRASP), (MOVE TO C), etc. It should be kept in mind, however, that H4 computes P4 continuously and produces an output at every instant of time, just as H1 computes P1.
The feedback F4 may contain highly processed visual scene analysis data which identifies the general layout of the work space, and thereby determines which output vectors P4 (and hence which simple task commands C3) should be selected and in which order. F4 may also contain data from P4 and P3 which indicates the state of completion of the decomposition of C4. F4 combines with C4 to define the complete input vector S4. The H4 operator produces an output vector P4 = H4 (S4).
At least part of the output P4 becomes part of the input command vector C3 to the next lower level. C3 is also a symbolic vector which identifies one of a library of simple task commands together with the necessary modifiers and arguments. As the feedback F4 varies with time, the input vector S4, and hence the output vector P4, move along a trajectory generating a sequence of simple task commands at C3 such as (FETCH A), (FETCH B), (MATE B TO A), (FASTEN B TO A), etc. as shown in figure 17b.
Feedback at F3 may identify the position and orientation of the parts A and B, and also carry state sequencing information from outputs P3 and P2. As F3 varies with time, it drives the input S3 (and hence P3) along a trajectory generating a sequence of elemental movement commands at C2 such as (REACH TO A), (GRASP), (MOVE TO C), (RELEASE), etc.
Feedback at F2 may contain information from proximity sensors indicating the fine positioning error between the fingers and the objects to be manipulated, together with state sequencing information derived from P2 and P1. The operator H2 produces P2, which denotes the proper velocity vectors Cl for the manipulator hand in joint angle coordinates.
Feedback F2 also provides joint angle position data necessary for the coordinate transformations performed by H2. P2 provides reference, or setpoint commands, Cl to the servomechanism operator H1. F1 provides position, velocity, and force information for the traditional servocomputations. The output P1 is a set of drive signals to the actuators.
Feedback enters this hierarchy at every level. At the lowest levels, the feedback is unprocessed (or nearly so) and hence is fast acting with very short delays. At higher levels, feedback data passes through more and more stages of an ascending, sensory processing hierarchy. Feedback thus closes a real time control loop at each level in the hierarchy. The lower level loops are simple and fast acting. The higher level loops are more sophisticated and slower.
At each level the feedback vector F drives the output vector P along its trajectory. Thus, at each level of the hierarchy, the time rate of change of the output vector Pi will be of the same order of magnitude as the feedback vector Fi, and considerably more rapid than the command vector Ci. The result is that each stage of the behavior generating hierarchy effectively decomposes an input task represented by a slowly changing Ci into a string of subtasks represented by a more rapidly changing Pi.
At this point we perhaps should emphasize
that the difference in time rate of
change of the vectors at various levels in
the hierarchy does not imply that the H
operators are computing slower at the higher
levels than at the lower. We will, in fact,
assume that every H operator transforms
S into P with the same computational delay
At at all levels of the hierarchy. That is:
FIXME: Pi (t) = Hi (Si (t-ht)) or Pk = Hi (Sk-' )
at every level. The slower time rate of
change of P vectors at the higher levels
stems from the fact that the F vectors driving
the higher levels convey information
about events which occur less frequently.
In some cases certain components of higher
level F vectors may require the integration
of informaion over long time intervals
or the recognition of symbolic messages
with long word lengths.
When we represent time explicitly as in figure 17, we can label the relatively straight segments of the Tc trajectories as tasks and subtasks. Transitions between the subtasks in a sequence correspond to abrupt changes in Tc.
It we do not represent time explicitly, the relatively constant C vectors correspond to nodes, as in figure 15. The resulting tree structure represents a classical AND /OR decomposition of a task into sequences of subtasks, where the discrete Ci vectors correspond to OR nodes and the rapidly changing sequences of Pi vectors become sets of AND nodes under those OR nodes.
Back to ContentsFigure 17 illustrates the power of a hierarchy of multivariant servomechanisms to generate a lengthy sequence of behavior which is both goal directed and appropriate to the environment. Such behavior appears to an external observer to be intentional, or purposive. The top level input command is a goal, or task, which is successively decomposed into subgoals, or subtasks, at each stage of the control hierarchy until at the lowest level output signals drive the muscles (or other actuators) producing observable behavior.
To the extent that the F vectors at the various levels contain sensory information from the environment, the task decompositions at those levels will be capable of responding to the environment. The type of response to each F vector depends on the H function at that level. If the F vector at any level is made up solely of internal variables, then the decomposition at that level will be stereotyped and insensitive to conditions in the environment.
Whether or not the hierarchy is driven by external or internal variables, or both, the highest level input command commits the entire structure to an organized and coordinated sequence of actions which under normal conditions will achieve the goal or accomplish the task. The selection of a high level input command in a biological organism thus corresponds to an intent or purpose, which, depending on circumstances, may or may not be successfully achieved through the resulting hierarchical decomposition into action.
Back to ContentsThe success or failure of any particular task performance, or goal seeking action, depends on whether or not the H functions at each level are capable of providing the correct mappings so as to maintain the output trajectory within a region of successful performance, despite perturbations and uncertainties in the environment.
At all levels, variations in the F vectors due to irregularities in the environment cause T5 trajectories to vary from one task performance to the next. This implies that while there may exist a set of ideal trajectories through S and P space at each level of the hierarchy corresponding to an ideal' task performance, there also must be an envelope of nearly ideal trajectories which correspond to successful, but not perfect, task performance. This is illustrated in figure 18.
The H functions must not only be defined along the Ts trajectories corresponding to ideal performance, but also in the regions around the ideal performance. Consequently, any deviation from the ideal is treated as an error signal which generates an action designed to restore the actual trajectory to the ideal, or at least to maintain it within the region of successful performance.
Small perturbations can usually be corrected by low level feedback loops, as shown in figure 19. These involve relatively little sensory data processing, and hence are fast acting. Larger perturbations in the environment may overwhelm the lower level feedback loops, and require strategy changes at higher levels in order to maintain the system within the region of successful performance. This is illustrated in figure 20. Major changes in the environment are detected at higher levels after being processed through several levels of pattern recognizers. This produces differences in the F vector at the higher level which in turn produces different C vectors to lower levels. The result is an alternative higher level strategy to cope with the perturbation.
Of course, if the H functions do not provide stability, or if the environment is so perverse that the system is overwhelmed, then the trajectories diverge from the re- gion of successful performance and failure occurs.
Over-learned tasks correspond to those for which the H functions at the lower levels are sufficiently well defined over a large enough region of input space so as to maintain the terminal trajectory well within regions of stability and success without requiring intervention by the higher levels for strategy modification. Thus, a highly skilled and well -practiced performer, such as a water skier, can execute extremely difficult maneuvers with apparent ease despite large perturbations such as waves. His lower level H functions are well defined over large regions of space corresponding to large perturbations in the environment. He is thus capable of compensating for these perturbations quickly and precisely so as to maintain successful performance without intervention by higher levels. Such a performance is characterized by a minimum amount of physical and mental effort.
We say, "He skis effortlessly without even thinking." What we mean is that his lower level corrections are so quick and precise that his performance never deviates significantly from the ideal. There is never any need for higher level loops to make emer- gency changes in strategy. On the other hand, a novice skier (whose H functions are poorly defined, even near the ideal trajectory, and completely undefined elsewhere) may have great difficulty maintaining a successful performance at all. He is continually forced to bring higher levels into play to prevent failure, and even the slightest perturbation from the ideal is likely to result in a watery catastrophe. He works very hard, and fails often, because his responses are late and often misdirected. His performance is erratic and hardly ever near the ideal.
However, practice makes perfect, at least in creatures with the capacity to learn. Each time a trajectory is traversed, if there is some way of knowing what mistakes were made, corrections can be made to the H functions in those regions of input spaces which are traversed. The degree and precision of these corrections, and the algorithm by which they are computed, determine the rate of convergence (if any) of the learning process to a stable and efficient success trajectory.
There are many interesting questions about learning, generalization, and the mechanisms by which H functions are created and modified at the various hierarchical levels in biological brains. However, we will defer these issues until part 2 (July 1979 BYTE).
Back to ContentsNote that figure 17 illustrates only a single specific performance of a particular task. None of the alternative trajectories which might have occurred under different circumstances with a different set of F vec- tors are indicated. These alternatives which might have occurred can be illustrated in the plane orthogonal to the time axis.
Figure 21 illustrates the set of alternative C vectors available at various levels in the behavior -generating hierarchy of the male 3 spined stickleback fish. This figure represents a snapshot, or single cut through space orthogonal to the time axis. C4, the highest level goal, is survival. The feedback F4 consists of variables indicating water temperature and depth, blood chemistry, and hormone levels generated by length of day detectors. When the hormone levels indicate the proper time of year and the blood chemistry does not call for feeding behavior, then migratory behavior will be selected until warm, shallow water is detected. The F4 vector will then trigger the reproduction subgoal.
When C3 indicates (REPRODUCTION), an F3 vector indicating a red male in the territory will cause the (FIGHT) command to be selected to C2. When C2 indicates (FIGHT) and the intruder threatens, a C1 will be selected, and so on. At each level, a different feedback vector would select a different lower level subgoal. For example, if F3 indicates a female in the territory, C2 will become (MATE), and the type of mating behavior selected will depend on F2.
In simple creatures like the stickleback fish, the sensory stimuli that produce F2 and F3 vectors which trigger specific behavioral trajectories are called innate releasing mechanisms. Innate releasing mechanisms and their associated behavioral patterns have been studied extensively in a number of insects (ie: the digger wasp and various bee and ant species), several fish, and many birds (ie: the herring gull, turkey, and golden eye drake).
In these relatively simple creatures, behavior is sufficiently stereotyped that it can be described in terms of a small set of hehavioral patterns triggered by an equally small set of sensory stimuli. This suggests that insects, fish, and birds have only a few levels in their control hierarchies and a small set of behavior patterns stored as H functions at each level. It further implies that there are few externally driven components in the F vectors at each level. Behavior trajectories are internally driven, with only a few branch points controlled by sensory data processed through simple pattern recognizers. The trajectory segments driven en- tirely by internal variables are called fixed action patterns, or tropisms. The external variables which control the relatively few branch points are the innate releasing mechanisms.
In higher animals, behavior is more complex and much less stereotyped. This implies more levels in the hierarchy, more external sensory variables in the F vectors at each level, and hence many more possibilities for branching of the resulting trajectories.
Figure 22 illustrates a set of trajectories in which there is opportunity for branching at several different levels at every step along each trajectory. At each instant in time the C vector to any particular level depends upon what the C and F vectors were to the next higher level at the previous instant. Thus, a change in the F vector at any level causes an alternative C vector to be sent to the level below. Behavior is continuously modified at all levels by external variables, and hence does not appear stereotyped at all.
Many degrees of freedom place great demands on the H functions for maintaining stability and precision of control in such a large space of possibilities. Since successful behavior is only a tiny subset of all possible behaviors, it is clear that most of the potential branches will lead to disaster unless the H functions produce actions which steer the S and P vectors back into the narrow regions surrounding success trajectories. For a multilevel hierarchy with sensory interaction at many different levels, this is extremely complex. However, if the H functions are trainable, then performance can improve through practice. Complex tasks can be learned, imitated, and communicated from one individual to another.
Back to ContentsWe have now completed the first step in our development. We have described a hierarchical computing structure which can execute goals, or intended tasks, in an unpredictable environment. We have also defined a notation by which the behavior of such a hierarchy can be described clearly and concisely. We have asserted that the complexity of behavior resulting from such a control hierarchy depends on four things: the number of levels in the control hierarchy;
In part 2 we will describe a computer model of a neurophysiological structure in the brain which computes multivariant H functions. We will then suggest how the brain might use such structures to learn skills, remember events, select goals, and plan future actions.
Back to Contents BIBLIOGRAPHY
In part 1 I described how sensory interactive, goal directed behavior can be generated and controlled by a multilevel hierarchy of computing modules. At each level of the hierarchy, input commands arc decomposed into strings of output subcommands which form the input commands to the next lower level. Feedback from the external environment, or from internal sources, drives the decomposition process and steers the selection of subcommands so as to achieve successful performance of the task of reaching the goal. In this article I will address questions of what kind of neurological structures are believed to exist in the brain and what kind of computations, memory storage methods, and associative recall effects these structures seem to be performing.
Unfortunately, definitive experimental evidence about the structure and function of neurological circuitry in the brain is extremely difficult to obtain. Neurons, the brain's computing elements, are very tiny and delicate. It is hard to measure what is happening in them without damaging them or otherwise interfering with the flow of information related to their operation. Techniques do exist for measuring the activity of individual neurons and sometimes even observing the behavior of several neurons at the same time. There are also techniques which make it possible to monitor synchronized changes in the activity of large numbers of neurons.
However, the brain is such a complicated anatomical structure, with such a jumbled interconnection of different kinds of neurons being excited and inhibited by such a broad variety of chemical and electrical stimuli, that it is impossible to infer from these measurements any very sophisticated ideas about what mathematical functions are being computed or what procedures are being executed.
Neurons are as varied in size, shape, and type as trees and bushes in a tropical forest, and often are as closely intertwined and interconnected as a bramble patch overgrown with vines. Many of their most important information processing properties are statistical in nature, and these statistics may apply over ensembles of thousands of neurons.
The situation is further complicated by multiple feedback loops, some of which are confined to small, local clusters of neurons, and others which may thread through several entirely different regions of the brain. The result is that no one has yet been able to construct a clear picture of the overall information processing architecture in the brain. At present there exists no generally accepted theory which bridges the gap between hard neurophysiological measurements and psychological concepts such as perception and cognition.
Nevertheless, there is much that is known with certainty about the structure and function of at least some parts of the brain, particularly in the periphery of the sensory and motor systems. A great deal can be inferred from this knowledge. Furthermore, there is one area, the cerebellar cortex, where the geometry is sufficiently regular to enable researchers to positively identify a number of important neurophysiological relationships.
The cerebellum, which is attached to the midbrain portion of the upper spinal cord and nestles up under the visual cortex, as shown in figure 1, is intimately involved with control of rapid, precise, coordinated movements of limbs, hands, and eyes. Injury to the cerebellum results in motor deficiencies, such as overshoot in reaching for objects, lack of coordination, and the inability to execute delicate tasks or track precisely with the eyes.
During the 1960s, advances in the technology of single cell recordings and electron microscopy made possible an elegant series of experiments by Sir John Eccles and a number of others. These experiments identified the functional interconnections between the principal components in the cerebellar cortex. A brief outline of the structure and function of the cerebellar cortex is shown in figure 2.
The principal input to the cerebellar cortex arrives via mossy fibers (so named because they looked like moss to the early workers who first observed them through a microscope). Mossy fibers carry information from a number of different sources such as the vestibular system (balance), the reticular formation (alerting, the cerebral cortex (sensory -motor activity), as well as from sensor organs which measure such quantities as position of joints, tension in tendons, velocity of contraction of muscles, pressure on skin, etc. It is possible to categorize mossy fibers into at least two classes based on their point of origin: one, those carrying information which may include commands from higher levels in the motor system; and two, those carrying feedback information about the results of motor outputs. Once these two sets of fibers enter the cerebellum, however, they intermingle and become virtually indistinguishable.
The feedback mossy fibers tend to exhibit a systematic regularity in the mapping from point of origin of their information to their termination in the cerebellum. It is thus possible to sketch a map of the body on the surface of the cerebellum corresponding to the origins of feedback mossy fiber information, as shown in figure 3. This map is not sharply defined, however, and has considerable overlap between regions due in part to extensive intermingling and multiple overlapping of terminations of the mossy fibers in the cerebellar granule cell layer. Each mossy fiber branches many times and makes excitatory ( +) contact with several hundred granule cells spaced over a region several millimeters in diameter.
Granule cells are the most numerous cells in the brain. It is estimated that there are about 3 X 1010 granule cells in the human cerebellum alone. There are 100 to 1000 times as many granule cells as mossy fibers. Each granule cell is contacted by 5 to 12 mossy fibers and gives off a single output axon which rises toward the surface of the cerebellum. When it nears the surface this axon splits into two parts which run about 1.5 mm in opposite directions along the folded ridges of the cerebellum, making contact with a number of different kinds of cells in passage. These axons from the granule cells thus run parallel to each other in a densely packed sheet (hence the name, parallel fibers).
One of the cell types contacted by parallel fibers are Golgi cells (named for their discoverer). These cells have a widely spread dendritic tree and are excited by parallel fibers over a region about 0.6 mm in diameter. Each Golgi cell puts out an axon which branches extensively, making inhibitory ( -) contact with up to 100,000 granule cells in its immediate vicinity, including many of the same granule cells which excited it. The dendritic trees and axons of neighboring Golgi cells intermingle so as to blanket the entire granular layer with negative feedback. The general effect is that of an automatic gain control on the level of activity in the parallel fiber sheet.
It is thought that the Golgi cells operate such that only a small and controlled percentage (perhaps as little as 1 percent or less) of the granule cells are allowed above threshold at any one time, regardless of the level of activity of the mossy fiber input. Any particular pattern of activity on the mossy fiber input will produce a few granule cells which are maximally excited, and a great many others which are less than maximally stimulated. The Golgi cells suppress the outputs of all but the few maximally stimulated granule cells. The result is that every input pattern (or vector) is transformed by the granule layer into a small, and relatively fixed percentage, or subset, of parallel fibers which are active.
These active parallel fibers not only contact Golgi cells, but make excitatory contact with Purkinje cells (named for their discoverer) and basket and stellate cells (named for their shapes) through weighted connections (synapses). Each Purkinje cell performs a summation over its inputs and produces an output which is the output of the cerebellar cortex. The basket and stellate cells are essentially inverters which provide the Purkinje with negative weights that are summed along with the positive weights from parallel fibers.
A second set of fibers entering the cerebellar cortex are the climbing fibers, so named because they climb all over the Purkinje cells like ivy on a tree. There is typically one climbing fiber for each Purkinje cell. It is believed that these climbing fibers have some role in adjusting the strength of the weighted synaptic connections with the parallel fibers, so as to alter the Purkinje output. Climbing fibers are thus hypothesized to provide the information required for learning.
The availability of such detailed knowledge regarding the structure and function of the various cell and fiber types in the cerebellum has led a number of theoreticians to propose mathematical models to explain the information processing characteristics of the cerebellum. One model was developed independently in Great Britain by David Marr and in the United States by myself. The general outlines of this model are shown in figure 4. My further work has produced the more abstract version illustrated in figure 5, as well as a mathematical formalism called the CMAC (Cerebellar Model Arithmetic Computer).
CMAC is defined by a series of mappings:
FIXME:: S ->M -*A ->p
where:
S is an input vector;
M is the set of mossy fibers used to encode S;
A is the set of granule cells contacted by M;
p is an output value.
The overall mapping:
S -*p
has all of the properties of a function:
p = h (S)
as described in part 1. A set of L CMACs
operating on the same input produces a mapping:
S->P
which has the properties of the function:
P = H (S).
We may describe the information encoded
by mossy fibers as a vector S = C + F where:
C = (st, s2, ..., s¡) is a vector, or list,
of command variables;
and
F = (s ..., sN) is a vector, or list,
of feedback variables.
+ is an operator denoting the combination of
two vectors defined by two lists of variables
into a single vector or list of variables.
The vector components of S must be transmitted from their various points of origin to their destination in the cerebellar granular layer. Distances may range from a few inches to over a foot. This presents a serious engineering problem because mossy fibers, like all nerve axons, are noisy, unreliable, and imprecise information channels with limited dynamic range. Pulse frequency and pulse phase modulation (which the brain uses for data transmission over long distances) are subject to quantization noise and are bandwidth limited. Nerve axons typically cannot transmit pulse rates above two or three hundred pulses per second. Nevertheless, high resolution high bandwidth data is required for precise control of skilled actions.
The brain solves this problem by encoding each of the high precision variables to be transmitted so that it can be carried on a large number of low precision channels. Many mossy fibers are assigned to each input variable such that any one fiber conveys only a small portion of the information content of a single variable.
The nature of this encoding is that any particular mossy fiber will be maximally active over some limited range of the variable that it encodes, and less than maximally active over the rest of its variable's range. For example, the output of the mossy fiber labeled a in figure 6 is maximally active whenever the elbow joint is between 90° and 120° and is less than maximally active for all other elbow positions. The mossy fiber labeled b in figure 6 is maximally active whenever the elbow angle is greater than 160°. Now if there exists a large number of mossy fibers whose responses have a single maximum but which are maximally active over different intervals, it is then possible to tell the position of the elbow quite precisely by knowing which mossy fibers are maximally active. For example, in figure 7 the fact that mossy fibers a, b, and c are maximally active indicates that the elbow joint is between 118° and 120°.
The CMAC models this encoding scheme
in the following way: define m1 to be the
set of mossy fibers assigned to convey the
value of the variable si; define m1* to be the
mossy fibers in mi which are maximally
stimulated by a particular value of s¡. If for
every value of si over its range there exists
a unique set mi* of maximally active mossy
fibers, then there is a mapping si -+ mi* such
that knowing mi* (ie: which fibers in m1 are
maximally active) tells us what is the value
of si. If such a mapping is defined for every
component si in the vector S then we have a mapping:
where M is the set of all mossy fibers which
encode the variables in the vector S.
In CMAC each of the s¡ - m¡* mappings
may be defined by a set of K quantizing
functions ¡CI, 1C2, ...,'Ck each of which is
offset by a value of 1/K times the quantizing
interval. An example of this is given in
figure 8 where K = 4 and N = 2. Component
si is represented along the horizontal axis,
and the range of si is covered by four
quantizing functions:
FIXME
tCi }A,B,C,D,E}
C2 IF, G,H,1,Kf
C3 }M, N, P, Q, R}
C4=}S,T,V,W,X}
Each quantizing function is offset from the previous one by one resolution element. For every possible value of si there exists a unique set mi * consisting of the set of values produced by the K quantizing functions. For example (in figure 8), the value si = 7 maps into the set mi* _ 1B, H, P, V1.
A similar mapping is also performed on s2
by the set of quantizing functions:
FIXME
2Ct= fa, b,c,d,el
2C2 = 1 f, g, h, j, k }
2C3 = {m, n, p, q, r}
2C4= ;s,t,v,w,xI.
For example, the value s2 = 10 maps into the set m2* = 1c, j, q, v }. Now, if the st component in figure 8 corresponds to the position of the elbow joint, the mossy fiber labeled B will be maximally active whenever the elbow is between 4 and 7, and less than maximally active whenever the elbow position is outside that region. Similarly, the mossy fiber labeled H is maximally active when the elbow is between 5 and 8, the fiber P maximally active between 6 and 9, and V between 7 and 10, etc. The combination of mossy fibers in the set ml* = 1B, H, P, VI thus indicates that the variable st = 7. If si changes one position from (from 7 to 8, for example), the mossy fiber labeled B will drop out of the maximally active set m * to be replaced by another, labeled C.
This encoding scheme has a number of advantages. The most obvious is that a single precise variable can be transmitted reliably over a multitude of imprecise information channels. The resolution (or information content) of the transmitted variable depends on the number of channels. The more mossy fibers dedicated to a particular variable, the greater the precision with which it is represented.
A second equally important result is that small changes in the value of the input variable s1 have no effect on most of the elements in mi *. This leads to a property known as generalization, which is crucial for learning and recall in a world where no two situations are ever exactly the same. In CMAC the extent of the neighborhood of generalization along each variable axis depends on the resolution of the CMAC quantizing functions. In the brain this corresponds to the width of the maximally active region of the mossy fibers.
Just as we can identify (or name) mossy fibers by the input variables they encode, so we can identify granule cells by the mossy fibers which provide them with input. Each granule cell receives input from several different mossy fibers, and no two granule cells receive input from the same combination of mossy fibers. This means that we can compute a unique name (or address) for each granule cell by simply listing the mossy fibers which contact it. For example, a granule cell contacted by two mossy fibers B and c can be named (or addressed) Bc.
In the CMAC example in figure 8, 25 granule cells are identified by their contacts with mossy fibers from the quantizing functions 1C1 and 2C1. 25 other granule cells are identified by 1C2 and 2C2, 25 by 1C3 and 2C3, and 25 more by 1C4 and 2C4. There are, of course, many other possible combinations of mossy fiber names which might be used to identify a much larger number of granule cells. For this simple example, however, we will limit our selection to the permutation of corresponding quantizing functions along each of the coordinate axes. This provides a large and representative sample which uniformly spans the input space. Furthermore, this particular naming algorithm is simple to implement either in software or hardware.
We can define A to be the set of all granule cells identified by their mossy fiber inputs. Of course, all of the granule cells in A are not active at the same time. As was previously noted, most granule cells are inhibited from firing by Golgi cell gain control feedback. Only the small percentage of granule cells whose input mossy fibers are all maximally active can rise above threshold. We will define the set of active granule cells as A *.
Since we already know which mossy fibers are maximally active (ie: those mossy fibers in the sets mi *), we can compute names of granule cells in A *. For example, in figures 8 and 10,ifst = 7 and s2 =10, then mi *= 1B,H,P,V } and m2 *= lc, j,q,vi. The active granule cells in A* can now be computed directly as A* = Ì Bc, Hj, Pq, Vvf. All other granule cell names in the larger set A involve at least one mossy fiber which is not maximally active (ie: not in ml* or m2 *).
Note that, as illustrated in figure 9, the granule cell Bc will be active as long as the input vector remains in the region of input space 4 %lt si %lt 7 and 8 %lt s2 %lt 11. Thus, the generalizing property introduced by the S -> M mapping carries through to the naming of active granule cells. A particular granule cell is active whenever the input vector S lies within some extended region, or neighborhood, of input space. Other granule cells are active over other neighborhoods. These neighborhoods overlap, but each is offset from the others so that for any particular input S, the neighborhoods in A* all overlap at only one point, namely the point defined by the input vector. This is illustrated in figure 10. If the input vector moves one resolution element in any direction, for example, from (7, 10) to (8, 10), one active granule cell (Bc) drops out of A* to be replaced by another (Cc).
Back to ContentsGranule cells give rise to parallel fibers
which act through weighted connections on the
Purkinje output cell, varying its firing
rate. Each cell in A is associated with a
weight which may be positive or negative.
Only the cells in A* have any effect on the
Purkinje output cell. Thus, the Purkinje output
sums only the weights selected (or
addressed) by A *. This sum is the CMAC
output scalar variable p. For example, in
figure 8, S = (7, 10) maps into A* _ {Bc, Hj,
Pq, Vv) which selects the weights:
WB, = 1.0
WHi = 2.0
Wpq = 1.0
WV = 0.0.
These weights are summed to produce the
output:
p=4.0.
Thus the input S = (7, 10) produces the output
h(S) = 4.
In figure 8 four weights are selected for
every S vector in input space. Their sum is
the value of the output p. As the input vector
moves from any point in input space to an adjacent
point one weight drops out to be
replaced by another. The difference in value
of the new weight minus the old is the difference
in value of the output at the two adjacent
points. Thus, the difference in adjacent
weights is the partial derivative (or partial
difference) of the function at that point.
As the input vector S moves over the input
space, a value p is output at each point. We
can therefore say that the CMAC computes
the function:
p=h(S).
The particular function h computed depends on the particular set of values stored in the table of weights. For example, the set of weights shown in figure 8 computes the function shown in figure 11.
In the cerebellum there are many Purkinje
cells which receive input from essentially the same
mossy fibers. Thus, there are many CMACs all computing on the same
input vector S. We can therefore say that a set of L
CMACs computing on the same input vector
produces a vector mapping:
P=H(S).
One of the most fascinating, intensively studied, and least understood features of the brain is memory, and how data is stored in memory. In the cerebellum each Purkinje cell has a unique fiber, a climbing fiber, which is believed to be related to learning. Fibers from an area called the locus coeruleous have recently been discovered which appear to be related to learning. In addition, a number of hormones have been shown to have profound effects on learning and retention of learned experiences.
While the exact mechanism (or mechanisms) for memory storage are as yet unknown, the cerebellar model upon which CMAC is based hypothesizes that climbing fibers carry error correction information which "punishes" synapses that participate in erroneous firing of the Purkinje cell. The amount of error correction that occurs at any one experience may depend on factors such as the state of arousal or emotional importance attached by the brain's evaluation centers to the data being stored during the learning process.
Cerebellar learning is modeled in CMAC by the following procedure:
FIXME:
However, if Ipi - pi I > ti then add Ai
FIXME:
to every weight which was summed to
produce pi where:
._ (pi- pil c
I g
IA*I /
(1)
IA *I is the number of weights in the
set A* which contributed to p, and g
is a gain factor which controls the
amount of error correction produced
by one learning experience.
If g = 1, then CMAC produces oneshot learning which fully corrects the observed error in one data storage operation. If 0 %lt g %lt 1, then each learning experience moves the output pi only in the direction of the desired value p. More than one memory storage operation is then required to achieve correct performance.
FIXME:
An example of how an arbitrary function
such as: = (sin x)(sin y)
where: x = 27r si /360
and:
y = 27r s1 /360
can be stored in CMAC is shown in figure
12. In this example the input is defined with
unity resolution over the space 0 %lt st %lt360
and 0 %lt s2 %lt 180, and the number of
weights selected by each input is IA *I = 32.
Initially all the weights were equal to 0. The point Si _ (90, 90) was chosen for the first data entry. The value of the desired function p = h (90, 90) is 1. By formula (1) (where g = 1) each of the weights selected by S = (90, 90) is set to 1/32, causing the prop- er value to be stored at S = (90,90) as shown in figure 12a. After two data storage operations, one at (90, 90), the other at (270, 90), the contents of the CMAC memory are as shown in figure 12b. After 16 storage operations along the s2 = 90 axis the results are as shown in figure 12c. After 175 storage operations scattered over the entire input space, the contents of the CMAC memory are as shown in figure 12d.
Back to ContentsThe CMAC S - A* mapping corresponds to an address decoder wherein S is the input address and the active granule cells in A* are select lines. These access weights whose sum can be interpreted as the contents of the address S. In a conventional memory, each possible input address selects a unique single location wherein is stored the contents of that address, as illustrated in figure 13a. In CMAC each possible input address selects a unique set of memory locations, the sum of whose contents is the contents of the input address, as shown in figure 13b.
This suggests that the Cerebellar Model Arithmetic Computer might require considerably less memory than a conventional lookup table in storing certain functions. The reason is that the number of ways that x elements can be selected from a table of y entries always exceeds y and, in some cases, it does so by orders of magnitude.
A conventional memory requires RN memory locations to store a function of N variables, where R is the number of resolution elements on each variable. CMAC re- quires at most K X QN memory locations, when K is the number of quantizing functions and Q the number of resolution elements on each quantizing function.
A modest example of CMACs reduced memory requirements can be seen in fig- ure 8 where N = 2 and R = 17. Here then are 172, or 289, possible input vectors. The CMAC shown has only 100 weights since K = 4 and Q = 5. Thus K X QN = 100. This savings in memory size becomes increasingly significant for large N. It allows CMAC to store a large class of low resolution functions of up to 12 variables over the entire input space with computer memory of practical size (less than 100 K bytes), whereas conventional table lookup becomes impractical for similar functions of more than four variables.
An even greater savings in memory
requirements can be achieved by the use of
hash coding techniques in the selection of
addresses for the elements in A *. Hash coding
allows CMAC to store functions of many
variables, so long as the information content
of the portion of the function stored does
not exceed the number of bits in the CMAC
Hash coding is a commonly used memory addressing technique for compressing a large but sparsely populated address space into a smaller, more densely populated one. (See "Making Hash with Tables" by Terry Dolhoff in Programming Techniques: Program Design, BYTE Books, 1979.) Many addresses in the larger space are mapped onto each of the addresses in the smaller space. One method is simply to overlay pages. Hashing works when the probability of a collision (ie: more than one filled location in the large memory mapping into the same address in the small memory) is low.
CMAC can tolerate a fairly high incidence of collisions because of its distributed memory (ie: its output is the sum of many locations). Thus a collision (which in a conventional memory would make the output completely incorrect) in CMAC introduces only a small amount of noise into the output. Hash coding noise can be seen in the base plane in figure 12a, b, c.
In CMAC, hashing noise is randomly scattered over the input space each time new data is stored. Thus each new data storage operation degrades previously stored data somewhat. The effect is that the contents of a CMAC memory are most accurately defined in the regions where it is most recently stored. Old data tends to gradually fade, or be "forgotten ", due to being hashed over.
Back to ContentsThe fact that each possible CMAC input vector selects a unique set of memory locations rather than a single location implies that any particular location may be selected by more than one input vector. In fact, the S -%gt A* mapping insures that any two input vectors which are similar (ie: close together in input space) will activate many of the same granule cells, and hence select many of the same weights. This is the property of CMAC which causes it to generalize.
In figure 14a the input vector S2 selects three out of four of the same memory locations as Si. Thus, the output h(S2) will be similar to h(Si ), differing only by the contents of the single location which is not in common. The S - A* mapping controls the amount of overlap between sets of selected memory locations such that, as the input space distance between two input vectors increases, the amount of overlap decreases. Finally, at some distance the overlap be- comes 0 (except for random hashing collisions), as in figure 14b, and the sets of selected memory locations are disjoint. At that point input S2 can be said to be outside the neighborhood of generalization of Si . The value of the output h(S2) is thus independent of h(S1).
The extent of the neighborhood of generalization depends on both the number of elements in the set A* and the resolution of the si - m1* mappings. It is possible in CMAC to make the neighborhood of generalization broad along some variable axes and limited along others by using different resolution quantizing functions for different input variables. This corresponds to the effect in the cerebellum where some input variables are resolved finely by many mossy fibers and others resolved more coarsely by fewer mossy fibers.
A good example of generalization can be seen in figure 12a. Following a single data storage operation at S1 = (90, 90) we find that an input vector S2 = (91, 90) will produce the output p = 31/32 even though nothing had ever been explicitly stored at (91, 90). This occurs because S2 selects 31 of the same weights as S1 . A third vector S3 = (92, 90) or a fourth S4 = (90, 92), will produce p = 30/32 because of sharing 30 weights with S1 . Not until two input vectors are more than 32 resolution elements apart do they map into disjoint sets of weights.
As a result of generalization, CMAC memory addresses in the same neighborhood are not independent. Data storage at any point alters the values stored at neighboring points. Pulling one point to a particular value as in figure 12a produces the effect of stretching a rubber sheet.
Generalization has the advantage that data storage (or training) is not required at every point in the input space in order for an approximately correct response to be obtained. This means that a good first approximation to the correct H function can be stored for a sizable envelope around a Ts trajectory by training at only a few points along that trajectory. For example, figure 12c demonstrates that training at only 16 points along the trajectory defined by s2 = 90 generalizes to approximately the correct function for all 360 points along that trajectory plus a great many more points in an envelope around that trajectory. Further training at 175 points scattered over the entire space generalizes to approximately the correct response for all 360 by 180 (over 64,000) points in the input space as shown in figure 12d.
Generalization enables CMAC to predict on the basis of a few representative learning experiences what the appropriate behavioral response should be for similar situations. This is essential in order to cope with the complexities of real world environments where identical Ts trajectories seldom, if ever, reoccur.
An example of how CMAC uses generalization to learn trajectories in a high-dimensional space is shown in figure 15. A seven degree of freedom manipulator arm was controlled by seven CMACs, one for each joint actuator, such that the output vector P = H(S) had seven components. The input vec- tor S to each CMAC contained 18 variables corresponding to position and velocity feedback from each of the seven joints of the arm, plus four binary bits defining the Elemental Move Command. The resolution on the feedback variables was different for each of the seven CMACs, being highest resolution from the joint driven by the output pr and lower for other joints in inverse proportion to their distance along the arm from the controlled joint.
The desired output trajectory Tp is shown as the set of solid curves marked (a) in figure 16. This trajectory corresponds to the Elemental Movement <SLAP> which is a motion an arm might make in swatting a mosquito.
The (i) curve in figure 17 shows the learning performance with no previous learning over twenty complete Tpa "slap" motions. At the beginning of each motion the arm was positioned at the correct starting point and driven from there by the P output computed by the CMAC H function. Differences between P and P at 20 points along the slap trajectory were corrected by formula I (with g set to 1/20). Each point on the curve in figure 17 represents the sum of all the errors for all the joints during an entire slap mo- tion. Note that learning is rapid despite the high dimensional input space in which no two T5 trajectories were ever exactly the same. This is due to CMAC's ability to generalize from a relatively small number of specific teaching experiences to a large number of similar but not identical trajectories.
The (ii) curve in figure 17 shows the learning performance on the same twenty Tpa trajectories when preceded by twenty training sessions on the 1-Pb trajectory indicated by the dotted set of curves marked (b) in figure 16. Note that performance on Tpa is consistently better following prior learning on a similar trajectory Tpb. The learning on Tpb generalizes to the similar trajectory Tpa.
Needless to say, predictions based on generalization are not always correct and sometimes need to be refined by further learning. The ability of CMAC to discriminate (ie: to produce different outputs for different inputs, (Si and S.) depends upon how many weights selected by Si are not also selected by S2, and how different in value those weights are. If two inputs which are close together ill input space are desired to produce significantly different outputs, then repeated training may be required to overcome the (in this case erroneous) tendency of CMAC to generalize by building up large differences in the few weights which are not in common.
In most behavioral control situations, sharp discontinuities requiring radically different outputs for highly similar inputs do not occur. Indeed most servocontrol functions have simple S shaped characteristics along each variable axis. The complexity in control computation in multivarient servo - systems typically derives from cross - products which affect the slope of the function, or produce skewness, and nonsymetrical hills and valleys in various corners of the N dimensional space. As can be seen from figure 11 these are the type of functions CMAC can readily store, and hence compute. Nevertheless, even on smooth functions generalization may sometimes introduce errors by altering values stored at neighboring locations which were already correct. This type of error corresponds to what psychologists call learning interference, or retroactive inhibition.
For example, in the learning of the two similar trajectories in figure 16, training on Tpa causes degradation or interference with what was previously learned on Tp . This can be seen in figure 18 where, after TO training sessions on Tp0, the CMAC is trained 20 sessions on Tr, . Following this the performance on Tpb is degraded. However, the error rate on Tpb quickly improves over another 20 training sessions. Following this another 20 training sessions are conducted on T. Again degradation in TPb due to learning interference occurs, but not as severely as before. Another set of 20 training sessions on TPb followed by another 20 on TPe shows that the amount of learning interference is declining due to the buildup of values in the few weights which are not common to both T58 and Tsb. Thus, learning interference, or retroactive inhibition, is overcome by repetition of the learning process.
The ability of CMAC to store and recall (and hence compute) a general class of multivarient mathematical functions of the form P = H(S) demonstrates how a relatively small cluster of neurons can calculate the type of mathematical functions required for multivarient servomechanisms, coordinate transformations, conditional branches, task decomposition operators, and IF /THEN production rules. These are the types of functions that we showed in part 1. They are required for generating goal- directed behavior (ie: the purposive strings of behavior patterns such as running, jumping, flying, hunting, fleeing, fighting, and mating, which are routinely accomplished with apparent ease by the tiniest rodents, birds, and even insects).
In the case of multivarient servomechanisms the S vector corresponds to commands plus feedback (ie: S = C + F). For coordinate transformations the S vector contains the arguments as well as the variables in the transformation matrix.
In the case of conditional branches,
one or more of the input variables in S can be
used to select different regions in input
space where entirely different functions are
stored. Assume, for example, that in figure
12 a third variable 53 had been included in
the function being stored. Assume that s3 is
held constant at s3 = 0 while storing the
function p = (sin x)(sin y). Following that,
an entirely different function, say p = 3x + 5y2,
could be stored with s3 held constant
at s3 = 50. Since every point in the
input space for s3 = 0 is outside the neighborhood
of generalization of the input space
for s3 = 50, there would be no interference
except for random hashing collisions. The
stored function would then be:
p = (sin x)(sin y) if s3 = 0
p = 3x +5yz if s3 =50 .
In the interval 0 %lt s3 %lt 50 the function
would change smoothly from p = (sin x)
(sin y) to p = 3x + SY2. Additional functions
could be stored for other values of s3,
or other conditional variables sq, 55, and so
on might be used for additional branching
capabilities. If these conditional variables
are part of a command vector, then each different
input command can select a different
subgoal generator. If they are part of the
feedback, then different environmental conditions
can trigger different behavioral patterns
for accomplishing the subgoals.
If some of the variables in the P output vector loop directly back to become part of the S input vector (as frequently happens in the cerebellum as well as in other parts of the brain), then CMAC becomes a type of finite state automaton, string generator, or task decomposition operator. For example, the CMAC in figure 19a behaves like the finite state automaton in 19b. The loop-back inputs st and s2 define the state of the machine, and s3 is the input. The H function defines the state transition table. In general it is possible to construct a CMAC equivalent of any finite state automaton. Of course, CMAC can accept inputs and produce outputs which are nonbinary. Furthermore, the outputs generalize. Thus, CMAC is a sort of "fuzzy state automaton."
A Cerebellar Model Arithmetic Computer with direct feedback from output to input demonstrates how a neural cluster can generate a string of outputs (subgoals) in response to a single input, or unchanging string of inputs. Additional variables added to F from an external source increase the dimensionality of the input space and can thus alter the output string (task decomposition) in response to environmental conditions.
The different possible feedback pathways to a CMAC control module cast light on a long standing controversy in neurophysiology regarding whether behavior patterns are generated by "stimulus- response chaining" (ie: a sequence of actions in which feedback from sensory organs is required to step from one action to the next) or by "central -patterning" (ie: a sequence which is generated by internal means alone). A CMAC hierarchy may include tight feedback loops from the output of one level back to its own input to generate central patterns, longer internal loops from one level to another to cycle through a sequence of central patterns, as well as feedback from the environment to select or modify central patterns or their sequence in accordance with environmental conditions.
The above discussion makes it obvious that CMAC can also implement IF /THEN production rules by the simple mechanism of making the S vector (or the TS trajectory) correspond to an IF premise. The P vector output (or Tp trajectory) becomes the THEN consequent.
The capability of CMAC to simulate a finite state automaton, to execute the equivalent of a conditional branch, and to compute a broad class of multivarient functions makes it possible to construct the CMAC equivalent of a computer program. Conversely it is possible to construct a hierarchy of computing modules, perhaps implemented on a network of microprocessors, which is the equivalent of a CMAC hierarchy. This has profound implications re- garding the type of computing architecture which might be used to build a model of the brain for robot control.
Note in this regard that CMAC produces nothing comparable to a DO loop or an interrupt. Each CMAC is a state machine which samples (or polls) a set of input variables and computes a set of output variables. There is no way that it can be instructed to DO something N times. CMAC can, of course, perform a DO -UNTIL in the sense that if the input is constant, the output will remain constant until the input changes. Thus for a constant input St, CMAC will DO Pt = H(S1) UNTIL Si changes to S2. But this is not a DO loop in the customary sense.
Similarly, one or more of the CMAC input variables can be used to "interrupt" an ongoing trajectory by causing a branch to a new trajectory. A hierarchy of CMACs can return to the interrupt trajectory after a deviation, if the higher level goals remain unchanged throughout the lower level trajectory deviation. This, however, is quite a different mechanism from the interrupt circuitry in the normal computer where a program counter is stored so that program execution can continue after the interrupt has been serviced.
The implication here is that a set of robot control programs modeled after a CMAC hierarchy will include no DO -loops and will not be interrupt driven. Every computing module will implement a simple state mapping function of the form P = H(S).
Note also that in a CMAC hierarchy, a deviation in a higher level trajectory changes the command string, and hence the program, of all the levels below it. This implies real time modification of program statements and thus makes the use of a compiler based programming language somewhat cumber- some. A robot control system modeled after a CMAC hierarchy should use some form of an interpretive language where program statements are translated into machine code at execution time. A language similar to FORTH seems ideal. (An interpretive language can, of course, be written in a compiler based language. Also, languages can be devised which are partially compiled and partially interpreted.) We will return to these and other practical issues of computing architecture for robot control at a later time.
Back to ContentsAs was discussed in part 1, any spatial pattern can be represented as a vector. For example, a picture can be represented as an array, or ordered list, of brightness or color values. A symbolic character can be represented as an ordered list of features (or arbitrary numbers, as in the ASCII convention). Any temporal pattern can be represented as a trajectory through an N- dimensional space. For example, an audio pattern is a sequence of pressure or voltage values (ie: a one- dimensional trajectory). A moving picture or television scene corresponds to a sequence of picture vectors (ie: an N- dimensional trajectory where N is the number of picture resolution elements or pixels).
The fundamental problem of pattern recognition is to name the patterns. All the patterns with the same name are in the same class. When a pattern has been given a name we say it has been recognized. For example, when the image of a familiar face falls on my retina and I say to myself "That's George," I have recognized the visual pattern by naming it.
At this point we need to introduce some
new notation to clearly distinguish between
vectors in the sensory processing hierarchy
and those in the behavior -generating hierarchy.
Thus we will define the input vector
to a CMAC pattern recognizer as:
D =E +R
where:
E= (di ,d2,...,di)
s a vector, or list, of data variables derived
from sensory input from the external
environment, and:
R = (di+t, . . . dN)
s a vector of data variables derived from re-
called experiences, or internal context. The
CMAC mapping operator in the sensory
processing hierarchy will be denoted G and
the output Q such that:
Q = G(D)
We can now define a CMAC D vector to
represent a sensory pattern plus context
such that each component di represents a
data point or feature of the pattern plus
context. The existence of the D vector within
a particular region of space therefore
corresponds to the occurrence of a particular
set of features or a particular pattern in a
particular context. The recognition problem
is then to find a set of CMAC weights such
that the G function computes an output
vector:
Q = G(D)
such that Q is the name of the pattern plus
context D as shown in figure 20.
In other words G can recognize the existence
of a particular pattern and context
(ie: the existence of D in a particular region
of input space) by outputting the name Q.
For example,
Q = Class I whenever D is in Region 1
Q = Class II whenever D is in Region 2
.
.
.
etc.
The D -+ A mapping in the sensory processing CMAC can be chosen so as to define the size of the neighborhood of generalization on the input space. This means that, as long as the regions of input space corresponding to pattern classes are reason- ably well separated, the G function can reliably distinguish one region of input space from another and hence classify the corresponding sensory patterns correctly.
In the case where the D vector is time dependent, an extended portion of a trajectory TD may map into a single name Q as shown in figure 21. It then is possible by integrating Q over time and thresholding the integral to detect, or recognize, a temporal pattern TD such as a sound or a visual movement.
Note that the recognition, or naming, of a temporal pattern (as illustrated in figure 21) is the inverse of the decomposition of a task as illustrated in figures 14 thru 17 in the previous article in this series. In task decomposition a slowly varying command C is decomposed into a rapidly changing output P. In pattern recognition a rapidly changing sensory experience E is recognized by a slowly varying name Q.
It frequently occurs in pattern recognition or signal detection that the instantaneous value of the sensory input vector E is ambiguous or misleading. This is particularly true in noisy environments or in situations where data dropouts are likely to occur. In such cases the ambiguity can often be resolved or the missing data filled in if the context can be taken into account, or if the classification decision can make use of some additional knowledge or well founded prediction regarding what patterns are expected.
In CMAC the addition of context or prediction variables R to the sensory input E such that D = E + R increases the dimensionality of the pattern input space. The context variables thus can shift the total input (pattern) vector D to different parts of input space depending on the context. Thus, as shown in figure 22, the ambiguous patterns E1 and E2, which are too similar to be reliably recognized as being in separate classes, can easily be distinguished when accompanied by context R1 and R2.
In the brain, many variables can serve as context variables. In fact, any fiber carrying information about anything occurring simultaneously with the input pattern can be regarded as context. Thus context can be data from other sensory modalities as well as information regarding what is happening in the behavior -generating hierarchy. In many cases, data from this latter source is particularly relevant to the pattern recognition task, because the sensory input at any instant of time depends heavily upon what action is currently being executed. For example, information from the behavior -generating hierarchy provides contextual information necessary for the visual processing hierarchy to distinguish between motion of the eyes and motion of the room about the eyes.
In a classic experiment, von Holst and Mittelstaedt demonstrated that this kind of contextual data pathway actually exists in insects. They observed that a fly placed in a chamber with rotating walls will tend to turn in the direction of rotation so as to null the visual motion. They then rotated the fly's head 180° around its body axis (a procedure which for some reason is not fatal to the fly) and observed that the fly now circled endlessly. By attempting to null the visual motion it was now actually increasing it.
Later experiments with motion perception in humans showed that the perception of a stationary environment despite motion of the retinal image caused by moving the eyes is dependent on contextual information derived from the behavior -generating hierarchy. The fact that the context is actually derived from the behavior -generating hierarchy rather than from sensory feedback can be demonstrated by anesthetizing the eye muscles and observing that the effect depends on the intent to move the eyes, and not the physical act of movement. The perceptual correction occurs even when the eye muscles are paralyzed so that no motion actually results from the conscious intent to move.
Back to ContentsContextual information can also provide predictions of what sensory data to expect. This allows the sensory processing modules to do predictive filtering, to compare incoming data with predicted data, and to "flywheel" through noisy data or data dropouts.
The mechanism by which such predictions, or expectations, can be generated is illustrated in figure 23. Here contextual input for the sensory processing hierarchy is shown as being processed through a CMAC M module before being presented to the sensory pattern recognition G modules at each level. Inputs to the M modules derive from the P vector of the corresponding behavior -generating hierarchy at the same level, as well as an X vector which includes context derived from other areas of the brain, such as other sensory modalities or other behavior -generating hierarchies. These M modules compute R = M(P + X). Their position in the links from the behavior- generating to the sensory processing hierarchies allows them to function as a predictive memory.
They are in a position to store and recall (or remember) sensory experiences (E vector trajectories) which occur simultaneously with P and X vector trajectories in the behavior -generating hierarchy and other locations within the brain. For example, data may be stored in eachMi module by setting the desired output Ri equal to the sensory experience vector Ei. At each instant of time t = k, sensory data represented by Ek will then be stored on the set of weights selected by the Pk + X. vector. The result will be that the (sensory experience represented by the sensory data trajectory TEi will be stored in association with the context trajectory Tpi +Xi.
Any time afterwards, t = k + j, a reoccurrence of the same context vector Pk +i + Xk +i = Pk + XI' will produce an output Rk +i equal to the El' stored at time t = k. Thus a reoccurrence of the same context trajectory Tpi + Xi will produce a recall trajectory TR. equal to the earlier sensory experience TE.. These predictive memory modules thus provide the sensory processing hierarchy with a memory trace of what sensory data occurred on previous occasions when the motor generating hierarchy (and other parts of the brain) were in similar states along similar trajectories. This provides the sensory processing system with a prediction of what sensory data to expect. What is expected is whatever was experienced during similar activities in the past.
In the ideal case, the predictive memory
modules Mi will generate an expected
sensory data stream TRi which exactly
duplicates the observed sensory data stream
TEi. To the extent that this occurs in
practice it enables the Gi modules to apply
very powerful mathematical techniques to
the sensory data. For example, the Gi
modules can use the expected data TRi to:
If we assume, as shown in figure 23, that predictive recall modules exist at all levels of the processing -generating hierarchy, then it is clear that the memory trace itself is multi- leveled. In order to recall an experience precisely at all levels, it is necessary to generate the same context (ie: Pi + Xi address) at all levels as existed when the experience was recorded.
Back to ContentsWe can say that the predictive memory modules Mi define the brain's internal model of the external world. They provide answers to the question, "If I do this and that, what will happen ?" The answer is that whatever happened before when this and that was done will probably happen again. In short, IF I do Y, THEN Z will happen when Z is whatever was stored in predictive memory the last time (or some statistical average over N last times) that I did Y, and Y is some action such as performing a task or pursuing a goal in a particular environment or situation, which is represented internally by the P vectors at the various different levels of the behavior -generating hierarchy and the X vectors describing the states of various other sensory processing behavior -generating hierarchies.
The Mi modules (as all CMAC modules) can be thought of as storing knowledge in the form of IF /THEN rules. The CMAC property of generalization produces a recall vector Ri (a THEN consequent) which is similar to the stored experience so long as the context vector Pi + Xi (the I F premise) is within some neighborhood of the context vector during storage.
Much of the best and most exciting work now going on in the field of artificial intelligence revolves around IF/THEN production rules, and how to represent knowledge in large computer programs based on production rules. Practically any kind of knowledge, or set of beliefs, or rules of behavior can be represented as a set of production rules. The CMAC hierarchy shown in figure 23 illustrates how such computational mechanisms can arise in the neurological structure of the brain.
Back to ContentsWe have now completed the second step in our development. I have described a neu- rological model which can store and recall (and hence compute) a broad class of mathematical functions. I have shown how a hierarchical network of such models can execute tasks, seek goals, recognize patterns, re- member experiences, and generate expectations. The final part of this series will include a brief overview of evidence that such networks actually exist in the brain. Also, this part will describe how a CMAC hierarchy can create plans, solve problems, and produce language. Finally I will discuss the design of robot control systems incorporating these properties and offer some suggestions as to how brain -like computing networks might be constructed and trained!
Back to Contents
James Albus
Project Manager
National Bureau of Standards
United States Dept of Commerce
Washington DC 20234
In parts 1 and 2 we have shown how a neurological model called the Cerebellar Model Arithmetic Computer (CMAC) can compute functions, recognize patterns, and decompose goals. We have also shown how a crosscoupled hierarchy of CMACs (see figure 1) can memorize trajectories, generate goal directed purposive behavior, and store an internal model of the external world in the form of predicted sensory data. In this third article we will attempt to show how this structure and its capabilities can give rise to perceptual and cognitive phenomena.
The fact that the mathematical details of the CMAC model were derived from the cerebellum, a portion of the brain particularly regular in structure and hence uniquely suitable for detailed neurophysiological analysis, does not mean that the results are inapplicable to other regions of the brain as well. The basic structure of a large output cell (sometimes called a principal, relay, or projection neuron) served by a cluster of local interneurons is quite typical throughout the brain. Such clusters commonly receive input from a large number of nonspecific neural fibers similar to the mossy fibers in the cerebellum. In many instances they also receive specific inputs which are more or less analogous to climbing fibers. As we might expect, there are many differences in size and shape of the corresponding cell types from one region of the brain to another. These reflect differences in types of computations being performed and information being processed, as well as differences in the evolutionary history of various regions in the brain. Nevertheless, there are clear regularities in organization and similarities in function from one region to another. This suggests that, at least to a first approximation, the basic processes are similar.
The implication is that the general model of information processing defined by CMAC (the concept of a set of principal neurons together with their associated interneurons transforming an input vector S into an output vector P in accordance with a mathematically definable relationship H) may be useful in analyzing the properties of many different cortical regions and subcortical nuclei. This is particularly true since the accuracy, resolution, rate of learning, and degree of generalization of the CMAC H function can be chosen to mimic the neuronal characteristics of different areas in the brain.
Back to ContentsThe idea that the central nervous system, which generates behavior in biological organisms, is hierarchically structured is an old one, dating back considerably more than a century. The analogy is often made to a military command structure, wherein many hundreds of operational units and thousands, even millions of individual soldiers are coordinated in the execution of complex tasks or goals. In this analogy each computing center in the behavior -generating hierarchy is like a military command post, receiving commands from immediate superiors and issuing sequences of subcommands which carry out those commands to subordinates.
Feedback is provided to each level by a sensory -processing hierarchy which ascends parallel to the behavior -generating hierarchy, and which operates on a data stream derived from sensory units which monitor the external environment as well as from lower level command centers which report on the progress being made in carrying out their subcommands. Feedback is processed at many levels in this ascending hierarchy by intelligence analysis centers that extract data relevant to the cornmand and control functions being performed by the behavior- generating module at that level.
Each of these intelligence analysis centers makes predictions based on the results expected (ie: casualties, rewards, sensory data patterns) as a consequence of actions currently being taken. The intelligence centers then interpret the sensory data they receive in the context of these predictions. For example, in military intelligence analysis a loss of 60 men in an operation where losses had been predicted at 600 implies an unexpectedly easy success, and perhaps indicates a weakness in the enemy position which should be further exploited. In the brain, the observation of 60 nerve impulses on an axon where 600 has been anticipated may imply an unexpectedly weak branch in a tree, upon which the placing of any weight will result in a fatal fall from the treetop.
The response of each command post (or data analysis center) in the hierarchy to its input depends on how it has been trained. Basic training teaches each soldier how to do things the "army way" (ie: what each command means and how it should be carried out). Each operational unit in the military has a field manual which defines the proper, or ideal response of that unit to every foreseeable battlefield situation. Each field manual is essentially a set of IF /THEN production rules or case statements, corresponding to a set of CMAC functions, P = H (S) or Q = G (D). At the lowest level in the military analogy these rules define the proper procedures for maintaining and operating weapons, as well as the proper behavioral patterns for surviving and carrying out assignments under battlefield conditions. At higher levels they define the proper tactics for executing various kinds of maneuvers. At the highest level, they define the proper strategy for deployment of resources and achievement of objectives.
In the case where each unit carries out its assignment "according to the book," the overall operation runs smoothly and the goal is achieved on schedule as expected. To the extent that various units do not follow their ideal trajectories, either because of improper training or because of unforeseen difficulties in the environment, the operation will deviate from the expected or planned schedule. Alternate tactics may be required. If a change in tactics still does not produce success, new strategies may be required. Of course, there is always the possibility that failure will occur, despite every effort. The goal will not be achieved or, worse yet, the organism may suffer a catastrophic setback.
There is considerable anatomical, neurophysiological, and behavioral evidence that the analogy between the brain and a military hierarchy is quite accurate. However, in saying this, it is important to keep in mind that the highly schematic hierarchy shown in figure 1 is a grossly oversimplified diagram of the vast interconnected hierarchical network which is the brain. Every motor neuron in the nervous system can be thought of as being controlled by its own hierarchy which interleaves and overlaps extensively with the hierarchies of nearby synergistic motor neurons. Each sensory-motor system has its own set of overlapping hierarchies which become increasingly interrelated and interconnected with each other at the higher levels. Thus, the entire brain may have the topological shape of an inverted paraboloid as shown in figure 2.
Back to ContentsThere is in fact some evidence to suggest that the human brain is topologically similar to three (or more) concentric paraboloid hierarchies as illustrated in figure 3. Paul MacLean and others have hypothesized a triune brain wherein the inner core is a primitive structure (ie: the reptilian brain) which provides vital functions such as breathing and basic reflexive or instinctive responses such as eating, fighting, fleeing, and reproductive activities. Superimposed on this inner core is a second layer (ie: the mammalian brain) which is capable of more sophisticated sensory analysis and control. This second layer tends to inhibit the simple and direct responses of the first so as to apply them more selectively and to delay responses until opportune moments. This second brain thus provides the patient waiting behavior necessary for effective hunting of prey. On top of this is yet a third layer (ie: the primate brain) which possesses the capacity to manipulate the other two layers in extremely subtle ways; to imagine and plan, to scheme and connive, to generate and recognize signs and symbols, to speak and understand what is spoken.
The outer layers employ much more sophisticated sensory analysis and control algorithms that detect greater subtleties and make more complex decisions than the inner more primitive layers are capable of performing. Under normal conditions the outer layers modify, modulate, and sometimes even reverse the sense of the more primitive responses of the inner layers. However, during periods of stress, the highly sophisticated outer layers may encounter computational overload and become confused or panicked. When this happens, the inner core hierarchy may be released from inhibition and execute one of the primitive survival procedures stored in it (ie: fight, flee, or freeze). A similar takeover by the inner hierarchy may occur if the more delicate circuitry of the outer is disrupted by physical injury or other trauma. Thus the brain uses its redundancy to increase reliability in a hostile environment.
Of course, all three layers of the behavior -generating hierarchy come together at the bottom level in the motor neuron - the final common pathway.
Back to ContentsIn the military hierarchy analogy, the motor neurons are the foot soldiers. They produce the action. Their firing rates define the output trajectory of the behavior -generating hierarchy. A CMAC representing a spinal motor neuron and its associated interneurons receive feedback F from stretch receptors via the dorsal roots, as well as from other motor neurons reporting ongoing activity in related muscles. The command vector C to this lowest level comes from the vestibular system, which provides inertial reference signals necessary for posture and balance, as well as from the reticular formation and basal ganglia (and in primates, also directly from the motor cortex).
There is nothing analogous to climbing fibers for the motor neurons, but this is not surprising since there is evidence that little or no learning takes place at this first level in the behavior -generating hierarchy.
Evidence for second, third, and fourth levels in the behavior-generating hierarchy comes from experiments with animals and observations of injured humans where the spinal cord is severed at different levels. If, as is shown in figure 4, the cord is severed from the brain along the line A -A, most of the basic motor patterns such as the flexor reflex and the reflexes that control the basic rhythm and patterns of locomotion remain intact. However, coordinated activation of these patterns to stand up and support the body against gravity requires that the regions below B-B be intact.
The stringing together of different postures to permit walking and turning movements requires the regions below C -C to be undamaged. In particular it is known that the rotational movements of the head and eyes are generated in the interstitial nucleus; raising and lowering of the head in the prestitial nucleus; and flexing movements of the head and body in the nucleus precommissuralis. Stimulation of the subthalamic nuclei can cause rhythmic motions including walking. A cat with its brain sectioned along C-C can walk almost normally. However, it cannot vary its walking patterns to avoid obstacles.
Animals whose brains are cut along the line D-D can walk, avoid obstacles, eat, fight, and carry on normal sexual activities. However, they lack purposiveness. They cannot execute lengthy tasks or goals. Humans with brain disease in the basal ganglia may perform an apparently normal pattern of movements for a few seconds and then abruptly switch to a different pattern, and then another. One form of this disease is called St Vitus' dance.
Higher levels of the behavior-generating hierarchy become increasingly difficult to identify and localize, but there is much to indicate that many additional levels exist in the cerebral cortex. For example, the motor cortex appears to be responsible for initiating commands for complex tasks. The ability to organize lengthy sequences of tasks, such as the ability to arrange words into a coherent thought or to recall the memory of a lengthy past experience, seems to reside in the posterior temporal lobe. Interactions between emotions and intentional behavior appear to take place in the mediobasal cortex, and long term plans and goals are believed to derive from activity in the frontal cortex. Hierarchies of different systems (ie: vision, hearing, manipulation, locomotion, etc) merge together in the association areas.
Back to ContentsIt is a well established fact that hierarchies of sensory -processing modules exist in the brain. In a famous series of experiments, Hubel and Wiesel demonstrated four clearly distinguishable hierarchical levels in the visual system. Similar sensoryprocessing hierarchies have been extensively studied in the auditory system and also the proprioceptive and kinesthetic pathways. Cross-coupling from these ascending hierarchies of sensory- processing modules to the motor -generating hierarchies provides the many different levels of sensory feedback information required at the various stages of the task or goal decomposition process. At each level, output vectors from the previous level of the sensory-processing hierarchy provide inputs to the next higher level, as well as feedback to the same level of the behavior -generating hierarchy.
In the case of vision, the two-dimensional nature of input from the surface of the retina causes the computational modules in the visual processing system to be organized in sheets. This implies that a CMAC model of a typical level in the visual processing hierarchy would resemble the structure shown in figure S. In this structure the sensory input D1 might consist of a pattern of sensory variables El defining light intensity (perhaps in a particular color band) together with predicted variables R1 which select a particular filter function. The output Qi = G1 (D1) then might define a pattern of edges or line segments. This output forms part of the input E2 to the second level. Output from the second level, Q2 = G2 (D2), might define patterns of connected regions or segments.
Recent work by David Marr at the Massachusetts Insititute of Technology and Jay Tennenbaum at SRI International suggests that the output vectors Qi at various levels may define more than one type of feature. For example, a single level in the visual processing system might contain a depth image (derived from stereo disparity, light gradients, local edge- interaction cues, etc), a velocity image (derived from motion detectors), and an outline drawing image (derived from edge detectors, line, and corner finders) in addition to brightness, color, and texture images of the visual field. These and many other kinds of information appear to exist in registration at several different levels of the visual information processing hierarchy so as to make possible the extremely sophisticated visual recognition tasks which our brains routinely perform. These different types of images interact, sometimes reinforcing each other so as to confirm a recognition, and sometimes contradicting each other so as to reject one possible interpretation of the visual input in favor of another.
Back to ContentsCross links from the descending hierarchies of motor -generating modules provide the many different levels of contextual and predictive information required at various stages of the pattern recognition or sensory analysis process. In the visual hierarchy, as well as in all other sensoryprocessing hierarchies, context variables Ri may define expected values of the Ei vectors. This implies that the addresses Pi and Xi have stored data from previous experiences when what is currently recalled as Ri was experienced as Ei. In this case the recalled context Ri is essentially a stored image, or map, which is accessed by an associative address created by the behavior -generating hierarchy being in a state more or less similar to that which existed when the remembered experience (ie: the map) was stored.
This implies that the sensory data processing hierarchy is a multilevel map (or template) matching process, and that in order to generate these maps the behavior -generating side of the crosscoupled hierarchy must be put into a state (or pulled along a trajectory) similar to that which existed when the template was recorded.
When this occurs, the interaction around the loop formed by the Gi, Hi, and Mi modules at each level is similar to a phase -lock loop, or a relaxation process. The data Ei enters the module Gi which recognizes it to be in a certain class Qi with perhaps an error of Fi. The recognition Qi triggers an appropriate goal decomposition (or subgoal selection) function in the Hi +1 (or higher) modules which generates a command (or hypothesis) Ci. This command, modified by the error F1, generates a subcommand (or subhypothesis) Pi and hence a predicted data vector Ri. The prediction Ri may confirm the preliminary recognition Qi and pull the context Pi into a more exact prediction via the feedback loop involving Fi. Alternatively the prediction R. may cause Gi to alter or abandon the recognition Qi in favor of another recognition Q 'i
Back to ContentsObviously such looping interactions involve timing and phase relationships which may themselves have information content. Many sensory data patterns, especially in the auditory, visual, and kinesthetic pathways, are time dependent and involve some form of rhythmic or harmonic temporal patterns as well as spatial relationships. For example, activities such as walking, running, dancing, singing, speaking, and gesturing all have a distinctly rhythmic and sometimes strictly periodic character.
As was discussed in part 1 of this series, temporal patterns at various levels correspond to trajectories with different time rates of change, and hence (assuming approximately the same information content stored as trajectories at each level) different periods or complete rhythmical patterns. For example, at the lowest level of the auditory system, brain cells are excited by mechanical and electrical stimuli with frequencies ranging from about 20 Hz to 20,000 Hz. These sensory inputs thus have periodicities from 0.00005 to 0.05 seconds.
The highest frequency a nerve axon can transmit is about 500 Hz, but the brain handles higher frequencies in a manner somewhat reminiscent of the cerebellum's encoding of precise position. It encodes pieces of information about the phase of a wavefront on a number of different fibers. This means that by knowing which fibers are firing in which combinations at which instants, one can compute not only what is the fundamental pitch of the temporal pattern but what are all of its overtones. Thus, the CMAC G function at the lowest level (or really the loop comprised of the lowest level G, H, and M modules) can compute the Fourier transform, or the autocorrelation function, and presumably even the Bessel function describing the modes of vibration of the cochlear membrane.
Assume for example, that the G, H,
and M modules in figure 6 constitute a
phase-lock loop such that the input
PATTERN is a signal f(t) and the
PREDICTION is another signal
f(t -r). If the processing module G
computes the product of the PATTERN
PREDICTION, then the output
NAME is f (t) f (t - r). When r
corresponds to 1/4 of the period of
the input f(t), a low pass filter applied
to the output will produce a phase
ERROR signal which, when applied
to the H module, can enable the
PREDICTION signal f(t -r) to track
and lock on to the input PATTERN
f(t). If the loop consists of a multiplicity of pathways with different
delays (r > 0), the output,
when processed through low pass
filters, will produce an autocorrelation
function:
such that:
where:
0 %lt r1 %lt r2 ... %lt ri
It has been shown that such an autocorrelation function produces a perception of pitch which is in good agreement with psychophysical data. In figure 6 the presence of an output on element qi would correspond to the perception of pitch at a frequency 1/ri .
Back to ContentsFigure 7 suggests how a hierarchy of phase -lock loops might interact to recognize the variety of periodicities which provide the information content in spoken language and music. The coefficients that qi obtained from the lowest level loop form the input (together with other variables) to the second level.
If we assume that the sensory input to the first level consists of a pattern rich in information, such as music or speech, then as time progresses the trajectory of the input vector to the second level will also contain many periodicities. The principal difference from the standpoint of information theory is that the periodicity is now on the order of 0.05 seconds to 0.5 seconds. The trajectory input to the second level can, of course, be subjected to a quite similar mathematical analysis as were the trajectories of hair cell distortions and cochlear electrical stimulation which were input to the first level.
The principal difference is that at the second level and higher, informa- tion can be encoded for neural transmission by pulse- frequency rather than pulse -phase modulation. Also, some of the mechanisms by which time integrals are computed may be different. Nevertheless, processing by a CMAC G function can transform sections of the input trajectory into output vectors so as, in effect, to give them names. Characteristic patterns, or periodicities, at the second level are named notes, when the sensory stimulus is music. Where the stimulus is spoken language, they may be called phonemes.
The output of the second level forms part of the input to the third. The G function at the third level computes the names of strings of phonemes which it calls words, or strings of notes which it calls tunes. The G function at the fourth level computes names of strings of words which it calls sentences (or ideas), strings of tunes which it calls musical passages, etc. In music, the pattern in which the different periodicities match up as multiples and sub-multiples (ie: the beat, notes, various voices, melodies, and chord sequences) comprise the inner structure, harmony, or "meaning." The ability of the sensory processing-generating hierarchy of the listener to lock on to the periodicities and harmonies at many different levels (and hence many different periodic intervals) is the ability to "appreciate" or "understand" the music.
Similarly in speech the ability of the audio -processing hierarchy to lock on to periodicities at each level, and to detect or recognize and pass on to the next level the information bearing modulations or deviations in those periodicities, constitutes the ability to "understand" what is spoken. If the audio system locks on only at the first level, it detects phonetic sounds but not words. If it locks on the first two levels but no higher, it detects words but not meaningful phrases. If, however, the audio hierarchy locks on at the third, fourth, fifth, and higher levels, there is excited in the mind of the listener many of the same trajectories and sequences of interrelated and harmonious patterns (ie: goals, hypotheses, sensory experiences) as exist in the mind of the speaker.
This gives the speaker the ability to transmit messages and, even more important, to manipulate the mind of the listener to achieve his own goals. He can recruit help, enlist sympathy, give orders, and transmit all forms of sophisticated signals related to dominance, submission, and social interaction. Furthermore, by this mechanism he can induce into the highest levels of the sensory processing hierarchy of the listener recalled memories of his own experience. He can tell tales, relate stories, and thereby provide others with secondhand information as to what strategies and goal decomposition rules he personally has found to be successful.
Back to ContentsOne of the most basic features of language is that it is a form of behavior. That seems an obvious thing to say, but evidently it is not. Many experts feel that because language is connected with the intellect (ie: a higher function) it is quite divorced from mere motor behavior. However, there is no such thing as mere motor behavior. All behavior is the final output trajectory in the decomposition of high level goals. The intellect is not something distinct from behavior. It is the deep structure of behavior. It is the set of nonterminal trajectories which generate and coordinate what finally results in the phenomena of purposive or intentional action.
Language is certainly like other behavior in that it results from the coordinated contractions of muscles; in the chest, throat, and mouth. Like any other behavior such as walking, dancing, making a tool, or hunting for prey, language is both learned and goal directed.
The infant is born with only the most basic verbal reflexes. At first primitives are learned (coos, gurgles, cries, and phonetic sounds of various types), then strings of primitives (words), and finally strings of strings (phrases), etc. The sensory processing system stores (ie: records) sounds from the environment as R; trajectories. Later the behavior -generating system learns to produce verbal outputs which mimic or duplicate these stored trajectories.
As with all behavior, the purpose of language is to obtain reward, to avoid punishment, and to achieve success in the social dominance hierarchy. The unique feature of language behavior is that it allows communication between individuals to enlist help, to issue commands, to organize group behavior, and to receive feedback information from the sensory experiences of others.
Certainly written language, at least, had its origins in goal-seeking activities. For example, the earliest writing in China began around 2000 BC as ideograms or symbols, engraved on bones and shells for the purpose of asking questions of heaven. Each stroke or series of strokes asks a certain question or seeks guidance for a particular branch point in the behavioral trajectory of the life of the asker.
The earliest of all known writing is the Uruk tablets discovered in the Mideast and dated about 3100 BC. This writing appears to be almost exclusively a mechanism for recording business transactions and land sales. These written symbols are now thought to be pictorial lists of tokens used for keeping track of merchandise or livestock. The tokens themselves first appeared 5000 years earlier during the beginning of the Neolithic period in Mesopotamia when human behavior patterns related to hunting and gathering were being replaced by others related to animal husbandry, agriculture, and the village market place.
This token method of accounting apparently served its purpose well, for the system remained virtually un changed for about 5 millennia until the early Bronze Age when cities and city-states became the most advanced social organizations, and commerce grew into a large scale and complex enterprise. Then the requirements for more efficient accounting procedures led to the pictorial listing of tokens by writing on tablets - an early y form of double-entry bookkeeping.
Once skill in this form of writing became widespread and commonly practiced, only a few additional symbols and some rules of syntax were required to express decrees, record dates, and relate accounts of significant events.
Thus, the language skill of writing evolved in small increments over many generations from the goal directed manipulation of physical objects; first the objects themselves, then token objects, and finally images or symbols representing the tokens. The meaning of the symbols, as well as the rules of syntax, were obvious to anyone having an everyday familiarity with the manipulation rules for tokens. These in turn mimicked the rules for manipulation of the objects of merchandise. The manipulation of symbols in written language is a form of goal- seeking behavior which evolved from, and remains similar to, the manipulation of physical objects.
Skill in writing, as any other com- plex goal- seeking activity, is acquired through painstaking training, endless practice, and numerous corrections of mistakes by a teacher. It is learned in stages, the lowest level primitives first (forming letters), then strings of primitives (words), then strings of strings (sentences), and so on. Only when the rules of spelling, grammar, and composition are more or less mastered can the scribe express or encode a thought (ie: a high level trajectory) into a string of written symbols.
Back to ContentsThe origin of speech is much less certain since it dates from an earlier period. In fact, if we include the sounds of whales, animals, birds, and even insects as a form of speech, spoken language predates the origin of humanity itself. Surely any behavior pattern which communicates a threat, signals submission, expresses fear or acceptance, is a form of language whether it be audible speech or sign language, whether it be expressed by a mouse or a human. By this definition, some speech is very simple - a single facial expression, gesture, chirp, growl, or squeak for each emotional state encoded or intent expressed. Throughout the animal kingdom however, there exists a great variety of modes of expression and many different levels of complexity. Clearly sounds such as the growls, whines, barks, and howls of the wolf express an extremely complex variety of social communications. One can easily feel caught up in a primitive corn - munity sing -along when listening to a recording of a wolf-pack chorus.
As we ascend the ladder of behavioral complexity, we find a corresponding increase in the ability to communicate complex messages. In most cases this appears to be not so much an increased vocal capacity as an increased complexity of deep structure underlying overt behavior. This implies that the ability to speak derives, first of all, from having something to say (ie: from having internal trajectories of sufficient complexity that to attach facial expressions, gestures, and audible sounds to them results in complex and subtle messages).
Back to ContentsThe most ancient forms of human speech that survive today are the tribal dances of the few remaining stone-age peoples. In such rites, information on vital subjects such as hunting (including the habits, ferocity, and vulnerable areas of the prey), the proper techniques of stalking, using weapons, etc, are conveyed by dance, symbolic gestures, pantomime, songs, and shouts, as the hunters relate (indeed reenact) the exploits of the hunt. The storytellers replay the behavioral trajectories of their own actual hunting experience and attach verbal symbols and gestures to the portions which cannot be literally acted out.
Even in modern cultures, the majority of everyday speech consists of relating experiences ( "...he did this, and I said that...," etc). This is simply the straightforward encoding of behavioral trajectories, or the recalled sensory experiences addressed by those behavioral trajectories, into a string of language tokens or symbols such as gestures, vocal cord, tongue, and lip manipulations. Thus, in the final analysis, all language is a form of goal- directed manipulation of tokens and symbols. The ultimate result is a manipulation of the minds, and hence the actions, of other members of the society. Language is a tool by which a speaker can arouse or implant in the listener a great variety of behavioral goals, hypotheses, and belief structures. By the use of these means, a speaker can command, instruct, threaten, entertain, or chastise other persons in his group to his own benefit and for his own ends.
The implication for research in language understanding is that there is much to be learned from the relationship between language and other forms of behavior. How, for example, can behavioral goals and trajectories be encoded into strings of language symbols for making requests, issuing commands, and relating sensory experiences? How can patterns of trajectories be encoded and transmitted by one processing -generating hierarchy so as to be received and reconstructed by another?
Clearly, language recognition depends on many of the same mechanisms by which the rhythms, periodicities, and harmonic patterns of music, song, and poetry are recognized, tracked, and predicted at many different levels. Consider that children are fascinated by rhythmical sounds, rhymes, and the repetition of familiar stories. Why do adolescents find it so rewarding to hear the same popular song over and over? Is it not the predictability, the lock -on which can be achieved due to a correspondence between the stored internal model and the observed sensory data stream? And why are the rhythmic movements of dancing and marching to music so compelling? Is it not the correlations and harmonic relationships between trajectories in the behavior-generating and sensory-processing hierarchies?
Music is a relatively simple domain for the study of the time dependent interactions between stored models and input data, and the study of music recognition by computer in an almost completely unexplored field. Thus, it is a fertile area for computer hobbyists and other researchers with limited resources.
Part 4 will discuss some operations of the highest hierarchical level such as will, emotion, and creativity.
Back to ContentsThe essence of a hierarchy is that control is top -down. The ultimate choices are made at the top, and the goals selected at this level are decomposed into action as they filter down through the various levels of the hierarchy. For the purposes of our discussion, we will define the highest level H function in the behavior-generating hierarchy of the human brain as the will.
For centuries philosophers and theologians have debated the nature of the will, particularly the question of whether humans have "free" will (ie: the freedom to choose goals) or whether all choice is merely a reflexive or predestined response to the environment. We shall not presume to deal with this question here, other than to suggest what types of inputs are available to this highest level goal selection module.
By definition much of the input to the highest level behavior-generating module must come from the highest level sensory-processing module.
This is the level at which the overall result of the entire sensory processing operation is evaluated as being good or bad, rewarding or punishing, satisfying or frustrating. In humans, this function is performed by what are commonly called the emotions. It has long been recognized that emotions play a crucial role in the selection of behavior. We tend to practice that which makes us feel comfortable and avoid what we dislike. Our behavior-generating hierarchy normally seeks to prolong, intensify, or repeat those behaviors which give us pleasure or make us feel happy or contented. We normally seek to terminate, diminish, or avoid those behavior patterns which cause us pain, or arouse fear or disgust.
In the past 25 years it has become known that the emotions are generated in localized areas, or cornputing centers, in the brain. For example, the posterior hypothalamus produces fear, the amygdala generates anger and rage, the insula computes feelings of contentment, and the septal regions produce joy and elation. The perifornical nucleus of the hypothalamus produces punishing pain, the septum pleasure, the anterior hypothalamus sexual arousal, and the pituitary computes the body's response to danger and stress. These emotional centers, along with many others, make up a complex of about 53 regions linked together by 35 major nerve bundles. This entire network is called the limbic system. Additional functions performed in the limbic system are the regulation of hunger and thirst performed by the medial and lateral hypothalamus, the control of body rhythms such as sleep -awake cycles performed by the pineal gland, and the production of signals which consolidate (ie: make permanent) the storage of sensory experiences in memory performed by the hippo - campus. This last function allows the brain to be selective in its use of memory by facilitating the permanent storage of sensory experiences to which the emotional evaluators attach particular significance (eg: close brushes with death, punishing experiences, etc).
Input to the limbic system emotional centers consists of highly processed sensory-data such as the names of recognized objects, events, relationships, and situations, such as the recognition of success in goal achievement, the perception of praise or hostility, or the recognition of gestures of dominance or submission transmitted by social peers. These inputs are accompanied by such modifier variables as confidence factors derived from the degree of correlation between predicted and observed sensory input.
Sensory processing at the level of the emotions is heavily influenced by contextual information derived from internal models and expectations at many different levels in the processing hierarchy. If a painful stimulus is perceived as being associated with a nonfear producing source, we may attack the pain causing agent. If, however, the perceived source of pain also induces fear, we may flee.
Similarly if an observed event such as a person talking to a flower is perceived as deviant, then this input to the emotions, along with other recognized qualifier variables such as the person is a) eccentric, b) retarded, or c) dangerously psychotic, will cause the emotions to output a) amusement, b) pity, or c) fear, respectively. Amusement input to the behavioral goal selecting module may lead to laughter, poking fun, or ridicule. Pity input to the will may evoke a behavioral pattern of sympathy. Fear may evoke an attempt to secure medical or psychiatric treatment, or incarceration.
If, however, a person talking to a flower is recognized as perfectly normal, then the emotions will give no indication that the event is particularly worthy of attention, or that there exists any need to deviate from whatever behavior is presently being executed. These relationships are described graphically and symbolically in figure 1.
In this model the standards of normalcy and deviance are clearly in the eye of the beholder, or at least in the expectations and beliefs stored in the processing -generating hierarchy. In many ways the emotional evaluators are even more dependent on internal beliefs than externally observed facts. This is particularly true in the case where a person's belief structure discounts the reliability or moral worth of the physical senses, as is characteristic of philosophical constructs derived from gnosticism or asceticism.
Thus the emotions, just as any other sensory processing module in the brain, simply compute a G function on the D vector that they input to produce the Q vector that they output. In simple creatures the emotional output vector may be restricted to a few components such as good-bad, pleasure-pain, etc. In higher forms the emotional output is a highly multidimensional vector with many faceted components such as love, hate, jealousy, guilt, pride, disgust, etc. Part of this Q output may simply produce feelings (ie: joy, sadness, excitement, fear, etc). However, most of the Q output directly or indirectly provides F input to the highest level H function, the will.
Output from the emotional centers is known to be of two types: one consists of signals on nerve fibers; the other consists of hormones and chemical transmitters which convey their messages (Q vector values) via fluid transport mechanisms.
What the G and H functions of the emotions and will are, and where they come from is a matter of hot dispute. One recent theory proposed by sociobiology is that they are genetically determined, derived from in formation stored in the DNA molecule, as the result of millions of years of natural selection. This theory argues that innate behavior -selecting mechanisms have evolved so as to maximize the Darwinian fitness (the expected number of surviving offspring) of their possessors.
The incidence of behavior in many different species from insects to birds to mammals corresponds closely to mathematical predictions derived from genetics and game- theory analyses of strategies for maximizing the probability of gene propagation. Even cooperative or altruistic behavior such as that of the worker bee, and ritualized behavior in animal contests and courtship, can in many cases be explained by genetic arguments. However, the evidence for this theory is much stronger for insects than for higher forms, and the opinion that human emotions are transmitted genetically is not widely held.
A competing theory put forward by behaviorists is that in higher forms the evaluator functions of the emotion and the selector functions of the will are mostly learned, perhaps even imprinted, during the early years of development. Certainly many of the emotional evaluations and behavior selection rules in the human brain are culturally determined, derived from religious teachings defining good and evil, or from social conventions defining duty, fairness, etiquette, and legality. These fundamental rules of opinion and behavior are instilled in the young by parents, educators, and religious and state authorities. They are reinforced throughout life by peer group pressure, as well as by church and civil sanctions.
There are, of course, many persons who would disagree with both of these theories. Perhaps the most widespread opinion (which until recent years was virtually unchallenged) is that the human will and its emotional evaluator inputs are non-mechanistic in nature and therefore unknowable in some fundamental sense. Many would even claim that emotions and will are subject to, or controlled by, spiritual and supernatural forces. For example, the doctrine of original sin states that the highest level behavior selecting mechanism, the human will, is basically defective because of the disobedience of Adam and Eve, and except for divine intervention is under the power of evil or satanic forces. The literature surrounding the age old controversy over free will versus predestination centers largely on the role of the Divinity (or the stars, or fates) in the determination of human behavior. Most cultures view the conscience (ie: the emotional evaluator for right and wrong or good and evil) as a divine gift or manifestation of the indwelling of the spirit of God.
Clearly the emotions and will are a very basic (some would say primitive) and compelling part of our behavioral mechanism. Carl Sagan calls them the Dragons of Eden. Humans are often driven, sometimes beyond rational justification, to heroic feats of courage or physical endurance by the behavior rules of duty or the emotions of love, pride, guilt, jealousy, and hate.
Whatever their origins, the G functions of our emotions and the H functions of the will can be modeled. They are rule based, and the rules are, for the most part, clearly defin ed. In many cases these rules are even written down as systems of moral philosophy, ethics, or rules of social behavior such as Emily Post's Book of Etiquette.
Nothing so complex need be modeled for the highest level G and H modules of a robot for many years. Nevertheless, every robot needs some sort of highest level evaluator and goal selector function in order to exhibit any sort of autonomous behavior. At what point in the spectrum of multidimensional sophistication we choose to dignify an evaluator function with the term emotion, or goal selection function with the term will, is not clear. What is clear is that simple approximations to the functions computed by the emotions and the will can be moduled by CMAC G and H functions operating on input vectors and computing output vectors. The degree of sophistication and complexity of the modeling is limited only by the ingenuity and resources of the modeler.
The interdependency of the processing and generating hierarchies suggests at least 3 distinct modes of operation.
Back to ContentsIn the task execution mode the motor -generating hierarchy is cormmitted to a goal, which it decomposes into subgoals, sub -subgoals, and finally into action primitives. In this mode the sensory -processing hierarchy is primarily engaged in providing feedback; first to aid in selecting the goal, then to steer the goal decomposition process, and finally to direct the output drive signals to the muscles (or actuators) so as to follow a success trajectory.
Consider a simple, everyday goal
such as the fixing of a leaking faucet.
First, the sensory processing system
must recognize the fact that the faucet
is leaking. This information is then
evaluated by the emotions as something
that needs attention. This
evaluation is passed on to the will,
where the rules of what ought to be
done and under what circumstances
reside. If there are no higher priority
items vying for the attention of the
will, then the goal
At each instant of time tk the sensory-processing module at each hierarchical level extracts feedback vectors F; required by the H behavior -generating modules at each level for goal decomposition. At the instant to when the goal is selected, the feedback F° at the various levels causes the selection of the initial subgoal decompositon P? This determines the initial direction of the trajectories Tp, on their way toward the goal state. As the task proceeds, the recognition of subgoal completions and /or unanticipated obstacles triggers the selection of the proper sequence of actions directed toward the goal achievement.
The entire set of trajectories Tp;describes the sequence of internal states of the brain which underlie and give rise to the observable phenomena of purposive behavior. These are the deep structure of behavior. Only the output trajectory, the terminal or bottom level trajectory, is manifested as overt action. The extent to which the trajectories Tp, are independent of feedback is the extent to which behavior is preprogrammed. The extent to which the feedback pulls the Tp, trajectories along predic table paths to the goal state is the extent to which behavior is adaptive. For some goals, such as hunting for prey or searching for breeding territory, the selection of the goal merely triggers migratory searching behavior which continues until feedback indicates that the goal is near at hand. For such goals, behavior is indefinite and highly feedback dependent. For other goals, such as building a nest, making a tool, courting a mate, or defending a territory, behavior is more inner-directed, requiring only a few sensory cues for triggers.
In either case, while in the acting mode the sensory data flowing in the sensory-processing hierarchy is highly dependent on (if not directly caused by) the action itself. If the action is speech, the sensory-processing hierarchy is analysing what is spoken, and provides feedback for control of loudness, pitch, and modulation. If the action is physical motion, data from vision, proprioception, and touch sensors are all highly action dependent, and the sensory analysis is primarily directed toward servo control of the action itself.
In the action mode, the M, associative memory modules provide context in the form of predicted data to the sensory- processing modules in order to distinguish between sensory data caused by motion of the sensors and that caused by motion of the environment. What is predicted is whatever was stored on previous experiences when the same action was generated under similar circumstances. This allows the sensory - processing hierarchy to anticipate the sensory input and to detect more sophisticated patterns in the sensory data than would otherwise be possible.
Back to ContentsA second mode of operation of the crosscoupled hierarchy is the analysis of sensory data from external sources not primarily caused by action of the behavior -generating hierarchy. For example, when listening to a concert, a speech, or a play, there is little action going on in the muscles and motor neurons. The lower levels of behavior -generating hierarchies are quiescent, or set to a constant value, or given a command to execute an overlearned task which can be carried out without any assistance from the upper levels.
The sensory-processing hierarchies, however, are very busy. They are filtering and predicting, recognizing patterns and trajectories, locking on to rhythms and harmonious periodicities, and tracking targets of attention. Predictions generated by the M modules are clearly required for these types of analyses, whether or not the organism is engaged in physical activity. This suggests that the upper levels of the behavior-generating hierarchies (which are not currently required for generating behavior) might be used instead to generate hypotheses and subhypotheses which in turn produce context and predictions to aid the sensoryprocessing hierarchy in the recognition, analysis, and understanding of incoming sensory data.
At each level hypotheses which generate TR predictions that match or track the TE sensory data trajectories will be confirmed. If the hypothesized TR trajectories are only close to the TE observations, they can be pulled by error signal feedback TR from the processing hierarchies. When a hypothesis is successful in generating predictions which match the sensory data stream, the loop at that level locks onto the sensory data. When lock -on is simultaneously achieved at many different levels, we can say that the processing -generating hierarchy "understands" the incoming data (ie: it can follow and predict it at many different levels). The depth of understanding depends upon how many levels lock onto the sensory data stream. The accuracy of understanding depends upon how precisely the hypotheses track and predict the incoming sensory data.
It is easier to follow a trajectory than to reproduce it. When observing a procedure, the generating hierarchy merely needs to produce hypotheses which are in the right vicinity so that they can be synchronized with the sensory input. Uncertainties at branch points in Tp, do not matter greatly because errors are quickly corrected by comparing TR, with TE;.
On the other hand, reproducing a procedure requires that the H functions be capable of generating T,., trajectories which are quite precise over their entire length. They must not wander outside of the success envelope or miss any critical branch points. Needless to say, the latter is a much more exacting computational problem, and offers an explanation for why a student may be able to follow the reasoning of his professor's lecture, but is unable to pass an exam without additional drill and practice.
Back to ContentsThe directing or focusing of attention is essentially a purposive action whose goal is to optimize the quality of the sensory data. The basic elements of attention are orienting (positioning the body and sensory organs so as to facilitate the gathering of data) and focusing (blocking out extraneous or peripheral information so that the sensory processing system can bring all of its capacities to bear on data that is relevant to the object of attention). The orienting element is simply a behavioral task or goal to acquire and track a target. The focusing element is a filtering problem which can be solved by a hypothesis or goal decomposition which evokes the appropriate masks or filter functions from the R; modules so as to block out all but the relevant sensory input data.
Thus, attending is a combination of observing and acting. It is primarily a sensory analysis mode activity, with a stong assist from the task execution mode.
Back to ContentsA third distinct mode of operation occurs when the upper levels of the processing -generating hierarchy are largely disconnected from both motor output and sensory input. In this mode high -level hypotheses TR, may be generated, and predicted sensory data TR, recalled. In the absence of sensory input from the external environment, these recalled trajectories make up all of the information flowing in the sensory -processing hierarchy. The processing modules G; operate exclusively on the internally recalled R; trajectories producing TQ,experiences and TR, feedback. The TF;trajectories act on the generating hierarchy so as to modify and steer the Ts, trajectories creating new hypotheses TF1. The system is free running, guided only by stored experiences M;, learned interpretations G;, and practiced skills H;, for generating strings of hypotheses and decomposing goals and tasks. The upper levels of the crosscoupled hierarchy are, thus, imagining (ie: generating and analyzing what would be expected if certain hypothesized goals and tasks were to be carried out).
Imagination is based on stored experiences and driven by hypothesized actions. It is constrained in large measure by the knowledge frames, world models, expected values, and belief structures (IF I do this, THEN such and so will happen) embedded in the upper levels of the cross -coupled processing-generating hierarchy.
If we attempt to hypothesize some action X which lies outside of the neighborhood of generalization of prior experience, we get no recalled R, vectors from memory M. In this case we say "we cannot imagine what X would be like."
One of the functions of the free-running mode is to remember or recall past experiences by hypothesizing the same goals as when the experience was recorded. Thus, in our imagination we can reach back and relive experiences, recall events, and, hence, remember facts and relation- ships from our past. Imagination, however, is not limited to duplication of past experiences. We can also rearrange sections of learned trajectories to create experiences in our minds which never occurred. We can string together trajectories in new combinations or insert new modifier variables in various hypothesis vectors. We can watch a bird fly and substitute a "self" variable in place of the bird to imagine ourselves soaring through the sky. We can listen to a story of adventure and imagine ourselves in the place of one of the characters. Imagination allows us to hypothesize untried actions and, on the basis of M functions learned during previous experiences, to predict the outcome.
Back to ContentsImagination gives us the ability to think about what we are going to do before committing ourselves to action. We can try out, or hypothesize prospective behavior patterns, and predict the probable results. The emotions enable us to evaluate these predicted results as good or bad, desirable or undesirable.
Imagination and emotional evaluators together give us the capability to conduct a search over a space of potential goal decompositions and to find the best course of action. This type of search is called planning.
When we plan, we hypothesize various alternative behavior trajectories and attempt to select the one that takes us from our present state to the goal state by the most desirable route. Imagined scenarios which produce positive emotional outputs are flagged as candidate plans. Favorably evaluated scenarios or plans can be repeatedly rehearsed, reevaluated, and refined prior to initiation of behavior -producing action.
Imagined scenarios which produce negative evaluation outputs will be avoided if possible. In some situations it may not be possible to find a path from our present state to a goal state, or at least not one which produces a net positive evaluation. Repeated unsuccessful attempts to find a satisfactory, nonpunishing plan, particularly in situations recognized as critical to one's wellbeing, correspond to worry.
One of the central issues in the study of planning is the search strategy, or procedure, which dictates which of the many possible hypotheses should be evaluated first. In most cases, the search space is much too large to permit an exhaustive search of all possible plans, or even any substantial fraction of them. The set of rules for deciding which hypotheses to evaluate, and in which order, are called heuristics.
Heuristics are usually derived in an ad hoc way from experience, accident, analogy, or guesswork. Once discovered, they may be passed from one individual to another, and from one generation to another by teaching.
Historically, artificial intelligence researchers have been fascinated by the subject of heuristics. At least a portion of this interest is a result of their recursive nature. A heuristic is a procedure for finding a procedure. When this recursion is embedded in a cross-coupled processing-generating hierarchy with the rich complexity of the human brain, it becomes clear why the thoughts and plans of humans are filled with such exquisite subtleties, and curious, sometimes insidious reasoning. It also provides some insight into the remarkable phenomenon of self-consciousness (ie: a computing structure with the capacity to observe, take note of, analyze, and, to some extent, even understand itself.)
Much of the artificial intelligence research in planning and problem solving has its origins and theoretical framework based on simple board games where there are a finite (although sometimes very large) number of possible moves. The discrete character of such games, together with the digital nature of computers, led naturally to the analysis of discrete trees, graphs, and search strategies for such structures.
Planning in a natural environment is much more complex than searching discrete trees and graphs. In the study of planning in the brain it is necessary to deal with the continuous time-dependent nature of real world variables and situations. States are not accurately represented as nodes in a graph or tree; they are more like points in a tensor field. Transitions between states are not lines or edges, but multidimensional trajectories (fuzzy and noisy at that). In a natural environment, the space of possible behaviors is infinite. It is clearly impossible to exhaustively search any significant portion of it. Furthermore, the real world is much too unpredictable and hostile, and wrong guesses are far too dangerous to make exploration practical outside of a few regions in which behavior patterns have had a historical record of success. Thus behavior, and hence imagination and planning, is confined to a relatively small range of possibilities, namely those behavior and thought patterns which have been discovered to be successful through historical accident or painful trial and error. Both the potential behavior patterns and the heuristics for selecting them are passed from one generation to another by parents, educators, and civil and religious customs.
Back to ContentsThe fact that the imagination can generate hypothetical scenarios with pleasurable emotional evaluations makes it inevitable that such scenarios will, upon occasion, be rehearsed for their pleasure -producing effect alone. This is a procedure that can only be described as daydreaming or fantasizing.
When we daydream we allow our hypothesis generators to drift wherever our emotional evaluators pull them. Our imagination gravitates toward those trajectories which are emotionally most rewarding. Some of the most pleasurable scenarios we can imagine are physically impossible, impractical, or socially taboo. Most of us recognize these as fantasies and never attempt to carry them out. However, once a person adopts the intent to carry out a fantasy, it ceases to be a dream and becomes a plan.
Thus, planning and daydreaming are closely related activities, differing principally in that planning has a serious purpose and involves an intent to execute what is finally selected as the most desirable of the alternative hypotheses.
This model suggests that dreaming while sleeping is similar in many respects to daydreaming. The principal difference in night dreaming seems to be that the trajectories evoked are more spasmodic and random, and are not always under the complete control of the emotions and will.
Back to ContentsThe notion of planning or discovering procedures for achieving goals leads inevitably to the issue of creativity. If we assume that most of the H, G, and M functions in the processing -generating hierarchy are learned, then where is the creativity? Is creativity merely an illusion generated by the recursion of procedures for discovering procedures?
Certainly we as humans like to think of ourselves as creative. But what are we doing when we create something new? Typically we borrow an idea from here, put it together with another from there, and give it a different name. We take a familiar behavioral trajectory, add a tiny variation, and claim that we have discovered something completely new - a new dance step, dress style, song, or idea. Seldom, however, are any of these more than the slightest deviation from a preexisting procedure or behavioral trajectory. To quote Ecclesiastes: "There is nothing new under the sun."
True creativity, in the sense of the invention of an entirely new behavioral trajectory, is extremely rare, if it ever occurs at all. Furthermore, it is highly doubtful that a truly creative act would be recognized if it ever did occur. Our processing-generating hierarchies cannot lock on to sensory input patterns which are totally different from everything that is stored in them. We reject such inputs as meaningless noise, or as alien and possibly hostile. True creativity would be as incomprehensible as a book written in a foreign language, or a theorem expressed in an unknown mathematical notation.
In one sense we are all creative in everything that we do, since no two behavioral trajectories are ever repeated exactly. However, the day-to-day variations in our ordinary behavior are not what we usually mean when we speak of creativity. We take pride in those moments of inspiration when something clicks, and we produce a great invention or a work of art.
Nonetheless, if we analyze a list of the great creative ideas which have shaped human history, we find that even these have been little more than clever rearrangements of well -known preexisting patterns or procedures.
Consider the fact that it took the human race many millenia to learn to start a fire, to grow a crop, to build a wheel, to write a story, to ride a horse. Even the Greeks did not know how to build an arch. Yet these are all simple procedures which any child can understand and more or less master. Surely our ancestors as adults were as intelligent and creative as today's children. Why did they fail for hundreds of years to discover these simple yet highly useful procedures?
It was because they had no one to teach them. A modern child knows about wheels because he is taught. He plays with toys that have wheels. He rides in vehicles with wheels. If a modern child grew up in a culture where he never saw a wheel, he would never think of one, nor would his children, or his grandchildren, any more than his ancestors did for thousands of years before him.
The reason that we value creativity so highly is because it is so rare and so highly advantageous. Once a new and useful procedure like navigating a ship, making steel, or flying an airplane is discovered, it can easily be taught to others. Entirely new worlds of possible behavior patterns open up for all who possess the secret.
We learn to solve problems, to invent, and to be creative, in much the same way as we learn any other goal-directed behavior pattern such as hunting, dancing, speaking, or behaving in a manner that is acceptable to and approved by our peers. We learn it from a teacher. The beauty, the sense of awe and wonder we experience when confronted by a work of creative genius, derives not so much from its novelty /creativity as from the skill and precision with which it is executed.
Back to ContentsThere is little need to worry about programming "creativity" into our machines. If we design systems with sufficient skill in executing tasks and seeking goals, and sufficient sophis tication in sensory analysis and context sensitive recall, and if we teach these systems procedures for selecting behavior patterns which are appropriate to the situation, then they will appear to be both intelligent and creative. But there will never be any particular part of such a device to which one can point and say "Here is the intelligence," or "Here is the creativity." Skills and knowledge will be distributed as functional operators throughout the entire hierarchy. To the degree that we are successful, intelligence and creativity will be evidenced in the procedures which are generated by such systems.
Above all, we should not expect our robots to be more clever than ourselves, at least not for many decades. In particular we should not expect our machines to program themselves, or to discover for themselves how to do what we do not know how to teach them. We teach our children for years. It will take at least as much effort to teach our machines.
We must show our robots what each task is and how to do it. We must lead them through in explicit detail, and teach them the correct response for almost every situation. This is how industrial robots are programmed today at the very lowest levels, and this is, for the most part, how children are taught in school. It is the way that most of us learned everything we know, and there is no reason to suspect that robots will be programmed very differently. Surely it is as unreasonable to expect a robot to program itself as it is to expect a child to educate himself. We should not expect our robots to discover new solutions to unsolved problems or to do anything that we, in all the thousands of generations we have been on this earth, have not learned how to do ourselves.
This does not mean that once we have trained our robots to a certain level of competence that they can't learn many things on their own. We can certainly write programs to take the routine and the tedium out of teaching robots. Many different laboratories are developing high-level robot programming languages. We already know something about how to represent knowledge in computers about mathematics, physics, chemistry, geology, and even medical diagnosis. We know how to program complex control systems and to model complicated processes, and we are rapidly learning how to do it better, more quickly, and more reliably. Soon perhaps it will even be possible to translate knowledge from natural language into robot language so that we will be able to teach our robots from text books or tape recordings more quickly and easily than humans. We can even imagine robots learning by browsing through libraries or reading scientific papers.
But it is a mistake to attempt to build creative robots. We are not even sure what a creative human is, and we certainly have no idea what makes a person creative, aside from contact with other creative humans - or time alone to think. Is it both? Or neither?
I believe that we should first learn how to build skilled robots - skilled in manipulation, in coping with an uncertain or even hostile environment, in hunting and escaping, in making and using tools, in encoding behavior and knowledge into language, in understanding music and speech, in imaging, and in planning. Once we have accomplished these objectives, then perhaps we will understand how to convert such skills into creativity. Or perhaps we will understand that robots with such skills already possess the creativity and the wisdom which springs naturally from the knowledge of the skills themselves.
Back to Contents Additional Reading