Connectionism is the study of artificial neural networks. These networks are designed to emulate the neural circuits that are found in real nervous systems, although the similarities are sometimes only superficial. As a broad set of concepts and research techniques, connectionism can be divided into two major approaches. First, researchers in psychology, neuroscience, linguistics, and related disciplines use connectionist models to simulate cognitive processes such as perception, memory, learning, and motor skill. Second, researchers in computer science, engineering, and mathematics study the formal properties of connectionist models and also use these models to analyze and solve complex real-world tasks (e.g., pattern recognition, robot planning and control, and non-linear function approximation).
The perceptron, one of the most basic types of artificial neural networks, was proposed by Frank Rosenblatt in 1958. As Figure 1A illustrates, the perceptron includes two layers of simulated neurons: a layer of input units, which is analogous to a simple sensory system (e.g., tactile skin receptors, retinal neurons), and a layer of output units, which is analogous to a simple motor system (e.g., muscle fibers). Each unit in the network can be interpreted as corresponding to a neuron, whose activity level (i.e., simulated firing rate) is influenced by its connections to other units in the network. A given network's architecture is determined by the particular pattern of connections among the units in the network. For example, the percep-tron is a two-layer feed-forward network because the activity at the input layer propagates forward and influences activity at the output layer (e.g., from sensory input to motor output). Like real synapses, the connections between units in an artificial neural network can vary in strength, causing each receiving unit either to increase or decrease its activity (i.e., excitation or inhibition).
Rosenblatt's 1958 work helped to highlight an important feature of artificial neural networks: With an appropriate
set of weighted connections, a network can be used to transform a given set of input patterns into a desired set of output patterns. For example, imagine that the four input units in Figure 1A are retinal sensors whose activity varies from 0 to 1, depending on the intensity of light stimulating each unit. Similarly, imagine that the units in the output layer either (1) drive an eye movement to the left or right, or (2) maintain the fixation point, depending on which of the three units is most active. With the correct set of connection weights, this network would then be able to perform a simple orienting response, which shifts its gaze laterally toward bright objects in its visual field.
The initial success of Rosenblatt's perceptron was challenged by Marvin Minsky and Simon Papert (1969), who published a critical analysis of two-layer networks. Their critique demonstrated by mathematical proof that the perceptron was severely limited in the kinds of input-output mappings or functions that it could compute. As a result, research on artificial neural networks stalled for more than a decade.
However, in 1986 David Rumelhart and James McClelland published a landmark two-volume text that both revived interest in the study of artificial neural networks and, more importantly, provided a comprehensive response to Minsky and Papert's criticisms. In particular, Rumelhart and McClelland proposed a set of relatively modest changes to the perceptron. One of these changes, as Figure 1B illustrates, was to insert a new layer of units between the input and output layers. This set of units is called the hidden layer, as it is hidden from the external environment (i.e., only the input and output units make direct contact with the environment). Surprisingly, Rumelhart and McClelland's multilayer networks are able to approximate all of the input-output functions that Minsky and Papert proposed, and under the appropriate set of conditions, they can also be broadly interpreted as universal function approximators.
Rumelhart and McClelland also promoted the idea that neural networks can learn. More specifically, the connections in a network can be modified by a set of mathematical rules called a learning algorithm. For example, in a Hebbian network, the connection between two units is strengthened when both are active at the same time. Thus, an artificial neural network can “learn” in the sense that as it is repeatedly presented with one or more input patterns, it adjusts or modifies its connection weights, which gradually changes the pattern of output produced by the network.
One of the most common learning algorithms is back-propagation-of-error, which belongs to a set of methods called supervised learning algorithms. These methods are supervised in the sense that after an input pattern is presented to the network, the model-builder compares the output produced by the network to a desired pattern. The network's connection weights are then adjusted so that the output is moved closer to the desired pattern. Thus, a fundamental assumption of supervised learning is that a “teacher” not only is available, but also provides highly specific feedback during the learning process. It is important to note, however, that there are alternative methods for simulating learning in artificial neural networks that do not require explicit feedback from a teacher (e.g., reinforcement learning, unsupervised learning; for a recent review, see Schlesinger & Parisi, 2004).
The learning algorithm back-prop, as it is typically called, derives its name from the fact that in a standard network, activation flows forward, from the input layer to the output layer. In contrast, when the connection weights are modified, the training signal propagates in reverse: Changes are first made to the connections at the output layer, followed by changes to the connections at the hidden layer. While this bi-directional flow of information is a mathematical requirement, it has been challenged as biologically implausible. Indeed, a growing number of researchers advocate for designing artificial neural networks that more accurately represent both the structure and function of the brain (e.g., Sejnowski, Koch, & Churchland, 1988).
There are numerous examples of connectionist models that may be relevant to the study of classroom learning. While these models are designed to simulate learning within a particular knowledge domain (e.g., grammar learning, numerical cognition, etc.), they also suggest more general principles that are broadly applicable across domains.
The Importance of Starting Small. Jeff Elman (1993) modified the standard 3-layer architecture to study how a network learns the structure of a small artificial language. As Figure 1C illustrates, the unique feature of this simple recurrent network is that activation from the hidden layer both propagates forward to the output layer and projects back to a set of context units in the input layer. These context units provide a type of short-term memory trace that allows the network to differentiate between two identical input patterns that occur within different contexts (e.g., “I read the book” vs. “They will book a room”).
Elman presented the network with sentences from the artificial language—one word at a time—and trained it to predict the next word in each sentence. Initially, the network was unable to learn the task. Next, he trained a new network that started with limited short-term memory (this was implemented by clearing the memory trace after every few words), which gradually increased during training. In contrast to the first network, the network with limited short-term memory succeeded on the learning task. Elman used these findings to argue that modest limitations on information processing (e.g., memory, attention, etc.) during early learning may be an advantage for novices who are acquiring a new skill.
Growing New Connections. Another important innovation in connectionist models was proposed by Tom Shultz (2003), who conducts simulations with a learning algorithm called cascade correlation. What makes the cascade correlation algorithm unique is that, in contrast to standard multilayer networks that have a fixed architecture, cascade correlation networks are able to generate new hidden units as they learn.
One of the tasks that Shultz has studied with the cascade correlation algorithm is the balance-scale task. This task was first investigated by Jean Piaget (1896– 1980), who asked children to predict which side of the scale will tip when weights are hung on each side; Piaget discovered that children pass through four stages as they learn to master the balance scale. Shultz first simulated the task with a fixed 3-layer network and found that the network was only able to reach stage three. He then trained a second network with cascade correlation and found that as new hidden units were generated, the network reached stage four on the balance scale task.
Shultz proposes that while networks with a fixed architecture learn by a process comparable to rote memorization, dynamic networks “grow” new units and connections and are thus able to actively reorganize what is being learned. Walter Schneider and David Graham (1992) suggest that traditional classroom learning can benefit from this research by exploiting both styles or modes: conventional memorization-based learning, balanced with self-directed or problem-based learning.
Elman, J. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71–99.
Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.
Rumelhart, D., & McClelland, J. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.
Schlesinger, M., & Parisi, D. (Eds.). (2004). Beyond backprop: Emerging trends in connectionist models of development. [Special section]. Developmental Science, 7, 131–132.
Schneider, W., & Graham, D. (1992). Introduction to connectionist modeling in education. Educational Psychologist, 27, 513–530.
Sejnowski, T., Koch, C., & Churchland, P. (1988). Computational neuroscience. Science, 241, 1299–1306.
Shultz, T. (2003). Computational Developmental Psychology. Cambridge, MA: MIT Press.