Knowledge representation refers to how knowledge is stored in long-term memory. Researchers have been keenly interested in this topic for over 50 years and a number of models of knowledge representation have been developed (Miyaki & Shah, 1999). There are four main families of models that are of interest to researchers. These include network, production, dual coding, and connectionist models. Each model is summarized below, compared to other models, and briefly discussed regarding its contributions to understanding learning in the classroom.
Network models of knowledge representation became popular in the 1960s. Early models focused on the hierarchical representation of declarative knowledge in memory and the relationship between different knowledge units (Quillian, 1968; Collins & Quillian, 1969). Network models possess three major components, including nodes in which a specific unit of information is stored, properties of information within nodes, and relational links among nodes. This can be explained by reference to the domain of animals. Subsumed within this domain are different types of animals such as birds, fish, and mammals. Network models envisioned each of these categories as nodes, while each node possessed a number of essential properties. The “animal” node included properties such as “breathes, eats, has skin.” The “bird” node included properties such as “has wings, has feathers, and flies” whereas the “fish” node included different properties such as “has fins, has gills, and swims.” Network models emphasized parsimony in mental representation; thus, properties included in a superordinate node were not replicated at a subordinate node. Because birds and fish are both animals, it was not necessary to include the property “has skin” because this property was included already in the “animal” node.
Quillian (1968) proposed five different kinds of relational links between nodes, including superordinate and subordinate, modifier, disjunctive, conjunctive, and residual links. These links specified whether properties of one node were shared with another node. For example, the fact that all animals have skin is a superordinate link that is true of all other links subsumed beneath it unless otherwise noted as a disjunctive link. Network models based on the notion of nodes, properties, and links helped explain how people remember information in an efficient manner and why it is relatively easy to search memory and make simple judgments, such as whether a canary eats and has skin.
The search process of memory was explained by the concept of spreading activation of attention among nodes. Some concepts activated particular nodes and activation would spread to adjacent nodes. For example, the word “camel” would activate the “mammal” node and all properties of mammals would be activated, whereas properties of distance nodes such as fish would not be activated. Thus, activation spread through memory both vertically and horizontally. Activation spread vertically in an upward (i.e., camel to mammal) or downward (i.e., camel to dromedary). Activation also spread horizontally (i.e., camel to horse, camel to mule). Activation typically spreads further in a horizontal versus vertical direction, although activation is constrained in part by the situational demands of learning.
The idea that memory is organized into nodes of specific information that are interrelated to other nodes has been a lasting idea. Almost all other models of knowledge representation incorporate the idea of node, although what a node includes varies from model to model. The assumption that nodes have properties and are linked in a manner that indicates the type of relationship between nodes has not fared as well. Early network models provided a useful description of how declarative knowledge was represented in long-term memory, but they failed to explain the construction and representation of procedural and self-regulatory knowledge. Network models also paved the way for the development of schema theory in the 1970s, which spawned hundreds of practical experiments about the effect of schemata on learning and memory.
In recent years, a new class of network models has appeared that focuses on higher order processes such as complex problem solving, creativity, and metacognition (Griffiths, Steyvers & Tenenbaum, 2007). These models frequently describe excitatory and inhibitory processes similar to those described in connectionist models. This new breed of semantic network models often provides a better account of complex mental processes, such as understanding the overall gist of text or conversation.
Production models of knowledge representation and learning were first developed in the 1970s. One goal of these models was to explain a broader array of memory phenomena such as procedural learning, in addition to the representation of declarative knowledge. The most comprehensive production model is the ACT-R model (Adaptive Character of Thought, Revised) by John Anderson (1996, 2000). Anderson's model developed from the human associative learning (HAM) model proposed by Anderson and Bower (1973).
ACT-R proposes three interactive memory systems that support adaptive thinking, including declarative knowledge, procedural knowledge, and working memory. The declarative knowledge component consists of schemata and chunks within schemata that encode specific declarative knowledge units. The procedural knowledge component consists of production rules that break down complex action sequences into a number of “if-then” steps, which enable the learner to perform complex actions using a series of simple steps. Declarative and procedural components are connected to each other, as well as a working memory system in which activated declarative and procedural units are used to solve problems, make decisions, and adapt to environmental conditions.
ACT-R differs from earlier network models in that it proposes production rules, which are combined into production systems, which enable the brain to represent complex actions. A production rule specifies the action to be taken to achieve a specific goal and the conditions under which each action is taken. For example, imagine that a person has a ring of five keys and needs to open an office door. This scenario can be represented as a simple production as follows: IF a person must open a door, THEN he or she must insert key one and open the door; IF key one fails to open the door THEN the person must insert key two, and so on.
This production rule could be subdivided further into finer grained production rules that specify how to use each key until the correct key is identified, or none of the keys open the door. In addition, conditions could be added to each substep in the production sequence to assist the learner. For instance, one might add a condition statement, instructing the person not to attempt to use long, narrow keys with square heads because these keys often open car doors rather than office doors.
Anderson states that complex cognitive activity can be understood and explained in terms of small productions, based on simple units of declarative and procedural knowledge. This suggests that learning is a systematic process of acquiring declarative and procedural knowledge through experience and using this knowledge under specific conditions to execute complex actions, which themselves are comprised of many small productions. The theory of ACT-R also discusses how individuals construct and infer new knowledge based on past experiences. Thus, the theory is not entirely experience driven. Nevertheless, ACT-R views learning as a systematic process of acquiring the right knowledge and using that knowledge under the right conditions. Using knowledge repeatedly (i.e., practicing) increases the speed and accuracy of productions. Tuning productions to varying conditions also increases the efficiency of learning and performance.
Like network models, ACT-R postulates a process of spreading activation among declarative and procedural knowledge units during the execution of production sequences. Anderson (1996) provides sophisticated weighting systems, which serve as algorithms for which production rules to apply under particular conditions. Activation spreads among production rules as a function of conditions and weights, which highlight some rules and downplay others. Activation is not necessarily hierarchical from superordinate to subordinate nodes, as is often the case in network models. Thus, production systems tend to be less hierarchical than networks.
Production system models have two clear advantages over earlier network models. First, they incorporate procedural knowledge into the model and explain how procedural and declarative knowledge are interrelated through working memory. Second, they do an excellent job of explaining incremental skill acquisition and the development of expertise. Production systems have been used to create and model intelligent tutoring systems that might take the place of human tutors. One potential criticism is that production systems are highly mechanistic; that is, they postulate that learning and performance is the sum and nothing more than the sum of a sequence of discrete productions. Related to this criticism is the fact that production systems highlight the role of experience and direct leaning and downplay rational reflection and the role of discovery and creativity.
Dual coding theory (DCT) was first postulated by Alan Pavio in the early 1970s and continues to be an important model of processing and knowledge representation in long-term memory (Pavio, 2007). DCT postulates two separate modular stores in long-term memory that include visual-spatial and verbal representation systems. Both systems are assumed to be functionally separate, yet interconnected. This means that visual-spatial and verbal long-term memories can perform tasks independent of one another, yet are able to pool resources when necessary. A number of researchers have speculated that dual-coding systems may be reflected in neurological differences between the brain's right and left hemispheres (Pavio, 2007). DCT also postulates that some learners may have a visual-spatial or verbal preference for information processing.
DCT hypothesizes different representational systems for each of the two codes. The visual-spatial system uses mental images as the primary representational code, while the verbal system uses speech as the primary code. DCT assumes that every object and concept has a verbal label in verbal memory, whereas not every object or concept has an imaginal label in visual-spatial memory. Specifically, some concepts such as “automobiles” have concrete referents, while some concepts such as “affection” do not. DCT refers to this distinction as concrete versus abstract concepts.
The most important assertion of DCT is that concrete concepts may be easier to process and learn because mental activity can be distributed across the two stores
(Reed, 2006; Sadoski, 2005). Thus, a word such as “cat” can be represented separately in each storage system, whereas a word such as “truth” presumably is represented only in the verbal system. Two implications follow from this assumption. One is that information that is concrete in nature or that can be visualized will be better learned (Sadoski, Goetz, & Rodriguez, 2000). This has led to a great deal of research on the use of mnemonic techniques. A second implication is that visual information such as pictures in a book, summary tables, graphs, charts, and other visual aids should facilitate learning (Schnotz, 2002).
Research findings generally support the two-store model proposed by dual coding theory. DCT seems to be especially useful as an explanation of beginning reading processes such as vocabulary learning. It also explains why words are easier to learn in context as well as in the presence of visual aids such as pictures. In contrast, the theory does not explain well how congenitally blind individuals learn or how students create integrated visual-verbal representations in memory.
Connectionist models of knowledge representation and learning became popular in the 1980s and sometimes are referred to as neural networks or parallel distributed processing (PDP) models (Neath & Suprenant, 2003). Con-nectionist models represent an important paradigm shift from network and production system models because they de-emphasize the intentional role of the learner, while emphasizing the role of experience in building neural pathways and connections, as well as assumptions about cognitive architecture (Bechtel & Abrahamsen, 2002). Although a great deal of attention has been devoted to connectionist models the past 20 years, especially the seminal work of Rumelhart and McClelland (1986), their origin can be traced to earlier researchers such as Selfridge (1959).
Connectionist models differ from network and production models in two ways. The first difference is that previous cognitive models used a computer metaphor to describe human information processing. In this view, information passes through an initial sensory system, is acted upon in working memory, and represented in permanent store in long- term memory. Connectionist models replaced the computer metaphor with a neural pathway metaphor modeled on the human brain. In this view, information is represented as patterns of activation across a variety of units, which correspond to neurons in the human brain.
A second difference is that network and production models focus on the representation of discrete units of information within a node in memory (e.g., a fact or a simple production rule), whereas connectionist models view knowledge representation as continuous across a number of interconnected units in memory. Thus, information such as facts, concepts, and production rules are not represented within single nodes, but distributed across nodes.
Connectionist models propose a rather simple architecture based on units, which maintain elementary information, typically simpler than corresponding nodes in network and production models. Multiple units are connected to create information that one might label as facts or concepts. The connectivity pattern among these units is of utmost importance. Any given unit may be connected to many other units, using a number of different connectivity patterns. Thus, one unit may be part of different knowledge representations much like a single light in a theatre marquee may be used to spell different words. Connectionist theories have proposed different types of units. The most important of these are input units, output units, and hidden units, which are mediating connections between inputs and outputs.
Each unit has an activation value assigned to it under different processing conditions. Activation spreads throughout the system, but depends in part on the connectivity pattern among units, as well as connection weights, which determine whether one unit contributes more activation than another unit. There are a variety of activation algorithms; however, the two most important are forward (i.e., input to output units) and backward propagation (i.e., output to input units). Training (i.e., learning) in a connectionist network occurs as units are activated and deactivated, and connection weights change due to environmental conditions and feedback to the connectionist network through back propagation.
Connectionist models have several strengths and weaknesses. Strengths include their close physiological analogy to the human brain, the fact that their major claims can be tested using computer simulations, and that they provide a general theory of learning that is not unique to humans, but explains how learning may occur in other mammalian and non-mammalian life forms. Possible weaknesses, depending upon one's theoretical point of view, is that connectionist models are too bottom-up (i.e., learning occurs exclusively through experience and data-based feedback), and the mind is removed from models of learning (i.e., the role of rational reflection and inference construction is downplayed).
Each of the models described above has unique strengths. Table 1 provides a summary of these, as well as the main assumptions of each model. Several points should be considered regarding Table 1. First, each of the models is speculative and incomplete in nature. A large number of studies have supported some, but not all, of the assumptions of each of the models. Currently, there are few cross-model comparisons that definitely support one of the four hypothesized representational architectures. Second, all of the models emphasize the bottom-up nature of learning from experience. Network models are most likely to emphasize the role of higher-order knowledge, whereas connectionist models are least likely to make assumptions about higher-order knowledge or conscious self-regulatory skills. Third, all have useful implications for understanding learning.
Theories and models of knowledge representation all agree on two important implications for learning. One is that knowledge is represented in complex, multi-dimensional ways in memory. The models in Table 1 assume that learners possess higher-order knowledge that develops from simpler knowledge representations. In addition, all the models assume that knowledge is modular in nature (i.e., partitioned in memory into functional units), albeit each model postulates different modules such as concepts embedded in schemata (i.e., networks), separate declarative and procedural representations (i.e., production systems), or imaginal and verbal processing systems (i.e., dual coding).
A second implication is that knowledge is acquired very slowly. Concepts, schemata, and procedural skills are built up slowly over time, automated over hundreds of hours of practice, and often honed under the watchful eye of mentors and master teachers. From an educational perspective, it seems näıve to expect students to become highly knowledgeable within a domain without years of exposure and practice within that domain. Observing and modeling the performance of an expert helps novices develop the knowledge and skills necessary to perform at a high level of expertise.
One important difference among the four perspectives described above is how they address very complex representations such as mental models (Radvansky, 2006). A mental model is a cognitive representation of a complex process (e.g., flying a jet), spatial map (e.g., mental navigational map of New York City), or explanatory model of some phenomenon (e.g., Big Bang Theory). Many experts would agree that constructing mental models and using them to reason and solve problems is the height of cognition. Nevertheless, it is unclear presently how individuals construct mental models, represent them in memory, or use them to make complex decisions (Dougherty, Franco-Watkins, & Thomas, 2008). Network and production system models seem better suited to explain them, whereas connectionist models often deny the necessity of complex representations like mental models. Understanding the representation of complex mental phenomenon such as a mental model remains an important goal of cognitive psychology.
Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 255–365.
Anderson, J. R. (2000). Cognitive psychology and its implication (5th ed.). New York: Worth.
Anderson, J. R., & Bower, G. H. (1973). Human associative memory. Washington, DC: Winston.
Bechtel, W., & Abrahamsen, A. (2002). Connectionism and the mind: Parallel processing, dynamics, and evolution in networks (2nd ed.). London: Blackwell Publishers.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8, 240–248.
Dougherty, Franco-Watkins, A. M., & Thomas, R. (2008). Psychological plausibility of the theory of probabilistic mental models and the fast and furious heuristic. Psychological Review, 115, 199–213.
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114, 211–244.
Miyake, A., & Shah, P. (1999). Toward unified theories of working memory: Emerging general consensus, unresolved theoretical issues, and future research directions. In A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control. Cambridge, England: Cambridge University Press.
Neath, I., & Surprenant, A. M. (2003). Human memory: An introduction to research, data, and theory (2nd ed.). Pacific Grove, CA: Brooks/Cole.
Pavio, A. (2007). Mind and its evolution: A dual coding theoretical approach. Mahwah, NJ: Erlbaum.
Quillian, M. R. (1968). Semantic memory. In M. Minsky (Ed.), Semantic information processing, (pp. 21–56). Cambridge, MA: MIT Press.
Radvansky, G. A. (2006). Human memory. Boston: Pearson.
Reed, S. K. (2006). Cognitive architectures for multimedia learning. Educational Psychologist, 41, 87–98.
Rumelhart, D. E., McClelland, J. L., and the PDP Research Group. (1986). Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 1. Foundations. Cambridge, MA: MIT Press.
Sadoski, M. (2005). A dual coding view of vocabulary learning. Reading & writing Quarterly, 21, 221–238.
Sadoski, M., Goetz, E. T., & Rodriguez, M. (2000). Engaging texts: Effects of concreteness on comprehensibility, interest, and recall in four text types. Journal of Educational Psychology, 92, 85–95.
Schnotz, W. (2002). Towards an integrated view of learning from text and visual displays. Educational Psychology Review, 14, 101–120.
Selfridge, O. G. (1959). Pandemonium: A paradigm for learning. In The mechanization of thought processes. London: H. M. Stationery Office.
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- 10 Fun Activities for Children with Autism
- First Grade Sight Words List
- Social Cognitive Theory
- The Homework Debate
- GED Math Practice Test 1