Operant conditioning is defined as the use of consequences to modify the occurrence and form of behavior. “To put it very simply, behavior that is followed by pleasant consequences tends to be repeated and thus learned. Behavior that is followed by unpleasant consequences tends not to be repeated and thus not learned” (Alberto & Troutman, 2006, p. 12). Operant conditioning is specifically limited to voluntary behavior, that is, emitted responses, which distinguishes it from respondent or Pavlovian conditioning, which is limited to reflexive behavior (or elicited responses).

Operant conditioning was developed by B. F. Skinner (1904–1990), a psychologist at Harvard University, in 1938, and has continued into the early 2000s to be a popular approach for influencing behavior. Although the model was originally applied to animal learning (rats, pigeons; Skinner, 1938; 1963), it was subsequently commonly applied in educational settings. The model involves the operations of positive reinforcement, negative reinforcement, extinction, response cost punishment, and punishment with aversives, each of which is described below in this entry.


Gredler (2005) offers the following assumptions as the foundation of operant conditioning:

Learning is behavioral change (meaning that observers conclude that learning has occurred when behavior changes).

Behavioral change (i.e., learning) is related to changes in environmental events (these events being precursors of and consequences of an action).

One can determine relationships between behavior and the environment only if the characteristics of the behavior and the experimental conditions under which it occurs are defined in physically observable terms and observed under controlled conditions (the process must be systematic, observable, and controlled).

The only acceptable sources of information about the causes of specific behaviors are data from the experimental study of behavior (people must observe both the behavior and its causes).

The appropriate data source is the behavior of the individual (rather than the observers' expectations or inferences).

Of prime importance in the operant conditioning model is the focus on relationships between environmental events and behavior defined in physical terms, with an avoidance of the use of inner states as explanations.


There are four contexts or types of operant conditioning: positive reinforcement, negative reinforcement, positive (or response-cost) punishment, and negative punishment (or punishment with aversives) (Landrumm & Kauff-man, 2006). The last three of these are all associated with aversiveness or aversive control while only one, positive reinforcement, is associated with positive control. Thus, researchers can distinguish between two variations of the model, a positive one and a negative one. (There is also extinction, which occurs when reinforcement following behavior is discontinued, causing the behavior itself to eventually be discontinued.)

In the positive version of the model, a person who emits a desired behavior (e.g., raising her hand and waiting to be called on) receives something good—a positive consequence (referred to as positive reinforcement). This may be a smile or praise or a piece of candy. The result of the reinforcement is that the behavior is strengthened, that is, its likelihood of subsequent occurrence increases. This represents a positive form of control.

In the negative version of the model there are three possible consequences. One is to avoid something bad— negative reinforcement. If a student raises her hand and waits to be called on, rather than speaking out, there is no positive consequence, only the avoidance of a negative one. A second is to receive something bad—punishment with aversives—which may take the form of being yelled at or ridiculed, hence reducing the tendency to speak out (or, perhaps, just suppressing it temporarily). The last negative approach, response-cost punishment, represents being deprived of something good, that is, a previously earned reinforcer being removed because of an undesirable behavior such as talking out in class, rather than raising one's hand and waiting to be called on (Walker, Shea, & Bauer, 2004). The punishment might be being placed in time-out or sent to the principal's office. These three approaches all represent aversive control, which may be associated with anxiety and fear (Skinner, 1953), and they may not result in a diminution of the strength of the undesirable response.

Another variation of the model is based on who or what precedes or occasions a response. After repeatedly pairing a response with a stimulus that precedes it, called a discriminative stimulus (SD), the response will only occur in the presence of SD, not in its absence. Such a response is said to be under stimulus control. “A behavior under stimulus control will continue to occur in the presence of the SD, even when reinforcement is infrequent” (Alberto & Troutman, 2006, p. 306). Examples of stimulus control are answering telephones only when they ring (the sound of the ring serving as a discriminative stimulus), driving through intersections when the light is green (the SD), not when it is red (although this is an imperfect SD, because drivers often run red lights), and paying attention in class (a response) when being watched by the teacher (an SD).


Various different positive reinforcers can be used to increase the likelihood of desired behavior in the classroom. They appear in the form of (a) consumable (e.g., candy), social (e.g., praise), (b) activity (e.g., time on the computer), (c) exchangeable (e.g., points or stickers), and (d) tangible (e.g., getting to sit in one's favorite chair). Activity rein-forcers are among the most educationally relevant, since the activity can be done with educational value such as doing a jigsaw puzzle or watching an instructional video. However, it is of critical importance that the desired behavior immediately precede the activity reinforcer rather than follow it in order for the reinforcer to strengthen the response (this is called the Premack Principle, after David Premack, its discoverer), and in some cases this may be difficult to arrange, as, for example, when the activity reinforcer is a field trip (Kazdin, 2001).

Various reinforcement schedules (Skinner, 1969) have an effect on educational outcomes by affecting the likelihood of a particular response. A continuous reinforcement schedule, wherein every occurrence of a desired operant response is followed by a reinforcement, is desirable when operant conditioning is first taking place. However, once the desired response occurs on a regular basis, it can be maintained by only occasional or intermittent reinforcement, thereby lessening the load on the teacher.

There are four possible intermittent reinforcement schedules: fixed ratio, fixed interval, variable ratio, and variable interval. In an educational setting (as in most settings), the two variable schedules best maintain the desired behavior, primarily because of their unpredictability. For example, if students were given the opportunity to listen to music, a reinforcement, after handing in some number of completed assignments, they would be more motivated to hand in completed assignments if the number required was not always the same (variable ratio) or the time during which they had to be handed in was not always the same (variable interval). By comparison, in the fixed interval schedule, where the reinforcement is provided after the desired behavior has been performed for a fixed amount of time (say 10 minutes), it does not take students long to realize that they can do nothing for nine and a half minutes and then perform the behavior to get the reinforcement. Similarly, if the fixed ratio is 4:1, students will perform the behavior four times in a row, and then relax after receiving the reinforcement.

Operant conditioning is a vehicle for teachers to achieve behavior modification in order to improve classroom management and facilitate learning. There are three techniques employed in particular to facilitate learning: prompting, chaining, and shaping. Prompting involves giving students cues (called discriminative stimuli in the lexicon of operant conditioning) to help them perform a particular behavior. When students are learning to read, a teacher may help them by sounding out a word (just as when actors forget their lines, someone prompts them by saying their next line). Prompting helps to make the unfamiliar become more familiar, but, if used too often, students can become dependent on it, so teachers should withdraw prompts as soon as adequate student performance is obtained (a process called fading). Also, teachers should be careful not to begin prompting students until students try a performing task without extra help.

Learning complex behaviors can also be facilitated through an operant conditioning technique called chaining, a technique for connecting simple responses in sequence to form a more complex response that would be difficult to learn all at one time. Each cue or discriminative stimulus leads to a response that then cues the subsequent behavior, enabling behaviors to be chained together. Skinner taught pigeons to steer torpedoes toward enemy vessels in World War II by chaining together responses that adjusted the direction of a torpedo relative to the target as it appeared on a screen. Although the technique was not actually used in the war, it appeared in trial runs that it would work successfully.

The third, and perhaps most generalizable technique is called shaping, a process of reinforcing each form of the behavior that more closely resembles the final version. It is used when students cannot perform the final version and are not helped by prompting. Shaping involves gradually changing the response criterion for reinforcement in the direction of the target behavior. If the student is given 10 math problems, for example, and gets three of them right, the student gets a reinforcement. On the next set of problems, the student needs to get six right for a reinforcement, then 10. By shifting the criterion for reinforcement, or successive approximations, a student's behavior is shaped in the direction of ultimate success.

According to Landrum and Kauffman, “Despite a rich history and extensive empirical underpinnings, the behavioral perspective on teaching and management is not highly regarded in the education community” (2006, p. 47). Its critics contend it is an unfeeling approach more suited to animals than to humans (Landrum & Kauffman, 2006). Nevertheless, operant conditioning is commonly used in classrooms and is viewed by many teachers as an effective approach to improving classroom practice. It provides teachers with a set of tools for improving classroom management and student learning.


Alberto, P. A., & Troutman, A. C. (2006). Applied behavior analysis for teachers (7th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Gredler, M. E. (2005). Learning and instruction: Theory into practice (5th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Kazdin, A. E. (2001). Behavior modification in applied settings (6th ed.). Belmont, CA: Wadsworth.

Landrum, T. J., & Kauffman, J. M. (2006). Behavioral approaches to classroom management. In C. M. Evertson & C. S. Weinstein (Eds.), Handbook of classroom management: Research, practice and contemporary issues. Mahwah, NJ: Erlbaum.

Skinner, B. F. (1938). The behavior of organisms. New York: Appleton-Century-Crofts.

Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.

Skinner, B. F. (1963). Operant behavior. American Psychologist, 18, 503–515.

Skinner, B. F. (1969). Contingencies of reinforcement. New York: Appleton-Century-Crofts.

Walker, J. E., Shea, T. M., & Bauer, A. M. (2004). Behavior management: A practical approach for educators. Upper Saddle River, NJ: Pearson/Prentice Hall.