The term single-case designs refers to a family of research designs that are true experiments. They can be used to infer causal relationships between an intervention program (e.g., in education, therapy, rehabilitation) and change in client functioning and behavior. The unique feature of these designs is the capacity to conduct experimental investigations with the single case, that is, one subject or one group. However, the designs can evaluate the effects of interventions with multiple subjects and groups.


There are key features of the designs that are pivotal for drawing causal inferences. As with designs in the quantitative-group tradition, control and comparison of conditions, prediction, and testing of predictions are all central. These are accomplished in novel ways in single-case designs.

Logic of the Designs. The designs draw causal inferences based on how the assessment information is used over the course of the design. Each design includes phases or periods of time (e.g., week, month) in which baseline (no intervention) or intervention are presented (e.g., to one or a few individuals or a group). The designs usually begin with a baseline phase, a period of observations before the intervention is implemented. The data in this phase have two purposes: to describe current performance and predict what performance is likely to be in the immediate future without an intervention. After the baseline pattern is clear, an intervention is implemented in a new phase, all the while data are being collected. Data in the intervention (and any subsequent phases) have three purposes: to describe as in baseline; to predict what performance would be likely if the intervention were to continue, and to test the prior prediction from baseline. Baseline is used to predict likely performance; if the intervention is having any effect in the next phase the data ought to depart from that projected level of performance.

Essentially, single-case designs are based on describing, predicting, and testing predictions. The logic of this is exactly like that of experiments in the quantitative-group tradition, in which describing performance without intervention and testing whether the intervention departs from that is achieved by a control group of subjects (e.g., no treatment, waiting list). In single-case and group research methods, the question is whether the intervention (variable, experimental manipulation) made a difference in a way that departs from the control phase or group, respectively.

Assessment Requirements. The most fundamental design requirement is repeated observations of performance over time. The performance of one or more clients is observed on several occasions, usually before the intervention is applied and continuously over the period while the intervention is in effect. Typically, observations are conducted on a daily basis or at least on multiple occasions each week to provide the information to describe, predict, and test predictions, as noted previously. Continuous assessment provides the observations that allow the comparisons of interest (e.g., intervention versus no intervention) within the individual subject.

Because baseline performance is used to predict how the client will behave in the future, it is important that the data are stable. Data stability refers to minimal fluctuation or variability in the subject's performance over time. Excessive variability in the data during baseline or other phases can interfere with drawing conclusions about treatment. Whether the variability is excessive and interferes with drawing conclusions about the intervention depends on many factors, such as the initial level of behavior during the baseline phase and the magnitude of behavior change when the intervention is implemented.


There are many single-case designs that vary in the way the intervention is presented and evaluated over time. Three designs are presented here to illustrate the methodology.

ABAB Design. As with all of the designs, continuous observations of performance are made over time for a given client (or group of clients). In ABAB design, typically two separate phases are alternated over time, including the baseline (A phase), when no intervention is in effect, and the intervention (B phase). The A and B phases are repeated again to complete the four phases. The effects of the intervention are clear if performance improves during the first intervention phase, reverts to or approaches original baseline levels of performance when the intervention is withdrawn, improves when the intervention is reinstated in the second intervention phase, and again changes in the final intervention phase.

The most commonly used version of the ABAB design has been discussed here as a four-phase design that alternates a single intervention with baseline phases. However, designs are available that include more than one treatment and more than four phases or that end with a new phase in which procedures are included to maintain the gains. For example, suppose that the treatment (B1) does not change behavior after the baseline phase. The investigator would not continue the phase but would try another treatment (B2). This latter treatment would constitute a new phase and would probably be implemented later in the design. The design could be represented as an AB1 B2 AB2 design.

As a general rule, problems related to reversing behavior make the ABAB design and its variations undesirable in educational and other applied settings. If a reversal does occur, that may be problematic if the behavior is important for the clients or for those in contact with them. If a reversal does not occur, this raises obstacles in concluding that the intervention led to the change. Yet the power of the design in demonstrating control of an intervention over behavior is very compelling. If behavior can, in effect, be turned on and off as a function of the intervention, this is a potent demonstration of a causal relation. Other designs can also demonstrate a causal relation without using a reversal of conditions.

Multiple-Baseline Designs. These designs evaluate change across different baselines that refer to two or more: behaviors of a given individual, individuals, settings, or time periods. The intervention is introduced to the different baselines at different points in time, for example, in a staggered fashion. Ideally, change occurs when the intervention is introduced in sequence to each of the baselines. The different baselines might, for example, consist of three children in a classroom (or three entire classrooms). Each child's behavior is observed and graphed separately. After baseline observations, the intervention is introduced to one of the children. The other children continue to be observed and remain in a baseline phase. Later the intervention is introduced to the other children in a staggered fashion so that by the end each is receiving the intervention. The effect of the intervention is demonstrated by showing that the behavior of each child changed when and only when the intervention was introduced.

Multiple-baseline designs are user friendly in educational and clinical applications because the intervention is applied in a gradual or sequential fashion across different responses of the individual (or different individuals, or different situations). If the intervention is effective, then it can be extended to all of the other responses for which change is desired. As important, if the intervention is not effective or not effective enough to achieve important changes, it can be altered or improved before it is extended.

Changing-Criterion Design. This design demonstrates the effect of an intervention by showing that behavior changes in increments to match a performance criterion. The design begins with a baseline phase after which the intervention is introduced. When the intervention is introduced, a specific level of performance is chosen as a criterion for the client. The daily criterion may be used as a basis for providing response consequences or an incentive (e.g., token reinforcement). When the performance meets or surpasses the criterion level on a given day (e.g., certain number of cigarettes smoked, number of calories consumed), the response consequence (e.g., tokens) is provided.

A specific criterion usually is invoked continuously for at least a few days. When performance consistently meets the criterion, the criterion is made more stringent (e.g., fewer cigarettes or calories consumed daily). Consequences are provided only for meeting the new criterion on a given day, and the criterion again is changed if the performance meets the criterion consistently. The criterion is repeatedly changed throughout the intervention phase until the terminal goal of the program is achieved. A causal relation between an intervention and behavior is demonstrated if behavior matches a constantly changing criterion for performance over the course of treatment. By implementing a given criterion for at least a few days (or even longer), the behavior shows a step-like effect that is not likely to result from a general incremental change occurring as a function of extraneous events.

As with the multiple-baseline design, the changing-criterion design can be quite compatible with demands of the applied settings. Many therapeutic regimens focus on gradual development of behavior or skills (e.g., improving reading comprehension or participation of activities) or reduction of problematic function (e.g., overcoming anxiety). Shaping these behaviors or gradually exposing individuals to an anxiety-provoking situations may proceed in ways that can reflect increasing the performance criteria as changes are evident. Thus, progress can be monitored and evaluated in a changing-criterion design.


Data evaluation refers to the way the numbers are examined to infer whether there was a veridical intervention effect. Investigators working with single-case designs as a matter of choice often prefer nonstatistical evaluation of the data, a method referred to as visual inspection. Visual inspection depends primarily on examining four characteristics of the data across phases of the design:

  • Changes in means: Consistent changes in means (average) across phases;
  • Changes in level: A shift or discontinuity of the data point from the end of one phase to the beginning the next phase; an index of the immediacy of change;
  • Changes in slope or trend: Changes the direction (e.g., accelerating or decelerating slope) of behavior as the intervention is applied or withdrawn; and
  • Latency of the change: The more immediate the change after a phase is altered, the more likely the intervention can be inferred to be responsible for change.

Although visual inspection is the most frequently used method of data evaluation, statistical tests are available and often applied to the data. Statistical tests for single-case designs consist of methods (e.g., time-series analyses, randomization tests) not usually taught in social and biological sciences. Characteristics of single case data (autocorrelation) often preclude the straightforward application of more familiar statistical tests.


There are strengths of the designs. First, they are well suited to evaluating interventions in diverse settings. Most programs in schools, institutions, the home, and the community at large are not evaluated empirically and have no evidence in their behalf in part because randomized controlled trials are not feasible. Single-case designs permit careful and rigorous evaluation. Second, the designs allow for changes during a study, a feature that is well suited to applied settings. If the intervention is not working well or optimally, modifications can be made, and the design can continue to provide a rigorous evaluation. Third, the designs can address many questions of interest in intervention research beyond the effects of a particular intervention such as the components of the intervention that contribute to change and the relative effectiveness of two or more interventions.

There are limitations as well. First, when only one or a few subjects are used, there is difficulty in identifying characteristics (moderators) that might explain why some individuals respond better than others or do not respond. The sample is inherently too small to conduct post hoc analyses of characteristics (e.g., by age, sex, ethnicity) that might influence responsiveness to treatment.

Second, an oft-cited but probably misunderstood concern or limitation is the extent to which the results with one or a few subjects can generalize to others not included in the study. This concern has not proven to be an issue. Also, the concern reflects a misunderstanding of group research; when means are compared one has no idea within a study how many individuals responded. In addition, unless the group was randomly selected from a population, generalizing beyond the sample is a problem. Tests of generality invariably require replication. The limitation of single-case designs is not generality of the effects but identifying the dimensions or categories which may influence the extent to which the intervention exerted impact.

Finally, the use of visual inspection can be a limitation of the designs. When the visual inspection criteria are not clearly met, agreement on interpretation of the data becomes less clear. Studies of how individuals invoke the criteria for visual inspection have shown that judges, even when experts in the field, often disagree about particular data patterns and whether the effects were reliable.


Single-case designs have been used extensively in educational settings, from preschool through college. There is a special role for these designs in schools because school is often a place in which diverse programs are implemented and at different levels (e.g., classroom, schools, districts, and states and provinces). Among the many examples are programs with an academic focus (e.g., reading comprehension), study habits (e.g., homework compliance and completion), classroom deportment (e.g., decreases in disruptive behavior), risky behaviors among adolescents (e.g., substance use, unprotected sex), skill acquisition (e.g., voice, musical instrument), and safety (e.g., driving). It is not feasible to even consider randomized trials to evaluate such programs. Single-case designs are quite useful for evaluation in general and in the many situations in which there is or cannot be a control group. Single-case designs and programs in educational (but other institutional settings) are a natural combination because one can develop the program by examining its impact on student functioning, can make changes during the evaluation to improve the program, and identify causal relation between interventions and outcomes without the constraints of large samples, random assignment, and control condition.

Single-case designs are rarely taught in graduate training in social and biological sciences. This is unfortunate because people in education, psychology, counseling, medicine, and other disciplines frequently are interested in evaluating interventions at the level of the individual, groups, and institutions. The designs can be used in applied settings and hence serve as a way to translate laboratory findings to real world settings as well as identifying promising interventions that might warrant further research in laboratory settings.


Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis. 2nd ed. Upper Saddle River, NJ: Pearson Prentice Hall.

Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York: Oxford University Press.

Kazdin, A. E. (2001). Behavior modification in applied settings (6th ed.). Belmont, CA: Wadsworth.

Kazdin, A. E. (2003). Research design in clinical psychology (4th ed.). Needham Heights, MA: Allyn & Bacon.

Kennedy, C. H. (2005). Single-case designs for educational research. Boston: Allyn & Bacon.

Kratochwill, T. R., & Levin, J. R. (Eds.). (1992). Single-case research design and analysis. Hillsdale, NJ: Erlbaum.