Standardized Testing

Updated on Dec 23, 2009

Standardized testing involves using testing instruments that are administered and scored in a pre-established standard or consistent manner. There are two types of standardized testing instruments: norm-referenced tests and criterion-referenced tests (IRA/NCTE Joint Task Force on Assessment, 1994). The former testing instruments yield scores that compare the examinee's scores to that of a representative sample (the normative group) of same-age or grade peers. The latter type of testing instrument involves comparing an examinee's score to a predetermined criterion (such as a school curriculum).

Norm-referenced Tests. Academic achievement tests and cognitive tests, commonly referred to as IQ tests, are well known examples of norm-referenced, standardized tests given to individuals. Most norm-referenced test batteries include a manual and/or computerized scoring program that (1) provides information regarding the normative, or standardization, sample; (2) provides information on reliability and validity, (3) provides language and presentation of items administration and scoring information, and (4) provides guidelines for the interpretation of the test results. Norm-referenced test performance is generally summarized as one or more types of scores such as age-equivalence, grade-equivalence, percentile rankings, stanine, scaled scores, indexes, clusters, or quotients (Mercer, 1997). Newer editions of test instruments follow an item-response-theory procedure in their development which can allow for a new type of scores. These scores (called W-, Growth, Change-Sensitive, Growth-Score-Value) allow an examinee's performance to be measured against themselves by establishing the difficulty level of the items.

Criterion-referenced Tests. Criterion-referenced tests are similar to norm-referenced tests in terms of administration, scoring, and format; however, they differ in terms of interpretation. Criterion-referenced test interpretation involves evaluating an examinee's performance in relation to a specific criterion. For instance, if a criterion were “the ability to subtract single digit numbers,” the interpretation would involve indicating simply whether or not the student answered the administered subtraction problem items correctly. A norm-referenced test interpretation, however, would involve whether this student correctly answered more questions compared to others in the normative group. Generally, criterion-referenced performance is summarized as percentage correct or represented as a grade-equivalent score (Weaver, 1990; Witt, Elliot, Daly, Gresham, & Kramer, 1998).

Criterion-referenced tests are sometimes misunderstood. Although these types of test can involve the use of a cutoff score (e.g., the point at which the examinee passes if the score exceeds this number), the cutoff score is not the criterion. Rather, the criterion refers to the content area domain that the test is intended to assess (Witt et al., 1998).


The quality, or adequacy, of any standardized testing instrument, whether norm-referenced or criterion-referenced, is directly empirically supported by both reliability and validity studies. Professional testing associations or organizations often publish standards that practitioners can refer to when evaluating the quality of a testing instrument. For instance, in the field of psychometrics, there is a set of standards titled, “Standards for Educational and Psychological Testing” which psychologists and other related practitioners can refer to when interested in the standards of test development or construction, fairness in testing, and testing applications. Additionally, practitioners can learn about the psychometric properties (e.g., reliability, validity) of tests by consulting sources such as Mental Measurements Yearbooks and Tests in Print, both available from the Buros Institute of Mental Measurements and housed within most major libraries (Mercer, 1997) or Test Critiques, available from Pro-Ed Publishers.


Using standardized tests to conduct assessments is advantageous for several reasons. First, because standardized tests yield quantifiable information (scores, proficiency levels, and so forth), results can be used in screening programs (e.g., identifying those students in need of further assessment). Second, standardized test results provide information regarding an examinee's areas of strength and weakness. Third, standardized test results allow a student to be compared to age- or grade-peers. Finally, standardized tests can be used to assess students' progress over time (e.g., readministering tests after the application of an intervention or following the institution of a remedial program) (IRA/NCTE Joint Task Force on Assessment, 1997; Witt et al., 1998). The most important advantage of results from a test administered in a standardized fashion is that the results can be documented and empirically verified. This then allows for the results to be interpreted and ideas about an individual's skills generalized.


Although standardized testing is beneficial in some situations, its use has been criticized, specifically because such measures fail to inform instruction adequately. Standardized administrations may not be possible for some students with disabilities. Some disabled students can take some test in the established standardized way with some accommodations. Some accommodations, however, can become modifications to the trait or concept attempting to be measured. Some other common criticisms or disadvantages of standardized tests are as follows: (1) standardized test items frequently are unrelated to those tasks and behaviors required in the classroom setting, (2) standardized test results reflect behavior or ability that has been measured during a single point in time and, as such, are greatly influenced by noncognitive factors (e.g., fatigue, attention, and so forth); (3) standardized test results do not provide the type of information required for making curricular modifications or instructional change, and (4) standardized administration procedures often prevent the examiner from obtaining useful information regarding the conditions under which the examinee may be able to improve performance (e.g., could a student with a language deficit benefit from clarification of test directions?) (Fuchs & Fuchs, 1986; Haywood & Tzuriel, 1992; Quinto & McKenna, 1977; Tzuriel, 2001; Tzuriel & Samuels, 2000).


Partly due to the criticisms of standardized testing and the need to generate information that can more directly guide instruction, alternatives to standardized testing have arisen. While there are various alternatives, three of the most commonly used alternatives are curriculum-based assessment, dynamic assessment, and alternative, or portfolio-based, assessment approaches.

Curriculum-Based Assessment. Although curriculum-based assessment (CBA) falls under the umbrella of criterion-referenced testing, it is thought of as an alternative to traditional, standardized norm-referenced academic testing. Curriculum-based assessment refers to a measurement method that relies on “direct observation and recording of a student's performance in the local curriculum as a basis for gathering information to make instructional decisions” (Deno, 1987, p. 41). CBA has also been referred to as direct assessment of the mastery of academic skills, and although models of CBA may differ, all share the common foundational assumption that one should assess what is taught, or more simply, “test what one teaches.” Typically, CBA approaches involve repeated assessment of specific academic skills (Lentz, 1988). In each academic area, probes are developed (e.g., short reading passages, samples of math computation items, and brief spelling word lists, and so forth) and used to collect student performance data. The curricular materials from the examinee's immediate learning environment are used to develop CBA probes. Given this, CBA provides a structured method for evaluating students' performances on curricular assignments used in their actual academic setting.

Generally, a student's responses are evaluated in terms of speed or proficiency, as well as for accuracy. Performance criteria are then developed to determine acceptable levels of student performance or mastery (Witt et al., 1998). Normative sampling is one procedure employed for establishing mastery criteria (Idol, 1993). This procedure involves collecting samples of average or acceptable student performance in the general education setting and using such samples to decide what the absolute mastery criteria should be. In some cases, a referred student may be so far below the levels of acceptable performance that a type of changing criterion design might have to be implemented. This type of design, which would allow the mastery criteria to reflect the classroom average, would permit a lowering of the criteria for subsequent instruction, and then allow the criteria to be made more stringent until the student reached the changed classroom average.

Overall, the basic assumption of a CBA approach is that in evaluating students' progress in reading and writing, researchers should observe them reading and writing in their academic environment, and should collect such data often so that they can efficiently ascertain whether a student is progressing adequately or falling behind. The ability to generalize from the results of CBA tests is limited.

Dynamic Assessment. Dynamic assessment refers to a particular type of learning assessment that involves the use of an active teaching process (Lidz, 1987). The goal of this teaching process is to “modify” an individual's cognitive functioning and to observe subsequent changes in the examinee's learning and use of problem-solving strategies. The primary goals of dynamic assessment are to (1) assess a student's ability to understand principles underlying a problem and to use that understanding to generate a solution, (2) assess the type and amount of teaching necessary to teach a student a specific rule or principle, and (3) identify any cognitive deficits and noncognitive factors that assist in explaining performance failures and to determine whether such factors can be modified by teaching alone (Lidz, 1987).

Dynamic assessment directly contrasts with static assessment procedures (i.e., standardized assessment), which involve examiners presenting items to examinees without any guidance, assistance, or any other intervention designed to change or improve the examinee's performance. A static test is usually based on a “question, record, and score” format wherein the examiner presents the question, records the examinee's response, and awards a prescribed number of points, based on the examinee's given response.

The difference between static and dynamic assessment approaches stems from the paradigms from which they emerged. Static assessment generally involves “passive acceptance,” wherein a child's deficits or disabilities are accepted and the environment is modified to help the child work within any identified limitations (Haywood, 1997). In contrast, dynamic assessment is based on “active modification,” wherein a concentrated effort is made to remediate any identified deficit or at least provide the child with compensatory strategies to circumvent the impact of any identified weakness (Haywood & Tzuriel, 1992).

The inherent limitations or inadequacy of standardized tests has motivated, in part, the development of dynamic assessment approaches. Static assessments have been criticized widely. Major criticisms involve the fact that (1) static tests do not provide important information about a child's learning processes or mediational strategies that can facilitate learning, (2) they do not result in clear recommendations for prescriptive teaching or remedial activities, and (3) they do not focus on noncognitive factors that influence an examinee's performance on standardized, cognitive assessments. Compared to static assessment, dynamic assessment is intended to provide information about (1) examinees' overall learning ability and information regarding how they learn, (2) specific cognitive factors that can assist in problem solving and can help the examiner understand potential factors related to academic successes and failures, (3) teaching strategies that seem to work for a given examinee, and (4) noncognitive factors that exert beneficial or negative influences on cognitive performance (e.g., heightened anxiety can impact performance on tests of perceptual speed).

The zone of proximal development (ZPD) developed by Lev Vygotsky (1896–1934) and Reuben Feuerstein's theory of mediated learning experience (MLE) served as the primary foundations for most of the dynamic assessment approaches (Feuerstein, Rand, & Hoffman, 1979; Tzuriel, 1999). It is important to note that dynamic assessment is intended to supplement, not replace, standardized testing. It is a broad assessment approach, rather than a particular test. Some standardized test batteries have features of dynamic assessment (e.g., KABC-II contains teaching items that can be used with examinees before the administration of sample items).

Disadvantages of dynamic assessment include: (1) the time and skill required to implement a dynamic assessment approach, (2) the extent to which cognitive modifi-ability can occur across all cognitive domains is largely unknown, and (3) the validation of DA is far more complex than validating static assessment approaches because dynamic assessment has broad goals (e.g., assess initial performance, assess cognitive functions, identify any deficit functions, determine the nature and amount of remediation needed to address the deficit, identify noncognitive factors and the role they play in cognitive performance, and identify the parameters or goals for future change). By allowing the examiner to administer a test instrument in a non-standardized way, the ability to replicate the test results is more limited due to the potentially inconsistent nature of test administration. Overall, to validate dynamic assessment approaches, one needs to develop criteria variables that measure changes that are directly relatable to any applied cognitive intervention.

Alternative, or Portfolio-Based Assessment. Another type of assessment is alternative assessment, or portfolio-based assessment. This type of assessment is often longitudinal and very idiosyncratic in nature, as teachers, students, and even parents at times, select pieces from a student's work over several years (four years, on average) to demonstrate what learning progress has occurred over the years. Alternative assessments encourage all relevant individuals (teachers, students, parents) to become active participants in the documentation of the learning process (Quinto & McKenna, 1977). Although the terms portfolio-assessment and performance assessment sound similar, the latter involves looking at actual student work produced over time and the processes by which the students produced such work, be it individually or col-laboratively. In contrast, the former involves focusing on the products and processes of learning as well as other factors, such as the students' interest in learning, their concept of themselves as readers and/or writers, and their ability to evaluate their own work and set learning goals for themselves. Examples of portfolio- or performance-based assessment include such things as tape-recorded samples of students' oral reading, results of reading interviews focused on identifying students' understanding of the reading process (e.g., strategies they used to decode problem words or comprehend text), records of students' reading lists to gain information regarding reading interests, and so forth.

Overall, alternative assessment is derived from student's daily classroom work. Minimally, it involves collecting student work samples, recording observations of learning processes, and student and/or teacher evaluation of students' processes and products. While such information can be summarized quantitatively for grading purposes, the primary goal of such assessment is to improve both teaching methods and students' learning.

Other Forms of Assessment and Testing. While static assessment approaches such as norm-referenced testing and criterion-referenced testing, curriculum-based assessment, and dynamic assessment approaches are used to varying extents in academic settings, other assessment techniques are also used, including interviews, anecdotal records, rating scales, classroom quizzes and tests, observation, and self-report techniques (Mercer, 1997; National Commission on Testing and Public Policy, 1990).

Each form of testing gathers information regarding a student or group of students and allows for a different type of interpretation and usage of data applied. Like the sides of a cut diamond, each shines in areas in which it is strong but is only a limited facet of the whole.


American Psychological Association, National Council on Measurement in Education, & American Educational Research Association. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

Deno, S. L. (1987). Curriculum-based measurement. Teaching Exceptional Children (20), 41.

Feuerstein, R., Rand, Y., & Hoffman, M. B. (1979). The dynamic assessment of retarded performers: The learning potential assessment device: Theory, instruments, and techniques. Baltimore, MD: University Park Press.

Fuchs, L. S., & Fuchs, D. (1986). Linking assessment to instructional intervention: An overview. School Psychology Review (15), 319–322.

Haywood, H. C. (1997). Interactive assessment. In R. L. Taylor (Ed.), Assessment of individuals with mental retardation (pp. 108–129). San Diego, CA: Singular.

Haywood, H. C., & Tzuriel, D. (1992). Interactive assessment. Berlin: Springer.

Idol, L. (1993). Special educator's consultation handbook. Austin, TX: Pro-Ed.

IRA/NCTE Joint Task Force on Assessment (1994). Standards for the assessment of reading and writing. Newark, DE: International Reading Association, and Urbana, IL: National Council of Teachers of English.

Lentz, F. E. (1988). Direct observation and measurement of academic skills: A conceptual review. In E. S. Shapiro & T. R. Kratochwill (Eds.). Behavioral assessment in schools (pp. 76–120). New York: Guilford.

Lidz, C. S. (1987). Dynamic assessment. New York: Guilford.

Mercer, C. D. (1997). Students with learning disabilities (5th ed.). Upper Saddle River, NJ: Prentice-Hall.

National Commission on Testing and Public Policy (1990). From gatekeeper to gateway: Transforming testing in America. Chestnut Hill, MA: Boston College.

Quinto, F., & McKenna, B. (1977). Alternatives to standardized testing. Washington, DC: National Education Association, Division of Instruction and Professional Development.

Tzuriel, D. (1999). Parent-child mediated learning transactions as determinants of cognitive modifiability: Recent research and future directions. Genetic, Social, and General Psychology Monograp, 109–156.

Tzuriel, D. (2001). Dynamic assessment of young children. New York: Kluwer Academic/Plenum.

Tzuriel, D., & Samuels, M. T. (2000). Dynamic assessment of learning potential: Inter-rater reliability of deficient cognitive functions, type of mediation, and non-intellective factors. Journal of Cognitive Education and Psychology, (1), 41–64.

Weaver, C. (1990). Understanding whole language: From principles to practice. Portsmouth, NH: Heinemann.

Witt, J. C., Elliot, S. N., Daly III, E. J., Gresham, F. M., & Kramer, J. J. (1998). Assessment of at-risk and special needs children (2nd ed.). Boston: McGraw-Hill.

Add your own comment

Washington Virtual Academies

Tuition-free online school for Washington students.