Reliability indicates the consistency or stability of test performance and is one of the most important considerations when selecting tests and other assessment tools. A test must be constructed so that examiners can administer the test with minimal errors and can interpret the performance of students with confidence.

The assessment process is subject to error from many sources. Errors in measurement can stem from the testing environment, the student, the test, and the examiner. Sources of error in the testing environment include:

  • Noise distractions
  • Poor lighting
  • Uncomfortable room temperature

Sources of error associated with the student include:

  • Hunger
  • Fatigue
  • Illness
  • Difficulty in understanding test instructions
  • Difficulty in understanding or interpreting language used

Sources of error stemming from the test include:

  • Ambiguously worded questions
  • Biased questions
  • Different interpretations of the wording of test questions

An examiner who is not prepared or who incorrectly interprets administration or scoring guidelines contributes to measurement errors. Sources of error associated with test administration include:

  • Unclear directions
  • Difficulty in achieving rapport
  • Insensitivity to student's culture, language, preferences, or other characteristics
  • Ambiguous scoring
  • Errors associated with recording information about the student

Reliability information that is reported in test manuals should be carefully considered. While there are some books and journal articles that report evaluations of tests, tests are not given "seals of approval." To be useful, they must meet certain standards. Three professional organizations, the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (1999) have published Standards for Educational and Psychological Testing, which provide criteria for evaluating tests, testing practices, and the effects of test use on individuals. The 1999 edition of Standards for Educational and Psychological Testing describes reliability and provides a departure from more traditional thinking about reliability. In this edition, reliability refers to the "scoring procedure that enables the examiner to quantify, evaluate, and interpret behavior or work samples. Reliability refers to the consistency of such measurements when the testing procedure is repeated on a population of individuals or groups" (p. 25).

Test developers convey reliability of assessment instruments in various ways. They are responsible for reporting evidence of reliability. Test users and consumers must use this evidence in deciding the suitability of various assessment instruments. While no one approach is preferred, educators should be familiar with all of the approaches in order to judge the usefulness of instruments. These approaches are: (1) one or more correlation coefficients, (2) variances or standard deviations of measurement errors, and (3) technical information about tests known as IRT (item response theory).