Validity and Reliability

The instruments selected should be valid and reliable. Information relating to validity and reliability studies can be found in the instruments technical manual, on the publisher’s Web site, or in test review publications.

Validity refers to the degree in which an instrument measures what it purports to measure. Three types of validity are typically reported.

  • Content Validity- the degree to which the questions on the test adequately cover or are representative of the domain (intelligence, creativity, leadership, etc.) under consideration.
  • Construct Validity- the degree to which an instrument measures the domain or construct that is purports to measure.
  • Criterion-Predictive Validity- the degree to which the test can predict performance on another measure that assesses the same area in a different way.

Reliability refers to the degree to which a test is consistent and stable over time in measuring what it is intended to measure.

Norming Samples

Norming samples used should be representative of the most recent census data, for this reason, tests that have not been renormed in more than 10 years should be avoided. The demographics of the students being tested should to the greatest extent possible match those of the norming sample.


Various types of scores are provided on assessments. These might include raw scores, standard scores, grade- and age-equivalent scores, percentile ranks, and stanines.

  • Raw Scores- the number of items answered correctly on the test. These scores are not comparable across tests and generally provide little information since they are not placed in any sort of context.
  • Standard Scores- are basically raw scores that have been translated using a conversion table provided with the test so that a student’s performance can be compared to others of the same age or grade level. Unlike percentile ranks, standard scores are expressed in standard deviation units on a normal curve and are comparable across tests.
  • Grade- and Age- Equivalent Scores- estimates that are used to describe a student’s score in terms of a grade or age level in which the student is functioning. These scores are often misinterpreted. For example, if a 4 th grade student receives a grade equivalent score of 8.1 on the reading portion of a grade-level achievement test, this does not mean the students is reading at the 8 th grade level. It means that this student reads 4 th grade material as well as the average 8 th grader would read it.
  • Percentile Ranks- indicates the percentage of others that the student did better than on the test. For example, a person scoring at the 88 th percentile, did better than 88% of those in the norming sample.
  • Stanines- short for standard nine, these scores range from 1 to 9. A stanine of 1, 2, or 3 is considered below average whereas stanine scores of 7, 8, or 9 are above average.