Validity and Reliability
The instruments selected should be valid and reliable. Information relating to validity and reliability studies can be found in the instruments technical manual, on the publisher’s Web site, or in test review publications.
Validity refers to the degree in which an instrument measures what it purports to measure. Three types of validity are typically reported.
- Content Validity- the degree to which the questions on the test adequately cover or are representative of the domain (intelligence, creativity, leadership, etc.) under consideration.
- Construct Validity- the degree to which an instrument measures the domain or construct that is purports to measure.
- Criterion-Predictive Validity- the degree to which the test can predict performance on another measure that assesses the same area in a different way.
Reliability refers to the degree to which a test is consistent and stable over time in measuring what it is intended to measure.
Norming Samples
Norming samples used should be representative of the most recent census data, for this reason, tests that have not been renormed in more than 10 years should be avoided. The demographics of the students being tested should to the greatest extent possible match those of the norming sample.
Scores
Various types of scores are provided on assessments. These might include raw scores, standard scores, grade- and age-equivalent scores, percentile ranks, and stanines.
- Raw Scores- the number of items answered correctly on the test. These scores are not comparable across tests and generally provide little information since they are not placed in any sort of context.
- Standard Scores- are basically raw scores that have been translated using a conversion table provided with the test so that a student’s performance can be compared to others of the same age or grade level. Unlike percentile ranks, standard scores are expressed in standard deviation units on a normal curve and are comparable across tests.
- Grade- and Age- Equivalent Scores- estimates that are used to describe a student’s score in terms of a grade or age level in which the student is functioning. These scores are often misinterpreted. For example, if a 4 th grade student receives a grade equivalent score of 8.1 on the reading portion of a grade-level achievement test, this does not mean the students is reading at the 8 th grade level. It means that this student reads 4 th grade material as well as the average 8 th grader would read it.
- Percentile Ranks- indicates the percentage of others that the student did better than on the test. For example, a person scoring at the 88 th percentile, did better than 88% of those in the norming sample.
- Stanines- short for standard nine, these scores range from 1 to 9. A stanine of 1, 2, or 3 is considered below average whereas stanine scores of 7, 8, or 9 are above average.
Reprinted with the permission of Duke University. © 2008 Duke University Talent Identification Program.
Add your own comment
Ask a Question
Have questions about this article or topic? AskRelated Questions
See More QuestionsToday on Education.com
Popular Articles
- Kindergarten Sight Words List
- The Five Warning Signs of Asperger's Syndrome
- What Makes a School Effective?
- Child Development Theories
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- 10 Fun Activities for Children with Autism
- Bullying in Schools
- Test Problems: Seven Reasons Why Standardized Tests Are Not Working
- Should Your Child Be Held Back a Grade? Know Your Rights
- First Grade Sight Words List

Celebrate Memorial Day! Worksheets and Activities About American History
Get Outside! 10 Playful Activities 