Norm-referenced scoring is the process of comparing one person's score relative to a group in order to determine the relative standing of that person tothe groupinthe area being testing (Thorndike, 2005). It is comparing one's score relative to the performance of others. Norm-referenced scoring is then a system of rank ordering and a way to give meaning to raw scores. Raw norm-referenced scoring does not indicate mastery and/or competency of skills. The group being compared against in norm-referenced scoring is termed the normative group or normative sample. The normative group should be representative of the person that is being compared against it (i.e., gender, race, geographical region, age, acculturation, language) in order for the scores to be meaningful.
Several key factors should be considered before using a norm-referenced assessment. One important factor is that the norming is recent, meaning that the sample that was used to collect the data is relatively recent. A good rule is to make sure that the norming data is no more than 10 years old. Also the normative groups should have been fairly large. Other important factors to consider are the reliability and validity of the scores that are given. Reliability refers to precision of scores. A highly reliable score is one which the tested person would achieve again if the same test were given a second time. Validity refers to the idea that the score is truly measuring what it purports to measure. Reliability and validity are reported in a range from zero to one. The closer to one the reported number is, the more reliable and valid the measure.
Norm-referenced scores are derived scores that can be reported in multiple forms. Raw scores (the actual number of items answered correctly) are not reported in norm-referenced scoring but are instead used to derive all other types of scores that can be reported. Norm-referenced scores can be reported in either a developmental format or a relative standing format. Both formats have strengths and weaknesses associated with them.
Developmental scores are ordinal scores, ordered from best to worst or worst to least in which adjacent values indicate a higher or lower value (Thorndike, 2005). There are two forms of developmental scores: age-equivalent or grade equivalent. The norms in these forms are then the average raw score of a particular age or grade indicated in the norm sample. For example, when using age-equivalents, an age equivalent of 12 would have been calculated in the norm sample by averaging all raw scores obtained by 12-year-olds. This calculation works in the same manner for grade equivalents. To say that a child is scoring at the third grade level indicates that a child obtained a raw score that was equivalent to the average raw score of third graders in the normative group. This statement does not, however, indicate in which way the score was obtained. The questions answered correctly by the person may be different from which questions were answered correctly in the norm sample. There is not an equal interval between scores. Age and grade equivalent scores have the advantage of seeming to be easily interpretable. A person's level of performance is compared to familiar milestones of age or education, which makes them easily explainable to a person with limited knowledge of score interpretation. For example, it is easy for a teacher to tell parents that their child is reading at a third grade level.
There are several drawbacks, though. Typically in the norm sample, representation of every month in school or every age is not possible. For example, the norm sample may have included students in grade 2.0, 3.0, and 4.0. The average raw score is easily obtained for those grades. But raw scores that fall in between these levels have to be translated into grade equivalents. Therefore, if a raw score of 30 indicated a grade equivalent of 2.0, and a raw score of 38 indicated a grade equivalent of 3.0, the raw scores of 31 to 37 have to be interpolated (assigned arithmetically to grades between or inside those tested (Thorndike, 2005). Not all grade or age equivalents may be assigned a raw score and there may not be equal intervals between scores or equal growth between years. For example, a raw score of 35 may translate into an age-equivalent of 7 years, a raw score of 36 may translate into an age-equivalent of 7 years, 1 month, and a raw score of 38 may translate in to an age-equivalent of 7 years, 8 months. Also, there may be possible raw scores that fall outside average raw scores obtained in the norm sample, which would then require extrapolation (arithmetically calculating outside the range tested) for those raw scores. For example, Test A has raw scores possible from 0 to 40. Grades 3.0 to 6.0 were included in the norm sample, and a raw score of 34 was the average for a sixth grader. The raw score of 40 falls outside the norm sample and would be calculated to be a grade equivalent greater than 6.0. Another important consideration with grade and age equivalents is whether there is meaning in assigning this type of score. For example, in most cases these types of score lose meaning in high school subject matter assessments. If a student is able to take biology in any grade of high school, it would not make sense then to administer a biology test and then to tell a student that he or she performed at, for example, a ninth grade equivalent.
Scores of relative standing can also be broken into several formats: standard scores, scaled scores, and percentiles. These types of scores are useful for interpreting scores when grade or age is not a concern. Standard scores represent deviations (or the scatter of individual scores in the norm group) from the mean (Thorndike, 2005). Standard scores are considered to be at equal intervals; the difference between two points is the same throughout the scale. Standard scales are useful when the norm group is distributed among all scores possible. The standard score may be reported as a z-score, where the mean (or average) raw score is transformed to equal zero. Fifty percent of all scores will then fall above zero and 50% will fall below zero. This type of score is often complicated because raw scores may indicate a negative standard score. To help with this problem, standard scores are typically converted into other scaling systems which are easily comparable and retain the same properties. For example, the average IQ standard score of 100 is equivalent to a z-score of zero. A scaled score is a standard score that has been converted to a mean of 10 with a standard deviation of three.
Percentile equivalents are also directly comparable to standard scores. A percentile is the percentage of people in the norm group that the test taker performed as well as or better than (Thorndike, 2005). For example, a percentile score of 67 would indicate that a person performed as well or better than 67 percent of the norm sample. Percentile scores have one crucial difference from the scaled scores. Percentiles are not in an equal interval format. There is smaller variation in the percentiles nearer the mean and larger variation in the percentiles further from the mean. For example, there is little difference in raw scores that equate to percentile ranking of 50 or 55, whereas there is typically a great difference between the raw scores that compute to percentile rankings of 90 and 95.
Overall normative scoring is useful for making comparisons of performance against a similar group but not useful for determining mastery of content.
Thorndike, R. M. (2005). Measurement and evaluation in psychology and education. Upper Saddle River, NJ: Prentice Hall.
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- 10 Fun Activities for Children with Autism
- Social Cognitive Theory
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- GED Math Practice Test 1
- The Homework Debate
- Problems With Standardized Testing