Norm-Referenced Achievement Tests
Topics: Middle Years (5-9), College Admissions Test Preparation, more...
Human beings make tests. They decide what topics to include on the test, what kinds of questions to ask, and what the correct answers are, as well as how to use test scores. Tests can be made to compare students to each other (norm-referenced tests) or to see whether students have mastered a body of knowledge (criterion or standards-referenced tests). This fact sheet explains what NRTs are, their limitations and flaws, and how they affect schools.
What are norm-referenced tests?
Norm-referenced tests (NRTs) compare a person's score against the
scores of a group of people who have already taken the same exam, called
the "norming group." When you see scores in the paper which
report a school's scores as a percentage -- "the Lincoln school ranked at
the 49th percentile" -- or when you see your child's score reported that
way -- "Jamal scored at the 63rd percentile" -- the test is usually an NRT.
Most achievement NRTs are multiple-choice tests.
Some also include open-ended, short-answer questions. The questions on
these tests mainly reflect the content of nationally-used textbooks, not
the local curriculum. This means that students may be tested on things your
local schools or state education department decided were not so important
and therefore were not taught.
Commercial, national,
norm-referenced "achievement" tests include the California
Achievement Test (CAT); Comprehensive Test of Basic Skills (CTBS), which
includes the "Terra Nova"; Iowa Test of Basic Skills (ITBS) and Tests of
Academic Proficiency (TAP); Metropolitan Achievement Test (MAT); and
Stanford Achievement Test (SAT, not to be confused with the college
admissions SAT). "IQ," "cognitive ability," "school readiness," and
developmental screening tests are also NRTs.
Creating the bell curve.
NRTs are designed to "rank-order" test takers -- that is, to
compare students' scores. A commercial norm-referenced test does
not compare all the students who take the test in a given year. Instead,
test-makers select a sample from the target student population (say, ninth
graders). The test is "normed" on this sample, which is supposed to fairly
represent the entire target population (all ninth graders in the nation).
Students' scores are then reported in relation to the scores of this
"norming" group.
To make comparing easier, testmakers create exams in which the
results end up looking at least somewhat like a bell-shaped
curve (the "normal" curve, shown in the diagram). Testmakers
make the test so that most students will score near the middle, and only a
few will score low (the left side of the curve) or high (the right side of
the curve).
Scores are usually reported as percentile
ranks. The scores range from 1st percentile to 99th percentile,
with the average student score set at the 50th percentile. If Jamal scored
at the 63rd percentile, it means he scored higher than 63% of the test
takers in the norming group. Scores also can be reported as "grade
equivalents," "stanines," and "normal curve
equivalents."
One more question right or wrong can cause a
big change in the student's score. In some cases, having one
more correct answer can cause a student's reported percentile score to jump
more than ten points. It is very important to know how much difference in
the percentile rank would be caused by getting one or two more questions
right.
In making an NRT, it is often more important to
choose questions that sort people along the curve than it is to make sure
that the content covered by the test is adequate. The tests
sometimes emphasize small and meaningless differences among testtakers.
Since the tests are made to sort students, most of the things everyone
knows are not tested. Questions may be obscure or tricky, in order to help
rank order the testtakers.
Tests can be biased.
Some questions may favor one kind of student or another for reasons that
have nothing to do with the subject area being tested. Non-school knowledge
that is more commonly learned by middle or upper class children is often
included in tests. To help make the bell curve, testmakers usually
eliminate questions that students with low overall scores might get right
but those with high overall scores get wrong. Thus, most questions which
favor minority groups are eliminated.
NRTs usually have to
be completed in a time limit. Some students do not finish, even
if they know the material. This can be particularly unfair to students
whose first language is not English or who have learning disabilities. This
"speededness" is one way testmakers sort people out.
Reprinted with the permission of the National Center for Fair and Open Testing.
Take an action
- this article with friends and family.
- Have a question about Middle Years (5-9)? Ask it here.
- Publish your work on education.com.
Great Gift Ideas

to help build your child’s brain, and they’re chock full of fun! Browse Our Recommendations.
- 5 Tips to Help Your Kids Ace Their SAT Essay
- The "New" SAT: A Better Test or Just a Marketing Ploy?
- Ten Myths About the SAT
- Preparing for College: How Do I Set Up a Long-Range Plan?
- Myths and Realities about Testing
- SAT Versus ACT? Which Test is Right for Your Student?
- Put to the Test: Preparing for the SAT/ACT (for teens)
- Gender Bias in College Admissions Tests
- Norm-Referenced Achievement Tests
- Multiple-Choice Tests
