Education.com
Try
Brainzy
Try
Plus

Percentiles Help

By — McGraw-Hill Professional
Updated on Aug 26, 2011

Percentiles in a Normal Distribution

Percentiles divide a large data set into 100 intervals, each interval containing 1% of the elements in the set. There are 99 possible percentiles, not 100, because the percentiles represent the boundaries where the 100 intervals meet.

Imagine an experiment in which systolic blood pressure readings are taken for a large number of people. The systolic reading is the higher of the two numbers the instrument displays. So if your reading is 110/70, read ''110 over 70,'' the systolic pressure is 110. Suppose the results of this experiment are given to us in graphical form, and the curve looks like a continuous distribution because there the population is huge. Suppose it happens to be a normal distribution: bell-shaped and symmetrical (Fig. 4-1).

Let's choose some pressure value on the horizontal axis, and extend a line L straight up from it. The percentile corresponding to this pressure is determined by finding the number n such that at least n% of the area under the curve falls to the left of line L. Then, n is rounded to the nearest whole number between, and including, 1 and 99 to get the percentile p. For example, suppose the region to the left of the line L represents 93.3% of the area under the curve. Therefore, n = 93.3. Rounding to the nearest whole number between 1 and 99 gives p = 93. This means the blood pressure corresponding to the point where line L intersects the horizontal axis is in the 93rd percentile.

The location of any particular percentile point (boundary), say the pth, is found by locating the vertical line such that the percentage n of the area beneath the curve is exactly equal to p, and then noting the point where this line crosses the horizontal axis. In Fig. 4-1, imagine that line L can be moved back and forth like a sliding door. When the number n, representing the percentage of the area beneath the curve to the left of L, is exactly equal to 93 then the line crosses the horizontal axis at the 93rd percentile boundary point. Although it's tempting to think that there could be a ''0th percentile'' (n = 0) and a ''100th percentile'' (n = 100), neither of these ''percentiles'' represents a boundary where two intervals meet.

Note the difference between saying that a certain pressure ''is in'' the pth percentile, versus saying that a certain pressure ''is at'' the pth percentile. In the first case we're describing a data interval; in the second case we're talking about a boundary point between two intervals.

Percentiles in Tabular Data

Imagine that 1000 students take a 40-question test. There are 41 possible scores: 0 through 40. Suppose that every score is accounted for. There are some people who write perfect papers, and there are a few unfortunates who don't get any of the answers right. Table 4-1 shows the test results, with the scores in ascending order from 0 to 40 in the first column. For each possible score, the number of students getting that score (the absolute frequency) is shown in the second column. The third column shows the cumulative absolute frequency, expressed from lowest to highest scores.

Percentiles

Where do we put the 99 percentile points (boundaries) in this data set? How can we put 99 ''fault lines'' into a set that has only 41 possible values? The answer is, obviously, that we can't. What about grouping the students, then? A thousand people have taken the test. Why not break them up into 100 different groups with 99 different boundaries, and then call the 99 boundaries the ''percentile points,'' like this?

  • The ''worst'' 10 papers, and the 1st percentile point at the top of that group.
  • The ''2nd worst'' 10 papers, and the 2nd percentile point at the top of that group.
  • The ''3rd worst'' 10 papers, and the 3rd percentile point at the top of that group.
  • he ''pth worst'' 10 papers, and the pth percentile point at the top of that group.
  • he ''qth best'' 10 papers, and the (100 – q)th percentile point at the top of that group.
  • he ''3rd best'' 10 papers, and the 97th percentile point at the bottom of that group.
  • The ''2nd best'' 10 papers, and the 98th percentile point at the bottom of that group.
  • The ''best'' 10 papers, and the 99th percentile point at the bottom of that group.

This looks great at first, but there's a problem. When we check Table 4-1, we can see that 50 people have scored 31 on the test. That's five groups of 10 people, all with the same score. These scores are all ''equally good'' (or ''equally bad''). If we're going to say that any one of these papers is ''in the pth percentile,'' then clearly we must say that they are all ''in the pth percentile.'' We cannot arbitrarily take 10 papers with scores of 31 and put them in the pth percentile, then take 10 more papers with scores of 31 and put them in the p + 1st percentile, then take 10 more papers with scores of 31 and put them in the p + 2nd percentile, then take 10 more papers with scores of 31 and put them in the p + 3rd percentile, and then take 10 more papers with scores of 31 and put them in the p + 4th percentile. That would be unfair.

This business of percentiles is starting to get confusing and messy, isn't it? By now you must be wondering, ''Who invented this concept, anyway?'' That doesn't matter; the scheme is commonly used and we are stuck with it. What can we do to clear it all up and find a formula that makes sense in all possible scenarios?

View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
Ask
150 Characters allowed