Elementary Statistics for AP Psychology
Practice questions for this study guide can be found at:
A large amount of data can be collected in research studies. Psychologists need to make sense of the data. Qualitative data are frequently changed to numerical data for ease of handling. Quantitative data already is numerical. Numbers that are used simply to name something are said to be on a nominal scale and can be used to count the number of cases. For example, for a survey, girls can be designated as "1," whereas boys can be designated as "2." These numbers have no intrinsic meaning. Numbers that can be ranked are said to be on an ordinal scale, and can be put in order. For example, the highest scorer can be designated as "1," the second highest as "2," the third highest as "3," etc. These numbers cannot be averaged. Number 1 could have scored 50 points higher than 2. Number 2 may have scored 4 points higher than 3. If there is a meaningful difference between each of the numbers, the numbers are said to be on an interval scale. For example, the difference between 32° Fahrenheit (F) and 42°F is 10°F. The difference between 64°F and 74°F is also 10°F. However, 64°F is not twice as hot as 32°F. When a meaningful ratio can be made with two numbers, the numbers are said to be on a ratio scale. The key difference between an interval scale and a ratio scale is that the ratio scale has a real or absolute zero point. For quantities of weight, volume, and distance, zero is a meaningful concept, whereas the meaning of 0°F is arbitrary.
Statistics is a field that involves the analysis of numerical data about representative samples of populations.
Numbers that summarize a set of research data obtained from a sample are called descriptive statistics. In general, descriptive statistics describe sets of interval or ratio data. After collecting data, psychologists organize the data to create a frequency distribution, an orderly arrangement of scores indicating the frequency of each score or group of scores. The data can be pictured as a histogram—a bar graph from the frequency distribution—or as a frequency polygon—a line graph that replaces the bars with single points and connects the points with a line. With a very large number of data points, the frequency polygon approaches a smooth curve. Frequency polygraphs are shown in Figure 6.1.
Measures of Central Tendency
Measures of central tendency describe the average or most typical scores for a set of research data or distribution. Measures of central tendency include the mode, median, and mean. The mode is the most frequently occurring score in a set of research data. If two scores appear most frequently, the distribution is bimodal; if three or more scores appear most frequently, the distribution is multimodal. The median is the middle score when the set of data is ordered by size. For an odd number of scores, the median is the middle one. For an even number of scores, the median lies halfway between the two middle scores. The mean is the arithmetic average of the set of scores. The mean is determined by adding up all of the scores, then dividing by the number of scores. For the set of quiz scores 5, 6, 7, 7, 7, 8, 8, 9, 9, 10; the mode is 7; the median is 7.5; the mean is 7.6. The mode is the least used measure of central tendency, but can be useful to provide a "quick and dirty" measure of central tendency especially when the set of data has not been ordered. The mean is generally the preferred measure of central tendency because it takes into account the information in all of the data points; however, it is very sensitive to extremes. The mean is pulled in the direction of extreme data points. The advantage of the median is that it is less sensitive to extremes, but it doesn't take into account all of the information in the data points. The mean, mode, and median turn out to be the same score in symmetrical distributions. The two sides of the frequency polygon are mirror images as shown in Figure 6.1a. The normal distribution or normal curve is a symmetric, bell-shaped curve that represents data about how many human characteristics are dispersed in the population. Distributions where most of the scores are squeezed into one end are skewed. A few of the scores stretch out away from the group like a tail. The skew is named for the direction of the tail. Figure 6.1b pictures a negatively skewed distribution, and Figure 6.1c shows a positively skewed distribution. The mean is pulled in the direction of the tails, so the mean is lower than the median in a negatively skewed distribution, and higher than the median in a positively skewed distribution. In very skewed distributions, the median is a better measure of central tendency than the mean.
Measures of Variability
Variability describes the spread or dispersion of scores for a set of research data or distribution. Measures of variability include the range, variance, and standard deviation. The range is the largest score minus the smallest score. It is a rough measure of dispersion. For the same set of quiz scores (5, 6, 7, 7, 7, 8, 8, 9, 9, 10), the range is 5. Variance and standard deviation (SD) indicate the degree to which scores differ from each other and vary around the mean value for the set. Variance and standard deviation indicate both how much scores group together and how dispersed they are. Variance is determined by computing the difference between each value and the mean, squaring the difference between each value and the mean (to eliminate negative signs), summing the squared differences, then taking the average of the sum of squared differences. The standard deviation of the distribution is the square root of the variance. For a different set of quiz scores (6, 7, 8, 8, 8, 8, 8, 8, 9, 10), the variance is 1 and the SD is 1. Standard deviation must fall between 0 and half the value of the range. If the standard deviation approaches 0, scores are very similar to each other and very close to the mean. If the standard deviation approaches half the value of the range, scores vary greatly from the mean. Frequency polygons with the same mean and the same range, but a different standard deviation, that are plotted on the same axes show a difference in variability by their shapes. The taller and narrower frequency polygon shows less variability and has a lower standard deviation than the short and wider one.
Since you don't bring a calculator to the exam, you won't be required to figure out variance or standard deviation.