Describing and Displaying Categorical Data Study Guide (page 2)
Introduction to Describing and Displaying Categorical Data
Population is the set of objects or individuals of interest, and that the sample is a subset of the population. When presented with a set of either population or sample values, we need to summarize them in some way if we are to gain insight into the information they provide. In this lesson, we will focus on the presentation of categorical data, both categorically and graphically.
Population Distribution versus Sample Distribution
Suppose that a high school orchestra has 62 members, 35 females and 27 males. The population of interest is the set of all orchestra members. Data are available on the categorical random variable, gender. The collection of genders for the population represents the population distribution of genders. In this example, the population is finite because there are a finite number of units (orchestra members) in the population.
Suppose 15 of the orchestra members are randomly selected. The genders of these 15 represent the sample distribution of gender. A summary measure of a population is called a parameter; a summary measure of a sample distribution, which is a function of sample values and has no unknown parameters, is called a statistic. Frequencies and relative frequencies are important summary measures for categorical variables such as the gender of the orchestra members.
As the sample size increases, the sample distribution tends to be more like the population distribution as long as the units for the sample have been drawn at random from the population. This is comforting in that, intuitively, we expect for the sample "be better" as the sample size increases.What we mean by "better" is not always clear. However, the fact that the sample distribution begins to tend toward the population distribution is one way in which we have done better. Other measures of "better" will be discussed in later lessons.
Frequency and Relative Frequency
The nature of categorical data leads to counts of the numbers falling within each category: the numbers of females and males; the numbers of red, yellow, or blue items; and the numbers ordering pizza, hamburger, or chicken. Notice, if we have two categories,we have two counts; three categories, three counts; and so on.
The number of times a category appears in a data set is called the frequency of that category. The relative frequency of a category is the proportion of times that category occurs in the data set; that is,
relative frequency = frequency/number of observations in the data set
These frequencies or relative frequencies are best organized in tabular form. The table should display all possible categories and the frequencies or relative frequencies. The frequency distribution (or relative frequency distribution) for categorical data is the categories with their associated frequencies (or relative frequencies). It is important to remember that if we have all population values, we can find the population frequencies and population relative frequencies. Otherwise, population values, then we can find the sample frequencies and sample relative frequencies. For the band members, we have all population values. The population frequency and relative frequency distributions for the genders of these band members are displayed in Table 3.1.
In the orchestra of 62 members, 36 play a string instrument, 12 play a woodwind, ten play brass, and four play percussion. Provide a tabular display of the frequency and relative frequency distributions for the type of instruments for this orchestra.
Note that, although the frequencies sum properly to the total number of orchestra members, the relative frequencies actually sum to 0.99, not to 1 as indicated. The reason for this is round-off error. When dividing the category frequency by the total number of members, the result was not always an exact two-decimal value.We rounded to two decimal places. It is better to use the 1.00 as the total rather than reﬂecting the rounding error in the total.
The orchestra is going to a special awards banquet during which dinner will be served. The hosts of the banquet need to know in advance whether the orchestra members prefer steak, ﬁsh, or pasta as their main dish. Thirty-two members choose steak, 12 choose ﬁsh, and 18 choose pasta. Provide a tabular display of the frequency and relative frequency distributions of the orchestra members' main dish choices.
Pie Charts - Visual Displays for Categorical Data
Pie charts and bar charts are common graphical approaches to displaying data. The frequencies, relative frequencies, or percentages can be presented graphically using pie charts or bar charts. For each category of chart, percent = relative frequency × 100%.
To make a pie chart, ﬁrst draw a circle to represent the entire data set. For each category, the "slice" size is the category's relative frequency times 360 (because there are 360 ° in a circle). Each slice should be labeled with the category name. The numerical value of the frequency, relative frequency, or percentage associated with each slice should also be shown on the graph. Percentages are presented most commonly in pie charts. As an example, consider the frequency distribution of the high school orchestra members' gender. Thirty-ﬁve or 56.5% were female, and 27 or 43.5% were male. For the pie chart, the slice for females is of the pie; the remaining is for the males. See Figure 3.1.
For the relative frequency distribution of the types of instruments played by the high school orchestra members discussed earlier in the lesson, create a pie chart.
First, the size of each pie must be found. For example, for the woodwinds, the slice of the pie is . After performing this calculation for each instrument type, the following pie chart can be created (see Figure 3.2).
Create a pie chart of the frequency and relative frequency distribution of the orchestra members' meal choice for the awards banquet.
See Figure 3.3 for an illustration of the orchestra members' meal choices.
Bar Charts - Visual Displays for Categorical Data
Bar charts may be used to display frequencies, relative frequencies, or percentages represented by each category in a data set. If there is only a response variable and frequencies are to be presented, a bar is used for each category and the height of the bar corresponds to the number of times that category occurs in the data set. For a relative frequency bar chart, a bar is used for each category, and the height of the bar corresponds to the proportion of times that response occurs in the data set. As an illustration, consider a relative frequency bar chart of the genders of the orchestra members.
Notice that in Figure 3.4, both the x- and y-axes are labeled. Categories are displayed on the x-axis, and an appropriate scale is used on the y-axis.
- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Grammar Lesson: Complete and Simple Predicates
- Definitions of Social Studies
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- Social Cognitive Theory
- How to Practice Preschool Letter and Name Writing
- Theories of Learning