Populations, Samples, and Variables Study Guide (page 2)

based on 4 ratings
Updated on Oct 5, 2011

Types of Variables

There are two primary branches of statistics: descriptive statistics and inferential statistics. Once data have been collected or an appropriate data source identified, the information should be organized and summarized. Tables, graphs, and numerical summaries allow increased understanding and are efficient ways to present the data. Descriptive statistics is the branch of statistics that focuses on summarizing and displaying data.

Sometimes, description alone is not enough. People want to use data to answer questions or to evaluate decisions that have been made. Inferential statistics is the branch of statistics that uses the information gathered from a sample to make statements about the population from which it was selected. Because we have seen only a portion of the population (the sample), there is a chance that an incorrect conclusion can be made about the population. One role of statistics is to quantify the chance of making an incorrect conclusion.

If every population unit (person or object) were identical, no need would exist for statistics. For example, if all adult men in the United States were exactly the same number of inches tall, we could measure the height of one adult male in the United States and then know exactly how tall all U.S. adult males are. Obviously, that will not work. The heights of men vary. Some are taller than 70 inches; some are shorter than 70 inches; a small proportion is 70 inches tall. It is this variability in heights that makes determining height characteristics about the population of U.S. adult males a statistical challenge.

Every person or object in a given population typically has several characteristics that might be studied. Suppose we are interested in studying the fish in a lake. The length, weight, age, gender, and the level of methyl mercury are but a few of the characteristics that could be recorded from each. A variable is a characteristic that may be recorded for each unit in the population, and the observed value of the variable is generally not the same for all units. The length, weight, age, gender, and mercury level of the fish are five variables, some or all of which might be of interest in a particular study.

Data consist of making observations on one or more variables for each sampled unit. A univariate data set consists of observations collected regarding only one variable from each unit in the sample or population. A bivariate data set results from observations collected regarding two variables from each unit in the sample or population. When observations are collected on three or more variables, then we have a multivariate data set. (Sometimes, bivariate data sets are called multivariate data sets. Because multi implies more than one, this is an acceptable use of the term.)

When working with bivariate or multivariate data, the variables may have different uses. For the fish data, the goal of the study may be to predict the level of methyl mercury in fish; that is, methyl mercury level is the response variable. A response variable, or outcome variable, is one whose outcome is of primary interest. The methyl mercury level could depend on many factors, including the environment and traits of the fish. Fish length, age, weight, and gender may be potentially useful in explaining the level of methyl mercury and are called explanatory variables. An explanatory variable is one that may explain or cause differences in the response variable.

Notice, in the fish example, that the natures of the variables differ. Length, weight, and age are numerical (or quantitative) variables; that is, each observation for these variables is a number. A numerical variable is said to be continuous if the set of possible values that may be observed for the variable has an uncountable number of points; that is, the set of possible values of the variable includes one or more intervals on the number line. Length and weight represent two continuous variables. Both must be positive. Although the sensitivity of the measuring device may limit us to recording observations to the nearest millimeter or gram, the true values could be any value in an interval.

A numerical value is said to be discrete if the set of possible values that may be observed for this variable has a countable number of points. The ages of fish are often determined by growth rings on the scales. In the summer, fish grow rapidly, forming a band of widely separated, light rings. During the winter, slower growth is indicated by narrow separations between the rings, resulting in a dark band. Each pair of rings indicates one year. Because fish spawn at a specific time of year, during the spring for many species, age is generally recorded by year. Age 0 fish are less than a year old, age 1 fish are between 1 and 2 years, etc. Thus, age is a discrete variable with possible values of 0, 1, 2, . . . Although there is undoubtedly an upper limit to age, we have represented the possible ages as being a countably infinite number of values.

Gender is a different type of variable; it is categorical (or qualitative) in nature. A variable is categorical if the possible responses are categories. Each fish is in one of two categories: male or female. We may arbitrarily associate a number with the category, but that does not change the nature of the variable. Car manufacturers, brands of battery, and types of injury are other examples of categorical variables.

Populations, Samples, and Variables In Short

This lesson has provided a brief overview of some of the key ideas in statistics. As with any science, terms have special meaning, and a number of the common statistical terms have been introduced in this lesson. Both the ideas and terms will be encountered frequently throughout this text, helping you to become more comfortable with them.

Find practice problems and solutions for these concepts at Populations, Samples and Variables Practice Exercises.

View Full Article
Add your own comment