Source Data and Sampling Frames Help

By — McGraw-Hill Professional
Updated on Aug 26, 2011

Introduction to Sound Data and Sampling Frames

In the real world, data is gathered by a process called sampling. It's important that the sampling process be carried out correctly, and that errors of all kinds be minimized (unless the intent is to deceive).

When conducting a statistical experiment, four steps are followed. They must be done in order. Here they are:

  1. Formulate the question. What do we want to know, and what (or who) do we want to know it about?
  2. Gather the data from the required places, from the right people, and over a sufficient period of time.
  3. Organize and analyze the data, so it becomes information.
  4. Interpret the information gathered and organized from the experiment, so it becomes knowledge.

Source Data

Primary Versus Secondary

If no data are available for analysis, we have to collect it ourselves. Data collected by the statistician is called primary source data. If data are already available and all a statistician has to do is organize it and analyze it, then it is called secondary source data. There are certain precautions that must be taken when using either kind of data.

In the case of primary source data, we must be careful to follow the proper collection schemes, and then we have to be sure we use the proper methods to organize, evaluate, and interpret it. That is, we have to ensure that each of the above Steps 2, 3, and 4 are done properly. With secondary source data, the collection process has already been done for us, but we still have to organize, evaluate, and interpret it, so we have to carry out Steps 3 and 4. Either way, there's plenty to be concerned about. There are many ways for things to go wrong with an experiment, but only one way to get it right.

Sampling Frames

The most common data-collection schemes involve obtaining samples that represent a population with minimum (and ideally no) bias. This is easy when the population is small, because then the entire population can be sampled. However, a good sampling scheme can be difficult to organize when a population is large, and especially when it is not only huge but is spread out over a large region or over a long period of time.

The term population refers to a particular set of items, objects, phenomena, or people being analyzed. An example of a population is the set of all the insects in the world. A sample of a population is a subset of that population. Consider, as a sample from the foregoing population, the set of all the mosquitoes in the world that carry malaria.

It can be useful in some situations to define a set that is intermediate between a sample and a population. This is often the case when a population is huge. A sampling frame is a set of items within a population from which a sample is chosen. The idea is to whittle down the size of the sample, while still obtaining a sample that fairly represents the population. In the mosquito experiment, the sampling frame might be the set of all mosquitoes caught by a team of researchers, one for each 10,000 square kilometers (104 km2) of land surface area in the world, on the first day of each month for one complete calendar year. We could then test all the recovered insects for the presence of the malaria protozoan.

In the simplest case, the sampling frame coincides with the population (Fig. 5-1A). However, in the mosquito experiment described above, the sampling frame is small in comparison with the population (Fig. 5-1B). Occasionally, a population is so large, diverse, and complicated that two sampling frames might be used, one inside the other (Fig. 5-1C). If the number of mosquitoes caught in the above process is so large that it would take too much time to individually test them all, we could select, say, 1% of the mosquitoes at random from the ones caught, and test each one of them.

Sampling Frames

View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
150 Characters allowed