Random Sampling Help

By — McGraw-Hill Professional
Updated on Sep 13, 2011

Introduction to Random Sampling

When we want to analyze something in a large population and get an unbiased cross-section of that population, random sampling can be used. In order to ensure that a sampling process is random (or as close as we can get), we use sets, lists, or tables, of so-called random numbers.

A Random Sampling Frame

Think of the set T of all the telephone numbers in the United States of America (USA) in which the last two digits are 8 and 5. In theory, any element t of T can be expressed in a format like this:

t = abc-def-gh85

where each value a through h is some digit from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The dashes are not minus signs; they're included to separate number groups, as is customarily done in USA telephone numbers.

If you're familiar with the format of USA phone numbers, you'll know that the first three digits a, b, and c together form the area code, and the next three digits d, e, and f represent the dialing prefix. Some of the values generated in this way aren't valid USA phone numbers. For example, the format above implies that there can be 1000 area codes, but there aren't that many area codes in the set of valid USA phone numbers. If a randomly chosen phone number isn't a valid USA phone number, we agree to reject it. So all we need to do is generate random digits for each value of a through h in the following generalized number:


where h represents the "100s digit," g represents the "1000s digit," f represents the "10,000s digit," and so on up to a, which represents the "1,000,000,000s digit." We can pick out strings eight digits long from a random-digit list, and plug them in repeatedly as values a through h to get 10-digit phone numbers ending in 85.

Smaller and Smaller

In the above scenario, we're confronted with a gigantic sampling frame. In reality, the number of elements is smaller than the maximum possible, because many of the telephone numbers derived by random-number generation are not assigned to anybody.

Suppose we conduct an experiment that requires us to get a random sampling of telephone numbers in the USA. By generating samples on the above basis, that is, picking out those that end in the digits 85, we're off to a good start. It is reasonable to think that this sampling frame is an unbiased crosssection of all the phone numbers in the USA. But we'll want to use a smaller sampling frame.

What will happen if we go through a list of all the valid area codes in the USA, and throw out sequences abc that don't represent valid area codes? This still leaves us with a sampling frame larger than the set of all assigned numbers in the USA. How about allowing any area code, valid or not, but insisting that the number ab,cde,fgh be divisible by 7? Within the set of all possible such numbers, we would find the set of all numbers that produce a connection when dialed, that is, the set of all "actual numbers" (Fig. 5-2). Once we decided on a sampling frame, we could get busy with our research.

Replacement or Not?

View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
150 Characters allowed