**Introduction to Random Sampling**

When we want to analyze something in a large population and get an unbiased cross-section of that population, *random sampling* can be used. In order to ensure that a sampling process is random (or as close as we can get), we use sets, lists, or tables, of so-called *random numbers*.

**A Random Sampling Frame**

Think of the set *T* of all the telephone numbers in the United States of America (USA) in which the last two digits are 8 and 5. In theory, any element *t* of *T* can be expressed in a format like this:

*t* = *abc*-*def*-*gh*85

where each value *a* through *h* is some digit from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The dashes are not minus signs; they're included to separate number groups, as is customarily done in USA telephone numbers.

If you're familiar with the format of USA phone numbers, you'll know that the first three digits *a*, *b*, and *c* together form the *area code*, and the next three digits *d*, *e*, and *f* represent the *dialing prefix*. Some of the values generated in this way aren't valid USA phone numbers. For example, the format above implies that there can be 1000 area codes, but there aren't that many area codes in the set of valid USA phone numbers. If a randomly chosen phone number isn't a valid USA phone number, we agree to reject it. So all we need to do is generate random digits for each value of *a* through *h* in the following generalized number:

*a*,*bcd*,*efg*,*h*85

where *h* represents the "100s digit," *g* represents the "1000s digit," *f* represents the "10,000s digit," and so on up to *a*, which represents the "1,000,000,000s digit." We can pick out strings eight digits long from a random-digit list, and plug them in repeatedly as values *a* through *h* to get 10-digit phone numbers ending in 85.

**Smaller and Smaller**

In the above scenario, we're confronted with a gigantic sampling frame. In reality, the number of elements is smaller than the maximum possible, because many of the telephone numbers derived by random-number generation are not assigned to anybody.

Suppose we conduct an experiment that requires us to get a random sampling of telephone numbers in the USA. By generating samples on the above basis, that is, picking out those that end in the digits 85, we're off to a good start. It is reasonable to think that this sampling frame is an unbiased crosssection of all the phone numbers in the USA. But we'll want to use a smaller sampling frame.

What will happen if we go through a list of all the valid area codes in the USA, and throw out sequences *abc* that don't represent valid area codes? This still leaves us with a sampling frame larger than the set of all assigned numbers in the USA. How about allowing any area code, valid or not, but insisting that the number *ab*,*cde*,*fgh* be divisible by 7? Within the set of all possible such numbers, we would find the set of all numbers that produce a connection when dialed, that is, the set of all "actual numbers" (Fig. 5-2). Once we decided on a sampling frame, we could get busy with our research.

### Ask a Question

Have questions about this article or topic? Ask### Related Questions

See More Questions### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Signs Your Child Might Have Asperger's Syndrome
- Definitions of Social Studies
- A Teacher's Guide to Differentiating Instruction
- Curriculum Definition
- Theories of Learning
- What Makes a School Effective?
- Child Development Theories