Intelligence Testing

Updated on Dec 23, 2009

Intelligence and intelligence testing are two of the most controversial and highly polemic of all topics in the field of psychology. It seems that psychologists, educators, and indeed, the lay public alike, all have a love-hate relationship with the concept of intelligence and even more so with intelligence testing. Some form of intelligence testing is one of the most widely used of all forms of psychological tests. While tests for special aptitudes are available, and these are widely used for specialized diagnostic purposes as well as specialized aspects of personnel selection, these tests all measure some aspect of intellectual function. This entry describes more generally intelligence testing, provides a brief history of intelligence tests, presents their fundamental assumptions, applications, and an introduction to their interpretation.


Achievement tests as those designed to assess students' knowledge or skills in a content domain in which they have received instruction. In contrast, intelligence tests are broader in scope than achievement tests and are designed to measure the cognitive skills, abilities, and knowledge that individuals have accumulated as the result of their overall life experiences coupled with skills in application of these attributes to problem-solving. In other words, while achievement tests are tied to a specific program of instruction, intelligence tests reflect the cumulative impact of life experiences as a whole in concert with an individual's underlying or latent ability to use information. The general intelligence factor, g, is the most reliable component present in any multifactorial view of intelligence (Jensen, 1998). In the Cattell-Horn model (Horn & Cattell, 1966; Kamphaus, 2001) of intelligence, g is the dominant factor in the hierarchy of multiple abilities, with the next two dominant facets being crystallized and fluid intelligence.

Crystallized intelligence tends to be related more closely to verbal domains as a practical matter and is defined as the application of knowledge to problem solving. Fluid intelligence tends to be related more closely to nonverbal domains as a practical matter and is defined more strictly as reasoning and problem solving in the absence of any requirement for prior knowledge. It turns out that people do not really know how to assess reasoning and problem solving in the total absence of knowledge and so most tests of fluid intelligence attempt to approximate this perfect state to the extent possible by using principally nonverbal tasks that do not require knowledge of language or language concepts (Reynolds & Kamphaus, 2003).

The inclusion of crystallized intelligence measures as a component of most intelligence tests has led many people to believe, erroneously, that intelligence tests are simply measures of what people have learned. While intelligence and knowledge are certainly correlated, intelligence as measured on modern individually administered tests of intelligence and even many group measures is more directed at the assessment of problem solving and reasoning skill as opposed to static knowledge or learned content. The latter is the domain of achievement testing (Reynolds, Livingston, & Willson, 2006).

This introduction might suggest that there is a clear and universally accepted distinction between achievement and intelligence tests. However, in actual practice such is not the case and the distinction is actually a matter of degree. Many, if not most, testing experts conceptualize both achievement and intelligence tests as tests of developed cognitive abilities that can be ordered along a continuum in terms of how closely linked the assessed abilities are to specific learning experiences. The abilities measured by achievement tests are specifically linked to academic instruction or training. In contrast, the knowledge and abilities measured by intelligence tests are acquired through a broad-range of life experiences, including those at school, home, work, and all other settings.

General intelligence tests historically have been the most popular and widely used aptitude tests in school settings. While many people are familiar with the concept of intelligence and use the term in everyday conversations, it is not easy to develop a definition of intelligence on which everyone agrees. While many people, lay or professional, will have their own separate definition of intelligence, most of these definitions will incorporate abilities such as problem solving, abstract reasoning, and the ability to acquire knowledge. Developing a consensus beyond this point has proved quite difficult.


Intelligence tests had their beginning in the schools, in the early 1900s in France when a compulsory education program was initiated. Alfred Binet (1857–1911) and his colleague Theodore Simon (1873–1961) had been attempting to develop a measure of intelligence for some years and were commissioned by the French government to develop a test that could predict academic performance accurately. The result of their efforts was the first Binet-Simon Scale released in 1905. This test contained problems arranged in the order of their difficulty and assessing a wide range of abilities. The test contained some sensory-perceptual tests, but the emphasis was on verbal items assessing comprehension, reasoning, and judgment. Subsequent revisions of the Binet-Simon Scale were released in 1908 and 1911. These scales gained wide acceptance in France and were soon translated and standardized in the United States by Louis Terman (d. 1959) at Stanford University. Terman's work resulted in the Stanford-Binet Intelligence Test (1916), which has been revised numerous times and continues to be a prominent intelligence test used in the early 2000s.

The introduction of the Stanford Binet intelligence scales in the United States by Terman occurred in close proximity to World War I. Seeing the success of this approach to measuring mental ability, the U.S. Army set about to devise a means of evaluating recruits. A group of psychologists headed by Robert Yerkes (1876–1956) subsequently developed the Army Alpha and Army Beta examinations, which quickly became the most widely used group intelligence tests in the world. This widespread use also had the effect of familiarizing literally millions of individuals with the concept of intelligence testing and made it an acceptable enterprise. Not long afterward, the College Entrance Examination Board began development and employment of what became the SAT, a conglomerated measure of achievement and intelligence.

The development and success of the Binet-Simon Scale, and subsequently the Stanford-Binet Intelligence Test and the U.S. Army testing programs, ushered in the era of widespread intelligence testing in the United States. Following the model of the Stanford-Binet Intelligence Test, other assessment experts developed and released their own intelligence tests. Some of the tests were designed for individual administration (such as the Stanford-Binet Intelligence Test) while others were designed for group administration. Some of these tests placed more emphasis on verbal and quantitative abilities while others placed more emphasis on visual-spatial and/or abstract problem-solving abilities. As a general rule, research has shown with considerable consistency that contemporary intelligence tests are good predictors of academic success. This correlation is to be expected considering this was the precise purpose for which they were initially developed over 100 years earlier. In addition to being good predictors of school performance, research showed that IQs are fairly stable over time. Nevertheless, these tests became controversial as a result of the often-emotional debate over the meaning of intelligence. To try and avoid this association and possible misinterpretations, many test publishers adopted more neutral names such as “academic potential,” “scholastic ability,” “school ability,” “mental ability,” or simply “ability” to designate essentially the same construct to which the term intelligence referred.


Clearly, aptitude and intelligence tests have a long history of use in the schools. Their widespread use continues in the early 2000s, with major applications including the following (Reynolds et al., 2006; Reynolds & Kamphaus, 2003):

Providing alternative measures of cognitive abilities that reflect information not captured by standard achievement tests or school grades,

Providing objective evaluations of ability that do not reflect the subjective judgment of observers or others who may be influenced by irrelevant factors,

Helping teachers tailor instruction to meet a student's unique pattern of cognitive strengths and weaknesses,

Assessing how well students are prepared to profit from school experiences,

Identifying students who are underachieving and may need further assessment to rule-out learning disabilities or other cognitive disorders, including mental retardation or intellectual disability,

Identifying students for gifted and talented programs,

Helping guide parents and students with educational and vocational planning.

While this list identifies the most common uses of aptitude/intelligence tests in the schools, the list is not exhaustive. Classroom teachers and school administrators are involved to varying degrees with these applications. For example, teachers are frequently called on to administer and interpret many of the group aptitude tests for their own students. School psychologists or others professionals with specific training in administering and interpreting clinical and diagnostic tests typically administer and interpret the individual intelligence and aptitude tests.


As with achievement tests, group and individual intelligence tests are commonly used in schools. Whereas teachers are often asked to help administer and interpret the group aptitude tests, school psychologists and other professionals with special training in administering and interpreting clinical and diagnostic tests usually administer and interpret the individual tests. The most frequently employed individually administered intelligence tests are reviewed briefly below.

Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV). The WISC-IV is as of 2007 the most popular individual test of intellectual ability for children. Empirical surveys of school psychologists and other assessment personnel have consistently shown that the Wechsler scales are the most popular individual intelligence test used in clinical and school settings with children. The WISC-IV, as is true of virtually all individually administered intelligence tests, must be administered by professionals with extensive training in psychological assessment. The WISC-IV is one of the longest of such intellectual assessments and takes approximately 2 to 3 hours to administer and score. Below are brief descriptions of the subtests (Wechsler, 2003):

Arithmetic—the student is presented a set of arithmetic problems that they solve mentally (i.e., no pencil and paper) and answer orally. This subtest involves numerical reasoning ability, mental manipulation, concentration, and auditory memory.

Block Design—the student reproduces a series of geometric patterns using red-and-white blocks. This subtest measures the ability to analyze and synthesize abstract visual stimuli, nonverbal concept formation, and perceptual organization.

Cancellation—the student scans sequences of visual stimuli and marks target forms. This subtest involves processing speed, visual attention, and vigilance.

Coding—the student matches and copies symbols that are associated with either objects (i.e., Coding A) or numbers (Coding B). This subtest is a measure of processing speed, short-term visual memory, mental flexibility, attention, and motivation.

Comprehension—the student responds to questions that are presented orally involving everyday problems or social situations. This subtest is a measure of verbal comprehension and reasoning as well as the ability to apply practical information.

Digit Span—the student is presented orally sequences of numbers that they repeat verbatim (i.e., Digits Forward) or in reverse order (i.e., Digits Backwards). This subtest involves short-term auditory memory, attention, and on Digits Backwards, mental manipulation.

Information—the student responds to questions that are presented orally involving a broad range of knowledge (e.g., science, history, and geography). This subtest measures the student's general fund of knowledge.

Letter-Number Sequencing—the student reads a list of letters and numbers and then recalls the letters in alphabetical order and the numbers in numerical order. This subtest involves short-term memory, sequencing, mental manipulation, and attention.

Matrix Reasoning—the student examines an incomplete matrix and then selects the item that correctly completes the matrix. This subtest is a measure of fluid intelligence and is considered a largely language-free and culture-fair measure of intelligence.

Picture Completion—the student is presented a set of pictures and must identify what important part is missing. This subtest measures visual scanning and organization as well as attention to essential details.

Picture Concepts—the student examines rows of objects and then selects objects that go together based on an underlying concept. This subtest involves nonverbal abstract reasoning and categorization.

Similarities—two words are presented orally to the student and the student must identify how they are similar. This subtest measures verbal comprehension, reasoning, and concept formation.

Symbol Search—the student scans groups of symbols and indicates if a target symbol is present. This subtest is a measure of processing speed, visual scanning, and concentration.

Vocabulary—the student is presented orally a series of words that the student must define. This subtest is primarily a measure of word knowledge and verbal conceptualization.

Word Reasoning—the student must identify the underlying or common concept that is implied by a series of clues. This subtest involves verbal comprehension, abstraction, and reasoning.

Information, Word Reasoning, Picture Completion, Arithmetic, and Cancellation are supplemental subtests while the other subtests are core subtests. The administration of supplemental subtests is not mandatory, but they may be used to substitute for a core subtest if the core subtest is seen as being inappropriate for a particular student (e.g., due to physical limitation). A supplemental subtest may also be used if a core subtest is invalidated for some reason (e.g., its administration is interrupted).

The WISC-IV produces four Index Scores. Below are brief descriptions of the Index Scores (Wechsler, 2003):

Verbal Comprehension Index (VCI) is a composite of Similarities, Vocabulary, and Comprehension. Information and Word Reasoning are supplemental VCI subtests. The VCI reflects verbal reasoning, verbal conceptualization, and knowledge of facts.

Perceptual Reasoning Index (PRI) is a composite of Block Design, Picture Concepts, and Matrix Reasoning. Picture Completion is a supplemental PRI subtest. The PRI reflects perceptual and nonverbal reasoning, spatial processing abilities, and visual-spatial-motor integration.

Working Memory Index (WMI) is a composite of Digit Span and Letter-Number Sequencing. Arithmetic is a supplemental WMI subtest. The WMI reflects the student's working memory capacity that includes attention, concentration, and mental control.

Processing Speed (PSI) is a composite of Coding and Symbol Search. Cancellation is a supplemental PSI subtest. The PSI reflects the student's ability to quickly process nonverbal material as well as attention and visual-motor coordination.

The WISC-IV and its predecessors are designed for use with children between the ages of 6 and 16 years of age. For early childhood assessment the Wechsler Preschool and Primary Scale of Intelligence, Third Edition (WPPSI-III) is available and is appropriate for children between 2 years 6 months to 7 years 3 months. The Wechsler Adult Intelligence Scale, Third Edition (WAIS-III) is appropriate for individuals between the ages of 16 and 89 years of age.

Stanford-Binet Intelligence Scales, Fifth Edition (SB5). The Stanford-Binet Intelligence Test was the first intelligence test to gain widespread acceptance in the United States. While the Wechsler scales have become the most popular and widely used intelligence tests in schools, the Stanford-Binet scales have continued to have a strong following. As of 2007. the most recent edition of these scales is the SB5 that was released in 2003. The SB5 is designed for use with individuals from 2 to 85 years of age. It contains 10 subtests which are combined to produce five factor indices (i.e., Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial Processing, and Working memory), two domain scores (i.e., Verbal IQ


and Nonverbal IQ), and a Full Scale IQ reflecting overall intellectual ability. A potentially appealing aspect of the SB5 is the availability of an Extended IQ scale that allows the calculation of FSIQs higher than 160, which can be useful in the assessment of extremely gifted individuals.

Woodcock-Johnson III (WJ III) Tests of Cognitive Ability. The WJ III Tests of Cognitive Ability has gained a loyal following and has some unique qualities that warrant mentioning. The battery is designed for use with individuals 2 to 90 years of age. The WJ III Tests of Cognitive Ability is based on the Cattell-Horn-Carroll (CHC) theory of cognitive abilities, which incorporates Cattell's and Horn's Gf-Gc theory and Carroll's three-stratum theory. The CHC model provides a comprehensive model for assessing a broad range of cognitive abilities, and many clinicians like this battery because it measures such a broad range of abilities.

Reynolds Intellectual Assessment Scales (RIAS). The RIAS is a newcomer to the clinician's collection of intelligence tests. It is designed for use with individuals between 3 and 94 years of age and incorporates a co-normed supplemental memory test. One particularly desirable aspect of the RIAS is the ability to obtain a reliable and valid measure of intellectual ability that incorporates both verbal and nonverbal abilities (crystallized and fluid intelligence) in a relatively brief period (i.e., 20–25 minutes). Most other tests that assess verbal and nonverbal cognitive abilities require considerably more time. The supplemental memory tests require about 10 minutes for administration, so a clinician can assess both memory and intelligence in approximately 30 minutes.


In the early decades of intelligence testing, intelligence test scores were expressed as a true quotient, hence the term IQ or intelligence quotient. An IQ was defined as a ratio of the examinees mental age to the examinees chronological age which was then multiplied by 100 to eliminate dealing with fractional scores [(MA/CA)X100]. This form calculation of an IQ has serious psychometric and related measurement problems and has been abandoned for decades although its presentation continues to be common in many introductory psychology and education textbooks. In the early 2000s, IQs are calculated in the form of age corrected deviation scaled scores. These are formal transformations of raw scores (i.e., number of points obtained or items answered correctly) into a standard score format that incorporates the use of the mean and the standard deviation of the raw scores at predetermined age intervals so that the IQ given by the test has the same percentile ranking at each age level, which is not true of the old ratio style IQ. Table 1 presents a common system for ascribing a qualitative descriptor to various score ranges found on most common intelligence tests, nearly all of which (including all of those reviewed above) report IQs using a metric where the mean IQ is equal to 100 and the standard deviation is 15. When accompanied by significant deficits and adaptive behavior and occurring during the developmental period, scores below 70 are commonly associated with varying degrees of mental retardation or intellectual disability, while scores above 130 are often used to designate individuals as being intellectually talented or cognitively gifted.

The scores from intelligence tests are derived from large samples of individuals drawn using what is known as population proportion of stratified random sampling. Because all individuals in the United States cannot be tested, a sample is drawn to represent the entire population. This sample is typically chosen to be representative of the general population of the United States at large on the basis of gender, ethnicity, social economic status or educational level, region of residence within the United States, and community size, including urban and rural areas.

Scores from intelligence tests are interpreted properly only when the standardized instructions for administering and scoring the test have been followed rigidly. Deviations from standardized administration and scoring cause the scores to move up or down for an individual examinee inappropriately and in ways that are unpredictable, rendering the scores uninterpretable (Lee, Reynolds, & Willson, 2003). Intelligence test scores are viewed by some as reflecting innate potential but clearly that is not the case. While innate ability contributes to intelligence test performance, many other variables contribute to performance on ability measures as well.

Intelligence as measured on such tests as described here is a summative construct at any given point that is a reflection not only of a person's innate potential but the interaction of this potential with the entire life experiences of the individual as well as factors such as early stimulation, nutrition, prenatal care, and numerous other variables too extensive to list and discuss here. Proper interpretation of intelligence tests requires knowledge of the examinee's history, background, educational exposure, and generally the context of the examinee's life, especially when clinical diagnoses are being considered. Intelligence tests in the schools are very good predictors of academic achievement, but even this prediction is predicated upon averages among the various examinees. This qualification means that intelligence tests' predictions of future attainment are based on various assumptions about individuals taking such tests. Such assumptions, for example, would include the assumption that a particular examinee is no more motivated to achieve than the average person taking the test, that such an examinee would spend no more and no less time studying in any particular academic area, and would have no more or no less opportunity to acquire information in a particular academic domain. To the extent such assumptions are violated, the predictive schema of the intelligence test score interpretation would not hold.


Horn, J. L., & Cattell, R. B. (1966) Refinement and test of the theory of fluid and crystallized general intelligence. Journal of Educational Psychology, 57, 253–270.

Jensen, A. (1998). These suppressed relationship between IQ and the reaction time slope parameter of the Hick function. Intelligence, 26, 43–52.

Kamphaus, R. W. (2001) Clinical assessment of child and adolescent intelligence, 2nd ed. Boston: Allyn & Bacon.

Lee, D., Reynolds, C. R., Willson, V. L. (2003). Standardized test administration: why bother? Journal of Forensic Neuropsychology, 3, 55–81.

Reynolds, C. R., & Kamphaus, R. W. (2003). Reynolds intellectual assessment scales and Reynolds intellectual screening test: Professional manual. Lutz, FL: Psychological Assessment Resources Inc.

Reynolds, C. R., Livingston, R. A., & Willson, V. L. (2006). Measurement and assessment in the classroom. Boston: Allyn and Bacon.

Wechsler, D. (2003). Wechsler intelligence scale for children, 4th ed. San Antonio, TX: The Psychological Corporation.

Add your own comment