Tests can be categorized into two major groups: norm-referenced tests and criterion-referenced tests. These two tests differ in their intended purposes, the way in which content is selected, and the scoring process which defines how the test results must be interpreted. A criterion-referenced assessment has set criteria to be achieved and therefore the pass-fail aspect of the assessment is the most important aspect. The driving test is a good example of a criterion-referenced assessment.
In theory all students could pass the assessment or alternatively all students could fail the assessment. Criterion-referenced tests document individual performance in relation to a domain of information or specific set of skills. Criterion-referenced tests are related directly to instructional objectives, are based on task analysis, and are designed to measure changes in successive performances of an individual. Criterion-referenced tests, therefore, are sensitive to and can be used to measure the effects of instruction.
Criterion-referenced tests (CRTs) determine “… what test takers can do and what they know, not how they compare to others (Anastasi, 1988). CRTs report how well students are doing relative to a pre-determined performance level on a specified set of educational goals or outcomes included in the school, district, or state curriculum. Educators or policy makers may choose to use a CRT when they wish to see how well students have learned the knowledge and skills which they are expected to have mastered.
This information may be used as one piece of information to determine how well the student is learning the desired curriculum and how well the school is teaching that curriculum. The Brigance Diagnostic Inventory of Early Development can be used with children whose developmental ages range from birth to 7 years. 10 This criterion-referenced test is based on items from norm-referenced tests, and its purpose is to assess general development as a guide to subsequent instruction.
Developed as a response to Public Law 94-142 for children with developmental handicaps, it is used appropriately for programming, but not as a diagnostic or placement test. A second criterion-referenced test is the Milani-Comparetti Motor Development Screening Test. 11 This test is a neurodevelopmental examination of children from 0 to 2 years of age with tasks designed to measure developmental reflexes and motor skill development. Although its purpose is to determine “whether one child’s motor development corresponds to that of a normal child,”11 normative data are unavailable.
The major reason for using a norm-referenced tests (NRT) is to classify students. NRTs are designed to highlight achievement differences between and among students to produce a dependable rank order of students across a continuum of achievement from high achievers to low achievers (Stiggins, 1994). School systems might want to classify students in this way so that they can be properly placed in remedial or gifted programs. These types of tests are also used to help teachers select students for different ability level reading or mathematics instructional groups.
A norm-referenced assessment expresses the candidates’ scores in rank order, based on a distribution of scores. It is comparative, telling us that one student is better than another student. Normal distributions curves are often associated with norm-referenced assessment. Norm-referenced tests are designed to examine individual performance in relation to the performance of a representative group. Norm-referenced tests generally are not related to instructional objectives, do not use task analysis, and are designed to delineate differences among individuals.
Norm-referenced tests, therefore, are not sensitive to and should not be used to evaluate the effects of instruction. Two examples of norm-referenced tests for young children are the Bayley Scales of Infant Development6 and the Gesell developmental scales. 7 The Bayley scales were standardized on 1,400 children between 1 and 15 months of age and 160 children between 18 and 30 months of age. The main purpose of the Bayley scales is to establish current developmental status (motor and mental scales) to identify problems and the need for intervention.
The Gesell scales were administered to groups of children between 4 weeks and 36 months of age with the purpose of identifying even minor deviations in the areas of adaptive, gross and fine motor, language, and personal and social development. Tests such as the California Achievement Test (CTB/McGraw-Hill), the Iowa Test of Basic Skills (Riverside), and the Metropolitan Achievement Test (Psychological Corporation) are normed using a national sample of students.
Because norming a test is such an elaborate and expensive process, the norms are typically used by test publishers for 7 years. All students who take the test during that seven year period have their scores compared to the original norm group. The following table highlights the differences between norm and criterion-referenced assessments: The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.