Validity and Scoring

Measuring the Reliability and Validity of CLEP Exams

CLEP uses "rights only" scoring, which means that the exams are scored without a penalty for incorrect guessing. The test-taker’s raw score is simply the number of questions answered correctly. However, this raw score is not reported. Instead, it is converted into a scaled score by a process that adjusts for differences in the difficulty of the questions on the various forms of the test.

The scaled scores are reported on a scale of 20 to 80. Because the different forms of the test are not always exactly equal in difficulty, raw-to-scale score conversions may in some cases differ from form to form. The easier a form is judged to be, the higher the raw score needs to be to attain a given scaled score.

Reliability

The reliability of the test scores of a group of test-takers is commonly described by two statistics: the reliability coefficient and the standard error of measurement.

The reliability coefficient is the correlation between the scores those test-takers get (or would get) on two independent replications of the measurement process. It is intended to indicate the stability of the candidates' test scores if those candidates were to take different forms of the same exam. The reliability coefficient can be interpreted as the correlation between the scores the test-takers would earn on two forms of the test that had no questions in common. Statisticians use an internal-consistency measure to calculate the reliability coefficients for the CLEP exam. This involves looking at the statistical relationships among responses to individual multiple-choice questions to estimate the reliability of the total test score. The formula used is known as "Kuder-Richardson 20," or "KR-20," which is equivalent to a more general formula called "coefficient alpha."

The standard error of measurement shows how much the test-taker’s score might vary over repeated tests. Note that the standard error of measurement is inversely related to the reliability coefficient. If the reliability of the test were 1.00 (a perfect measure of the candidate's knowledge), the standard error of measurement would be zero.

Validity

Validity is a characteristic of a particular use of the test scores of a group of test-takers. If the scores are used to make inferences about the test-takers’ knowledge of a particular subject, the validity of the scores for that purpose is the extent to which those inferences can be trusted to be accurate.

One type of evidence for the validity of test scores is called content-related evidence of validity. It is usually based upon the judgments of a set of experts who evaluate the extent to which the content of the test is appropriate for the inferences to be made about the examinees' knowledge. The CLEP test development committees select the content of the tests to reflect the content of the corresponding courses at most colleges, as determined by a curriculum survey.

Because colleges differ somewhat in the content of the courses they offer, faculty members should, and are urged to, review the content outline and the sample questions to ensure that the test covers core content appropriate to the courses at their colleges.

Another type of evidence for test score validity is called criterion-related evidence of validity. It consists of statistical evidence that test-takers who score high on the test also do well on other measures of the knowledge or skills the test is being used to measure. In the past, criterion-related evidence for the validity of CLEP scores has been provided by studies comparing students' CLEP scores to the grades they received in corresponding classes. Although CLEP no longer conducts these studies, individual colleges using the tests can undertake such studies in their own courses. Learn more about CLEP and ACES, a free College Board service that allows institutions to conduct these studies.