It is not uncommon to receive the results from an examination and see a score that in no way reflects the number of questions you answered correctly (think of the ACT score range of 1-36 or the SAT range of 200-800 per section). The scores that are reported on the ACT or SAT are called “scaled scores.” These are a transformation of the number of questions an individual answered correctly, which is referred to as a “raw score.” For example, the English section of the ACT has 75 questions. However, there are many different versions (or “forms” in psychometric-speak) of the English section. Answering 65 of the 75 questions correctly on a particularly hard form* will earn you a higher scaled score than answering 65 questions correctly on an easier form. Thus, scaled scores are reported to examinees to facilitate interpretation.
In the context of board certification, you might recall the scaled scores from the Written Qualifying Examination (WQE). The WQE reports scores on a scale of 200-1000. The cut score (score needed to pass) is a scaled score of 700. Because each form of the WQE is slightly different in difficulty, the raw score needed to pass is not always the same. However, after transforming the raw scores to scaled scores, the cut score is always 700. Although reporting raw scores might be more straightforward to understand at first, scaled scores are actually better at describing how much mastery was exhibited on an examination.
The Quarterly Questions knowledge-based examination does not report scaled scores for several reasons. For one, equating is not possible because questions are never reused. All Quarterly Questions are new questions (which means they’ve never appeared on a previous form), so the lack of overlap in questions from form to form does not allow any linking across those forms to determine equivalence. Second, given that the assessment provides immediate feedback on one’s performance (i.e., whether an item was answered incorrectly), it is more intuitive for the diplomate to simply count up how many questions s/he has answered correctly. What happens when we report raw scores, however, is that we must account for differences in question difficulty in another way. We handle this by setting a cut score annually for Quarterly Questions that reflects the mastery needed to earn a passing score. In 2018, the cut score was 60% correct. In 2019, the cut score is 65% correct. In every exam you’ve ever taken, the cut scores vary from form to form just like this, but it is usually not transparent to examinees because it is built into the scaled scoring process.
Finally, we are frequently asked whether the ABO examinations are graded “on the curve,” that is, whether a certain percentage of candidates necessarily will pass or fail. The answer is no: because of the methods by which the cut scores are determined, it is theoretically possible for all candidates to pass (or fail) a specific examination. If that were to occur, however, unless the population of candidates was remarkably atypical, it would indicate that the method used to establish the cut score was faulty. Fortunately, but not surprisingly, this has never occurred.
*The process psychometricians use to determine the ease or difficulty of a form is called equating. Equating is a statistical procedure that allows us to see how much harder or easier one form of questions is than another. For more information, browse these links: