top of page

The Science of Testing: Why AIG is No Substitute for SMEs


Artificial intelligence (AI) is one of the hottest topics in technology today. (Just ask Siri or Alexa.) But what if we could ask these virtual assistants to order our groceries, tell us the weather, and write examination content? It sounds a bit farfetched, but Automatic Item Generation (AIG), a major innovation in the science of test development, could change the way assessment organizations like certification boards develop multiple-choice questions.

Multiple-choice questions (MCQs) are a favored item type for knowledge assessment, particularly in medical education and board certification examinations. MCQs help organizations assess both breadth and depth of knowledge within a specialty, such as ophthalmology. As compared to other item types, MCQs deliver a high degree of objectivity and efficiency; however, developing a “good question” requires significant time, care, and expense.

In a typical item writing process, a human subject matter expert (SME) writes one question at a time based on a given topic. But AIG introduces item templates and computer algorithms into the process, enabling the computer to generate hundreds of question permutations and possibilities all at once. In addition to saving time, studies suggest AIG may improve content quality, reduce human error, and lower the overall cost of test development.

But even the best algorithm is no substitute for the judgment and expertise of human SMEs like board-certified ophthalmologists. AIG-developed questions require thorough evaluation by humans who can assess the quality and appropriateness of each computer-generated result. While AIG is not in wide use today and the ABO does not presently employ this technology in our item writing program, it presents exciting possibilities for the future of test development.

Recent Posts

See All
bottom of page