A new paper from an assessment expert cautions the state consortia that are designing common tests that while computer-based and computer-adaptive testing offer the possibility of improving assessment, getting them right presents significant challenges.
The paper by Mark D. Reckase, a professor of measurement and quantitative methods at Michigan State University, is aimed at the two big groups of states that are using Race to the Top money to design tests for the new common standards. It’s the latest of many papers that have been issued from the testing and policy worlds since the consortia began their work.
Reckase’s paper is one of three issued today by the Rennie Center for Education Research & Policy, and Policy Analysis for California Education, or PACE. The other two deal with tests for English-language learners and tests for science. You can see all of them at PACE’s website, collected under the title, “The Road Ahead for State Assessments.”
Reckase offers an accessible discussion of computer-based and computer-adaptive testing for regular folk, and places the current assessment work in the context of the changes in how testing is used. He details the advantages of each, including, for computer-adaptive, dispensing with the need to try to measure wide ranges of student ability with only the set of test items printed in a booklet.
But key challenges accompany the endeavor to move from paper-and-pencil to computer. For one, computer-adaptive testing (in which items are geared to the student) requires a larger and better-designed pool of test items than does traditional testing, he said. High-stakes tests will require a larger pool of items—likely 1,600 or more—than low-stakes tests, which might require closer to 200, Reckase writes. And test items in the pool have to be administered to “several hundred to several thousand” students in the design phase to determine exactly how the items function before the test is finalized, a process that could take up to a year, he says.
Another of the difficulties of computer-based testing is getting access to seat time in front of computers for massive numbers of students. In a state with 100,000 students in a given grade level, Reckase notes, a three-hour test will require 400,000 hours of seat time in front of a computer, including time for administrative issues. Such questions, combined with how many students a day can be tested, and how wide the testing window can be, pose a “difficult design issue” for states, he says. Having students use their own computers raises a whole host of other issues, including test security.
“Resolving the issues of computer access is one of the most challenging parts of implementing [computer-adaptive testing],” Reckase says in the paper.
The complex software needed to manage the computer-adaptive testing has its own challenges, he says. Off-the-shelf or custom-designed, it’s expensive. But he cautioned states against taking the cheap route: “Forcing the implementation of a [computer-adaptive assessment] to fit existing software will likely result in serious problems later in the implementation,” he writes.
Reckase also cautions against running parallel computer-based and paper-and-pencil testing systems, saying comparability of results is questionable.
He notes that such cautions can be “very discouraging” to test designers, but tries to perk people up by discussing several large-scale computer-based testing systems that he considers successful, including the Armed Services Vocational Aptitude Battery (ASVAB), and two tests used for college-course-placement purposes, the ACT’s COMPASS, and College Board’s Accuplacer.
In the end, Reckase muses about the possibility of doing away with stand-alone testing altogether: “If all of students’ day-to-day classroom work can be collected through computer systems and scored using intelligent evaluation software, the need to have a separate, stand-alone [computer-adaptive assessment] is no longer present,” he wrote. “As the development of [such] systems progresses, the goal of many educators—to have instruction and assessment be one and the same—can be met.”