When tens of millions of schoolchildren sit down at computers to take new common assessments in spring 2015, many of their peers will be taking similar tests the old-fashioned way, with paper and pencil, raising questions about the comparability of results—as well as educational equity—on an unprecedented scale.
Both state consortia that are designing tests for the Common Core State Standards are building computer-based assessments, but they will offer paper-and-pencil versions as well, as states transition fully to online testing. The Smarter Balanced Assessment Consortium plans to run the two simultaneous “modes” of testing for three years. The Partnership for Assessment of Readiness for College and Careers, or PARCC, will do so for at least one year.
In order to rely on the results, however, the consortia must show that the paper and computer modes of the tests in English/language arts and mathematics measure the same things.
The prospect of establishing such comparability between two versions of a test isn’t new. States have long used established statistical and psychometric practices to do so when they update their paper-and-pencil tests, for instance, or when they transition from paper-based tests to computer assessments. But the challenge before the two consortia ups the ante by hanging the validity of far more children’s test scores on the “linking” or “equating” process conducted by each group.
“In the assessment profession, we need to be able to back up claims we make about students’ and schools’ performance. Any threat to validity is a threat to those interpretations,” said Richard Patz, the chief measurement officer at ACT Inc., which is conducting comparability studies of its own as the Iowa City, Iowa-based company introduces a digital version of its college-entrance exam.
Thorny questions have arisen, too, about whether children who take the paper-and-pencil version of the consortia tests will be at a disadvantage—or perhaps have an edge—compared with their peers who take the computer version.
Could children in high-poverty areas, where technological readiness will likely be lower, lose something valuable by not interacting with the new tests’ technologically enhanced items, such as drawing and drag-and-drop functions? Would they actually benefit by sticking with paper exams if they are more comfortable taking tests in that mode?
Mixed Landscape
Consortia leaders say they are confident that comparability and equity questions will be fully addressed by the time the tests make their debut in 2015.
“It’s something we need to do carefully, and we intend to do it carefully,” said the executive director of the 25-state Smarter Balanced group, Joe Willhoft, who oversaw such studies as the assessment director in Washington state.
Jeffrey Nellhaus, the testing director for PARCC, which includes 18 states and the District of Columbia, said the group’s test designers are “very sensitive” to comparability questions and are planning studies to answer them.
Both of the state testing consortia will include technology-enhanced questions on their computer-based exams, such as this interactive sample item from the Smarter Balanced group. You can also try your hand at interactive sample questions from the Smarter Balanced consortium.
SOURCE: Smarter Balanced Assessment Consortium
About 40 million students attend school in the states that belong to the two consortia. But much is still unknown about how many will take paper tests in 2015 and how many will use a computer. Even rough feedback, however, shows a strong likelihood that large swaths of students will be picking up their No. 2 pencils.
Survey data collected in July by Smarter Balanced—also more of an approximation than a full accounting—show a wide range of technological readiness.
Oregon, long a leader in online assessment, reported that all its districts were capable of giving tests online, while only 45 percent of California’s districts did likewise. For PARCC, Mr. Nellhaus ventured a guess of a 50-50 split, but emphasized that data on districts’ and schools’ readiness are far from complete.
The consortia will not decide who takes the paper-and-pencil version of the test and who takes the computer version, officials said. That will be up to states, and in some cases, individual districts or schools.
Ideally, test results are “indifferent” to the mode in which the test is given, said Henry Braun, a longtime researcher with Princeton, N.J.-based test-maker Educational Testing Service and now an expert in educational evaluation and measurement at Boston College. If the mode of administration helps or hampers some students, the results are distorted, he said.
Differences in Format
Assessment experts say it’s much easier to establish comparability when two tests are similar in format, such as a multiple-choice test on paper that becomes a multiple-choice test on the computer. But even then, comparability issues can arise.
A student who must read a text passage in order to answer a multiple-choice question, for example, might be able to read the entire passage on one page of the paper test, but on the computer, she must scroll up and down to do so. Such shifts can affect the performance of some students, said a longtime assessment expert at a major testing company. (Like most experts interviewed for this story, he agreed to speak only if his name was withheld because of his employer’s contracts with the assessment consortia.)
Comparability challenges deepen when tests differ significantly in format, experts said. In the case of the two state consortia, their computer-based exams—with technology-enhanced items such as interactivity and animation, and longer, more complex performance tasks—will be able to represent ideas in ways that the paper versions cannot, so establishing comparability between the two will be tougher.
“When an assessment has types of items only available in one mode, it creates a greater challenge for establishing comparability, but it’s a familiar one and it’s generally a manageable one,” said ACT’s Mr. Patz.
The other expert, however, said that while the consortia’s comparability challenge is “not a fatal problem, it needs to be thoughtfully negotiated and represented to anyone who will use those test scores.”
That source said it’s not possible to measure everything in the paper-and-pencil version that can be measured in the computer-based version.
“In the technical sense of ‘comparable,’ the two might not be comparable,” he said. “If you were successful in measuring the same things, which would be a stretch if the computer-based version’s items are truly innovative, it could well be the case that one [test] could be harder or easier than the other because of how the items are presented.”
Writing From Scratch
Assessment specialists outlined various ways to establish comparability between the paper and computer versions of a test. One is to use a set of common items in both, so test designers can compare student performance on those items in the two modes. Another is to randomly assign students to take one or the other mode of the test. Better yet, a study group of students can be selected to take both the paper and computer versions. Consortium officials said such methods are being planned or considered for field tests next spring.
Testing experts also said it’s best to create assessment questions from scratch for the paper-based assessment, rather than building paper versions of test items originally designed for the computer.
“You can’t replicate the interactivity of the computer environment on paper,” said one testing expert. “You need to build alternate forms of the test that measure the same standards [on paper].”
Mr. Willhoft from Smarter Balanced said that his group is adapting items written for the online environment to paper. Mr. Nellhaus from PARCC said its developers are writing paper items from scratch to use in place of technology-enhanced items on the computer, but more traditional item types can be used in both modes.
PARCC’s field test next spring will include paper-based as well as computer-based exams, Mr. Nellhaus said. The Smarter Balanced field test will include paper forms only for a small group of students, to study comparability, Mr. Willhoft said. “There’s no denying that there will be some items that will be difficult to translate into the paper environment,” said Mr. Willhoft. One of the consortium’s math items, for instance, asks students to click on images of a cylindrical shape and a rectangular one in an exercise about volume. “But there’s nothing inherent in a given standard that requires a certain kind of interactive item,” he said. “You can measure the same standard in different ways.”
Smarter Balanced faces an extra layer of complexity in comparability because its test is computer-adaptive, meaning it adjusts questions to the test-taker’s skill level.
“With an adaptive test, you see right away what questions a kid needs,” said Lauress L. Wise, a principal scientist with the Monterey, Calif.-based Human Resources Research Organization, which has performed quality assurance and evaluation on testing systems such as the National Assessment of Educational Progress. “With paper and pencil, you’d have to offer a lot more questions—a longer test—to make it comparable to that. If you can’t do that, you won’t be measuring the end points [of achievement] as well.”
Mr. Willhoft acknowledged that the paper version of the Smarter Balanced test will be “less precise, with a larger measurement error” at those points in the spectrum.
In seeking comparability, a key consideration is what kinds of conclusions will be drawn from the scores on the two types of tests, said Mr. Wise. The degree of comparability takes on added significance when high-stakes decisions are based on the results, he said.
“If this were a graduation test, and some kids were getting denied diplomas because they took one form or another, you could make a plausible argument why there could be a lawsuit,” Mr. Wise said. “That could get sticky.”
Quality of Tasks
The fact that paper-and-pencil tests might be more widely used in lower-income areas is something that officials at the Education Trust, which advocates school improvement for disadvantaged students, are keeping an eye on. But those potential questions of equity revolve more around the quality of the assessment—and the teaching that goes with it—than about the mode of the test, they say.
Christina Theokas, the organization’s director of research, said she worries that if the paper test is less complex and instructionally rich than the computer version, classroom instruction could mirror that.
But students aren’t necessarily at a disadvantage just by taking a paper-and-pencil test, said Sonja Brookins Santelises, the Education Trust’s vice president of K-12 policy and practice. Top-notch paper tests such as NAEP and Massachusetts’ statewide exams demonstrate that, she said. The important thing to watch is not the mode in which a test is administered, Ms. Santelises said, but “the quality of the task” and how well students are prepared for it.
“You can do a rudimentary task on a computer and have it not be beneficial, and you can have a paper-and-pencil task that’s instructionally rigorous and very beneficial,” she said. “Are students going to have access to the kind of experiences and curriculum that prepare them for those kinds of tasks? Are teachers being prepared and supported to do that?”
Ms. Santelises added: “We need to stay focused on the teaching and learning, rather than on whether we have the right technology to give a test.”
Take the test: Try your hand at interactive sample questions from the Smarter Balanced consortium.