State Test Results Are In. Are They Useless?

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Educators have been bracing for them, and now they’re here: the first state test results since COVID-19 interrupted K-12 schooling. Districts, states, and schools are poring over the data from spring 2021 tests, hoping to understand exactly how—and how badly—the pandemic affected children’s learning.

But even though educators are hungry for insight, assessment experts are urging caution. This year, more than any in recent memory, calls for extreme care and restraint when analyzing statewide test scores, drawing conclusions, and taking action, they say.

Like schooling itself, standardized testing was deeply disrupted in many ways last spring, which may have distorted the meaning and utility of the results. In some cases, state test data will be virtually useless, the experts say. In others, with thoughtful analysis, the data can yield insights that could help leaders and educators allocate resources and help children rebuild academic muscle.

Here are some key considerations—and important cautions—for state, district, and school leaders, and teachers, to bear in mind as they review state test scores.

A lot happened with state tests in 2021 that could affect the results

In 2020, the U.S. Department of Education allowed states to skip federally required assessments. In 2021, however, states had to administer those tests. But that doesn’t mean it was business as usual.

In a handful of states, some students took tests remotely, while others took them in person. Massachusetts, for instance, allowed students in grades 3-8 to take remote tests if their schools were in remote learning mode, and more than 15 percent of those students did so.

Some states made other changes to their testing regimens. A few gave shortened versions of their tests. Colorado gave its English/language arts test only in grades 3, 5, and 7 and its math test only in grades 4, 6, and 8. In California, some districts gave the Smarter Balanced test, and others used assessments of their choosing.

Many states saw fewer students take the test than usual, though, and that is the factor poised to exert the most widespread influence on the validity and comparability of state test data. According to the Center for Reinventing Public Education, which has been monitoring states’ responses to COVID-19, of the 30 states that have released test results so far, only 14 reported test-participation rates of 90 percent or more.

Some states reported participation rates as low as 10 percent (New Mexico) and 30 percent (Oregon). Participation also varied markedly within states: Colorado reported regional participation rates ranging from 51 percent to 88 percent.

A number of factors fueled low participation rates, including many parents who chose not to send their children into school buildings simply to take a test. And schools likely felt less pressure to insist that students show up for testing, since the Education Department waived its accountability rules that normally penalize schools for testing fewer than 95 percent of their students.

“There was a wide variety in the ways testing played out,” said Terra Wallin, who advised the Education Department on assessment and accountability from 2014 to 2017 and now oversees those issues for Education Trust, a civil rights advocacy group. “There are still ways states could look at general patterns [in test-score data], do a higher-level examination, to help them think about how best to use federal funding for recovery, but they need to proceed with caution.”

Ask key questions before deciding how to use the data

Experts say it’s important to ask three crucial questions about your state test data.

Did any of our students take the test remotely? If so, those scores shouldn’t be viewed as comparable to the scores of students who took it in person. That “mode effect” is a key tenet of assessment: Whether a student takes a test online or with paper and pencil can influence the results.

Did we use the same test as in 2019? If you switched tests, or changed the length or frequency of your test, a detailed expert analysis could be needed to confirm the validity of the 2021 results—were there enough questions in each strand of the academic standards, for instance, to generate a valid score?—and to establish that those results can be compared with 2019 results.

How many of our students—and which ones—took the test? This “participation rate,” experts say, is very important in understanding what state tests say—or can’t say—about student learning. They urge educators to dig deeper than the overall state or district participation rate and find out who took the test and who didn’t.

Imagine that an analysis shows that the students who skipped the test were disproportionately those who scored low in previous years. That would skew test results artificially high, and stalled progress might appear less severe than it actually is.

That isn’t just speculation, either. It’s likely that remote learners account for many missed tests and it became increasingly apparent during the pandemic that low-income, Black, and Latino students were far likelier to be learning remotely than other students. And emerging multistate research on state test results is finding that COVID’s impact on learning isn’t concentrated just in elementary schools, or among traditionally low-performing students, as early analyses of interim tests suggested; it’s broader, affecting students at all grades and achievement levels.

Enrollment declines, widely documented in many grades, can also play havoc with sound interpretations of test scores. Again, it’s important to understand the academic and demographic profiles of who stopped coming to school, experts say.

“If you aren’t paying attention to how the population is changing, you’re misinterpreting your scores,” said Andrew Ho, a Harvard University professor of education who focuses on assessment. He urges state leaders to perform a three-dimensional analysis of their test scores to ensure valid comparisons. This is done by separately comparing each group—the students who took tests and those who didn’t—only to groups who performed similarly in the past.

“We’ve just got to avoid a naïve analysis” of 2021 test-score data, said Derek Briggs, a University of Colorado professor who leads the National Council on Measurement in Education, whose members design and study K-12 assessments.

“The danger here is that we report 2021 scores as observed in 2021, without doing any other analysis. People want to compare them to 2019, and they’re going to interpret the difference as the effect of COVID.” But the pool of students who took the tests in 2021 changed, and that requires deeper analysis than in other years, he said.

Briggs is worried that districts and states won’t take the shifting test pool into account, and they’ll take reassurance from a falsely rosy picture. That’s a particular danger in any state or district where fewer than 90 percent of students took the test, he said. Smaller margins of missing students means less of a chance those missing scores affect overall results.

Participation rates below 50 percent would make it tough to draw any meaningful conclusions from test results, said Marianne Perie, the president of Measurement in Practice, which advises states on test design and use.

Sean Reardon, who leads a Stanford University project that analyzes the links between test scores and children’s learning opportunities, said the insight into learning offered by last spring’s test scores is very limited because of all the factors influencing the scores.

“If you had a random sample of kids [in the testing pool], then that would be fine,” he said. “But testing in 2021 wasn’t random. Kids and families chose whether they took the test. Unless you have a lot of information to support a claim of comparability, I think the default assumption for 2021 is that they’re not comparable [to 2019 test scores]. I wouldn’t draw too many conclusions based on them and I’d use a lot of caveats.”

Consider ways to get insight into motivation and learning conditions

Ellen Forte, the chief executive officer and chief scientist at edCount, which advises states and districts on testing, said educators should bear in mind that millions of students, anxiety-riddled during COVID-19, were likely less motivated to do well on tests. Given that distortion, and the fact that state tests are not designed to yield highly detailed pictures of students’ achievement, she wouldn’t want to see students’ test scores used to make instructional decisions.

“Remember, these tests were designed for accountability,” Forte said. “The unit of focus should be the school, district, or state. Not the student.”

It also would behoove educators to understand more about the conditions in which students were learning, said Scott Marion, the executive director of the Center for Assessment, a consultant to states on testing. The organization has helped several states create student surveys that asked about things like their access to livestreamed instruction and how much they’d learned compared with the previous year. Teachers were asked, among other things, whether they’d been adequately supported with good professional development during the pandemic.

In a year like 2021, “I think it’s important,” Marion said. If a child tested in 2021 under conditions similar to 2019, educators can probably make sound—and very general—inferences about whether she gained or lost ground in those two years, Marion said. But what’s missing is the “why.” Gathering other data, from surveys, teacher observations, formative strategies, and interim assessments embedded in good curriculum, can shed light on “why my kids did poorly and what I might need to do differently,” he said.

Takeaway message: Multiple sources of data are more important than ever

Most experts consulted for this story agreed that with the right kinds of analyses, states can probably glean valuable information about patterns of low achievement so they can provide appropriate supports. They urged districts to press their states for detailed information and analysis to guide similar decisions at the district level.

In the classroom, though, experts differed on the role state test data should play in guiding instructional decisions for groups or individual students. Perie of Measurement in Practice said she wouldn’t want to see scores used for high-stakes decisions like grade promotion but thinks they could help teachers create flexible groupings in math or reading or dive more deeply into strands where class scores seemed weak.

Even better, Perie and other experts said, would be to blend test-score information with a portfolio of other data from formative or diagnostic tests, reports from students’ previous teachers, and other sources. This year, “you’ve got to triangulate, leveraging other measures like you never have before,” Harvard’s Ho said.

Superintendents understand this, said Dan Domenech, the executive director of AASA, the School Superintendents Association. They know it’s “critical to ascertain how much loss has taken place so they know where to begin,” but they recognize that standardized tests, while valuable, provide only “a general overview.” Accordingly, teachers will rely heavily on quizzes and other formative strategies to understand what their students need, he said.

Catherine Gewertz

Senior Contributing Writer, Education Week

Catherine Gewertz was a writer for Education Week who covered national news and features.