Twenty years ago, high-stakes testing was enjoying a renaissance as a convenient way for schools to comply with the newly passed federal No Child Left Behind Act. Today, only eight states still make passing comprehensive, statewide tests a requirement for high school graduation. At least one, New York, is moving away from the practice, and in November, voters in Massachusetts will decide whether they want to do the same.
The remaining states—Florida, Louisiana, Maryland, New Jersey, Texas, and Virginia—should follow suit and not just because of the avalanche of contemporary evidence against the wisdom of high-stakes tests. The little-known origins of such testing add to the case against it: High-stakes tests were never supposed to have the role they do in schools in the first place.
Beginning in the 1890s, California, Pennsylvania, New Jersey, New York, and other states began allowing psychologists to conduct limited standardized intelligence and achievement tests on public school students, but this work was largely experimental. That changed in 1919, when Massachusetts became the first state to mandate a form of high-stakes school testing.
A revolutionary law directed districts to evaluate any students three or more years behind in their coursework. A traveling clinic of psychologists and social workers went from school to school looking at 10 factors, from students’ physical health to the conditions in their homes. They used these to provide individualized assessments and recommendations for each student. The last of the 10 factors was the mother of all high-stakes tests—a new exam called the Stanford-Binet, known today as the IQ test.
The new law upended outmoded ways that schools operated. Its successes and failures also shaped the imperfect reality we see today where even states that lead in education (for example, Massachusetts) are failing their most vulnerable students—especially those with disabilities—and high-stakes testing has done little to help them. Why? In part, because we use these tests in the wrong ways, ways that the person behind the first testing law never intended.
The 10-point scheme that included the IQ test was developed by Dr. Walter E. Fernald, a founding faculty member of the Harvard Graduate School of Education and the world’s leading figure on intellectual and developmental disabilities at the time. Fernald worried that disabled students were not getting supports they needed in their community’s schools, which emboldened people to think of state institutions like the one he ran in Waltham, Mass., as dumping grounds for purportedly uneducable children.
High-stakes tests were never supposed to have the role they do in schools in the first place.
Fernald was an early critic of the IQ test, but he increasingly believed it had a place among the nine other areas for assessing child development. Specifically, he intended it to be used as a quick check of the other nine observations in case something was significantly off base, in which case it was supposed to be a corrective, not for the student but for the tester.
Yet within a few short years of his death in 1924, the IQ test became the dominant method for looking at student success. Used alone, it was relatively inexpensive, and a cult of true believers propped it up with false claims of its scientific accuracy.
Over the next half century, testing sprawled far from its original, narrow use, seeping into nearly every way we think about performance and accountability in education. In the 1990s, influential education reform advocates expressed support for high-stakes testing as a requirement for school graduation and much more. With the passage of No Child Left Behind, states had an excuse to do just what they had done 75 years before. Looking for a quick and easy way to implement the law’s requirements for evaluation of student learning, they broadly embraced high-stakes testing.
Thankfully, many states realized the limitations of this approach. But in the places where testing remains a requirement for graduation, we can see its rippling back into the classroom in corrosive ways—narrowing the curriculum, limiting teachers’ ability to individualize their teaching, and setting inappropriate goals from on high.
It also reduces learning to an annual spectacle where the release of test scores serves as a reminder that schools are largely failing to address racial, disability, and wealth-based disparities and even to get most students past the benchmarks for “proficient.” Armed with the findings, business interests and billionaire-backed school choice crusaders attack the idea of public education. Teachers—who in Massachusetts, for instance, have been clear that high-stakes testing robs resources from individualized education—go unheeded.
Today, a wealth of rich and exciting educational approaches demonstrate an even bigger point. Testing systems are not the only or best way to think about whether students are ready to graduate. For instance, New York’s post-high-stakes testing vision includes ways for students to demonstrate proficiency in subjects through career and technical education training programs. In Colorado, successful completion of well-designed projects counts toward graduation.
As Fernald recognized a century ago, an array of tools is needed when we evaluate students’ preparedness to advance through their education, and a community of people needs to come together to use those tools with each student. It is high time that states return to that ideal through assessments of student coursework, backed by district-set standards, and evaluated by the teachers who know the children in their classrooms. It is the modest, reasonable, and right thing to do.