Judging by state tests, school accountability systems are a success. In most states, test scores are going up. And such gains confirm, for the proponents of accountability, that the systems are working. Critics, however, point out that there are ways to raise test scores without improving student learning.
Meanwhile, and despite the shifting meanings that test scores may hold, low-performing schools in high-stakes systems continue to feel the stigma of failure, while their high-performing counterparts are held up as exemplary.
But are schools measured as high-performing by their accountability systems actually better schools? And could others learn from them what to do better?
My colleague Tina Trujillo and I wanted to know the answer. We sampled a number of California schools from both the top and bottom of the performance spectrum, controlling for demographics, and ended up with nine urban middle schools for our study. The differences between our high and low groups on the state performance indicator, viewed in the context of the state as a whole, amounted to about five years of growth. Such score differences ought to be tangible in the life and quality of schools, we surmised, if accountability measures are valid and relevant for school improvement.
For our study, we imagined educators from the low-performing schools traveling to the top-performing exemplars. What would they investigate during their visits? They’d check the schools’ orderliness; find out if students felt safe, cared for, engaged with learning, and challenged; observe teachers to see about time on task, instructional formats, and the cognitive complexity and tone of instruction; and sample student writing as to mechanics and content. They also would be interested in faculty cohesion, teachers’ and administrators’ sense of responsibility, innovativeness, strength of leadership, and improvement strategies. And they would want to know whether the system itself was important and meaningful for the teachers. Then we set about to translate our imagined travelers’ inquisitiveness into systematic research with robust survey, observation, and evaluation tools—and we controlled for biases.
As accountability systems take hold of educators’ minds and structure our practices, some of us have gotten into the habit of using higher performance scores as shorthand for higher school quality and more-successful improvement. But could we really make this connection if we didn’t know a school’s performance status? If we could do that, we’d feel better about the system’s validity and relevance for school improvement.
The idea behind current accountability systems is one of beautiful simplicity: Select a few key performance measures and enforce them with vigor, and all else will fall into place. ... Our study suggests otherwise.
To keep biases in check, we determined that our imagined practitioners would travel to schools without knowing their test scores; in other words, the researchers conducted all of their analyses blind. We constructed a school-quality profile with 56 non-test-based measures for each school. Using these measures, two independent raters judged whether a given school was in the top or bottom performance group, and we conducted statistical comparisons, all with concealed test scores.
Our findings were surprising and instructive. To sum up, we were unable to correctly classify a sufficient number of schools in their respective test-based performance groups. Student surveys told us that, across all schools, regardless of the wide gaps in test-based performance, students felt safe, but only mildly challenged and engaged with learning. Schools in both the top and bottom groups were also quite similar in the quality of their observed instruction. Scores on student writing samples were slightly higher in the top group, but the difference was not statistically significant. Alas, if our imagined travelers had expected to encounter visible signs of an overall higher quality in the high-performing schools, they would have looked in vain.
One school in the high-performing group, however, did stand out with high test scores, higher lesson quality, and more-effective adult relationships—yet not with deeper student engagement. This was, in fact, one of the fastest-growing middle schools in the state for its demographic profile. It had strictly aligned its curriculum to state assessments, abandoned nonacademic subjects, and folded social studies and science into language arts and math. Students below grade level were given extra periods in a remedial literacy program. The school embraced the accountability system and used data to carefully track remedial needs. We encountered no direct test drilling there, and the adults were earnest, responsible, and oriented toward social justice.
Contrast this touted high-performing school with one of the lowest-growth schools in the study. Despite a formidable difference in test scores, the school in the bottom group received remarkably similar ratings on instructional quality from the blind raters. Here instruction was lively and complex, but test scores were depressed. The school ignored and rejected the accountability system passionately as unworthy of professionals.
This school was an exception in that regard. The others in the study, top or bottom, pursued school improvement much as the high-testing school did. Tightening up, curricular alignment, more literacy remediation, and de-emphasizing nontest subjects were the most prevalent activities. But in no instance, we found, was better implementation of such strategies reflected in better instruction or more engaged or challenged students.
In the successful schools, teachers did manage to improve on standardized-test scores. Our observations suggest that these schools were committed to a highly focused coverage of standards-aligned materials within highly structured literacy and language arts programs taught in differentiated learning groups. Thus, our travelers seeking ways to improve their schools would have had to settle on a much narrower definition of quality, one that homes in on attitudes and behaviors that are quite proximate to the effective acquisition of standards-aligned and test-relevant knowledge, but that may go beyond mere drilling for the test.
Nine schools, however carefully selected and studied, are not a sufficient number of cases for making sweeping statements, to be sure. But if the pattern we detected among our nine schools were more widespread, we would have to rethink school accountability. Raising test scores is not a trivial challenge for many schools. Our highest-performing ones showed us how much hard work goes into it. But if we want to encourage educators to think more intensively about students’ joy of learning and teachers’ instructional practices, we have to find ways to move beyond a narrow agenda of alignment and standardization. Given that schools these days are fundamentally driven by external assessments, we would have to start by constructing assessment systems with different incentives and indicators that train the lens on what we value in education beyond test scores. And we would have to legitimize these measures on an equal footing with test scores to give schools room to explore and develop.
The idea behind current accountability systems is one of beautiful simplicity: Select a few key performance measures and enforce them with vigor, and all else will fall into place. It seems logical. If schools want to improve student test performance, will they not, sooner or later, shift their attention to instructional quality, student motivation, and all the other intangibles that make up the quality of the pedagogical relationship?
Our study suggests otherwise. Both high- and low-performing schools get stuck in a mode of school improvement that searches for the most direct connections among content, teaching, and testing. Students’ motivation for learning, as well as instructional quality, fade from view beyond the “rigorous” alignment and a “razor sharp” focus on material that needs to be re-taught.
Concentrating on a state test for the purpose of system monitoring makes sense, but bringing student engagement and teachers’ instructional practices to center stage requires more-complex performance profiles for schools. How to meld these more intricate profiles with the current architecture of test-based and sanctions-driven accountability will be next on the agenda of urban school reform.
The study on which this essay is based was published in the December 2007 issue of Educational Evaluation and Policy Analysis.