Guest post by John Thompson.
Most states accepted the offer they couldn’t refuse and adopted laws based of the Gates Foundation’s untested “teacher quality” theory. Two recent responses to that top-down mandate are illuminating. The more literary statement was issued by economist Tom Kane, the head of the Gates Measures for Effective Teaching (MET) project. A scholarly analysis, “Instructional Alignment as a Measure of Teaching Quality,” was published by Morgan Polikoff and Andrew Porter.
In contrast to Kane’s eccentric diagnosis of education’s shortcomings, Polikoff’s and Porter’s peer reviewed study is one of many social science studies that are giving a fuller picture of the many problems with the Gates’ and Kane’s policy gamble.
As he often does, Kane begins will a clueless metaphor. He said that previous, incremental school reforms were like “shooting a bottle rocket to the moon.” Because educators have never been empowered in ways that would produce transformational change, Kane believes that he and other corporate reformers should take over and design an educational Race to the Moon.
A better metaphor for the Kane blueprint could be borrowed from the 1950s television show, “The Honeymooners,” as when Ralph Kramden proclaims, “Alice, Straight to the Moon!” Like Jackie Gleason, the corporate response to dissent is “BANG, ZOOM!,” and a punch to the kisser. Because teachers, armed with bottle rockets, did not produce the education equivalent of a Moon landing in a decade, Kane demands that high-stakes testing be used as a club to get our attention and to propel our schools into outer space.
Kane ignores the obvious flaw in his logic. When NASA planned a flight to the moon, he argues, they realized that “incremental improvements in rocket design would not provide enough lift” to get astronauts into orbit. But Kane ignores what it would take to lay the educational foundations for such a grand effort. He’s silent on the policies - such as high-quality early education and a well-designed campaign to teach reading for comprehension by 3rd grade - that have the potential for propelling our schools into new levels of excellence. He forgets about what it takes to launch a rocket, as he seeks to be the Ground Control inventor of systems that he pretends are already capable of flight.
Then, Kane’s hypotheses get downright weird. He acknowledges that each of the MET policies that he promotes is under-powered. Firing bad teachers, better professional development, and better standards of instruction each get anemic results. They will only work as a package. So, educators have to accept them all!
In other words, Kane selected the wrong metaphor. He’s like the old-time merchant who loses money on everything he sells - but makes it up on volume!
In contrast, as explained by Education Week’s Holly Yettick, Polikoff and Porter:
Found no association between value-added results and other widely accepted measures of teaching quality, such as the degree to which instruction is aligned with state standards or the contents of assessments. Nor did the study find associations between "multiple measure" ratings, which combine value-added measures with observations and other factors.
The study also raises questions about the cornerstone of reformers’ theories by casting doubt on whether a focus on standards is the proper lever for improving instruction. That issue deserves a full analysis beyond the scope of this post.
To the Gates Foundation’s credit, it supported the Polikoff and Porter study with a $125,000 grant. The research drew on a subset of MET data from 327 4th and 8th grade teachers in six school districts.
Yettick reports that “Mr. Polikoff suggested that the study’s findings could represent something like the worst-case scenario for the correlation between value-added scores and other measures of instructional quality.” Proponents of the Gates-favored evaluations should be alarmed. Its correlations are “zero or near-zero.”
The MET, for reasons that still escape me, investigated whether, on the average, there are correlations between value-added and other indicators of effective teaching. But, the policies spawned by the MET seek to fire individual teachers for so-called “ineffectiveness.” (Emphasis mine) Nowhere does the MET address the policy issue of whether its estimate of ineffectiveness is valid for determining whether the individual is a good teacher or not.
Polikoff and Porter use a MET subsample and methodology to drill down to the policy questions that should have been addressed before risky new teacher evaluations experiments were imposed. They find, “In some places, [value-added measures] and observational scores will be correlated, and in some places they won’t.” He wrote in an email, “These correlations will depend on things like the content and quality of the test, the type of [measures] used, the properties of the observation system ...”
Yettick then summarizes other peer-refereed studies that reach similar conclusions. Douglass Harris, in the American Educational Research Journal, reports that principals’ ratings and their value-added ratings of teachers were “only weakly correlated.” Harris formulates an insightful metaphor to explain why “it is no surprise” the observations and value-added reach different outcomes, “You can think of principal evaluations and value-added as measuring two different elements of ‘quality instruction’ in the same way that temperature and humidity are two key elements of quality weather.”
Also in the American Educational Research Journal, Noelle A. Paufler and Audrey Amrein-Beardsley explain why value-added models fall short, even when they try to account for nonrandom classroom assignment. Real world, in contrast to the rarified experimental conditions in the MET process, supporters of value-added “typically fail to consider behavior, personality, and other factors that profoundly influenced the classroom-assignment decisions.” That weakness can bias value-added results.
Another forthcoming study in the peer-refereed Economics of Education Review, led by Brigham Young’s Scott Condie, draws on reading and math scores from a massive sample of 1.3 million students over six years. Condie et. al “found that between 15 percent and 25 percent of teachers misranked.” Moreover, it sounds like the study documents a dynamic where, metaphorically, MET supporters would evaluate (and fire) teachers based on humidity metrics when practitioners would prefer to use the temperature, or vica versa. Condie explains, "[A] teacher who is fired may generate more social value than one who is retained.”
Yettick also reviews the work of Marianne P. Bitler of the University of California. Bitler et. al discover an even better metaphor. Holding teachers accountable for test score growth, as estimated by value-added models, is like holding educators accountability about how much taller their students grow during the school year. They “found that teachers’ one-year ‘effects’ on student height were nearly as large as their effects on reading and math.”
So it is up to the Gates Foundation to decide whether it defends its quest for a better bottle rocket to drive school reform, or to heed the research of independent scholars. It can continue to remain supportive of Tom Kane’s continuing search for a metaphor to justify their overreach, or start to consider evidence-driven school improvement policies.
What do you think? When Gates-funded, peer reviewed research contradicts the opinions the opinions it favors, will they adjust? Will the Gates Foundation explicitly support the efforts of educators to back off from value-added evaluations? And which side has the better metaphors?
John Thompson was an award winning historian, with a doctorate from Rutgers, and a legislative lobbyist when crack and gangs hit his neighborhood, and he became an inner city teacher. He blogs for This Week in Education, the Huffington Post and other sites. After 18 years in the classroom, he is writing his book, Getting Schooled: Battles Inside and Outside the Urban Classroom.