Teachers who embodied the teaching skills outlined in certain popular teaching frameworks tended to help their students learn more, concludes a new study released by the Bill & Melinda Gates Foundation.
The report is the second major release from the foundation’s Measures of Effective Teaching project, which seeks to identify the best gauges of effective teaching. Ultimately, the Seattle-based foundation plans to devise a prototype teacher-effectiveness measure based on the findings.
Among other implications, the study’s results suggest that observations of teaching practice hold promise for being integrated into teacher-evaluation systems—if observers are carefully trained to ensure consistent application of the frameworks over multiple observations. Also, the study indicates that the gauges that appear to make the most finely grained distinctions of teacher performance are those that incorporate many different types of information, not those that are exclusively based on test scores.
“I was surprised at how aligned all the measures were,” said Douglas O. Staiger, a professor of economics at Dartmouth College, in Hanover, N.H., and one of the principal researchers on the study. “They seem to pick out teachers who are good on a range of dimensions, and I think that says there really is something kind of coherent about good teaching.”
Other studies have also linked teaching frameworks to student growth. But the breadth of measures studied, the number of districts and teachers included, and the focus on reliability in the Gates Foundation analysis give a richer picture of how stakeholders might create evaluation systems based on multiple measures, said Douglas N. Harris, an associate professor of education policy and public affairs at the University of Wisconsin-Madison, who was not involved in the study.
Still, he added, “all of this is being done in a low-stakes context. The question going forward is to what extent these conclusions apply in a high-stakes setting.”
The Gates Foundation also provides grant support for Education Week’s coverage of the education industry and for organizational capacity-building by the newspaper’s nonprofit publisher.
Findings Rundown
The study, “Gathering Feedback for Teaching,” released last week, draws on some 7,500 videotaped lessons taught by more than 1,300 grade 4-8 teachers across six school districts in several states. Each lesson was scored by multiple observers trained on one of several teaching frameworks.
The two general frameworks studied are consultant Charlotte Danielson’s Framework for Teaching and the Classroom Assessment Scoring System, a rubric designed by Robert C. Pianta, currently the dean of the education school at the University of Virginia, in Charlottesville. Three subject-specific frameworks, one in English/language arts and two in mathematics, were also tested.
The scores were then analyzed by looking at a variety of outcomes for students taught by those teachers. Among the new findings:
• All five frameworks were shown to bear a positive relationship to student achievement, as measured by growth on both state tests and more cognitively challenging exams, though the correlation was often “modest” in size.
• The error associated with the instruments was quite large when based on one observation by one observer, but it decreased when multiple observers scored the lessons.
• The observation frameworks showed a positive correlation with nonacademic measures of student happiness and effort.
• Teachers tended to score higher on such aspects of teaching as classroom management and pacing, but lower on such skills as connecting lessons to other disciplines, employing problem-based approaches, and using questioning strategies.
The MET project’s first release, in December 2010, brought some complaints from teacher groups, which said that the baseline measure of student outcomes—growth in test scores—was too narrow. For this study, the researchers broadened the list of outcomes slightly to include a measure of student effort and emotional engagement. Students taught by the teachers studied reported, for instance, on whether they pushed themselves to understand lessons in the class, and whether they felt happy in class.
The study also found that the observations helped improve the precision and stability of teacher-effectiveness calculations when they were combined with the two measures from the previous study: students’ perceptions of their teachers and “value added” measures of student test-score growth, a controversial element in many states’ new teacher-evaluation systems.
For instance, the study estimated that the difference in learning between students taught by teachers in the top and bottom quartiles of effectiveness, using just the Framework for Teaching as a measure, amounted to about 2½ months of growth in learning. But when the additional factors were included, the spread between top and bottom teachers’ effectiveness grew to 7½ months of learning, or almost a full year of schooling.
The researchers stressed that any decision to unite such measures carries trade-offs. For instance, of the measures studied, value-added was generally the best single predictor of future teacher performance, as measured by student score gains on state tests, but it is not as stable from one year to the next as student perceptions.
“There are trade-offs to what you put weight on, especially if you have goals beyond just maximizing how teachers can have the biggest impact on state standardized tests,” Mr. Staiger said.
In addition to the report’s implications for teacher-quality policy, its findings raise new questions about the state of testing in English/language arts. Researchers have long noted that value-added measures of teacher effectiveness seem to be less responsive to instructional differences in that subject than in mathematics.
The Gates Foundation study, however, found that student achievement on the Stanford 9, a standardized, open-ended literacy test requiring students to explain their thinking in writing, was more sensitive to differences in teacher quality than the state English/language arts tests.
The Context of Teaching
The report’s authors hypothesize that most states’ tests emphasize reading comprehension over the important skill of writing, which is emphasized in later grades.
The findings are unlikely to end what has so far been a volatile national discussion about how to boost teachers’ skills.
The president of the American Federation of Teachers, Randi Weingarten, said the study shows that teacher evaluation needn’t be based primarily on test scores. But she added that the study still puts too little emphasis on the context of teachers’ work.
“They are still regulating teachers, rather than creating the steps, including asking them what they need, to improve their practice,” she said in an email.
Among the MET analyses still under way is a random-assignment study meant to determine whether students assigned to teachers identified as more effective actually learn more relative to their peers.