The number of states that mandate principal evaluations has jumped in recent years, driven by rules tying federal education aid to such policies. But many are still grappling with the best ways to measure principal effectiveness and the extent to which student performance should be included in evaluating principals.
Since 2010, at least 36 states have adopted laws requiring principals to undergo regular assessments and increasing the rigor of those reviews, according to the National Conference of State Legislatures.
The changes reflect a shift from largely pro forma evaluations to complicated matrices that seek to tie principals’ effectiveness, in part, to student academic growth. The policies typically require that a percentage of a principal’s evaluation include student performance or growth. The amount ranges, for example, from 20 percent in Delaware to 50 percent of the overall score in states such as Georgia and Ohio.
But according to new, yet-to-be-published research, the growth of principal-evaluation policies has not been matched with corresponding study of their implementation, reliability, and effectiveness. Most of the attention and studies are geared toward similar systems for teachers.
Ellen Goldring, a department chairwoman at Peabody College of Education and Human Development at Vanderbilt University in Nashville, Tenn., did her own review of principal-evaluation legislation passed between 2009 and 2013. She said there was limited information about how the policies are used; a lack of clarity on the consequences for principals and how feedback is to be presented; and a lack of alignment with principals’ evolving roles, in talent management, data analytics, and building-level autonomy.
“Principal evaluation is ... the stepchild of teacher evaluation,” Ms. Goldring said, drawing a contrast to the huge body of research on teacher quality, including the influential Measures of Effective Teaching, or MET, project funded by the Bill & Melinda Gates Foundation.
“There has been a ton of work around teacher evaluations and measures of teacher quality,” she said, but “much less work on principal evaluations.”
Wave of Policies
The push to greater accountability for principals and teachers is due to a number of factors, according to the NCSL review of laws and policies adopted between 2010 and 2014.
The federal Race to the Top grant competition, launched in 2009, included such evaluation systems as a requirement for participants. More recently, in order for states to qualify for waivers from provisions of the No Child Left Behind Act, student growth had to be considered as a “significant” factor in evaluating principals, although federal guidelines left the details up to the states.
But the NCSL also found a dearth of valid and reliable evaluation methods, and little emphasis on training for the evaluators.
While details vary many states have settled on variations of three main models, according to Ms. Goldring.
Under one approach, which Ms. Goldring calls a 50-50 percentage model, half the principal’s score is derived from student performance or growth, using such measures as standardized tests. The other half is culled from a rubric that is generally aligned with the Interstate School Leaders Licensure Consortium standards, developed in conjunction with the Council of Chief State School Officers. Under those standards, principals are typically rated by their supervisors on a four-point scale, from ineffective to highly effective.
A second model is a matrix rather than a percentage formula. In Connecticut, student learning accounts for 45 percent and teacher effectiveness for 5 percent of a principal’s student-outcome rating score. The other half of a principal’s evaluation is composed of leadership practice (40 percent) and stakeholder feedback (10 percent).
In a third approach, student data counts for a smaller percentage of the overall evaluation, as low as 20 percent in some states; however, a principal cannot receive an “effective” score if student growth/achievement or performance is low.
Critiquing the Models
Ms. Goldring offered several critiques of the policies.
States are trying out a variety of ways to make student achievement a formal part of principal evaluations. New research sorts those approaches into a few common baskets:
“50-50" Percentage Model:
50 percent of the evaluation score is derived from student-outcome measures, usually student achievement or academic growth. This can include indicators such as graduation and attendance rates. The other 50 percent of the score often comes from a performance rubric, aligned with standards developed by the Council of Chief State School Officers.
Example: Georgia
Matrix Model:
In most cases, 50 percent of the evaluation is based on student outcome or growth measures; the other 50 percent of the score comes from a performance rubric. However, the overall score is derived from a matrix table, rather than a percentage formula.
Example: Ohio
Student ‘Data Trump’ Model
Student growth/performance may account for less than half of the principal’s overall score; however, a principal cannot earn the highest rating or be deemed “highly effective” with low student performance/outcome data. In other words, student data “trumps” everything else.
Example: Delaware
SOURCES: Ellen Goldring, Peabody College of Education and Human Development, Vanderbilt University; Kelly Jones, Vanderbilt University
She said the rubrics were being rolled out with little psychometric research on reliability or validity. They were based on professional judgment, she said, rarely relying on a “360-degree” approach, which could incorporate teachers’ feedback on principals’ quality and effectiveness.
Some systems, she said, were burdensome and complex; others were vague on the consequences. Some called for termination of low-scoring principals; others were silent on the matter.
With the changing roles of principals, states may want to consider alternative measures, Ms. Goldring said, including whether principals were retaining their most-effective teachers and whether professional development was meeting the needs of instructors.
New research in March in the Journal of Educational Evaluation and Policy Analysis seems to buttress Ms. Goldring’s call for caution and additional studies.
In looking at the use of value-added measures to evaluate principals’ performance in the 340,000-student Miami-Dade County school district, Jason Grissom, a professor at Vanderbilt University, found that different statistical models can yield different results, even when the principals, teachers, and students are the same.
Mr. Grissom recommends a system that, in addition to considering student performance and growth, would include a qualitative aspect with very specific descriptions of what constitutes “good” performance and require evidence collection, school visits, discussions with people who work with the principals, and surveys that could include parents, teachers, students, and those in the community.
A high reliance on test scores gives only a “narrow view” of principal and teacher performance, said Dick Flanary, a deputy executive director of the National Association of Secondary School Principals. Factors such as graduation rates, drop-out rates, literacy rates, and teacher turnover may be more appropriate measures to use, in his view.
His group and the National Association of Elementary School Principals, identify a number of other factors they say should taken into account in evaluations. They include professional growth and learning; student growth and achievement; school planning and progress; school culture; professional qualities and instructional leadership; and stakeholder support and engagement.
Many states are revisiting their evaluation policies for administrators. Just last week, the Michigan legislature voted on new statewide assessment standards for teachers and principals; however, the implementation will be delayed while disputes are settled over what tests will be used in the evaluations and the extent to which test scores will be considered.
Delaware, which has a long history of evaluating educators, including principals, added the student-performance portion to evaluations as part of its successful application for $119 million in federal Race to the Top money. In the revised appraisal system, rolled out in the 2012-13 school year, principals were evaluated in five areas: vision and goals; creation of a culture of learning; management; professional responsibilities; and student improvement.
Student data has the “governing effect” on the final analysis, meaning that a principal cannot receive an overall score of “effective” or “highly effective” if student growth is deemed unsatisfactory.
A state review found that during the 2012-13 school year, 97 percent of Delaware administrators received summative evaluations that rated them as “satisfactory,” yet there was little correlation between their high ratings and the student growth, according to the report.
Retooling the Evaluation
That was among the reasons the state again revised the appraisal system and added a more detailed principal-practice rubric, said Christopher Ruszkowski, the chief officer of the Delaware Department of Education’s Teacher and Leader Effectiveness Unit. The revisions were piloted in nine school districts in the 2013-14 school year.
If approved by the state board of education next month, they will be put in place statewide in the coming school year.
Jacqueline O. Wilson, an assistant professor of education at the University of Delaware, in Newark, said there was some pushback when the state decided to change the student-growth portion. But Delaware educators were already accustomed to being assessed, in part, on student performance, she said.
The main concern, Ms. Wilson said, was whether the state was using the right tools to measure effectiveness.
“I think having the preponderance of their evaluation based on student achievement is a little scary,” said Ms. Wilson, who is the director of the Delaware Academy for School Leadership, based at the university.
“But as other people in our state have said, if I really do components one through four well, ... it should result in higher student achievement.”
Positive Experience So Far
T.J. Vari, the principal of Everett Meredith Middle School, in Middletown, Del., who is taking part in the pilot program, said he has had a positive experience so far.
Mr. Vari meets regularly with his supervisor—the district superintendent—to discuss his progress and monitor the goals they had set for the year. If the superintendent can help him improve his performance, Mr. Vari said, he can do the same for teachers, which in turn will translate into better practice in the classroom and ultimately improved student outcomes.
“The four components that we have outlined for administrators are the right things to look at,” he said, “and I think that the rubric is very helpful in terms of seeing a continuum. The most important piece is that the principal-supervisor and the principal are meeting often to have both formal and informal discussions of school leadership.”