A new study has a strong indictment of the performance-based licensing test edTPA: The test results are not always reliable or precise, and in fact, can be “misleading.”
And the researchers argue that until these issues are further analyzed and resolved, there should be a moratorium on using edTPA scores for high-stakes decisions for individual teachers.
The assessment, which teacher-preparation programs must use in 18 states, requires teacher-candidates to submit a portfolio of materials for review, including a series of lesson plans, a video of themselves teaching, and a written analysis of their instructional practice. The study, published in the American Educational Research Journal, analyzed the technical properties of edTPA scores, using data from 2013 to 2018.
EdTPA is the first standards-based assessment for teacher-candidates to become nationally available, and is meant to ensure that new teachers are effective from day one. It was developed by prominent teacher-educator Linda Darling-Hammond and other researchers at the Stanford Center for Assessment, Learning, and Equity (SCALE) starting in 2009, and the exam rolled out in 2013-14 after two years of pilot testing.
EdTPA covers 27 different subjects and grade levels. The assessments are structured across three main teaching tasks (planning, instruction, and assessment), and each task has about five associated rubrics. In most cases, one person scores the entire porfolio for any given teacher-candidate.
When humans score assessments, there’s always going to be error, said Drew Gitomer, an education professor at Rutgers University and an author of the report. To mitigate those errors, many performance assessments have multiple raters, who often score different parts, so the errors will be balanced out over time.
With edTPA, a random subset of exams—about 10 percent—are scored twice to determine how reliable rater judgments are. And edTPA reports a high measure of reliability. But Gitomer argues that’s because—and here’s where it gets technical—edTPA uses its own version of the kappa statistic.
Kappa is used to test reliability, and is typically calculated by whether two or more raters assign the exact same score. A kappa of 1 indicates perfect agreement among raters, whereas a kappa of 0 indicates agreement that’s equivalent to chance.
But Gitomer says that edTPA uses an “invented” version of kappa that takes into account both exact agreement and adjacent agreement, or when ratings fall within one performance level of one another. For instance, the rubrics are graded on a five-point scale. If one rater gives a score of 3, and another gives a 2, edTPA would count that as agreement.
And that matters, Gitomer said, because if one person gives every part of an assessment a three, and a second rater gives half of the sections a two, then the teacher-candidate would not pass the exam, whereas they would have passed with all threes. Yet edTPA would report that the raters were in agreement.
“They’ve created their own statistic here,” Gitomer said. “It’s really deceptive. It creates an illusion for something that’s clearly not true. That’s a huge, huge issue.”
Conclusions Questioned
The developers of the exam have disputed the conclusions of this study. The edTPA is administered by Pearson.
“SCALE and Pearson had seven years in implementing this program nationally, and we stand behind the growing body of technical evidence that supports the intended use of the edTPA,” said Raymond Pecheone, the executive director of SCALE.
He added in an email that SCALE has published an annual technical report that focuses on the reliability and validity of edTPA, and has been transparent about its findings.
Even so, Gitomer argues that it’s possible many teacher-candidates who were close to the cut-off passing score were misclassified.
“If you’re going to make different judgments, there has to be some justification for that,” he said. “It shouldn’t be the luck of the draw.”
The paper argues for serious consideration to be given to a moratorium on using edTPA scores for high-stakes individual decisions until more evidence regarding the reliability, precision, and validity of the scores is gathered and analyzed.
Still, edTPA’s developers have pushed back on that recommendation.
“EdTPA is a widely used and rigorously field-tested performance assessment of new teachers and meets technical standards,” Pecheone said in a statement. “While technical experts at SCALE and Pearson are closely reviewing the AERJ article, the substantial and ongoing collection of evidence of edTPA’s technical qualities, based on its nationwide use over the past several years, firmly supports its intended use to support licensing decisions.”
A Controversial Exam
Since its inception in 2009, edTPA has generated controversy. The assessment was developed in order to set a high, nationally recognized standard for what competencies teacher-candidates are expected to demonstrate in order to achieve a teaching certificate.
But others take issue with the high-stakes nature of the exam, and argue that it’s forcing colleges of education to teach to the test. Also, teacher-candidates of color tend to score lower on the edTPA than their white peers, and some argue that the exam, which costs $300 to take, is keeping talented teachers out of the profession. And some worry that the tests are vulnerable to cheating.
A study published last year asked teacher-candidates how they perceived the assessment process. Some said they thought the test helped them reflect on their practice, develop effective assessments, and analyze student data, while 40 percent said the edTPA didn’t help them grow at all as educators.
Research has been mixed on whether passing the exam actually correlates with being an effective teacher. A 2016 study found that teacher-candidates who passed the test on the first try tended to boost their students’ reading scores more in their first year of teaching than those who didn’t. But passing the exam didn’t bear any relationship to students’ math scores.
Image via Getty