In September of last year, my students at the New York City public high school where I teach sat for a test called the Measures of Student Learning, or MOSL. The test was given for both math and English, each in a one-and-a-half-hour session which, for the sanity of both teachers and students, took place a week apart.
For English, my discipline, the students were given two reading passages, which they were told to use as the basis for an argumentative essay—in New York City Department of Education parlance, that means an essay in which students state a thesis and use articles to provide both claims and a counter-claim. And here’s the rub: Per the Department’s “Advance” teacher-evaluation initiative, 20 percent of a teacher’s yearly “rating” (Ineffective, Developing, Effective, or Highly Effective) will be based on the students’ collective growth on the MOSL assessment, as judged through seven different matrices. Thus, the purpose of the MOSL is not to learn about the students’ strengths or weaknesses so much as to learn about their teachers’.
Unsurprisingly, my students did not find their teachers’ evaluation to be a sufficiently compelling motivator to sit and take this test. The first question they asked was whether, if they took the MOSL, they’d still have to take the Regents—the New York State standardized exams—at the end of the year. When they were told that they would, they asked, “So why do we have to take this, too?” and “Does this test actually count for our grade?” When told that it would not, they either wrote one-paragraph responses to the essay question (a guaranteed failure, as far as the test score goes) or refused to take it altogether. In the room where I was proctoring, one of the kids called me over and politely inquired, “Why are we doing this? This test is a waste of class time.”
She couldn’t have articulated my own feelings more precisely. But it wasn’t only that I was irked by the fact that the four class periods required for administration of the math and English portions of the test could have been spent reading poetry, talking about novels, practicing writing conventions, or having discussions about world issues that would actually matter in the kids’ lives. Nor was it simply the fact that this test was now going to be used to evaluate my colleagues and me as teachers. I also felt quite strongly that, separate from its dubious applications, the MOSL was, quite simply, a badly designed test.
Missing the Point
What do I mean by this? For starters, the articles chosen for the argumentative essay didn’t pair well. The prompt asked the students to argue whether genius is innate or developed. One article, an excerpt from a Malcolm Gladwell book, examined a case study of violin students finding that greater levels of prestige in the profession corresponded to the number of hours per week the students spent practicing. The students who logged the most hours practicing, the study found, went on to play in world-renowned orchestras; the least consistently practicing students went on to become—what else?—violin teachers. (It was hard for us teachers not to feel a bit affronted by the choice of this particular passage). But the companion passage—which was, presumably, supposed to show that genius was innate (and not developed through hours of practice)—was only tangentially connected to the prompt. It recounted noted animal scientist Temple Grandin’s experience of living and working with autism, and contained only a seemingly throwaway line that one had to think a little bit “outside of the box” in order to accomplish great things. This, to me, didn’t especially convey the supposed counter-point idea to the Gladwell article; and I don’t think my students got the connection either, since a number of the kids who bothered to respond at all wrote some very earnest essays on the subject of “why autism is a hard disease to have.”
The problems stemming from the MOSL and other assessments of its ilk really break down into two categories of questions: Are these tests good for evaluating teachers? And what uses—if any—do standardized tests have as far as students are concerned? (The tertiary issue of whether all these tests are simply a means of lining the pockets of various test corporations big-wigs is also a valid consideration, but not one I’ll go into here).
Evaluation Dysfunction
From my standpoint, there are a great many reasons why assessments like MOSL are not good for evaluating teachers. One that I don’t hear articulated enough is how little they can control for outside factors, irrespective of what proponents of the infamous “value-added” models might assert. Whether students do brilliantly or poorly, it’s nearly impossible to attribute that performance solely to the one teacher they’ve had in a particular subject that year. Perhaps one English teacher’s students did well on a test because they had a history teacher who consistently drills them on essay writing. Or perhaps it was a writing teacher in an earlier grade who trained them particularly well. Whose pedagogical effect is the test measuring, exactly? Any group of scores is the result of a cumulative effect, not one single teacher’s.
By the same token, what if a whole cohort of students does poorly on a given assessment because they are late-enrollers routed into “failing” public schools, as they all too often are, according to a recent study by the Annenberg Institute? Such students often come from families in extremely stressful situations, including recent immigration from non-English speaking countries and bouncing between relatives’ homes and homeless shelters, among other family crises. As a result, they tend to be the poorest performers on assessments.
I recently spoke with a group of researchers from the Human Resources Research Organization, a nonprofit that specializes in personnel management. They acknowledged the faulty link between a teacher’s performance and students’ scores on tests. They likened attempts to evaluate the performance of teachers of students in high-needs schools through their test scores to measuring the performance of an umbrella salesman in a desert: Even great “performance"—a deep product knowledge, great salesmanship, and a great personality—would not be “effective” in yielding umbrella sales in a dry zone, just as even great pedagogical performance (interesting lessons, rigorous assignments, a way with kids) in no way guarantees effectiveness in terms of a group of students’ test scores hitting a certain mark.
In fact, what proponents of test-based teacher evaluation claim tests can show us about a teacher’s effectiveness can probably be determined better in other ways. These include principal and peer-to-peer observations, student questionnaires, and examining a teacher’s curriculum of lessons with an eye for rigor, creativity, and variety. Using students’ test scores as a means of evaluating teachers attempts to put a number on something that is inherently too unquantifiable, nuanced, and broadly impacted to be identified through an exam alone.
Failing the Students
The student benefit of these state-mandated assessments is also questionable. The premise of the particular exam my students took—that the argumentative essay is somehow the basis of the critical thinking the students will need to reach that ever-elusive “college-readiness” benchmark—is in itself faulty. In my own preparation for college, a premium was placed on skills in expository writing, research, clear explanation of sources, and perhaps most importantly, my ability to come up with my own unique interpretation of any given source or set of sources. The argumentative essay as given by the MOSL requires students simply to choose the one reading that seems more “correct” as far as answering the prompt, summarize it, and then mention the points of the other one. It’s laughable to assume that from this single exercise, one could distill their critical-thinking skills better than from any of the other tasks they perform over the course of their school year, or to believe that this test presents some “pure” example of critical thinking.
Apologists for “teaching to the test” (such as the author of a recent op-ed in the Wall Street Journal) might argue that tests like the MOSL promote critical thinking in that they require students to analyze passages of text for meaning and tone—and that teachers are simply too lazy to teach difficult passages of reading or critical thinking that these tests require. This argument simplifies the truth, which is that assessments such as MOSL represent only one particular type of critical thinking that is neither universally relevant, nor indicative of all ways one might deduce, synthesize, and re-convey information. It is, however, painfully boring for the kids. And in schools in which truancy is an issue, in which teachers are making every effort to make learning exciting, engaging, and relevant for kids, simply so that they’ll show up consistently, forcing students to take assessment on top of assessment in the hopes that something new will be shown is downright detrimental to educational outcomes.
The best and only useful application of assessments is a limited one, both in scope and frequency: They may be used diagnostically, in class, in the beginning of the school year or new curricular units, in order to help teachers (and parents) determine what strengths students already have and what weaknesses teachers need to address. They should be skill-specific, and not time-consuming.
The idea that an assessment can measure something so broad and nebulous as “critical thinking” or “college readiness” or “teacher efficacy” must be given up entirely, in favor of small-scale mini-assessments, oriented towards discrete skills or topics, that can help teachers to target instruction effectively throughout the school year. Only when viewed this way can assessment through tests—state-mandated or otherwise—actually serve a useful purpose, both in guiding teachers’ instruction in the classroom, and enabling students to develop and obtain achievable educational goals.