CORRECTION (Sept. 25, 2020): This post has been updated to clarify that the reference to 2019 NAEP data referred to a nationally representative 4th grade class, and that the conclusion drawn by the author was extrapolated beyond the grades tested by NAEP using nationally normed standardized tests.
This week Dylan Wiliam, eclectic Wales native and emeritus professor at University College London, takes over the blog. Dylan began his career as a math teacher in London (having followed his “jazz-folk” band to the capital city) before eventually stumbling into academe. His books include Creating the Schools Our Children Need and Leadership for Teacher Learning. Across a varied career, he has taught in urban public schools, directed a large-scale testing program, spent three years as senior research director at the Educational Testing Service (ETS), and served a number of roles in university administration, including dean of a school of education at King’s College London. Dylan, whose advice usually costs a pretty penny, will spend the week offering pro bono thoughts and tips to educators struggling to get school going this fall.
— Rick
As schools and districts around the country grapple with the issue of how to start the new school year, we’ve been inundated with alarming cries of how far behind students heading back this fall will be. Plenty of predictions of just how far behind we’re talking about have been tossed around—most of which are, in effect, extrapolations from studies of what is sometimes called “summer learning loss"—the decline in student achievement over the long summer vacation. I’m here to remind you that while these predictions can be helpful, there’s huge variation in the extent of such loss—from 40 percent to 10 percent—and as Paul von Hippel explains here, there may, in fact, be little or no learning loss over the summer.
One reason for the vast differences in these estimates may be due to an underappreciated feature of learning. Psychologists who research memory, like Elizabeth and Robert Bjork, point out that how easy it is to retrieve something from memory is different from how well something has been learned—what they call retrieval strength and storage strength, respectively. When we test students at the beginning of a school year, we are testing retrieval strength, which, if the students have not been reviewing the material they learned in the previous year, will have declined over the summer. But how well something is learned—storage strength—does not decline, and restudying the material increases both retrieval strength and storage strength. In other words, what students can do on their first day back in school—whether face to face or online—is a poor gauge of what they have actually learned. More importantly, restudying material increases storage strength more when retrieval strength is low so an hour spent restudying material after the summer break will have more impact on long-term learning than the same time spent studying it before the summer break.
It is also important to put whatever learning loss there has been in perspective. According to the latest (2019) NAEP data, in a nationally representative 4th grade class of 25 students (and extrapolating beyond the grades tested by NAEP using nationally normed standardized tests), there were five students whose math achievement was no higher than the average 1st grader and two students whose achievement would match that of the average 9th grader. In other words, in a nationally representative class of 4th graders, there is at least an eight-year spread of achievement. Even if we assume that students learned nothing after schools began closing down in March, it means that rather than an eight-year spread of achievement, a returning 5th grade class would have an eight-and-a-half year range of achievement. This does, of course, present teachers with huge challenges, but these are challenges that teachers have been dealing with for years.
Of course, teachers will need support in figuring out where their students are as the new school year starts, but it is important to note that one widely proposed approach—the use of standardized tests—is unlikely to be of much help. Standardized tests can tell us how far along a continuum of achievement a student is, but knowing that a student is at the 30th percentile of achievement for his grade or that a 4th grader has a grade equivalent of 3.6 tells us nothing about what kinds of instruction we should provide. Worse, because many such standardized tests adjust the items a student is asked to answer according to how well the student answered previous items (sometimes called “adaptive tests”), we don’t even know which items the student answered correctly. All we can do is place the student somewhere along a line of increasing achievement.
Acknowledging this, some test vendors have tried to increase the instructional usefulness of their tests by offering diagnostic score reports. For example, if a 40-item arithmetic test includes 10 items on each of addition, subtraction, multiplication, and division, as well the overall score on the test, a student might get four subscores indicating a profile of achievement across the four aspects of arithmetic.
At first sight, it seems that such an approach would be a good idea. Knowing that a student scored 90 percent on addition, subtraction, and division but only 50 percent on multiplication seems much more useful than being told that the student scored 80 percent overall. However, because we only had 10 questions on multiplication, the score a student gets on these 10 questions is not particularly reliable—with a different set of 10 multiplication questions, the student might do much better. Moreover, since students who are good at one aspect of arithmetic tend to be good at the other aspects of arithmetic, it turns out that even if we just wanted to know how good a student is at multiplication, somewhat paradoxically, the score they got overall on the whole 40-item test is a better guide than how well they did on the 10 multiplication questions. Unless students score very differently on different aspects of what is being tested, diagnostic scores tell us little about a student’s strengths and weaknesses (although they might tell us which aspects have been less well understood across a class).
So what does all this mean for schools and districts?
Well, first, unless you want to be able to put next year’s test results in context by having data that show how little students remembered from the previous year, then standardized tests aren’t going to be of much help.
Second, teachers and administrators must decide whether material that has not been covered from last year needs to be covered. While some authors have argued forcefully the desirability of starting students on this year’s curriculum, that aspiration must be tempered in the case of more hierarchical subjects like math. After all, if students cannot generate sequences of equivalent fractions, then they are unlikely to be able to master addition of fractions with any understanding.
Third, rather than relying on commercially produced tests, teachers would be better advised to use quick surveys of student achievement. These sort of assessments could take various forms, from using single, well-designed multiple-choice questions to gauge a class’ recall of the prerequisites for the next lesson, to getting students to use finger-voting (one finger for A, two for B, and so on), to using the chat facility when teaching online. This will provide teachers with useful information about where to pitch their instruction (and also provides students with retrieval practice!).
Ultimately, it comes down to the advice offered by David Ausubel over 50 years ago: “The most important single factor influencing learning is what the learner already knows. Ascertain this and teach [...] accordingly.”
— Dylan