Six states are seeing precipitous drop-offs in their students’ reading scores on a popular norm-referenced test once the students hit high school, leading officials in some of those states to question whether the test is a valid measure of achievement.
High school students in California, Florida, and four other states are posting reading scores on the Stanford Achievement Test-9th Edition that are dramatically lower when compared with national norms than those earned by students in earlier grades in the same states.
“The issue doesn’t seem to be the students,” said Scott Hill, the chief deputy superintendent of public instruction in California. “It does seem to be one based on the test.”
California’s 8th graders ranked at the 47th percentile on the Stanford-9 reading test this year, using what is called “Form T” of the battery of exams, while 9th graders scored at only the 36th percentile, and 10th graders came in 3 percentile points lower.
In Florida, which administers the same form of the Stanford-9, 8th graders scored at the 54th percentile this year. Ninth graders, however, ranked at just the 38th percentile, while 10th graders there, like those in California, scored at the 33rd percentile.
California has seen similar drop-offs each year since it began using the Stanford-9 in 1998. Florida used the test for the first time this year.
The four other states that give the Form T statewide—Alabama, Arizona, Delaware, and South Dakota—are experiencing similar falls in their high school scores in reading. Like California and Florida, those other states are not seeing comparable drop-offs in other subjects.
States that give the other form of the Stanford-9, called “Form S,” don’t see such sharp drops in reading achievement, according to Thomas E. Brooks, the director of applied research for Harcourt Educational Measurement, the publisher of the test.
“There’s something very fishy going on,” asserted Walter M. Haney, a Boston College education professor who specializes in testing. “It suggests that we ought to use considerable caution in interpreting these results.”
Harcourt Educational Measurement has investigated the fall-off, which seems to begin in the 9th grade, and says it hasn’t found anything that suggests a problem with the quality of the test.
“We’ve checked everything we can,” said Mr. Brooks of the San Antonio-based company, a division of publishing conglomerate Harcourt General Inc. “As far as we’ve been able to determine, the procedures have been followed correctly.”
Defining Normal
The phenomenon appears confined to states that use Form T, a version of the test designed to be given under strict conditions to safeguard against cheating.
Two of the states that use Form T, however, don’t question the reading results.
In Delaware, which gives the Stanford-9 in the 3rd, 8th, and 10th grades, high school students fare considerably worse in reading against national norms than those in the earlier grades. But those results are similar to those found over the past 20 years on other tests, including the exams currently given, according to Robin R. Taylor, the state’s acting associate education secretary for assessment and accountability.
Likewise, Alabama officials say their state’s reading drop-off parallels that of other standardized-test scores given in the recent years.
If there is an error with Form T, it may stem from the way the test was created and scored, according to testing experts and state officials.
Testing companies are having an increasingly difficult time recruiting schools to participate in the process called norming, Mr. Haney said. In that process, an exam is administered to a national sample of students designed to mirror the country’s demographic makeup. The group’s scores are then used to calculate a national average.
Finding volunteers for the norming process is difficult for all grade levels, but it’s especially hard in high schools, where school leaders are reluctant to take time out of schedules that are packed with academic and extracurricular pursuits.
If the norming participants for the Stanford-9’s reading test included a large proportion of high achievers, that could create a national average that is artificially high, experts say.
But Mr. Brooks of Harcourt said the test-maker had uncovered no problems with the manner in which the test was normed.
Earlier this year, Florida delayed the release of Stanford-9 scores for 2000 because they differed so greatly from the results on the Florida Comprehensive Assessment Test, the state’s own exam. The Stanford-9 scores suggested that 10th graders’ reading skills were poorer than those of the state’s 8th graders. But the FCAT results indicated the reverse.
After Florida’s inspector general studied the results and reached no conclusion about their validity, the state released the Stanford- 9 scores this past summer with a disclaimer saying that they should be interpreted “cautiously.”
South Dakota officials also question whether their state’s 11th graders read at just slightly above the national average, as this year’s Stanford-9 results show, when those same students rank at the 71st percentile on the social science exam, which also requires reading skills, said Gary Skoglund, the state’s director of assessment.
“There shouldn’t be that big of a difference,” Mr. Skoglund said. The discrepancy, he added, “is consistent year after year after year.”
California officials are asking Harcourt to hire an independent evaluator to review the test for any errors that may be causing the apparent drop in high school reading achievement.
The company has suggested that the scores may fall because the number of 9th graders in California generally exceeds that of 8th graders, due to a higher 9th grade retention rate. Mr. Brooks said that pattern tends to increase the ranks of low-achieving students in the 9th grade testing pool.
But Mr. Hill, the deputy California superintendent, said state officials are not convinced. “It’s not a very satisfactory answer,” he said.