For the second year in a row, a controversial $14.4 million federal study testing the effectiveness of reading and math software programs has found few significant learning differences between students who used the technology and those taught using other methods.
Of the 10 commercial software programs tested at various grade levels, only one—LeapTrack, a supplemental-reading program for 4th graders that is published by LeapFrog Schoolhouse, of Emeryville, Calif.—produced significant improvements in students’ test scores across both years of the study.
Although not large, the test-score boost that the program provides is considered enough to move a typical student from the 50th percentile to the 54th percentile on a national standardized reading test, according to the report.
The two Algebra 1 products tested—Carnegie Learning’s Cognitive Tutor Algebra 1 and Houghton Mifflin Harcourt’s Larson Learning Algebra 1—led to similar-size test-score gains, but only among students taught by a subset of teachers who had used the same products for two years in a row.
Publishers, researchers, and federal officials called the findings disappointing, but also raised cautions about relying too heavily on the results to compare effectiveness among products and choose which ones to buy.
“If you already have the hardware in the classroom and you want one of these products, this would not dissuade you,” said Mark Dynarski, the lead researcher on the project for Mathematica Policy Research Inc., the Princeton, N.J.-based company that conducted the study.
“If you’re quite skeptical of the software and very budget-pinched, I think you would feel this is evidence in favor of your position,” he added. “And if you’re really right in the middle, I think it comes down to how much you want to move test scores, because you’re really not going to see that happen with these products.”
Study Draws Criticism
Despite a quiet release in January, the study met with criticism from independent researchers and software publishers.
“There’s nothing really here that superintendents or state policymakers or corporations could use that would be a strong basis for decisionmaking,” said Christopher J. Dede, a professor of learning technologies, innovation, and education at the Harvard Graduate School of Education and a critic of the study. “I feel the methods used were more flawed in the second year than the first.”
Ten computer-based reading and math products were evaluated in the 2005-06 school year as part of a major federal research project.
Grade 1 Early Reading
• Destination Reading, Riverdeep Inc.
• Headsprout, Headsprout Inc.
• The Waterford Early Reading Program, Waterford Institute Inc.
• Plato Focus, Plato Learning
Grade 4 Reading Comprehension
• Academy of Reading, AutoSkill International Inc.
• Leaptrack, LeapFrog Schoolhouse
Grade 6 Prealgebra
• Plato Achieve Now, Plato Learning Inc.
• Larson Prealgebra, Houghton Mifflin Harcourt
Grade 9 Algebra
• Cognitive Tutor, Carnegie Learning Inc.
• Larson Algebra, Houghton Mifflin Harcourt
Source: Mathematica Policy Research Inc.
Note: Some of the developers and companies have since sold their product lines or been involved in corporate acquisitions.
The findings don’t mean that products that seem to be ineffective in one school or district won’t work better in another, the report concludes, nor should educators and policymakers use the results to make head-to-head comparisons between products. In some cases, Mr. Dynarski said, too few schools were using the individual products studied to make those kinds of comparisons.
Involving roughly 13,000 students, the study was ordered by Congress in the No Child Left Behind Act. The report on the first round of findings, which looked at 16 products, came out in 2007. (“Major Study on Software Stirs Debate,” April 11, 2007.)
The new report, the last one for the project, evaluates 10 commercial software programs that are widely used in the 1st, 4th, and 6th grades, as well as in Algebra 1 classes, which can be taught at several grade levels.
Unlike its predecessor, the final report gives product-by-product results for all 10 programs studied. Over the 2005-06 school year, researchers tested the programs in 23 districts around the country, most of which served high numbers of low-income students, and 77 schools. In each school, and for each product used in those schools, researchers included at least one control classroom and one experimental classroom.
“The control classrooms are generally using only products for Internet browsing or practicing on state assessments,” Mr. Dynarski said. “They weren’t using the other software products.”
A subset of teachers—115—stuck with the same products for a second year, allowing researchers to see whether the programs became more effective as teachers grew more familiar with them. The additional experience only seemed to matter for the Algebra 1 software, though; for the other programs, students fared about the same in both study years.
The study also found that the average amount of time that students spent using the programs fluctuated from year to year. Yet the researchers could find no correlations between programs’ effectiveness and the amount of time that students spent using them.
Questions on Method
Some experts said the study may raise more questions about the usefulness of experimental research designs in education than about the findings themselves. The software study was among the first to reflect the then newly formed Institute of Education Sciences’ early emphasis on large-scale randomized studies.
“These studies are intended to wash out all the variation in school environments, teacher quality, resources—all the things that we, in fact, know make a difference when it comes to student learning,” said Margaret A. Honey, a technology expert who is the president of the New York Hall of Science.
Mr. Dynarski said such concerns stem from the belief that the study had failed to pick up actual learning gains. “I’m not sure that the right answer isn’t zero,” he said.