ChatGPT feels like it’s everything, everywhere, all at once (repurposing a great movie title but inserting punctuation). How generative artificial intelligence—AI that creates new text or images (as you can see in ChatGPT, Bing, or DALL-E)—shakes out is unclear: Will we create an artificial superintelligence that displaces humans? Or will we harness its power to improve learning processes and outcomes?
Nobody can predict that future with certainty, but one thing we do know is that generative AI requires large quantities of high-quality, relevant data to be of any value. In the education sciences, we also know that such large-scale, high-quality data are neither everywhere nor all at once. However, the National Assessment of Educational Progress, often known as the Nation’s Report Card, provides carefully collected, valid, and reliable data with rich contextual information about learners while protecting student privacy. In short, NAEP can begin to fulfill the data needs of modern education research. And the National Assessment Governing Board—which sets policy for NAEP and meets this week—should prioritize the release of these data.
As is so often the case, the science is moving faster than the speed of government, but this is one area where we have everything we need to catch up. Given the potential these taxpayer-funded data have to improve support for educators and outcomes for students, there is a clear obligation to make the information available to researchers. As advocates for high-quality, high-impact research, we urge that step.
Since 1969, NAEP has measured student achievement in mathematics, reading, science, writing, arts, history, and civics. NAEP uses a mix of conventional forced-choice items; student essays; short, open-ended responses; and simulations. NAEP also collects “process data” about how students interact with items using the digital-based assessment platform. Further, NAEP collects detailed demographic and self-reported information, which includes the basics (for example, race/ethnicity, gender) and deeper information (for example, English-language-learner status, IEP status, disability accommodations). NAEP’s data mine holds hundreds of thousands of examples of student work coupled with detailed contextual information about students, their school, and their community. We need to use those data to improve AI algorithms that can in turn improve student outcomes.
Automated scoring is among the most widely researched and deployed uses of AI in education. But replicating human scoring is the floor, not the ceiling. Researchers could use NAEP data to explore complex constructs that have more far-reaching implications than scoring—such as categorizing math misconceptions, identifying ways to improve student writing, or understanding the key themes present in student writings about civic engagement.
With NAEP’s large samples and detailed contextual variables about the test-takers, their schools, and their families, we can also learn about the impact of many factors on student achievement.
NAEP can begin to fulfill the data needs of modern education research.
Protecting student privacy is, of course, essential but also not a reason to delay the release of the data, as some argue. Many safeguards are already in place. NAEP’s results reported at the group level means that protecting privacy is easier than individual assessments, because every result is a summary across many individuals. Further, NAEP’s long history and its procedures minimize risk. For example, the information that could identify a particular test-taker is removed even before the data leave the school. There are known solutions to ensure that individual student identities will not be revealed as a result of a small number of students being categorized in any subgroup. Open-ended responses are a bit trickier; NAEP doesn’t control what students put into these fields, and sometimes, they write a bit off-topic, revealing personal data that need to be scrubbed (perhaps noting that “My uncle, Frank Johnson, who lives in Auburn, was once busted for DUI”).
The Institute of Education Sciences, where we work, is scrupulously addressing privacy concerns in NAEP data. Our recently announced competition (with $100,000 in prizes) asks researchers to solve the difficult problem of using AI to replicate human-assigned scores for open-ended math items. Before NAEP math-assessment data were released to participants, the information was scrubbed for personally identifiable information and sensitive language using automated and human-based reviews. The reviews ensured that neither student identities nor other types of sensitive information such as a social media handle were disclosed. The dataset is being further processed through our internal controls to ensure it is sufficiently safe to release.
Decisions regarding data privacy should be weighed for the relative risk and reward. The value of tapping NAEP’s data gold mine is high, and, given its history and design, the risk to student privacy is low. In short, privacy concerns should not inhibit the release of NAEP data to qualified researchers.
Research using NAEP data could improve NAEP itself but, more importantly, answer questions about how students learn. For NAEP as an assessment, modern research methods could be used to help review and revise the questions, identifying items that specific groups of students find difficult due to wording or issues not related to the underlying construct. This would move beyond standard psychometric analyses through the incorporation of rich contextual data.
NAEP data could have much broader applicability, especially in the context of large-language models—the underlying approach used by generative AI. Most existing large-language models are based on data scraped from all over the web. While OpenAI, the company that created ChatGPT, does not disclose the specific data sources used for model training, ChatGPT is reportedly trained using information from web texts, books, news articles, social media posts, code snippets, and more. There are more than a few examples of ChatGPT providing questionable or toxic responses depending on the prompt it is given. An equally serious (and related) problem is that large-language models do not have access to enough student academic work, leaving them severely anemic just where we need them most. NAEP data could help with fine-tuning these models, making them more accurate and more useful.
We are only beginning to see how the future of education research will be transformed by generative AI—but one thing is crystal clear: NAEP data must be part of that future. Opening up NAEP’s gold mine of data is an easy call. Doing so will allow us to tap into the creativity of the research community to explore what insights we can derive from NAEP data that will be useful to education stakeholders.
NAEP is approaching a $200 million a year operation. While it produces invaluable insights into student achievement, it has not yet delivered on its full promise.