• Mon. Jun 15th, 2026

Are We Facing a New Dark Age for Public Health Data?

Are We Facing a New Dark Age for Public Health Data?

About a decade ago, I (Frank W. Putnam) read an editorial by the president of the Chinese Academy of Sciences lamenting the high rates of fraudulent research published in Chinese journals. Could Chinese science meaningfully advance over the long run given that an estimated one-third of papers were thought to include fabricated or falsified data?

Smugly, I thought that this was a uniquely Chinese problem related to a politicized academic system that excessively emphasized number and prestige of publications, even paying scientists and their institutions large cash bonuses for papers in esteemed journals. While lesser incentives also existed in the U.S., I believed that our peer review system limited their influence.

At the time, one of my duties was serving as the chief research integrity officer (RIO) for a large children’s medical center. Trained by the NIH Office of Research Integrity, I was involved in all of the medical center’s alleged research misconduct investigations for over a decade. Although some of the scientific fraud cases I investigated were eye-opening to be sure, I never once felt as if the integrity of U.S. science as a whole was in jeopardy.

Now, I believe that the credibility of our entire scientific order is at risk from twin attacks on the integrity and the availability of data. The first danger is the easy fabrication of fraudulent data by artificial intelligence (AI). The second peril is the systematic removal of essential, long-running, widely-trusted datasets from public access.

AI and the Fabrication of Scientific Data

Multiple studies prove that AI can generate fraudulent datasets that pass close forensic scrutiny (Hua, 2025). Not only can AI fabricate statistically convincing raw data, but it can also generate sophisticated fraudulent visualizations that are extremely difficult to detect (Kim et al, 2024). Primed by remarkably simple queries, AI can produce compelling “scientific” papers featuring plausible data and convincing figures. And, as true as ever, “seeing is believing.”

The magnitude of such AI-generated fraudulent science is exceedingly difficult to quantify, given the many ways in which it may be manifest and the many areas of science that may be involved. Authorities pretty much agree, however, that bogus scientific papers are becoming far more difficult to detect by both experts and AI screening. While scientific fraud has always been a problem, given the free availability of AI tools and mounting pressures to “publish or perish,” it is likely that fraud is increasing if only because of the ease and opportunity provided by AI as well as the growing difficulties in detecting faked science.

At best, these fraudulent studies dilute the facts and blur our efforts to delineate underlying principles and processes as well as misdirect researchers, clinicians, and funders down false pathways, wasting their time and resources. At worst, they will cost human lives and increase needless suffering.

The Loss of Irreplaceable Public Health Datasets

There is no science without scientific data. The better the data, the better the science, the deeper the knowledge gleaned. In one way or another everyone is touched by and benefits from the existing, rich array of taxpayer-funded scientific, medical, and public health data collected by local, state, and federal government agencies.

For many years, taxpayer-funded datasets were primarily available only to the scientists and institutions that collected them. In the past decade or two, the open science movement, which promotes the free exchange of scientific data and information, has resulted in dramatic increases in the productivity and reproducibility of scientific and medical research as a whole. Publicly available public health datasets have dramatically increased the access of independent investigators, especially more junior scientists and students, to high-quality, high-value, data far beyond what is typically available through mentors and institutions. They also provide for an objective check on misrepresentation of their findings.

The National Institute of Mental Health Data Archive

The National Institute of Mental Health Data Archive (NDA) is a sophisticated informatics system that maintains high quality, deidentified human subjects data related to mental health (Flores et al, 2025). The datasets, which share common data elements such as demographics, DSM diagnoses, psychiatric and medical symptoms, and behavior inventories, range from prospective imaging studies of child brain development to postmortem collections of genotyped brains.

Recently the NDA website pages were stamped across the top with the ominous warning: “This repository is under review for potential modification in compliance with Administration directives.”

Among mental health researchers, there are widespread concerns that the NDA could meet the same fate as the CDC’s Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS, one of the most widely used national health surveys, was taken offline (and later reposted without necessary questionnaires and code books). For 40 years the BRFSS has informed policy makers, the media, and public about “. . . health risk behaviors, preventive health practices, and health care access primarily related to chronic disease and injury (Cox et al., 2025).” Another serious recent loss is the CDC’s Youth Risk Behavior Survey (YRBS) which has tracked high school students’ health and social outcomes like drug, alcohol, and tobacco use, diet, and exercise since 1991. (The YBRS was also later reposted but again without the essential code books and questionnaires necessary to interpret the data).

Synergistic Impacts of Fraudulent Data and the Loss of Trusted Data

The synergistic impacts of the combination of an AI-driven deluge of undetectable fraudulent data, together with the disappearance of trusted, public domain datasets, pose an enormous threat to the quality and credibility of all of the scientific disciplines that undergird mental health care and policy. Not only will it become ever more difficult to establish “the facts” scientifically and to test the theories that depend on those facts, but it will also become increasingly difficult to set research agendas, make informed policy decisions, or trust clinical decisions. Science as a method for building, validating, and organizing knowledge will become paralyzed, unable to sort the wheat from the chaff.

Is this the beginning of a new dark age?

link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *