AI Deepfake X-Rays Fool Doctors and Machines Alike

A New Threat in Medical Imaging

Artificial intelligence can now generate fake X-ray images so convincing that even trained radiologists cannot reliably detect them. Moreover, the AI tools designed to analyse medical data face the same problem. A landmark study published in Radiology — the journal of the Radiological Society of North America — confirms that deepfake medical imaging has moved from a theoretical concern to a measurable clinical reality.

The study draws on input from 17 radiologists across 12 medical centres in six countries. Together, their findings expose a significant and growing vulnerability at the heart of diagnostic medicine. As AI tools become more accessible, the barrier to fabricating convincing medical images continues to fall.

How the Study Was Conducted

Researchers evaluated two distinct datasets to test both radiologist and AI performance. The first dataset included 77 real radiographs and 77 synthetic images generated by ChatGPT-4o, covering chest, spine, and extremity X-rays. The second dataset focused exclusively on chest X-rays — split evenly between authentic clinical scans and images produced by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers.

Who Participated

The 17 radiologists came from the United States, France, Germany, Turkey, the United Kingdom, and the United Arab Emirates. Their professional experience ranged from zero to 40 years. Participants included trainee residents, early-career staff, musculoskeletal specialists, thoracic imagers, and general radiologists — a broad cross-section of the field. Crucially, the two image sets had no overlap, ensuring each participant faced fresh images in each evaluation.

What Radiologists Got Right — and Wrong

The results are striking. When radiologists reviewed ChatGPT-generated images without knowing the study’s true purpose, only 41% spontaneously flagged the AI-generated scans as suspicious. The majority simply did not notice. After researchers informed participants that synthetic images were present, average detection accuracy rose to 75%. However, individual performance varied widely — some radiologists correctly identified as few as 58% of fake images, while the best performers reached 92%.

Experience Does Not Help

One of the most significant findings is that years of professional experience offered no advantage. A radiologist with four decades of practice performed no better than a recent trainee when it came to detecting deepfakes. This finding undermines the assumption that clinical expertise alone can serve as a reliable safeguard against synthetic image fraud. Furthermore, musculoskeletal radiologists outperformed other subspecialties — suggesting that familiarity with fine anatomical detail may matter more than general experience.

AI Models Struggle Just as Much

The research team also tested four leading multimodal large language models: GPT-4o and GPT-5 from OpenAI, Gemini 2.5 Pro from Google, and Llama 4 Maverick from Meta. Their accuracy in distinguishing real from ChatGPT-generated X-rays ranged from 57% to 85% — a range that closely mirrors human radiologist performance.

Even the Creator Could Not Catch All Its Own Fakes

Notably, ChatGPT-4o — the model used to generate the deepfake images — could not detect all of them. It did outperform the other models by a considerable margin, but its failure to achieve full accuracy reveals how sophisticated these synthetic images have become. For RoentGen-generated chest X-rays specifically, AI model accuracy ranged from 52% to 89%, with the lower end barely surpassing random chance.

What Makes Deepfake X-Rays Hard to Spot

Lead study author Dr. Mickael Tordjman of the Icahn School of Medicine at Mount Sinai described a paradox at the heart of deepfake detection: synthetic images often appear too perfect. Bones look unnaturally smooth, spines appear straighter than any real patient’s, lungs show excessive symmetry, blood vessel patterns look excessively uniform, and fractures present with unusual cleanliness — often limited to a single side of the bone.

This artificial perfection is both the deepfake’s most obvious flaw and, paradoxically, one of the hardest things for human eyes to register. Clinicians are trained to look for pathology, not for the subtle over-tidiness that AI generation produces. Consequently, the very quality that makes these images suspicious is also what makes them easy to overlook in a busy diagnostic workflow.

The Real-World Risks of Synthetic Medical Images

The implications of this research extend well beyond academic interest. Deepfake X-rays pose concrete risks across multiple areas of healthcare and law.

Fraudulent Litigation

A fabricated fracture that radiologists cannot distinguish from a real one creates a direct pathway for fraudulent personal injury or medical malpractice claims. As the study notes, this represents a high-stakes legal vulnerability — one that existing evidence standards in courts are not equipped to address.

Contaminated AI Training Data

Synthetic images that infiltrate medical databases could corrupt the training datasets used to build future diagnostic AI tools. If AI systems learn from fake images, their diagnostic outputs become less reliable over time. This is a systemic risk that compounds with every new model trained on contaminated data.

Compromised Clinical Diagnoses

Beyond fraud, deepfake X-rays could mislead treating physicians into making incorrect diagnoses or treatment decisions. In emergency or high-volume settings, where rapid image review is standard, the risk of a missed or misidentified synthetic image is especially acute.

How Experts Propose to Fight Back

Researchers are not without solutions. However, implementing them at scale requires coordinated action across healthcare institutions, technology developers, and regulatory bodies.

Digital Watermarking

One of the most promising safeguards involves embedding invisible watermarks directly into medical images at the point of capture. These watermarks would carry ownership or identity data, making it possible to verify image authenticity without visual inspection alone. Additionally, attaching cryptographic signatures linked to individual imaging technologists at the moment of capture would create a verifiable chain of custody for every scan.

Detection Training for Clinicians

The study itself includes a training component designed to sharpen radiologists’ ability to identify synthetic images. More broadly, deepfake detection must become part of radiology education — both in residency programmes and in continuing medical education for practising specialists.

Automated Detection Systems

Beyond human training, the field needs automated tools specifically built to identify AI-generated medical images. Given that even the models that create deepfakes cannot reliably catch all their own outputs, purpose-built detection systems — distinct from general-purpose LLMs — represent a critical gap that must be filled.

Dr. Tordjman concluded with a clear warning: “We are potentially only seeing the tip of the iceberg. The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI.” As those technologies mature, the detection challenge will only intensify — making early investment in safeguards more urgent than ever.

Recent Posts

India’s Healthcare Billionaires Rise

Australia’s Life Sciences Innovation Expands

AI Deepfake X-Rays Fool Doctors and Machines Alike

A New Threat in Medical Imaging