Synthetic data is acceptable for AI medical device development in specific roles: filling gaps, augmenting rare classes, stress-testing on edge cases, and enabling early verification. Synthetic data is not a substitute for clinical evidence under MDR Article 61 and Annex XIV. The generator itself is part of your medical device software pipeline and must be validated. There is no formal EU guidance yet, so expectations follow state of the art and notified body judgement.
By Tibor Zechmeister and Felix Lenhard.
TL;DR
- Synthetic data has legitimate roles: augmentation of under-represented classes, edge-case generation, verification of system behaviour, and privacy-safe development environments.
- Synthetic data does not replace real clinical data for MDR Article 61 clinical evaluation. Clinical evidence means data on the actual device in representative conditions.
- The synthetic data generator is part of your development pipeline and, depending on its influence on the released device, is subject to validation and documentation.
- There is currently no formal MDCG guidance specifically on synthetic data. Notified bodies apply state of the art and risk-based judgement.
- Notified bodies will ask three questions: what is the generator, how do you know the synthetic data is realistic enough for its intended use, and where does real data carry the evidence load?
- Document the synthetic data role explicitly in your technical file. Do not let it leak into clinical evaluation by accident.
Why this matters
Synthetic data is seductive for exactly the reasons you would expect. Real medical data is expensive, slow to access, legally constrained, and often imbalanced. The tumour you care about shows up in 2% of cases. The failure mode you need to cover appears twice in your entire dataset. A generative model offers an answer: synthesise more.
The problem is that the regulator does not care whether your data was easy or hard to get. The regulator cares whether the device is safe and performs as claimed. If you train on synthetic images of a rare lesion and then claim your device detects it reliably in real patients, your notified body will ask how you know. If your answer is "our synthetic generator produces very realistic examples", the conversation is going to get long.
This post separates the legitimate uses of synthetic data from the illegitimate ones, walks through what a notified body is actually looking for, and gives a Subtract to Ship playbook for documenting synthetic data in your technical file.
What MDR actually says
MDR is silent on synthetic data as a specific topic. There is no article that says "synthetic data is allowed" or "synthetic data is forbidden". What MDR does say is much more demanding: clinical evidence must be sufficient to demonstrate conformity with relevant general safety and performance requirements.
MDR Article 61(1) requires the manufacturer to "specify and justify the level of clinical evidence necessary to demonstrate conformity with the relevant general safety and performance requirements". Article 61(3) requires that the clinical evaluation follow a defined and methodologically sound procedure. Annex XIV Part A sets out the clinical evaluation process: planning, identification of available clinical data, appraisal, analysis, and conclusion on benefit-risk.
The key phrase is "clinical data". MDR Article 2(48) defines clinical data as information concerning safety or performance generated from the use of a device. Synthetic data, by definition, is not data from use of a device. Therefore synthetic data does not count as clinical data for Article 61 purposes.
This is not a technicality. It is the central point. Synthetic data can support many activities in your development pipeline, but it cannot fill the clinical evidence box.
For the software lifecycle, Annex I §17.2 requires development in accordance with the state of the art including risk management, verification, and validation. EN 62304:2006+A1:2015 defines the lifecycle processes. EN ISO 14971:2019+A11:2021 governs risk management. These standards do not prohibit synthetic data either. What they require is that you know what you used, why, and that you validated it for its role.
There is no MDCG document specifically on synthetic data as of this writing. Notified bodies rely on the state of the art principle and their own assessment. Expect expectations to harden as the field matures.
A worked example
A Class IIb SaMD startup is building AI software to detect pulmonary embolism on CT pulmonary angiography. Intended purpose: "To provide information to radiologists to support the detection of pulmonary embolism in adult patients undergoing CT pulmonary angiography." Under Rule 11, Class IIb because a missed PE could contribute to serious deterioration.
The team has access to 8,000 real CT studies from two hospital partners. Of these, 600 are positive for PE, heavily skewed toward proximal emboli. Sub-segmental PEs are rare and under-represented. The team considers using a generative model to synthesise additional sub-segmental PE cases.
Legitimate uses of synthetic data in this scenario: - Augmentation during training. The team uses synthetic sub-segmental PEs as part of the training set to help the model learn to recognise under-represented patterns. This is documented in the training data section of the technical file with a clear audit trail: generator version, number of synthetic cases, how they were injected into training. - Stress testing. The team generates synthetic edge cases (motion artefacts, low contrast) to probe where the model fails. This feeds risk management and drives mitigations. - Early system verification. Before real data is available, the team runs unit and integration tests on synthetic inputs to verify that the software pipeline behaves correctly under EN 62304. - Development environment. Developers work on synthetic data on their laptops without touching patient records, which simplifies data governance.
Illegitimate uses in this scenario: - Counting synthetic cases in clinical validation. The final performance study must use real, representative CT studies in the intended clinical setting. Adding 2,000 synthetic sub-segmental PEs to the validation set inflates apparent performance on a class the device has not actually been proven to detect in real patients. - Substituting for missing clinical data. If sub-segmental PEs are under-represented in real data, the honest answer is to either narrow the intended purpose to exclude them, acquire more real cases, or accept lower evidence for that sub-group. Synthesising the evidence does not fill the gap. - Leaving the generator undocumented. The generative model used to produce training data is part of the development pipeline. Its version, training data, known failure modes, and validation status must be recorded. If a notified body cannot tell whether the synthetic PEs look like real PEs, the synthetic cases might be teaching the model the wrong thing.
The team's technical file ends up with a clear synthetic data section: what was used, in what role, how the generator was validated, and an explicit statement that clinical performance claims rest on real data only. The clinical evaluation report references the synthetic data role in training but bases its conclusions on real-data performance measured on a held-out set that mirrors the intended population.
The Subtract to Ship playbook
1. Separate roles explicitly. In your technical file, state which datasets are used for training, verification, internal testing, and clinical validation. For each, state whether it is real, synthetic, or a mix, and in what proportion. Reviewers should not have to reverse-engineer this.
2. Never let synthetic data into clinical validation. The final performance claim must rest on real data in the intended population. If you do not have enough real data, narrow the intended purpose, raise more money, or wait. Do not fake evidence.
3. Validate the generator. If your generator matters, it needs a validation file. What was it trained on? How was it evaluated? What are its known failure modes? A generative model that produces medically implausible lesions is a training data poison, not an augmentation tool. Radiologist review of random synthetic samples, ideally blinded, is a reasonable minimum.
4. Characterise realism and diversity. Run quantitative checks: distribution of image statistics, domain classifier performance, clinical plausibility review. Document the results. "We visually inspected some samples" is not enough.
5. Tie synthetic data to specific risks. In your risk management file under EN ISO 14971, identify the hazards that synthetic data introduces (distribution drift between synthetic and real, over-fitting to synthetic artefacts) and the mitigations. Reviewers will specifically look for this.
6. Keep version control. Generator version, prompt or conditioning setup, random seeds where possible, and output dataset versions must be traceable. If a reviewer asks "which synthetic dataset was used to train the model that you submitted", you should be able to answer in under a minute.
7. State the limits in your clinical evaluation report. If synthetic data was used for training only, say so. If certain sub-groups are supported by synthetic augmentation in training but validated on limited real data, state that limitation. Honesty in the CER is a credibility signal.
8. Monitor for real-world drift post-market. Under MDR Article 83, post-market surveillance must feed back into the clinical evaluation. For devices trained partly on synthetic data, pay particular attention to whether real-world performance matches the validation claims. This is where sloppy use of synthetic data tends to catch up.
The core discipline: synthetic data is a development tool. It is not evidence. Treat it that way in your documentation and the conversation with your notified body becomes tractable.
Reality Check
- Can you state, for each dataset in your pipeline, whether it is real, synthetic, or mixed?
- Is the synthetic data generator a controlled item with version, training record, and validation file?
- Does your clinical evaluation report rest exclusively on real-data performance in the intended population?
- Have you identified synthetic-data-specific hazards in your risk management file?
- Can you reproduce a specific synthetic training set from recorded generator version and configuration?
- Has a clinician reviewed a random sample of synthetic data for clinical plausibility, with results recorded?
- Do your intended purpose claims match the sub-groups where real data provides the evidence?
- Does your post-market plan include drift monitoring that could surface unrealistic synthetic training artefacts?
Any "no" is a gap to close before submission.
Frequently Asked Questions
Is synthetic data explicitly allowed under MDR? MDR neither allows nor forbids it. What MDR requires is sufficient clinical evidence and a state-of-the-art development process. Synthetic data can support development; it cannot substitute for clinical data under Article 61.
Can I use synthetic data to cover rare classes in my training set? Yes, as augmentation, if you document the generator, validate it, and do not rely on synthetic data for clinical validation of those classes. If real data for a class is truly unavailable, consider narrowing the intended purpose.
My generator is a large foundation model I did not train. Do I still need to validate it? Yes. You are responsible for the outputs you use in your pipeline. Document the model, its version, how it was prompted or configured, and validation evidence for the outputs you relied on.
Can I count synthetic data toward sample size in my clinical evaluation? No. Clinical evaluation under Article 61 requires clinical data as defined in Article 2(48), which is data from use of a device. Synthetic data does not meet that definition.
Is there MDCG guidance on synthetic data? Not specifically as of this writing. Notified bodies apply state of the art and risk-based judgement. Expect formal guidance to develop over time.
How do I explain synthetic data use in my technical file? A dedicated subsection in the software documentation covering role, generator identity and validation, datasets produced, and how synthetic data does and does not contribute to each claim. Cross-reference from risk management and clinical evaluation.
Related reading
- Training data requirements for AI medical devices — expectations for dataset documentation under MDR.
- Data quality and bias in AI medical devices — why representativeness matters more than sheer volume.
- Clinical evaluation for AI/ML medical devices — how clinical evidence works for software devices.
- Rule 11 classification for AI/ML software — how intended purpose drives class.
- Sufficient clinical evidence: an auditor's perspective — what "sufficient" actually means in practice.
Sources
- Regulation (EU) 2017/745 on medical devices, consolidated text. Article 2(48), Article 61, Annex I §17.2, Annex XIV Part A.
- EN ISO 14971:2019+A11:2021 — Medical devices — Application of risk management to medical devices.
- EN 62304:2006+A1:2015 — Medical device software — Software lifecycle processes.
- MDCG 2020-5 (April 2020) — Clinical evaluation equivalence.
- MDCG 2019-11 Rev.1 (June 2025) — Qualification and classification of software.