AI-powered imaging analysis for radiology is medical device software under MDR whenever its intended purpose matches Article 2(1). Classification sits under Annex VIII Rule 11, and most radiology AI products land at Class IIa or IIb depending on the clinical severity of the decisions the output supports. MDCG 2019-11 Rev.1 (June 2025) provides the authoritative software qualification and classification guidance. The Notified Body expects a full EN 62304:2006+A1:2015 software lifecycle, an EN ISO 14971:2019+A11:2021 risk file tailored to AI-specific failure modes, a representative independent test set with subgroup performance, a reader study that matches the deployed workflow, and a PMCF plan with active drift monitoring. None of this is optional and none of it can be faked.
By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.
TL;DR
- AI radiology software is the single most common Software as a Medical Device category in the EU in 2026, and also the one where the Notified Body has the clearest expectations.
- Qualification is decided by intended purpose under MDR Article 2(1). If the software outputs information used for diagnosis, monitoring, prediction, or treatment, it is a medical device, regardless of whether a radiologist reviews every output.
- Classification runs through Annex VIII Rule 11. Routine imaging review support typically lands at Class IIa; severe-condition contexts. Stroke, cancer, intracranial haemorrhage, pulmonary embolism. Push to Class IIb.
- Evidence expectations include sensitivity and specificity at the shipped operating point on a representative independent test set, subgroup performance across scanner vendors and demographics, and a reader study that matches the actual clinical workflow.
- Bias in training data is the single most common reason radiology AI products fail in the field and the single most scrutinised topic in Notified Body reviews from 2024 onward.
- PMCF is not paperwork. For radiology AI, the PMCF plan has to detect input-distribution drift and performance degradation before patients are harmed, not after.
The radiology AI landscape in 2026
Radiology is where AI in medicine reached commercial traction first, and it is still where the largest number of CE-marked AI medical devices sit. Chest X-ray triage, head CT haemorrhage detection, pulmonary embolism flagging on chest CT, breast density assessment on mammography, bone-age estimation on paediatric wrist X-rays, lung nodule detection and tracking, prostate MRI lesion segmentation. The category is deep and the clinical literature is thick.
For a founder building radiology AI in 2026, that maturity is good news and bad news at the same time. The good news is that the regulatory expectations are relatively well understood. A Notified Body reviewing a radiology AI file is not working out the framework for the first time. The bad news is that the bar has been set by the products already on the market, and a new entrant is compared against that bar on sensitivity, specificity, reader study design, subgroup performance, and post-market behaviour. "We are an early-stage startup" does not move the bar.
This post is the practical walk-through of what the MDR and its guidance actually require for radiology AI, where most startups trip, and how to organise the work so a Notified Body review is a defensible conversation rather than an expensive rewrite.
Qualification under Article 2(1)
MDR Article 2(1) defines a medical device by intended purpose. Software qualifies as a medical device when the manufacturer intends it to be used for a medical purpose listed in Article 2(1). Diagnosis, prevention, monitoring, prediction, prognosis, treatment, or alleviation of disease, among the other categories.
Radiology AI almost always qualifies. The product exists because it analyses clinical images to extract information a radiologist will use in a diagnostic decision. The fact that a radiologist reviews the output does not remove the device from the regulatory scope. MDCG 2019-11 Rev.1 (June 2025) is clear that software providing information used to take decisions for diagnostic or therapeutic purposes is medical device software, regardless of the human review step downstream.
The rare cases where a radiology AI product might sit outside the medical device scope are narrow. A research-only tool with no clinical claim, a purely administrative workflow tool that routes images without interpreting them, an image compression utility with no diagnostic output. These can sit outside MDR if the intended purpose genuinely matches. Most commercial radiology AI products cannot credibly describe themselves that way because the marketing, the clinical literature, and the reimbursement story all point back to clinical use, and MDCG 2019-11 treats promotional materials as evidence of intended purpose.
The practical move for a founder is to write the intended purpose down, read it carefully, and ask whether every sentence holds up when a Notified Body reviewer reads the website alongside it. If those two sources diverge, the Notified Body will go with whichever places the device in the higher-stakes category. For the broader foundations, see our posts on what Software as a Medical Device means under MDR and on image-based SaMD qualification.
Rule 11 classification
Once a radiology AI product qualifies as a medical device, Annex VIII Rule 11 drives the class.
Rule 11 classifies software intended to provide information used to take decisions with diagnosis or therapeutic purposes. The default is Class IIa. The class moves up to IIb when those decisions can cause serious deterioration of health or a surgical intervention. The class moves up to III when those decisions can cause death or irreversible deterioration of health. Software intended to monitor physiological processes sits at IIa or IIb depending on criticality. The residual "other" category is Class I.
For radiology AI, the class follows the clinical severity of what the output feeds into. A product that supports routine diagnostic read-outs. Body composition measurement from CT, bone-age estimation from a wrist X-ray, non-urgent finding highlighting. Typically lands at Class IIa. A product that supports decisions where a missed finding changes acute management. Large vessel occlusion detection in suspected stroke, pulmonary embolism flagging, intracranial haemorrhage detection, cancer screening and tracking. Lands at Class IIb. Class III for radiology AI is rare, because a radiologist is almost always in the loop, but it is not impossible in fully autonomous triage designs where the AI output drives immediate intervention with no human backstop.
MDCG 2019-11 Rev.1 is the authoritative interpretation and worked-example document. Read it alongside the Regulation, not instead of it. Notified Body involvement is required from Class IIa upward under MDR Article 52, and no realistic radiology AI product self-certifies as Class I. Our deeper walk-throughs on MDR classification Rule 11 for software and on radiology AI workflows under MDR cover the boundary cases.
Clinical performance evidence
The Notified Body expects clinical performance evidence layered in a specific way for radiology AI.
Analytical performance on a representative independent test set. The test set has to be genuinely independent of the training data. Different institutions, different scanners where possible, different acquisition protocols. It has to match the intended use population on age, sex, disease prevalence, and clinically relevant subgroups. It has to be large enough to produce sensitivity and specificity estimates with confidence intervals that mean something. A test set drawn from the same hospital and the same scanner fleet as the training data is not an independent test set, and a Notified Body reviewer will say so.
Pre-specified operating point. The product ships at a specific threshold, and the clinical evidence has to be reported at that threshold. ROC curves that show "the model could operate anywhere" do not answer the question of how the shipped device performs. The technical file has to lock the operating point and report sensitivity, specificity, positive predictive value, and negative predictive value at that point, with confidence intervals.
Subgroup performance. Performance broken down by age, sex, scanner vendor, acquisition protocol, disease severity, and any other subgroup where clinical importance exists. An aggregate 95% sensitivity that hides a 70% sensitivity on small lesions is not a 95% sensitive product for a radiologist looking for early disease. The technical file has to surface this, and the labelling has to account for it.
Reader study matching the deployed workflow. For products used by a radiologist reviewing the output, the evidence expected is a reader study showing that the radiologist-plus-AI workflow performs at least as well as the unaided radiologist baseline. Preferably better. The design has to match deployment. A sequential-read study does not justify a concurrent-read deployment, and a single-reader study does not justify a double-read workflow.
Clinical evaluation under MDR Article 61 and Annex XIV. The layers above have to be structured into a clinical evaluation that argues conformity with the relevant GSPRs in Annex I. Literature, equivalence, and own investigation are the three sources. For radiology AI, own investigation is almost always the centre of the evidence because the specific model is new and equivalence to a different AI product is difficult to demonstrate in a way a Notified Body will accept.
Bias in imaging datasets
Bias is where radiology AI products fail most often in the field, and it is where Notified Body scrutiny has sharpened most visibly from 2024 onward.
The mechanism is simple. A model learns from the training data distribution. If the training data under-represents a patient subgroup, an imaging protocol, a scanner vendor, or a disease presentation, the model's behaviour on that subgroup in the field is not predictable from the aggregate validation numbers. A chest X-ray model trained predominantly on one vendor's scanner can degrade measurably on another vendor's images. A skin lesion model trained predominantly on lighter skin tones can miss findings on darker skin tones. A paediatric model validated on older children can fail on infants. These are not edge cases. They are the normal failure mode for supervised learning on real clinical data.
The MDR obligation to manage this risk already exists. Annex I Section 17 requires software to be developed in accordance with the state of the art, taking into account risk management and the software development lifecycle. EN ISO 14971:2019+A11:2021 is the harmonised risk management standard, and it expects the manufacturer to identify hazards, estimate and evaluate risks, and control risks. For a radiology AI product, the hazards explicitly include dataset bias, distribution shift between training and deployment populations, adversarial robustness gaps, and the complacency failure mode where clinicians stop scrutinising AI output after repeated correct results.
The practical file content a Notified Body expects on this topic includes: a documented training dataset composition (size, sources, demographic breakdown, scanner and protocol distribution); a representativeness analysis comparing the training distribution to the intended use population; an independent test set drawn from different institutions where possible; subgroup performance analysis with honest reporting of where the model is weaker; and labelling or indications that exclude subgroups the evidence does not support. A product that markets broadly and validates narrowly is not certifiable. A product that markets narrowly to match its validation is.
For the companion post, see our deep dive on training data governance for AI medical devices.
PMCF expectations
Post-Market Clinical Follow-up is not a line item. For radiology AI, it is the mechanism that detects degradation in the field before it harms patients, and the Notified Body reads the PMCF plan accordingly.
A classical radiology device deployed in the field does not change its behaviour because the field changed. An AI radiology product can effectively change its behaviour even when the model weights are frozen, because the distribution of inputs drifts. A hospital replaces a scanner. A protocol parameter changes. The patient mix shifts seasonally or demographically. The model has not moved, but its effective accuracy in that deployment has.
A defensible PMCF plan for a radiology AI product has to include active drift detection on input distributions (image statistics, metadata, scanner vendor mix), monitoring of model outputs (prediction distributions, confidence histograms), monitoring of clinical outcomes where accessible (concordance with final radiology reports, pathology follow-up, incident reports), and defined thresholds that trigger investigation and, where needed, a field action. Passive complaint handling is not a PMCF plan for radiology AI. It is a way to find out about failures after they have happened.
MDR Articles 83-86 require every manufacturer to have a PMS system proportionate to the risk class and appropriate for the device. For radiology AI, "appropriate" translates into the drift detection and monitoring architecture described above. Our deeper post on post-market surveillance for AI medical devices walks through the operational patterns.
Common mistakes startups make
- Validating on the same institution's data the training set came from. Generalisation is the whole point of an independent test set, and a Notified Body reviewer will ask for scanner vendor, acquisition protocol, and institutional diversity in the test set the first time they open the file.
- Reporting aggregate numbers without subgroup analysis. Aggregate 95% sensitivity is a marketing number. The Notified Body cares about where the model is weakest, not where it is strongest.
- Designing a reader study that does not match the deployed workflow. A sequential-read MRMC study is not evidence for a concurrent-read deployment. Pick the design that matches how the product will actually be used and size it for the workflow question.
- Under-specifying the intended use population to keep the product broadly sellable. The Notified Body will either demand evidence that covers the broader population or label the product down to the narrower population the evidence actually supports. There is no third option.
- Treating EN 62304 as paperwork. Software safety classification, traceability from requirements to tests, and configuration management are what allow a Notified Body to trust the validation evidence. A weak lifecycle file undermines strong clinical data.
- Writing a PMCF plan that is passive complaint handling. Drift detection is not optional for radiology AI. It is the mechanism that catches silent degradation, and a PMCF plan without it fails the "appropriate for the device" test under MDR Article 84.
- Assuming a CE marked predecessor in an adjacent modality transfers evidence automatically. A head CT haemorrhage model does not inherit the validation of a chest CT embolism model, even from the same manufacturer. Each intended purpose needs its own evidence.
The Subtract to Ship angle
The Subtract to Ship framework for MDR runs through radiology AI with the same four passes.
The Purpose Pass writes the intended purpose narrowly enough to match the validation evidence and no broader. A product that is honestly a "screening-assist tool for pulmonary nodules on low-dose chest CT in adult lung cancer screening populations" has a narrower and cheaper evidence burden than a general "chest imaging analysis platform." Narrowing the intended purpose is legitimate subtraction and it works.
The Classification Pass walks Rule 11 carefully. A routine radiology support tool can defensibly sit at Class IIa. Pushing it to IIb because "radiology sounds serious" is subtraction failure in the other direction. Class follows clinical severity and workflow role, not vibes.
The Evidence Pass asks what the minimum defensible clinical evidence looks like. For radiology AI, that is typically one well-designed reader study on an independent test set, with subgroup analysis and a pre-specified operating point, rather than three half-finished studies running in parallel. One defensible chain of argument beats three weak ones.
The Operations Pass asks what the QMS, PMS, and drift detection stack looks like specifically for a radiology AI product. The answer is a QMS that includes dataset governance, model versioning, drift monitoring, and revalidation triggers on top of the EN ISO 13485 backbone. A startup that bolts these onto a QMS template written for hardware devices will find the seams the first time a Notified Body asks a pointed question.
Reality Check. Where do you stand?
- Can you state the intended purpose of your radiology AI product in one sentence that a Notified Body reviewer and your marketing page both agree on?
- Have you classified the product under Annex VIII Rule 11 with the specific sub-clause documented and the severity rationale written down?
- Is your test set genuinely independent of the training set. Different institutions, different scanners, different protocols where possible?
- Do you report sensitivity and specificity at a pre-specified operating point, with confidence intervals, on an intended-use-population prevalence?
- Have you broken performance down by the subgroups where clinical importance is highest (small findings, edge demographics, underrepresented scanners and protocols)?
- Does your reader study design match the deployed clinical workflow, not the cheapest design you could run?
- Does your EN ISO 14971 risk file explicitly cover dataset bias, distribution drift, adversarial robustness, and clinician complacency as hazards?
- Does your PMCF plan include active drift detection with defined thresholds, not passive complaint handling?
- Is every activity in your regulatory plan traceable to a specific MDR article, annex, or harmonised standard requirement?
Frequently Asked Questions
Is AI radiology software a medical device under MDR? Almost always yes. When the intended purpose is to analyse clinical images to support diagnostic or therapeutic decisions, the software meets the Article 2(1) definition, and MDCG 2019-11 Rev.1 confirms that the presence of a radiologist reviewing the output does not remove the device from the regulatory scope.
What class is radiology AI under MDR? Most radiology AI products fall under Annex VIII Rule 11 at Class IIa or Class IIb. Routine diagnostic support sits at IIa. Severe-condition contexts. Stroke, cancer, intracranial haemorrhage, pulmonary embolism. Push to IIb. Class III is rare because a radiologist is almost always in the loop. Class I is effectively non-existent for diagnostic radiology AI.
Do I need a Notified Body for a radiology AI product? Yes. Class IIa and above require Notified Body involvement under MDR Article 52, and radiology AI almost never sits at Class I. Self-certification is not a realistic path for this category.
Do I need a prospective clinical investigation, or is a retrospective reader study enough? It depends on the class, the intended purpose, and the novelty of the clinical claim. Many radiology AI products ship with a retrospective reader study on an independent dataset, combined with literature and the manufacturer's own analytical performance evidence. Prospective investigation is required when the retrospective evidence does not adequately address the clinical question. The Notified Body is the referee on sufficiency.
How does the Notified Body assess training data quality? By reading the dataset documentation, the representativeness analysis, the independent test set composition, and the subgroup performance breakdown. The Notified Body looks for honest reporting of where the model is weaker and for labelling that matches what the evidence actually supports. A training data file that cannot answer the question "who is this model worse for?" is a file with a gap.
Can I update my radiology AI model after CE marking? Under a defined change control envelope, yes. The technical documentation has to specify in advance what kinds of updates fall inside the envelope (for example, retraining on additional data from the same distribution without changing the architecture or the intended purpose) and what kinds trigger a new conformity assessment. Silent continuous learning without a defined envelope is not supported by a clean pathway under MDR today.
What is the single most common reason radiology AI files fail Notified Body review? Validation on a dataset that does not represent the intended use population, combined with no subgroup analysis. The reviewer cannot tell where the model is weak, and so cannot accept the claim that the model is safe for the claimed population. The fix is on the evidence side, not the rhetoric side.
Related reading
- Image-Based SaMD Qualification Under MDR – the qualification foundation specific to imaging products.
- Radiology Workflow AI Under MDR – the companion post on workflow integration and reader study design.
- AI in Medical Devices Under MDR: The Regulatory Landscape – the pillar post for the AI category this post sits inside.
- Machine Learning Medical Devices Under MDR – the ML development view that pairs with this imaging-specific post.
- Classification of AI and ML Software Under Rule 11 – the practical walk-through of Rule 11 for AI products.
- Computer-Aided Detection (CADe) Under MDR – the detection-specific category that overlaps heavily with radiology AI.
- Computer-Aided Diagnosis (CADx) Under MDR – the characterisation-specific companion to CADe.
- Autonomous Diagnostic AI Under MDR – the neighbouring category for products that remove the human from the loop.
- The Subtract to Ship Framework for MDR Compliance – the methodology that runs through every post in this blog.
Sources
- Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 2(1) (definition of medical device) and Annex VIII Rule 11 (classification of software). Official Journal L 117, 5.5.2017.
- MDCG 2019-11 Rev.1. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745. MDR and Regulation (EU) 2017/746. IVDR, October 2019, Revision 1 June 2025.
- EN 62304:2006 + A1:2015. Medical device software. Software life-cycle processes.
- EN ISO 14971:2019 + A11:2021. Medical devices. Application of risk management to medical devices.
This post is part of the AI, Machine Learning and Algorithmic Devices category in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. Radiology AI is the most mature corner of AI in medicine, and the Notified Body bar reflects that maturity. If your radiology AI product sits at a boundary the general framing here does not resolve, that is exactly the point where a sparring partner who has walked other radiology AI founders through the same Notified Body conversation earns their keep.