Machine Learning Medical Devices: How MDR Applies to Adaptive Algorithms

Quick Summary

How MDR Rule 11, intended purpose, and significant-change rules apply to machine learning medical devices and adaptive algorithms.

The MDR was written for static products. Machine learning models are not static. The way founders bridge that gap is by fixing the intended purpose tightly, freezing the model for each release, and treating every meaningful retraining as a potential significant change under Article 52. The regulation does not forbid learning. It forbids unmonitored learning on the market.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

MDR classifies software by Annex VIII Rule 11; most diagnostic or therapy-driving ML is Class IIa or higher.
The regulation assumes a device is the same today as when it was certified. Adaptive models break that assumption.
The practical pattern that works today is a locked model per release, with retraining handled as a controlled change.
"Significant change" (MDR Article 52, Annex IX/X) decides whether retraining triggers a new conformity assessment.
Intended purpose under Article 2(12) is the lever: narrow it, and you define what the model is allowed to do.
MDCG 2019-11 Rev.1 (June 2025) is the authoritative reading for software qualification and Rule 11 application.

Why this matters

A founder walked into a review last year with a beautiful diagnostic model. It got better every week because it retrained nightly on new hospital data. The demo was impressive. The problem: the device that had been placed on the market on Monday was, by technical definition, a different device on Friday. No one had described that drift in the technical documentation. No one had defined the envelope inside which the model was allowed to change. The notified body did not accept the submission.

This is the core tension. MDR treats a medical device as a fixed artifact with a fixed intended purpose, placed on the market in a defined state. Machine learning is a moving target. The gap between those two realities is where most AI MedTech startups stumble.

The good news is that the MDR framework has enough flex to accommodate machine learning. If you understand which levers the regulation actually gives you.

What MDR actually says

Article 2(12). Intended purpose. "Intended purpose means the use for which a device is intended according to the data supplied by the manufacturer on the label, in the instructions for use or in promotional or sales materials or statements and as specified by the manufacturer in the clinical evaluation."

This matters because intended purpose is not a marketing statement. It is a regulatory boundary. Whatever the model learns to do outside that boundary is, legally, a new device.

Annex VIII Rule 11. Software intended to provide information which is used to take decisions with diagnostic or therapeutic purposes is Class IIa; if those decisions may cause death or irreversible deterioration of a person's state of health, it is Class III; if they may cause serious deterioration or a surgical intervention, Class IIb. Software intended to monitor physiological processes is Class IIa (Class IIb if vital physiological parameters where variations could result in immediate danger). All other software is Class I.

For ML devices this means the honest question is: what decisions does the output drive, and how bad is it if the output is wrong? That is a classification question before it is an engineering question.

Article 52 and Annex II/IX. Any change to a certified device that could affect compliance with the general safety and performance requirements, or affect the conditions under which the device was authorised, is a change that must be assessed and may require a new conformity assessment. The technical documentation (Annex II) must describe the device, including its software, completely enough that a reviewer can understand what it does and what has changed.

MDCG 2019-11 Rev.1. The guidance on qualification and classification of software under MDR. The revision (June 2025) is the current reference for Rule 11 interpretation, including examples closer to how modern ML software actually behaves.

Nothing in the MDR says "machine learning is forbidden." Nothing says "the model must never change." What the MDR says is: the device you place on the market must be the device you documented, and changes that affect safety or performance must be controlled.

Key Takeaway

Here is what passes a notified body review and what does not.

A worked example

A startup builds a chest X-ray triage tool. The model flags suspected pneumothorax so that the radiologist reviews those studies first. The model is Class IIa under Rule 11. It provides information used for diagnostic purposes, and a wrong output does not, by itself, cause irreversible harm because the radiologist still reads every image.

The founder wants continuous improvement. The team retrains weekly on anonymised hospital data, measures performance, and pushes a new model if it beats the previous one on a held-out benchmark.

Here is what passes a notified body review and what does not.

What passes: a locked model per release. Release v1.3 is a specific weights file with a specific training cut-off, specific training data characteristics, specific performance envelope, and specific intended purpose ("prioritisation of adult chest radiographs for suspected pneumothorax in emergency department workflows"). The technical documentation describes that release. Change control treats the retraining cycle as a defined process: data curation, validation against a frozen test set, performance thresholds, risk review. When the team decides to ship v1.4, they run a significant-change assessment before it goes live.

What does not pass: the production system silently retraining against live data, with no frozen version, no predefined performance envelope, and no change log that a reviewer can audit. Under MDR, that is an uncontrolled device.

The lever the founder has is intended purpose. If they write the intended purpose as "AI-powered radiology assistant," they have given themselves no boundary. If they write it as the narrow sentence above, they know exactly when a change has broken out of scope. Any retraining that expands the intended population, the clinical question, or the performance claim is no longer a routine update; it is a new device.

The significant-change question under Article 52 then becomes concrete. A retraining that keeps the intended purpose, stays inside the validated performance envelope, uses data consistent with the original training distribution, and is verified against the frozen test set is a controlled change within the existing certification. A retraining that pushes beyond any of those boundaries is a significant change and needs the notified body.

The Subtract to Ship playbook

Most AI MedTech founders try to protect the dream of a fully adaptive model and end up with neither compliance nor a shippable product. Subtract to Ship says: pick the smallest thing that works, and make the regulatory story straightforward.

Step 1. Narrow the intended purpose until it bites. One population, one clinical question, one output, one deployment context. If it feels uncomfortably specific, you are probably close. The test: can a reader tell from your intended purpose whether a specific retraining is inside scope or outside scope? If yes, you have a usable boundary. If no, rewrite.

Step 2. Lock the model for the release that ships. "Locked" here means the weights and the inference pipeline are fixed, versioned, and stored. The device that is CE-marked is that specific artefact. Retraining happens in your development environment, not in production.

Step 3. Classify honestly under Rule 11. MDCG 2019-11 Rev.1 has examples. Read them against your actual use case. Most ML diagnostic tools land Class IIa. Triage and workflow tools can sometimes stay Class I if the output genuinely drives no diagnostic or therapeutic decision, but this is a narrow path and notified bodies treat it with scepticism.

Step 4. Build the change-control process before you need it. Define your performance envelope (sensitivity, specificity, subgroup metrics, drift thresholds). Define the frozen validation set. Define the criteria that trigger a significant-change review versus a routine update. Write it down in the QMS before you ship v1.0, not after.

Step 5. Wire MDR to the standards that cover software. EN 62304:2006+A1:2015 for the software lifecycle. EN ISO 14971:2019+A11:2021 for risk management including model-specific risks (data drift, population shift, adversarial inputs). EN IEC 81001-5-1:2022 for cybersecurity of health software. These are the standards the MDR expects you to use for SaMD.

Step 6. Treat the clinical evaluation as the honest part of the file. Your model performs well on the data you trained on. The clinical evaluation has to argue that this performance holds on the population the device is intended for, in the clinical workflow the device is intended for. If the answer is "we do not know yet," that is what PMCF is for.

The point of this playbook is not to slow the team down. It is to prevent the situation where a shipping product has to be pulled because retraining quietly pushed it outside its certified envelope.

Reality Check

Can you write your intended purpose in one sentence that names the population, the clinical question, and the output?
For your current production model, can you retrieve the exact weights file that corresponds to release v1.0?
Do you have a frozen validation set that never sees training data?
Is there a written, pre-agreed performance envelope below which a new release cannot ship?
If you retrained tomorrow on a new data source, would your QMS tell you whether that is a significant change under Article 52?
Have you classified under Rule 11 with reference to MDCG 2019-11 Rev.1, not from memory?
Does your risk file (EN ISO 14971) include ML-specific risks. Drift, population shift, training data bias?
Can a notified body reviewer trace any output of the device back to a specific model version, on a specific date, against a specific input?

If more than two of these are uncertain, the device is not yet ready to submit.

Frequently Asked Questions

Can a machine learning medical device be certified under MDR at all?

Yes. Many already are. The path that works today is a locked model per release, with clear change control. Fully autonomous continuous learning in production is not something notified bodies currently accept without a very specific and narrow justification.

Is my triage tool Class I or Class IIa?

If the output is used to drive a diagnostic or therapeutic decision, Rule 11 pushes it to Class IIa at minimum. MDCG 2019-11 Rev.1 has the current examples. Do not rely on memory here. The revision changed several interpretations.

Does retraining always require notifying the notified body?

No. Retraining that stays inside your validated performance envelope, uses in-distribution data, and does not expand the intended purpose is a controlled change under your QMS. Retraining that changes any of those is a significant change under Article 52 and must be assessed.

What about the AI Act?

The AI Act applies in parallel to the MDR for medical-device AI. It adds obligations rather than replacing MDR obligations. See the related post on AI Act classification overlap for the current picture.

Do we need clinical investigations for an ML device?

Often not, if a strong clinical evaluation with real-world validation data and PMCF will do the job. Class IIb and III and devices without adequate existing evidence are a different conversation.

How specific does the intended purpose need to be?

Specific enough that a reasonable reader can tell whether a particular retraining or use case falls inside or outside scope. Vague intended purposes are the single biggest unforced error in AI MedTech submissions.

Locked vs adaptive AI algorithms under MDR, the foundational distinction behind every ML regulatory decision.
Continuous learning AI under MDR in 2026, where the frontier sits and what notified bodies currently accept.
MDR classification Rule 11 for software, the classification rule that governs almost every SaMD decision.
Significant change for software under MDR, when an update crosses the line into a new conformity assessment.
Clinical evaluation for AI/ML continuous learning, how to build a clinical argument for a device that learns.

Sources

Regulation (EU) 2017/745 on medical devices, consolidated text. Article 2(12), Article 52, Annex II, Annex VIII Rule 11, Annex XIV.
MDCG 2019-11 Rev.1. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745, MDR. June 2025.
EN 62304:2006+A1:2015. Medical device software. Software life cycle processes.
EN ISO 14971:2019+A11:2021. Medical devices. Application of risk management to medical devices.
EN IEC 81001-5-1:2022. Health software and health IT systems safety, effectiveness and security. Part 5-1: Security. Activities in the product life cycle.

The Bigger Picture

Federated Learning for Medical Devices: MDR and GDPR