Classification of AI/ML Software as a Medical Device Under MDR Rule 11

Quick Summary

AI and ML medical software is classified under the same Rule 11 as any other SaMD. Here is how the rule applies to AI specifically.

AI and ML medical software is classified under MDR Annex VIII Rule 11 in exactly the same way as any other software as a medical device. The underlying method, neural network, gradient-boosted tree, classical statistics, or rule-based code, does not change the class. What sets the class is the intended purpose and the ceiling of clinical harm if the output is wrong. Most AI medical devices land in Class IIa, IIb, or III because they exist to drive or support diagnostic and therapeutic decisions. Class I is available only through Rule 11's narrow catch-all and almost never fits an AI product that processes patient data to produce a clinical output.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.

TL;DR

Classification of AI/ML software under Rule 11 follows the same decision-making and monitoring branches as any other SaMD. The algorithmic method is not a factor.
The class is set by the intended purpose and the ceiling of clinical harm if the AI output is wrong. Not by model accuracy, explainability, or risk controls.
Decision-support AI is the dominant case: Class IIa by default, IIb where wrong output could cause serious deterioration or surgical intervention, III where it could cause death or irreversible deterioration.
Monitoring AI. Continuous physiological signal interpretation. Sits in the monitoring branch: IIa by default, IIb when the parameter is vital and the variations could cause immediate danger.
Locked algorithms fit the MDR framework cleanly. Adaptive algorithms do not change the class under Rule 11, but they force a predefined change control plan into the technical documentation.
Human-in-the-loop does not drop an AI product out of the decision-making branch. Clinician oversight is inside the scope of Rule 11, not an exit from it.
MDCG 2019-11 Rev.1 (June 2025) is the definitive interpretation. Its worked examples calibrate borderline AI cases better than the rule text alone.

Rule 11 applied to AI. The rule does not care about the method

The first thing to say about classifying an AI/ML medical device under Rule 11 is that Rule 11 does not mention AI. It classifies software by intended purpose and by the ceiling of clinical harm if the software is wrong, whether the software produces its output through a deep neural network, a gradient-boosted tree, a hand-coded rule system, or a linear regression trained in 1998. The rule asks the same two questions and gives the same answer.

This is a feature, not a gap. A classification framework that shifted whenever a new model architecture appeared would be unusable. The MDR was drafted to survive technology change, and Rule 11 is one of the places where that shows. An AI diagnostic tool and a deterministic diagnostic tool with the same intended purpose and the same ceiling of clinical consequence classify identically. The conformity assessment procedure, the Notified Body involvement, the technical documentation depth, and the clinical evidence burden follow the class, not the method.

What does change for AI is what the technical file has to contain at that class. Dataset governance, bias testing, drift monitoring, AI-specific failure mode analysis. Those are downstream of Rule 11, not part of it. This post stays on the classification question itself. Post 449 covers the technical documentation specific to AI.

For the entry-level reading of Rule 11 across all SaMD, post 081 is the prerequisite. For the branch-by-branch detailed look, post 085 is the companion. This post assumes both and narrows the lens to the AI-specific angles.

Decision-making AI versus monitoring AI, which branch

Rule 11 has two primary branches and a catch-all. For AI products, the split between the decision-making branch and the monitoring branch is usually the first question to answer.

The decision-making branch covers software intended to provide information used for diagnostic or therapeutic decisions. An AI model that reads a chest X-ray and flags a suspected finding is in this branch. A model that scores a patient's risk of sepsis from structured vitals is in this branch. A model that interprets a dermatology image for suspected malignancy is in this branch. The output is information; a clinician or a patient uses the information to make a diagnostic or therapeutic decision; the branch applies. Most AI MedTech products we see are here.

The monitoring branch covers software intended to monitor physiological processes. An AI model that processes a continuous ECG stream to detect arrhythmias is in this branch. A model that watches continuous glucose data and produces clinical output over time is in this branch. A model that interprets continuous respiratory or haemodynamic signals in an acute care context is in this branch. The defining feature is that the software watches a physiological signal over time and does something with it, rather than producing a single point-in-time interpretation of a single input.

Some AI products touch both branches. A wearable-linked model that continuously monitors physiological signals and also produces a diagnostic recommendation on demand covers both the monitoring and the decision-making function. Rule 11 in such cases applies through whichever branch captures the highest ceiling of clinical consequence across the full intended purpose. You do not pick the friendlier branch. You apply both and the higher class wins.

The catch-all. "all other software" at Class I. Is theoretically available but almost never fits an AI product. If the model takes patient data as input and produces any output that informs a care decision or tracks a physiological signal, one of the first two branches applies. An AI that generates administrative text, summarises meeting notes without clinical interpretation, or provides general reference information without patient-specific output may land in Class I. An AI that scores, classifies, flags, alerts, or recommends does not.

The consequence-of-error analysis. The AI twist

Rule 11 sets the class by the ceiling of clinical harm if the output is wrong. For AI, this analysis needs a sharper lens than it does for classical software, because the failure modes are different.

A classical decision-support tool fails when a coded rule is wrong or a calculation is off. The failure is usually visible, reproducible, and bounded to specific inputs. An AI model fails in more varied ways. It fails on out-of-distribution inputs that look normal. It fails on subgroups under-represented in the training data. It fails when the deployment environment drifts from the training environment. It fails silently, producing a confident output that is simply wrong. The ceiling of harm is set by what happens when one of these failures slips past the clinician and reaches a patient.

The question to ask for classification is always the same one from post 081: if the AI output is wrong and a clinician acts on it in good faith, what is the worst plausible clinical consequence? The answer is what puts you at IIa, IIb, or III in the decision-making branch.

Key Takeaway

Risk controls under EN ISO 14971 reduce residual risk inside the class.

IIa is the default. Wrong output could cause bounded harm. A wasted follow-up visit, a delayed but recoverable intervention, a correctable error in a low-acuity decision.
IIb applies when wrong output could cause serious deterioration of a person's state of health or a surgical intervention. An AI that flags (or misses) a finding that drives referral to surgery. A triage model whose misclassification could delay time-sensitive treatment to the point of serious harm. A dosing-support tool for medications where errors cause significant injury.
III applies when wrong output could cause death or irreversible deterioration. An AI that interprets imaging for conditions where a missed finding is fatal within the clinical window. A model that drives dosing for a high-risk therapy where overdose is fatal. A model embedded in a therapy-control pathway where wrong output is directly lethal.

The analysis is about the ceiling, not the probability. "Our model has 97% sensitivity" is not a Rule 11 argument. It is a performance claim that belongs in the clinical evaluation, not the classification. Rule 11 assumes a failure and asks what happens next, risk controls under EN ISO 14971 reduce residual risk inside the class, they do not move the class. This is the single most common Rule 11 mistake among founders arriving from an ML background, where the instinct is to defend accuracy rather than bound harm.

Locked versus adaptive algorithms. The impact on classification

A locked algorithm is one whose behaviour does not change after placing on the market, or only changes through controlled release events that pass through formal change management. An adaptive (or continuously learning) algorithm updates its weights or its decision behaviour from data it sees in the field, without a discrete release event.

The first thing to know is that the locked-versus-adaptive distinction does not change the Rule 11 class. A locked decision-support tool and an adaptive decision-support tool with the same intended purpose and the same ceiling of clinical consequence classify the same way. Rule 11 is not sensitive to how the software updates; it is sensitive to what the software does.

What does change is what the technical file has to address at the time of conformity assessment. A locked algorithm presents a stable artefact to the Notified Body. Each change in the future goes through change management and, where significant, re-assessment. An adaptive algorithm presents a moving target, and the MDR framework expects a predefined change control plan that bounds the envelope of possible updates in advance. The Notified Body assesses the envelope, not each future state. Each update inside the envelope can proceed without re-assessment; any update outside the envelope is a significant change and triggers a new conformity assessment.

In 2026 the practical consensus in the EU is that fully autonomous continuously-learning algorithms without a defined change envelope do not have a clean CE marking pathway. Most AI medical devices ship locked or with a predefined change control plan, post 433 walks the locked-versus-adaptive topic in full, for the classification question, the point is this: the adaptive nature is a technical documentation and change control problem, not a class problem. Your Rule 11 branch and level do not move because you chose adaptive.

The human-in-the-loop nuance

A recurring argument we hear is that an AI product with a clinician reviewing every output is not really decision-support and should fall to Class I. That argument does not survive Rule 11 or MDCG 2019-11 Rev.1. The word "decisions" in the rule covers decisions made by clinicians using the software's output as one input among several. Clinician-in-the-loop is inside the scope of Rule 11, not an exit from it.

There is a nuance worth stating precisely. Having a clinician in the loop does not change the class, but the way the clinician interacts with the AI affects the risk management file and the clinical evaluation. A system where the clinician sees the raw evidence and the AI's interpretation side by side, and can override freely, is a different product than one where the clinician sees only the AI's output. Both are in the decision-making branch. Both are at the same Rule 11 level for the same intended purpose. But the failure modes, the usability considerations, and the clinical evidence look different, and those differences belong in the Annex II technical documentation, not in the classification.

The complacency failure mode. Clinicians drifting from careful review to reflexive acceptance after the AI is right many times in a row. Is real and has to be addressed in the usability engineering file under EN 62366-1 and in the PMS plan. It does not change the Rule 11 class. It changes how the class is defended in the rest of the file.

Examples. Where typical AI products land

These are indicative placements based on common intended purposes and the ceiling-of-harm logic. Any specific product needs its own classification argument anchored to MDCG 2019-11 Rev.1.

AI-based radiology triage for non-critical findings. Decision-making branch. Typically IIa. Moves to IIb if the triage drives time-sensitive interventions whose delay would cause serious deterioration.
AI-based cancer detection in imaging. Decision-making branch. Typically IIb or III depending on whether the missed-finding consequence is serious-but-recoverable or fatal-or-irreversible within the clinical window.
AI sepsis prediction from vitals in an acute setting. Decision-making branch, often IIb because wrong output can delay treatment into serious deterioration. Can reach III where the clinical context makes delay directly fatal.
AI arrhythmia detection from continuous ECG. Monitoring branch. IIa by default, IIb where the arrhythmias monitored are life-threatening on a minutes-scale (classic vital-plus-immediate-danger case).
AI continuous glucose interpretation with dosing suggestions. Combines monitoring and decision-making. Typically IIb because the decisions informed (insulin dosing) can cause serious deterioration. The classification runs through the decision-making branch.
AI dermatology triage for lesion assessment. Decision-making branch, typically IIa or IIb depending on whether the intended purpose includes driving biopsy decisions for suspected malignancy.
AI chatbot providing general health information without patient-specific output. Possibly outside the MDR entirely if the intended purpose is genuinely non-medical. If it qualifies as MDSW, the catch-all Class I is theoretically available but rare. Any patient-specific output pulls it back into the first two branches.

These are starting points. The real classification argument for any product is anchored to the worked examples in MDCG 2019-11 Rev.1 that most closely match the intended purpose.

Common errors in AI Rule 11 classification

"The AI is only a recommendation, not a decision." Rule 11 explicitly covers information used to take decisions. The word "recommendation" lands in the middle of the rule, not outside it.

"We will reduce the class by improving model accuracy." Rule 11 is indifferent to accuracy, accuracy belongs in the clinical evaluation, the class is set by the ceiling of harm, not the probability of failure.

"Our explainability layer drops us to Class I." Explainability is a risk control and a usability consideration. It does not change the class.

"The adaptive nature makes us non-classifiable." The adaptive nature is a change control problem. The class is still set by Rule 11 based on the intended purpose and ceiling of harm.

"A clinician always reviews the output, so we are not decision-support." Clinician-in-the-loop is inside Rule 11, not an exit from it.

"Our US counterpart is FDA Class II, so we are MDR IIa." FDA and MDR classifications diverge. A US Class II AI product can be MDR IIa, IIb, or III depending on intended purpose and clinical consequence.

"We will argue the catch-all because our output is informational only." The catch-all is narrow. If the information is used for a diagnostic or therapeutic decision, the first branch applies regardless of how the output is framed.

Strategic Approach

The lever for AI classification under Rule 11 is the same lever as for any other SaMD: the intended purpose. Subtract to Ship, applied here, means writing the narrowest honest intended purpose that still describes the AI product you are actually building. If the model genuinely does not need to drive a surgical-intervention claim, do not make one. If the monitoring function is not genuinely for vital parameters on an immediate-danger timescale, do not describe it that way. If the output does not need to support a decision that could cause irreversible harm, scope it so it does not.

The opposite trap is where AI MedTech startups lose the most. An expansive pitch-deck intended purpose. "AI that diagnoses dozens of conditions across the full imaging workflow". Written into the technical documentation will classify the product into the highest bracket of every condition it touches. Every broad claim is absorbed into the class. The engineering team then builds against a Class IIb or III conformity assessment when a narrower intended purpose. One condition, one decision context, one intended-use population. Would have produced a defensible Class IIa and a year of saved runway.

The move is not to hide what the model can do. It is to state precisely what the model is intended for at the point of placing on the market, ship that, and expand the intended purpose deliberately through change management when the evidence supports it. Post 065 covers the framework in full.

Key Takeaways. Can you defend your AI product's Rule 11 classification?

Have you qualified your AI software as MDSW under MDR Article 2(1) through the MDCG 2019-11 Rev.1 decision tree, in writing, before touching Rule 11?
Have you written one paragraph that states which Rule 11 branch applies (decision-making, monitoring, or catch-all) and why?
In the decision-making branch, what is the worst plausible clinical consequence if the AI output is wrong and a clinician acts on it in good faith. Bounded harm, serious deterioration or surgical intervention, or death or irreversible deterioration?
If you are in the monitoring branch, are the parameters genuinely vital AND can variations result in immediate danger on a minutes-scale? Both conditions, or one?
If you believe your product fits the Class I catch-all, have you tried to falsify that against MDCG 2019-11 Rev.1's worked examples?
Is your classification argument anchored to a specific MDCG 2019-11 Rev.1 example that resembles your intended purpose?
Is your algorithm locked, or do you have a predefined change control plan documented before conformity assessment?
Does every public claim. Website, app store, marketing, clinical evaluation. Match the intended purpose you are relying on for Rule 11?
Is your EN 62304:2006+A1:2015 software safety class (A, B, C) consistent with the MDR device class you have reached under Rule 11?

Frequently Asked Questions

Does Rule 11 treat AI/ML software differently from classical SaMD?

No. Rule 11 classifies by intended purpose and the ceiling of clinical harm, not by the algorithmic method. An AI decision-support tool and a classical decision-support tool with the same intended purpose and the same consequence-of-error profile reach the same class. What AI changes is the content of the technical documentation, dataset governance, bias analysis, drift monitoring, not the class itself.

Can I classify an AI diagnostic-support product as Class I?

Almost never. The Class I catch-all in Rule 11 applies only to software that qualifies as a medical device but neither provides information used for diagnostic or therapeutic decisions nor monitors physiological processes. An AI that produces patient-specific output used in a clinical decision falls into the decision-making branch, which starts at Class IIa.

Does an adaptive (continuously learning) algorithm change the Rule 11 class?

No. The adaptive nature is a change control problem, not a classification problem. The class is set by the intended purpose and the ceiling of clinical harm. Adaptive algorithms do force a predefined change control plan into the technical documentation, and in 2026 fully autonomous continuously-learning algorithms without a defined envelope do not have a clean CE marking pathway in the EU.

If a clinician reviews every AI output, does that drop the class?

No. Rule 11 covers information used by clinicians to take decisions. Clinician-in-the-loop is inside the scope of the rule. The class does not drop because of oversight. Oversight affects the risk management file, the usability engineering file, and the clinical evaluation, not the Rule 11 class.

How do I decide between Class IIb and Class III for an AI product?

Ask whether the worst plausible harm from a wrong output acted on in good faith is recoverable or permanent, serious but recoverable, extended hospitalisation, significant but reversible functional loss, a surgical intervention the patient recovers from. Is IIb, permanent, non-recoverable harm, death, permanent organ damage, permanent functional loss, is III. The worked examples in MDCG 2019-11 Rev.1 are the calibration tool for borderline cases.

Does the EU AI Act change the MDR Rule 11 class?

No. The AI Act layers additional obligations on top of MDR for AI systems in safety-critical use, including many medical devices, but it does not change the MDR classification framework. Your Rule 11 class is set by MDR, your AI Act obligations sit alongside it, post 426 walks the two-layer framework.

How to Apply MDR Classification Rule 11: Software as a Medical Device, the entry-level reading of Rule 11 across all SaMD.
MDR Classification of Software: Rule 11 Detailed Look for SaMD Startups, the branch-by-branch companion detailed look.
Rule 11 Monitoring Branch. The Detailed Look, the vital-parameter escalation in detail.
What Is Software as a Medical Device (SaMD) Under MDR?, qualification before classification.
SaMD vs SiMD. Software as a Medical Device vs Software in a Medical Device, the distinction that determines whether Rule 11 applies stand-alone.
MDCG 2019-11 Rev.1. What the Software Guidance Actually Says, the full reading of the definitive guidance document.
AI in Medical Devices Under MDR: The Regulatory Environment in 2026, the pillar post for the AI cluster.
Machine Learning Medical Devices Under MDR, the companion ML development post.
Locked Versus Adaptive AI Algorithms Under MDR, the change control framework for adaptive systems.
The Subtract to Ship Framework for MDR Compliance, the methodology that drives how we scope intended purpose for Rule 11.

Sources

Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 2(1) (definition of medical device), Article 51 (Classification of devices), and Annex VIII (Classification rules), Rule 11. Official Journal L 117, 5.5.2017.
MDCG 2019-11. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745. MDR and Regulation (EU) 2017/746. IVDR. First published October 2019; Revision 1, June 2025. Published by the Medical Device Coordination Group, European Commission.
EN 62304:2006+A1:2015. Medical device software. Software life-cycle processes (EN 62304:2006 + EN 62304:2006/A1:2015).

This post is a spoke in the AI, Machine Learning and Algorithmic Devices category of the Subtract to Ship: MDR blog, and a cross-cluster reference for the Device Classification and Conformity Assessment category, authored by Felix Lenhard and Tibor Zechmeister, the MDR is the North Star for every claim in this post, MDCG 2019-11 Rev.1 is the authoritative interpretation of Annex VIII Rule 11, and EN 62304:2006+A1:2015 is the software lifecycle tool that sits alongside the Rule 11 device class rather than replacing it. For startup-specific regulatory support on AI/ML classification under Rule 11, Zechmeister Strategic Solutions is where this work is done in practice.