Software Change Management for AI/ML Devices: When Retraining Triggers New Assessment

Quick Summary

When retraining an AI/ML medical device counts as a significant change, it can trigger a new conformity assessment. Here is how to decide.

Under MDR, every change to a certified medical device has to be classified by significance. For an AI/ML device, retraining the model is a change. And when that change affects the intended purpose, the risk profile, the performance characteristics, or the validated operating envelope, it counts as a significant change and has to be notified to the Notified Body before the updated model reaches patients. Under Annex IX Sections 2.4 and 4.10, changes to an approved QMS or to an approved design trigger a review by the Notified Body where the change could affect conformity with the Regulation. The practical route for teams that need faster or more frequent model updates is a predetermined change control plan: a pre-authorised envelope of model updates, agreed with the Notified Body during the initial assessment, that lets bounded retraining events proceed without a fresh conformity assessment for each one.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.

TL;DR

Retraining an AI/ML medical device model is a change to the device. Whether it is a significant change is a regulated determination, not an engineering judgement call.
MDR Article 10(9) requires manufacturers to operate a QMS that governs all changes to the device. EN ISO 13485:2016+A11:2021 Clause 7.3.9 is the harmonised standard expectation for design change control.
Under Annex IX Sections 2.4 and 4.10, changes to the approved QMS or to an approved design have to be assessed; changes that could affect conformity are notified to the Notified Body and approved before they take effect.
MDCG 2019-11 Rev.1 (June 2025) is the guidance document for qualification and classification of medical device software and frames how software changes are handled in practice, including updates to AI components.
A retraining event that materially affects intended purpose, risk, clinical performance, target population, input data characteristics, or the validated operating envelope is significant. A retraining event that stays strictly inside the originally validated envelope can be handled as a minor change. But only with evidence.
The predetermined change control plan (PCCP) concept lets manufacturers pre-authorise a bounded set of future model updates. The envelope is assessed once by the Notified Body as part of the initial conformity assessment, and updates inside the envelope do not trigger a fresh assessment each time.
The most common mistake is assuming that because a retraining used "the same architecture" or "more of the same data," it is automatically minor. It is not. The assessment has to be made against a reference test battery and documented.

Why this question decides your release cadence

Almost every AI MedTech team we coach underestimates how much the change management process shapes the release cadence of their product, engineering teams plan quarterly or monthly model updates, investors ask about compounding data advantages, clinical partners expect the model to get better as more data comes in. And then the first retraining lands on Tibor's desk for a significance assessment, and the team discovers that the update they were planning to push next week needs a Notified Body notification, documentation, and possibly a partial re-assessment before it can reach patients.

The question is not whether retraining is allowed. Retraining is allowed. The question is which retraining events are significant changes under MDR, how you decide, who signs off, and how to build a process that does not bottleneck on the answer. For the broader treatment of locked versus adaptive algorithms, see the post on locked vs. adaptive AI algorithms under MDR. For the pillar framing see AI medical devices under MDR.

Change classification in the MDR framework

MDR does not contain a chapter called "change management." The obligation is stitched together from several places, and an AI/ML team needs to know all of them.

Article 10(9) requires the manufacturer to establish, document, implement, and maintain a quality management system that ensures compliance with the Regulation in the most effective manner, proportionate to the risk class and type of device. Change control is one of the explicit QMS subsystems. The manufacturer has to have a process that governs how changes to the device are assessed, approved, implemented, and recorded. The harmonised standard that discharges this obligation is EN ISO 13485:2016+A11:2021. Clause 7.3.9 of that standard is the design change control clause, and it is where the auditor will look when assessing whether your AI change management process is adequate.

Annex IX is the conformity assessment annex for the quality management system route, Section 2.4 deals with changes to the approved QMS itself, if you change how you run your change control process, that is a QMS change and the Notified Body has to be informed, Section 4.10 deals with changes to an approved design, if you change the certified device in a way that could affect conformity with the Regulation, the change has to be assessed and, where it could affect safety or performance, approved by the Notified Body before it is implemented. The two sections work together: the process for classifying and handling changes lives in the QMS; the individual change is assessed against the approved design.

The significance of a change is the regulated decision that determines which path it follows. A minor change is documented in the technical file and proceeds under the manufacturer's responsibility. A significant change is notified to the Notified Body and, where relevant, approved before it reaches patients. Getting this decision wrong in either direction has consequences. Classifying a significant change as minor is a compliance failure and a patient safety risk. Classifying a minor change as significant is not a compliance failure, but it inflates cost, slows cadence, and burns Notified Body goodwill.

What counts as significant for an AI/ML device

The significance criteria are familiar from any software medical device, but AI/ML makes several of them harder to assess honestly.

Intended purpose. Any change that alters, extends, or narrows the intended purpose of the device is significant. If a retraining event makes the model usable on a new patient population, on a new indication, on a new body site, or with a new imaging modality, the intended purpose has moved and the change is significant. This is the cleanest test and the one AI teams most often stumble over, because "adding more diverse training data" can quietly expand the population the model actually works on.

Risk profile. Any change that introduces new hazards or changes the severity or probability of known hazards is significant. For AI, retraining can introduce bias hazards that were not present before, change the failure mode distribution, or shift the performance on edge cases in a way that changes the clinical risk picture. The EN ISO 14971:2019+A11:2021 risk file has to be re-evaluated after every material retraining, not updated once a year.

Performance characteristics. Any change that materially alters the claimed performance, sensitivity, specificity, AUC, agreement with reference standard, calibration, is significant. The honest test is whether the updated model produces the same or better performance on the same reference test battery used for the initial assessment. If the team cannot run the updated model against the reference battery and produce the comparison, the team does not know whether the change is significant.

Input data characteristics. Any change in the input data distribution the model expects, resolution, acquisition protocol, preprocessing pipeline, data source, is a change to the device. Models are sensitive to input pipeline details, and a preprocessing change is often as impactful as a weight update.

Validated operating envelope. The original conformity assessment approved the device against a defined envelope: population, setting, data characteristics, performance bounds. A change that pushes behaviour outside that envelope is significant. A change that stays strictly inside it, and can be shown to stay inside it with documented evidence, can be handled as minor.

The point that often gets missed: "we used the same architecture and added more data" is not, on its own, a significance argument. Architecture stability says nothing about what the model actually does. Significance is determined by the behaviour of the updated model against the reference battery, not by the provenance of the change.

Retraining vs. recalibration vs. fine-tuning

The words matter, because the regulatory treatment depends on what actually changed.

Key Takeaway

It is a continuation of training from an existing checkpoint on new data, usually on a smaller scale than full retraining.

Retraining means running the full training procedure, usually on an updated dataset, producing a new set of model weights. Retraining is the broadest kind of change and the one most likely to be significant. The updated model has to be validated end-to-end against the reference battery, and the outcome of that validation determines whether the change is significant or minor.

Recalibration is narrower. It is an adjustment of output calibration. For example, adjusting a probability-to-decision threshold or updating a calibration curve, without changing the underlying model weights, recalibration can still be significant if it changes the operating point materially, but the assessment is more contained because the model's internal behaviour has not moved.

Fine-tuning is somewhere in between. It is a continuation of training from an existing checkpoint on new data, usually on a smaller scale than full retraining. Fine-tuning can drift the model behaviour subtly in ways that do not show up in aggregate metrics but change subgroup performance. It needs the same reference battery treatment as retraining.

Per-site adaptation is a special case. Some products fine-tune at installation time at each customer site. From an MDR perspective, every per-site model is a different configuration of the device, and the technical documentation has to explain how that set of configurations stays inside the validated envelope. This is exactly the kind of scenario a predetermined change control plan is designed for.

The distinction engineers use internally is not the distinction the Regulation cares about. The Regulation cares about whether the model running tomorrow does materially different things from the model that was assessed. The name of the training procedure is downstream of that question.

The predetermined change control plan approach

For teams that need a release cadence faster than the standard change notification process allows, the workable pattern in 2026 is a predetermined change control plan (PCCP). The concept has been discussed in international regulatory forums and explored in other jurisdictions; in the EU, the practical contours are being worked out between manufacturers, Notified Bodies, and the Medical Device Coordination Group. There is no single universally adopted form of it in the MDR text itself, and founders should treat it as a pattern to negotiate with their Notified Body rather than as a standard clause.

The idea is straightforward. During the initial conformity assessment, the manufacturer submits a document that pre-specifies a bounded envelope of future model updates the manufacturer is authorised to make without triggering a fresh assessment. The envelope has to specify, at minimum: which model parameters or components can change, what the trigger conditions are, what performance bounds the updated model has to meet, what reference test battery has to be passed, what documentation has to be produced, and how the fleet is updated. The Notified Body assesses the envelope itself as part of the initial assessment. Once approved, updates that fall inside the envelope are treated as already covered by the original conformity assessment; updates that fall outside the envelope are significant changes and go through the normal process.

The PCCP is only as useful as the discipline of staying inside the envelope and the rigour of the pre-authorisation, vague envelopes. "the model may improve as new data arrives". Do not survive Notified Body review. Specific envelopes that define concrete triggers, concrete metrics, concrete bounds, and concrete revalidation steps can. For the deeper treatment of this pattern see the post on locked vs. adaptive AI algorithms under MDR. For the related product-lifecycle view see continuous learning AI and MDR in 2026.

Documentation expectations for every retraining event

Whether a retraining is handled as minor or significant, the documentation obligation is the same: the decision has to be recorded and defensible. A Notified Body auditor reading your technical file two years from now has to be able to reconstruct what changed, why it was classified the way it was, what evidence supported the classification, and who signed off.

A workable change record for an AI retraining event includes the following, at minimum. A description of what was retrained, weights, architecture, preprocessing, training data, calibration, the reason for the change, bug fix, dataset expansion, performance improvement, drift response, the training data snapshot identifier and the diff from the previous snapshot. The validation results against the reference test battery, including aggregate metrics and subgroup metrics. A comparison against the performance bounds stated in the technical documentation or the PCCP envelope. A risk file review noting any new or changed hazards. A significance determination with justification. Sign-off by the roles defined in the QMS change control procedure.

This is not extra overhead invented for the Regulation. It is the discipline that lets the team move fast with confidence. Teams that skip the reference battery run on every update are not going fast; they are accumulating undocumented risk. The Annex I Section 17 software lifecycle obligations, the EN 62304:2006+A1:2015 software change control requirements, and the EN ISO 14971:2019+A11:2021 risk management obligations all converge here.

In Practice

The words matter, because the regulatory treatment depends on what actually changed.

When to notify the Notified Body

The rule of thumb: if the change is significant, notify. If the change is minor, document internally. If you are not sure, the honest answer is that you do not yet have enough evidence, and the right next step is to run the reference battery and the risk review before making the classification decision.

Practically, notification timing matters, significant changes under Annex IX Section 4.10 go through the Notified Body before implementation, not after, a team that pushes an update to production and notifies afterwards has already placed a non-conforming device on the market. The notification is not a rubber stamp; the Notified Body may ask for additional evidence, may require a partial re-assessment, and may take weeks to months to sign off depending on the scope. This is why the release cadence has to be planned against the real Notified Body turnaround, not against the engineering team's wishful pace.

Early and informal engagement with the Notified Body on change management approach is one of the highest-use moves a team can make. Before you lock the technical documentation for the initial assessment, have the conversation about what your expected update pattern looks like. That is the moment to negotiate a PCCP if you need one. After certification, it is too late to add one without a substantial variation.

Common mistakes teams make

Treating retraining as an internal engineering event. A retraining that reaches patients is a change to a certified medical device. It is not a GitHub merge. It has to flow through the QMS change control process with the same discipline as any other design change.

Assuming architecture stability means minor change. "Same architecture, more data" is not a significance argument. The significance depends on the behaviour of the updated model against the reference battery, not on how the update was produced.

Running no reference battery. Teams that do not maintain a frozen reference test battery cannot make significance determinations at all. Every significance call they make is a guess. The reference battery is the single most important piece of infrastructure in an AI change management process.

Waiting to talk to the Notified Body. Discovering at audit that the Notified Body does not accept your change control approach is the most expensive way to find out. Engage early, get the expected pattern on the table during the initial assessment, and negotiate a PCCP envelope if your business case requires one.

Confusing PMS with change control. Drift detection under MDR Article 83 is a different obligation from change control. Detecting drift tells you the environment has shifted; responding with a retraining is a change control event. Both have to happen. Having a good PMS process does not substitute for a good change control process, and vice versa.

Vague PCCP envelopes. Envelopes that do not specify concrete parameters, triggers, bounds, and revalidation do not survive Notified Body review. The envelope has to be specific enough that an auditor reading it cold can tell whether a given update is inside or outside.

The Lean Path Forward

The Subtract to Ship framework applied to AI change management produces a concrete subtraction: cut every retraining event that does not have a documented clinical reason. Teams burn enormous amounts of engineering and regulatory effort on retraining events that chase a fractional metric improvement on the validation set and produce no measurable change in clinical benefit. Each of those events consumes change control cycles, Notified Body attention, and internal sign-off time.

The Subtract to Ship move is to define, up front, what "a retraining worth doing" means for your product. Tie it to a concrete performance threshold, a concrete drift signal, a concrete clinical outcome, or a concrete bug class. Every retraining that does not clear that threshold does not happen. The result is fewer change events, each one better justified, each one moving through the QMS faster, and each one more defensible at audit. For the broader subtraction logic see minimum viable regulatory strategy for MDR and minimum viable regulatory strategy for MDR.

Where Do You Stand?

Do you have a written procedure in your QMS that tells an engineer, today, how to classify an AI retraining event as minor or significant. With criteria, evidence requirements, and sign-off roles?
Do you have a frozen reference test battery that every retrained model has to pass before it reaches patients, and is the battery comprehensive enough to detect subgroup regressions?
For every retraining event in the last twelve months, can you reconstruct what changed, why, and what evidence supported the significance decision. From the technical file alone?
Is the validated operating envelope for your device written down in enough detail that someone could tell whether a proposed update stays inside it or falls outside?
Have you had a conversation with your Notified Body about your expected retraining cadence, and do they know what to expect before the first change notification arrives?
If your answer to the cadence question involves a PCCP, is the envelope specific enough. Parameters, triggers, bounds, revalidation. That a cold reviewer could classify updates against it?
Does your risk file get re-reviewed after every material retraining, or only at annual review?

Frequently Asked Questions

Does every retraining of an AI medical device trigger a new conformity assessment?

No. Retraining is a change to the device, and the manufacturer has to classify it by significance under the QMS change control process. Changes that affect intended purpose, risk, performance, input characteristics, or the validated operating envelope are significant and have to be notified to the Notified Body before implementation. Changes that stay strictly inside the validated envelope, with documented evidence, can be handled as minor.

What is a significant change for an AI/ML device?

A significant change is one that materially affects intended purpose, the risk profile, the performance characteristics, the input data expectations, or the validated operating envelope of the certified device. Under Annex IX Sections 2.4 and 4.10, significant changes to the approved QMS or to an approved design have to be assessed by the Notified Body where they could affect conformity with the Regulation. The significance determination is a regulated decision, documented with evidence, not an engineering judgement call.

Can I handle retraining inside the certified device without notifying the Notified Body?

Only if the retraining stays strictly inside the validated envelope documented in the technical file, if you have evidence from the reference test battery that the updated model does not change the performance, risk, or behaviour beyond the bounds stated in the documentation, and if your QMS change control procedure classifies the change as minor with sign-off recorded. If any of those conditions is not met, the change is either significant or undetermined, and undetermined means you do not yet have enough evidence to ship.

What is a predetermined change control plan and do I need one?

A PCCP is a document submitted with the initial conformity assessment that pre-specifies a bounded envelope of future model updates the manufacturer is authorised to make without a fresh assessment for each one. The envelope defines which parameters can change, on what triggers, within what performance bounds, with what revalidation. You need one if your business case depends on a release cadence faster than the standard change notification process can support. You do not need one if a periodic release cadence through normal change control meets your needs.

What evidence does a Notified Body expect for an AI change management decision?

A description of what changed, the reason, the training data snapshot identifier, validation results against the reference test battery including subgroup metrics, comparison against the performance bounds stated in the technical documentation, a risk file review noting any new or changed hazards, the significance determination with justification, and sign-off by the roles defined in the QMS. EN ISO 13485:2016+A11:2021 Clause 7.3.9 is the design change control expectation the auditor will measure your process against.

Is bug-fix retraining treated differently from performance-improvement retraining?

Not structurally. The significance decision depends on the impact of the updated model, not on the intention behind the update. A bug fix that changes behaviour outside the validated envelope is significant; a performance improvement that stays inside the envelope is minor. Document both with the same rigour.

How fast can a significant change actually go through the Notified Body?

Variable, and the honest planning answer is weeks to months depending on scope, the current Notified Body workload, and whether additional evidence is requested. This is why the release cadence has to be planned against realistic Notified Body turnaround, not the engineering team's wishful pace, and why early negotiation of a PCCP is the highest-use move for teams that expect frequent updates.

AI Medical Devices Under MDR: The Regulatory Environment in 2026, the pillar post framing the full AI MedTech regulatory picture.
Locked vs. Adaptive AI Algorithms Under MDR, the companion piece on why locked is the default and how the envelope concept works.
Continuous Learning AI and MDR in 2026, the deeper dive on post-market learning pathways.
Post-Market Surveillance for AI Medical Devices, drift detection and its interaction with change control.
Model Validation and Verification for AI Medical Devices, the reference test battery that underpins every significance decision.
The Subtract to Ship Framework for MDR Compliance, the methodology that runs through every post in this blog.

Sources

Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices. Article 10(9) (quality management system obligation), Annex I Section 17 (electronic programmable systems and software requirements), Annex IX Section 2.4 (changes to the approved QMS), Annex IX Section 4.10 (changes to an approved design). Official Journal L 117, 5.5.2017.
MDCG 2019-11 Rev.1. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745. MDR and Regulation (EU) 2017/746. IVDR, October 2019, Revision 1 June 2025.
EN ISO 13485:2016 + A11:2021. Medical devices. Quality management systems. Requirements for regulatory purposes. Clause 7.3.9 on design and development changes.
EN 62304:2006 + A1:2015. Medical device software. Software life-cycle processes.
EN ISO 14971:2019 + A11:2021. Medical devices. Application of risk management to medical devices.

This post is part of the AI, Machine Learning and Algorithmic Devices category in the Subtract to Ship: MDR blog, authored by Felix Lenhard and Tibor Zechmeister, change management is the place where most AI MedTech teams discover that the Regulation is not the obstacle. Their own lack of a disciplined process is. If the shape of the right significance procedure and the right PCCP envelope for your product is not obvious after reading this post, that is expected: both are bespoke work, and both are the kind of decisions where a sparring partner who has walked other AI MedTech teams through the same Notified Body conversation earns their keep.

Predictive Analytics as a Medical Device: The Tipping Point