AI/ML Medical Device Compliance Checklist for Startups in 2027

Quick Summary

The complete compliance checklist for AI/ML medical device startups in 2027, spanning MDR, the AI Act layer, data governance, and post-market monitoring.

An AI/ML medical device in 2027 has to satisfy two Regulations at once. MDR (Regulation (EU) 2017/745) governs the device side: intended purpose under Article 2(1), classification under Annex VIII Rule 11, software lifecycle under Annex I Section 17, clinical evaluation under Article 61, and post-market surveillance under Articles 83-86. The EU AI Act (Regulation (EU) 2024/1689) layers obligations around training data, transparency, human oversight, and documentation on top of that. This checklist walks through the ten areas a startup has to cover before conformity assessment. Each one anchored to a specific MDR provision, harmonised standard, or MDCG guidance document, with the AI Act layer called out where it adds something the MDR alone does not.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.

TL;DR

AI/ML medical devices in 2027 are regulated by MDR in full. The AI Act adds a second, horizontal layer of obligations that must be integrated into the existing conformity assessment rather than run in parallel.
Ten checklist areas cover the practical work: intended purpose and qualification, Rule 11 classification, training data governance, algorithm lifecycle management, risk management for AI-specific failure modes, clinical evaluation, cybersecurity, post-market surveillance with drift detection, the AI Act layer, and technical documentation alignment with Annex II.
Each item on this checklist traces to a specific MDR article, annex, harmonised standard, or MDCG guidance. Anything that does not trace is not on the checklist.
The single biggest mistake startups make is treating AI compliance as an add-on. The AI-specific work has to be built into the QMS and the technical file from the start, not bolted on at audit.
After this checklist, the honest next step is a gap review against your current state. Most AI MedTech startups have three to five items that look covered but are not.

A note before the checklist

This is not a substitute for reading the Regulation. It is a structured walk through the areas where AI/ML medical device startups most often either miss something required or do work that is not required. Every item names the provision it traces to. If an item does not apply to your device because your classification or intended purpose excludes it, take it off your list. That is the Subtract to Ship discipline at work. If an item applies and is missing from your file, it is not optional.

Tibor tells a story from a customer of Flinn.ai, his second company, that frames the mindset this checklist needs. The customer had two people categorising incoming vigilance reports by hand, line by line in Excel. Both of them quit because the work did not match their qualifications. The team replaced the manual layer with an AI pre-categoriser that saved roughly eighty percent of the hours, useful, also dangerous: after ten correct pre-categorisations in a row, humans stop reading carefully, and the eleventh one slips through. That pattern is the AI failure mode that runs through every item on this checklist. Wherever a model sits in the workflow, inside the device or inside the regulatory operations around it, the human who signs still owns the outcome, design accordingly.

Section 1. Intended purpose and qualification (MDR Article 2(1), MDCG 2019-11 Rev.1)

The entry point for every medical device is the intended purpose. MDR Article 2(1) defines a medical device by what it is intended to do, not by how it is built. A neural network and a deterministic algorithm answer the same qualification question. MDCG 2019-11 Rev.1 (June 2025) is the guidance document that governs qualification and classification of software in the MDR context and does not carve AI out as a separate category.

Checklist items:

[ ] Intended purpose is written in a single precise sentence that names who uses the device, on which patient population, for which medical purpose, and in which clinical context.
[ ] The intended purpose maps to one of the medical purposes listed in Article 2(1). Diagnosis, prevention, monitoring, prediction, prognosis, treatment, alleviation of disease, or the other categories in the definition.
[ ] Every external artefact that describes the product (website, investor deck, scientific publications, sales materials) is consistent with the intended purpose. No medical claim lives outside the regulated scope.
[ ] You have applied the qualification decision tree from MDCG 2019-11 Rev.1 explicitly and recorded the result.
[ ] If the product has non-medical AI features bundled with medical AI features, the boundary between them is documented and the non-medical features are excluded from the regulated scope where possible.

This is the single highest-use section of the checklist. A precise intended purpose can drop a device one class, remove a feature from scope entirely, or, in rare cases, move a product out of MDR altogether.

Section 2. Classification under Rule 11 (MDR Article 51, Annex VIII Rule 11)

MDR Article 51 and Annex VIII govern classification. For AI/ML software that drives or supports clinical decisions, Rule 11 is the rule that applies in almost every case. The default under Rule 11 is Class IIa. The class moves up to IIb when decisions can cause serious deterioration of a person's state of health or a surgical intervention, and up to III when decisions can cause death or irreversible deterioration. Software intended to monitor physiological processes sits at IIa or IIb depending on the criticality of the parameters. Very few AI medical devices are Class I.

Checklist items:

[ ] The device class is assigned with explicit reference to the specific sub-clause of Rule 11 that applies.
[ ] The severity of the clinical decision the AI informs has been characterised and documented. Routine, serious harm, life-threatening.
[ ] If the device monitors physiological parameters, the criticality of those parameters is documented with a clinical rationale.
[ ] The classification rationale has been sanity-checked against examples in MDCG 2019-11 Rev.1.
[ ] The conformity assessment procedure has been selected from the annexes referenced by Article 52 and is consistent with the assigned class.
[ ] A Notified Body has been contacted (Class IIa and above always require Notified Body involvement).

Section 3. Training data governance and bias

The training data is, in effect, part of the product. A model that is accurate on the training population and inaccurate on a subgroup has a bias that is a hazard under risk management. Under MDR, the obligation to manage this risk sits at the intersection of Annex I Section 17 (software shall be developed according to the state of the art, taking into account the principles of development life cycle and risk management) and the risk management process defined in EN ISO 14971:2019+A11:2021. The AI Act adds explicit obligations around training data quality, representativeness, relevance to the intended use population, and documentation of provenance for high-risk AI systems.

Annex I Section 17 requires software to be developed in accordance with the state of the art and the principles of the development lifecycle.

Checklist items:

[ ] Dataset provenance is documented. Where the data came from, under what legal basis, with what consent, and with what rights to use it for model training.
[ ] The training, validation, and test sets are defined, versioned, and isolated from each other. Test set contamination is ruled out.
[ ] Representativeness of the training data against the intended use population is analysed and documented. Demographics, clinical subgroups, disease severity distribution, device settings, care setting.
[ ] Known gaps in representativeness are listed and either closed (more data) or declared as limitations in the intended use and instructions for use.
[ ] Bias testing is run across clinically meaningful subgroups. Results are documented with acceptance criteria defined in advance.
[ ] A data governance file exists as a distinct section of the technical documentation and is ready for Notified Body review.
[ ] Data protection compliance (GDPR) is confirmed separately and does not replace the data quality work above.

Section 4. Algorithm lifecycle management: locked versus adaptive (MDR Annex I Section 17, EN 62304:2006+A1:2015)

Annex I Section 17 requires software to be developed in accordance with the state of the art and the principles of the development lifecycle. EN 62304:2006+A1:2015 is the harmonised software lifecycle standard referenced in the medical device context. It was written before the AI era but its discipline, requirements, architecture, implementation, verification, release, applies. The specifically AI question is whether the algorithm is locked or adaptive, and how changes are controlled.

Checklist items:

[ ] The algorithm's update model is explicitly declared: locked, controlled release updates, or predefined change control plan with pre-approved change envelopes.
[ ] If updates are allowed, the change envelope is documented before first certification. What can change (weights, thresholds, input features), under what conditions, with what re-validation gates.
[ ] Model versioning is implemented end-to-end. Every deployed model instance is traceable back to its training dataset version, code version, and hyperparameters.
[ ] A model registry exists with immutable records of every model that has been released or considered for release.
[ ] The development lifecycle documented in the technical file matches the lifecycle actually used by the team. No theatre documentation.
[ ] Continuously learning behaviour without a defined change envelope is explicitly ruled out, because no clean CE marking pathway currently exists for it in the EU.

Section 5. Risk management for algorithmic failure modes (EN ISO 14971:2019+A11:2021)

EN ISO 14971 is the harmonised risk management standard. For AI/ML devices, the risk file has to identify and control failure modes that classical software does not have. A risk file copy-pasted from a non-AI template will miss them.

Checklist items:

[ ] Bias failure modes are identified (systematic under-performance on a subgroup) and controlled. Through data work, through intended use restrictions, through labelling, or through a combination.
[ ] Distribution shift is identified as a hazard and monitored in post-market surveillance with defined thresholds.
[ ] Adversarial robustness is assessed at a level proportionate to the clinical risk and the deployment context.
[ ] Explainability gaps are identified as a hazard where the clinician's ability to appropriately trust or distrust the output depends on interpretability.
[ ] Automation complacency. Clinicians rubber-stamping the AI after repeated correct outputs. Is identified as a use-error hazard with controls in the instructions for use and training materials.
[ ] The risk-benefit determination in the risk file aligns with the clinical evaluation conclusions.
[ ] Residual risks are communicated in the instructions for use.

Section 6. Clinical evaluation for AI performance (MDR Article 61, Annex XIV)

Clinical evaluation under Article 61 and Annex XIV does not change conceptually for AI, but the evidence content has to address AI-specific questions. Literature and equivalence are rarely sufficient on their own because the specific model is new; a retrospective performance study on an independent dataset or a prospective clinical investigation is usually required.

Checklist items:

[ ] Clinical performance is measured on an independent test set that was not used for model development in any form.
[ ] Performance metrics match the clinical question. Sensitivity, specificity, positive and negative predictive values for the prevalence in the intended use population, not for the training set prevalence.
[ ] Subgroup performance is reported with the same metrics, for every clinically meaningful subgroup identified in the risk management file.
[ ] Failure modes are characterised. Where the model fails, whether it fails silently or loudly, and what the downstream clinical consequences look like.
[ ] The clinician-in-the-loop effect is addressed where the device is decision-support: how do clinicians use the output, and does the net clinical outcome with clinician + model exceed clinician alone.
[ ] The clinical evaluation report (CER) is aligned with the risk management file and the intended use statement.
[ ] The state of the art is documented with current literature and the device is positioned against it.

In Practice

EN ISO 14971 is the harmonised risk management standard.

Section 7. Cybersecurity (EN IEC 81001-5-1:2022, MDCG 2019-16)

Cybersecurity for AI medical devices sits inside the general cybersecurity obligation for health software. EN IEC 81001-5-1:2022 is the harmonised standard for security activities across the health software lifecycle. MDCG 2019-16 is the guidance document on cybersecurity for medical devices. AI systems introduce threats that classical software does not: model theft, adversarial inputs, data poisoning during updates. Those threats belong in the same security lifecycle as every other software threat.

Checklist items:

[ ] A security risk assessment exists and covers the full lifecycle, consistent with EN IEC 81001-5-1:2022.
[ ] AI-specific threats are listed. Model extraction, adversarial input, data poisoning on update pipelines, prompt injection for generative components, supply-chain risk in pre-trained models.
[ ] Controls against each threat are implemented and documented.
[ ] Secure development practices (code review, dependency management, vulnerability scanning) are part of the software lifecycle and match what EN 62304 and EN IEC 81001-5-1 expect.
[ ] A vulnerability disclosure and response process exists and feeds into the PMS system.
[ ] Cybersecurity aspects are reflected in the technical documentation and in the instructions for use where the user needs to act.
[ ] The cybersecurity assessment is consistent with MDCG 2019-16 expectations.

Section 8. Post-market surveillance with drift detection (MDR Articles 83-86)

MDR Articles 83 through 86 require every manufacturer to operate a post-market surveillance system proportionate to the risk class and appropriate for the device. For AI, "appropriate" means drift detection is part of the PMS system. A classical device does not change behaviour because the field changed. An AI device effectively does, not because the model moved, but because the input distribution shifted.

Checklist items:

[ ] A PMS plan exists and names the AI-specific monitoring activities, not only generic complaint handling.
[ ] Input distribution monitoring is instrumented. The system tracks whether real-world inputs stay within the training distribution and flags drift.
[ ] Output distribution monitoring is instrumented. The system tracks whether model outputs stay within expected ranges over time.
[ ] Clinical outcome monitoring is implemented where feasible, closing the loop between model output and real-world consequence.
[ ] Drift thresholds are defined in advance with escalation actions. Investigation, retraining, field safety corrective action. Tied to each threshold.
[ ] The PMS system is actually running (not only documented). A PMS system that exists on paper and nowhere else is a finding waiting to happen.
[ ] Mandatory spot-check rates are defined for any AI-assisted review step inside the PMS workflow, to counter automation complacency.
[ ] Feedback from the PMS system into the risk management file and the technical documentation is a defined loop, not an ad hoc action.

Section 9. The AI Act layer (Regulation (EU) 2024/1689)

The EU AI Act is a horizontal Regulation that adds a second layer on top of MDR for AI systems used in safety-critical contexts, including many medical devices. The AI Act expects its obligations to be integrated into the existing sectoral conformity assessment rather than duplicated as a parallel process. The detailed operational interface between MDR and the AI Act is still being clarified by the Commission, the Medical Device Coordination Group, Notified Bodies, and AI Act governance bodies, and we are being deliberately careful not to cite specific AI Act article numbers where we cannot verify them against ground truth. The checklist items below reflect the areas of obligation named by the AI Act at the general level and how they map into the existing MDR technical file.

Checklist items:

[ ] The status of the AI system under the AI Act is assessed: whether it falls within the scope of the Regulation and, if so, in which risk tier.
[ ] Transparency to users (the fact that they are interacting with an AI system) is addressed in the instructions for use and in the user interface where relevant.
[ ] Human oversight requirements are designed into the workflow appropriate to the clinical use context: who oversees, with what information, with what ability to intervene.
[ ] Training data documentation is at a level that would satisfy the AI Act's data governance expectations as well as the MDR risk-based reading of Annex I Section 17.
[ ] Technical documentation specific to the AI system is ready as a distinct section or annex that an auditor can locate without reading the entire file.
[ ] The integration of AI Act obligations into the MDR conformity assessment has been discussed with the Notified Body in advance, because the operational mechanics are still being defined in 2026.
[ ] The project tracks AI Act obligations as a live list that will be updated as official guidance solidifies. Do not freeze the AI Act picture as of one point in time.

Section 10. Technical documentation alignment with Annex II

The MDR technical documentation is set out in Annex II. For an AI/ML device, every item in the checklist above has to land somewhere in the Annex II structure, in a way an auditor can manage.

Checklist items:

[ ] Device description and specification (Annex II section on device description) names the AI components, their intended purpose, and their role in the overall device function.
[ ] Information to be supplied by the manufacturer (labels, IFU) reflects the AI-specific content. Intended use population, limitations, human oversight requirements, known failure modes, transparency statements.
[ ] Design and manufacturing information covers the software lifecycle, the model development process, and the data governance file.
[ ] GSPR checklist (Annex I) is filled out with AI-relevant references where applicable, in particular Section 17.
[ ] Benefit-risk analysis and risk management file align with the clinical evaluation and include AI-specific failure modes.
[ ] Product verification and validation include AI-specific verification (model performance on the test set) and validation (clinical evaluation).
[ ] The clinical evaluation report is indexed in the technical file and is current.
[ ] The PMS plan and PSUR/PMS report structure are in place and reference the drift detection work.

Strategic Approach

The Subtract to Ship framework runs the same four passes on AI/ML medical devices as on any other device: Purpose, Classification, Evidence, Operations. For AI, the passes cut in places where founders are most likely to over-build.

The Purpose Pass cuts AI features that do not need to be medical devices. An AI feature that generates internal documentation is not a medical device. An AI feature that summarises clinical notes for non-medical display is usually not. Only the AI features that actually perform a medical function under Article 2(1) belong inside the regulated envelope.

The Classification Pass cuts the reflex assumption that any AI decision-support tool is IIb. Rule 11 has gradations. A precise intended purpose and a careful reading of the severity of the supported decision can legitimately place a device at IIa where the founder assumed IIb, and that difference changes the conformity assessment cost materially.

The Evidence Pass cuts duplicated clinical evidence. A solid retrospective performance study on an independent dataset plus targeted literature plus subgroup analysis can satisfy the clinical evaluation for many Class IIa AI devices. Adding a prospective clinical investigation on top, when the risk class does not require it, is added work that does not trace to an obligation.

Only the AI features that actually perform a medical function under Article 2(1) belong inside the regulated envelope.

The Operations Pass cuts the overbuilt QMS. A small team with an AI device does not need the QMS of a hospital group. It needs a QMS that covers the AI-specific processes, data governance, model versioning, drift monitoring, revalidation triggers, sitting on top of an EN ISO 13485-aligned process backbone and nothing more.

Everything on this checklist traces to a specific obligation. Everything not on this checklist is either not required or is optional additional rigour that the founder has chosen deliberately with their eyes open.

What This Means in Practice

Can you, right now, name every item on this checklist that is fully covered in your current technical file, and every item that is not?
Do you have a single precise sentence for the intended purpose of your AI feature, mapped to a specific medical purpose in Article 2(1)?
Is your Rule 11 classification documented with the specific sub-clause, or is it assumed?
Do you have a data governance file as a distinct section of your technical documentation, or is the data work scattered across team notebooks?
Have you declared, in writing, whether your algorithm is locked or has a predefined change envelope?
Does your risk management file list AI-specific failure modes (bias, drift, adversarial robustness, explainability, automation complacency), or was it built from a template written for hardware devices?
Does your clinical evaluation report include subgroup performance on an independent test set?
Is your PMS plan instrumented for drift detection with thresholds and escalation paths, or is it passive complaint handling?
Have you had a first conversation with your Notified Body about the integration of AI Act obligations into the MDR conformity assessment?
If your Notified Body asked "why is this AI safe and effective in its intended use population," do you have an answer that does not reduce to "because we tested it"?

Frequently Asked Questions

Is this checklist official?

No. It is a practitioner checklist compiled by Tibor Zechmeister and Felix Lenhard for the Subtract to Ship: MDR blog. Every item is anchored to a specific MDR article, annex, harmonised standard, or MDCG guidance document. The items are real obligations, but the checklist format and the selection of ten areas is our editorial choice, not a regulator-issued document.

Does an AI/ML medical device need a separate AI Act conformity assessment on top of MDR?

Not as a duplicated parallel process. The AI Act expects its obligations to be integrated into the existing sectoral conformity assessment for products already covered by harmonisation legislation, medical devices under MDR included, the practical mechanics of that integration are still being clarified between the Commission, the Medical Device Coordination Group, Notified Bodies, and AI Act governance bodies in 2026. Startups should discuss the specific approach with their Notified Body.

Can a startup skip the data governance file if the model was trained on public datasets?

No. The data governance obligation does not depend on where the data came from. Public datasets still need provenance documentation, representativeness analysis against the intended use population, test set isolation, and bias testing. The fact that a dataset is public does not mean it is representative of the patients your device will see.

Can a continuously learning AI medical device be certified in the EU?

Not cleanly in 2026. The MDR framework assumes a defined device configuration at the point of placing on the market and triggers re-assessment for significant changes. A fully autonomous continuously-learning model without a defined change envelope does not fit that framework. The practical pathway today is a locked algorithm or a predefined change control plan that specifies in advance how and when updates can occur.

How long does it take to cover this checklist end to end for a first-time AI MedTech startup?

That depends on the starting state, the class, and the team. A Class IIa SaMD with a capable small team, starting from a clear intended purpose and a reasonable QMS, can run the full checklist over several months of focused work alongside product development. A Class IIb device with complex clinical subgroups and extensive bias testing takes longer. Any consultant who gives you a confident number without seeing your specifics is selling, not estimating.

What is the single most common gap on this checklist?

Post-market drift detection. Startups document it in the PMS plan and then fail to instrument it in the running system. An auditor who asks to see the drift monitoring output often gets an answer that starts with "we're working on that," which is a finding.

AI in Medical Devices Under MDR: The Regulatory Environment in 2026, the pillar post that frames the AI MedTech field.
What Is Software as a Medical Device (SaMD) Under MDR?, the broader SaMD context.
Classification of AI and ML Software Under Rule 11, the practical walk-through of Annex VIII Rule 11 for AI products.
Machine Learning Medical Devices Under MDR, ML development under MDR discipline.
Locked Versus Adaptive AI Algorithms Under MDR, the open question on continuous learning.
Clinical Evaluation for AI and ML Medical Devices, the evidence expectations specific to AI products.
The EU AI Act and MDR: How the Two Regulations Interact, the dedicated post on the two-Regulation overlap.
Post-Market Surveillance for AI Medical Devices, drift detection and operational PMS patterns.
AI in Post-Market Surveillance: Complaint Analysis, deeper dive on the complaint workflow.
Training Data Governance for AI Medical Devices, the data governance file in detail.
Bias Testing for AI Medical Devices, subgroup analysis in practice.
How Flinn.ai and AI Tools Are Transforming Regulatory Work for Startups, the other side of the equation: AI inside the regulatory process.
The Subtract to Ship Framework for MDR Compliance, the methodology that runs through every post in this blog.

Sources

Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 2(1) (definition of medical device), Article 51 (classification), Article 61 (clinical evaluation), Articles 83-86 (post-market surveillance), Annex I (GSPR, in particular Section 17 on electronic programmable systems and software), Annex II (technical documentation), Annex VIII (classification rules, in particular Rule 11). Official Journal L 117, 5.5.2017, consolidated text.
Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Referenced by name for the general AI Act layer. Specific article references are intentionally not cited where they could not be verified against the ground truth catalog; founders should consult the official text on EUR-Lex.
MDCG 2019-11 Rev.1. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745. MDR and Regulation (EU) 2017/746. IVDR, October 2019, Revision 1 June 2025.
MDCG 2019-16. Guidance on Cybersecurity for medical devices, December 2019 and subsequent revisions.
EN 62304:2006 + A1:2015. Medical device software. Software life-cycle processes.
EN ISO 14971:2019 + A11:2021. Medical devices. Application of risk management to medical devices.
EN IEC 81001-5-1:2022. Health software and health IT systems safety, effectiveness and security. Part 5-1: Security. Activities in the product life cycle.

This post is part of the AI, Machine Learning & Algorithmic Devices series in the Subtract to Ship: MDR blog, authored by Felix Lenhard and Tibor Zechmeister, this checklist will be updated as the operational interface between MDR and the EU AI Act is clarified by the Commission and the Medical Device Coordination Group. If the general framing here does not resolve your specific device, and for a novel AI product, it often will not, that is expected: the domain is complex, every device is different, and that is exactly where a sparring partner who has walked other AI MedTech founders through the same decisions earns their keep.

AI/ML Medical Device Compliance Checklist for Startups in 2027

TL;DR

A note before the checklist

Section 1. Intended purpose and qualification (MDR Article 2(1), MDCG 2019-11 Rev.1)

Section 2. Classification under Rule 11 (MDR Article 51, Annex VIII Rule 11)

Section 3. Training data governance and bias

Section 4. Algorithm lifecycle management: locked versus adaptive (MDR Annex I Section 17, EN 62304:2006+A1:2015)

Section 5. Risk management for algorithmic failure modes (EN ISO 14971:2019+A11:2021)

Section 6. Clinical evaluation for AI performance (MDR Article 61, Annex XIV)

Section 7. Cybersecurity (EN IEC 81001-5-1:2022, MDCG 2019-16)

Section 8. Post-market surveillance with drift detection (MDR Articles 83-86)

Section 9. The AI Act layer (Regulation (EU) 2024/1689)

Section 10. Technical documentation alignment with Annex II

Strategic Approach

What This Means in Practice

Frequently Asked Questions

Is this checklist official?

Does an AI/ML medical device need a separate AI Act conformity assessment on top of MDR?

Can a startup skip the data governance file if the model was trained on public datasets?

Can a continuously learning AI medical device be certified in the EU?

How long does it take to cover this checklist end to end for a first-time AI MedTech startup?

What is the single most common gap on this checklist?

Related reading

Sources