Clinical Investigations for Software: Do You Need a Clinical Study for Your SaMD?

Software as a medical device sometimes requires a pre-market clinical investigation and sometimes does not. Under MDR Article 61, the clinical evaluation of SaMD must generate sufficient clinical evidence for the intended purpose. But the evidence can come from scientific literature, equivalence under MDCG 2019-11 Rev.1, a performance study on retrospective data, or a prospective clinical investigation under MDR Article 62 and Annex XV. A prospective investigation is mandatory in practice for Class III SaMD, for novel Class IIb SaMD whose clinical claims cannot be supported any other way, and for any SaMD where the intended purpose depends on a clinical outcome that has never been measured on the target population with the specific algorithm. For most Class IIa SaMD, a well-designed retrospective performance study against a locked test set, combined with a structured literature review and a PMCF plan, is the disciplined path. The wrong answer is not the expensive answer. It is the one that does not match the scientific question the Notified Body is going to ask.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.

TL;DR

SaMD clinical evidence sits under MDR Article 61 and Annex XIV exactly as any other device. There is no software carve-out.
A prospective clinical investigation under MDR Article 62 and Annex XV is one source of evidence, not the default. For most Class IIa SaMD, it is not the first move.
Retrospective performance studies on locked test sets are a legitimate evidence source for diagnostic and prognostic SaMD, and usually cheaper and faster than a prospective trial.
Any study that deploys the software on patients to generate new safety or performance data is a clinical investigation under Article 2(45) and triggers the full framework. Ethics approval, sponsor obligations, Article 80 reporting, EN ISO 14155:2020+A11:2024 compliance.
Class III SaMD classified under Annex VIII Rule 11 almost always requires a prospective clinical investigation under MDR Article 61(4).
MDCG 2019-11 Rev.1 (June 2025) is the authoritative guidance on SaMD qualification and classification, and sets the frame for what clinical evidence a Notified Body will expect.
EN 62304:2006+A1:2015 governs the software lifecycle under MDR Annex I Section 17. It is not a substitute for clinical evidence. It is the floor on which the clinical argument stands.

A Salzburg SaaS story: the Phase 1 scope cut

A company we worked with in Salzburg arrived with a SaMD built around a machine learning model that scored risk on routine clinical data. The founders had already been told by a consultant that the path to CE marking would involve a prospective multi-site clinical investigation with several hundred patients, two years of recruitment, and a budget that did not exist on the runway they had. They were three weeks away from a board meeting where the company would either pivot out of MedTech entirely or raise a bridge they did not believe they could close.

The question we asked before accepting the plan was the same question we always ask. What is the single scientific claim the clinical evaluation has to support, and what is the cheapest legitimate way to support it under MDR Article 61? The claim, written honestly, turned out to be narrower than the intended purpose the consultant had written. The algorithm did not need to prove it changed patient outcomes in Phase 1. It needed to prove that its risk score, on an independent and locked test set drawn from the target population, matched the clinical reference standard with a pre-specified performance metric, within a pre-specified confidence interval.

That is a retrospective performance study, not a prospective clinical investigation. It can be built from existing annotated data, run in months instead of years, and the results feed directly into a clinical evaluation under MDR Article 61 and Annex XIV Part A. The prospective outcome-level evidence. The part that costs millions and takes years. Becomes a post-market clinical follow-up commitment under Annex XIV Part B, not a pre-market blocker. The Salzburg company shipped Phase 1 on a fraction of the original budget. The PMCF commitment was real and they are executing it now. Nothing was skipped. Everything was sequenced correctly.

That sequencing is the subject of this post.

When a SaMD actually needs a clinical investigation

A SaMD needs a prospective clinical investigation under MDR Article 62 and Annex XV in three situations. The rest of the time, a different evidence route is probably the right answer.

Situation 1. The device is Class III under Annex VIII Rule 11. Software that provides information used to make decisions for diagnostic or therapeutic purposes lands in Class IIa by default under Rule 11 and escalates to IIb or III depending on the seriousness of the decision. A Class III SaMD falls under the general rule of MDR Article 61(4): clinical investigations shall be performed, unless one of the narrow exemptions applies. For Class III SaMD, a prospective investigation is the starting assumption, not an optional addition.

Situation 2. The clinical claim cannot be supported by existing data. If the intended purpose involves a patient population that has not been studied, a clinical outcome that has not been measured, or an interaction between the algorithm and clinical workflow that no published study addresses, literature and retrospective data will not close the evidence gap. A targeted prospective investigation may be the only honest path to the evidence the clinical evaluation actually needs.

Situation 3. The Notified Body will not accept the alternatives. Even when a literature-plus-retrospective route is technically defensible, some Notified Bodies. Based on the device category and their own experience. Will require prospective evidence. This conversation happens early with the Notified Body, not after the clinical evaluation report is written. Finding out at the review stage that your evidence base is not acceptable is the most expensive way to learn the lesson.

For most Class IIa SaMD, and for a meaningful share of Class IIb SaMD, none of the three situations apply. The evidence can be assembled from literature, retrospective performance studies, and recognised standards. The disciplined question is not "do we run a clinical investigation." It is "what is the minimum evidence base that supports the intended purpose, and what is the cheapest legitimate way to assemble it."

Retrospective versus prospective evidence for software

The distinction between retrospective and prospective evidence matters more for SaMD than for almost any other device category, because software, especially diagnostic and prognostic algorithms, can often be validated against data that already exists.

Retrospective performance study. The algorithm is run on a locked, independent test set drawn from the target population. The test set was collected before the study started, was never seen by the model during training or tuning, and is representative of the patient population in the intended purpose. The reference standard. Ground truth. Is established independently of the algorithm's output. Performance metrics (sensitivity, specificity, AUC, calibration, and whatever the clinical question requires) are pre-specified with confidence intervals. The study protocol is written and signed before the data is touched.

This is not a clinical investigation under MDR Article 2(45). No human subject is enrolled. No new data is generated on patients. The study operates on data that already exists. It does not trigger ethics committee approval for the study itself, although the underlying data collection must have been ethically obtained with appropriate consent and data protection under the applicable national and EU law.

Prospective clinical investigation. The algorithm is deployed in a clinical setting on subjects enrolled specifically for the study, new data is generated, and the study measures either algorithm performance, clinical outcomes, or clinical workflow effects in real time. This is a clinical investigation under MDR Article 2(45) the moment the first enrolled subject interacts with the device for the purpose of generating safety or performance data. The full framework applies. Ethics committee approval, competent authority notification under MDR Article 70, sponsor obligations under Article 72, adverse event reporting under Article 80, end-of-investigation reporting under Article 77, and execution under EN ISO 14155:2020+A11:2024.

For many SaMD, a staged approach works. Retrospective performance on locked test sets for pre-market evidence; prospective outcome-level evidence committed as PMCF under Annex XIV Part B. This is the pattern the Salzburg story illustrates and the pattern MDCG 2019-11 Rev.1 recognises as a legitimate way to generate sufficient clinical evidence when it is rigorously executed.

Performance studies as an alternative under MDCG 2019-11 Rev.1

MDCG 2019-11 Rev.1 (October 2019, revised June 2025) is the authoritative guidance on qualification and classification of software as a medical device under the MDR and IVDR. The guidance recognises that the clinical evaluation for Medical Device Software has three distinct components: the valid clinical association between the software output and the targeted clinical condition, the technical performance of the algorithm, and the clinical performance of the software in the intended use context.

A retrospective performance study, rigorously designed, addresses the first two components directly and contributes partially to the third. Combined with a structured literature review that establishes the valid clinical association. The scientific evidence that the input data and the clinical condition are meaningfully linked. The performance study can carry a substantial share of the clinical evaluation without a new prospective investigation.

What a disciplined performance study requires: a pre-specified protocol, a locked and independent test set, a clearly defined reference standard, pre-specified performance metrics with acceptance criteria, a statistical analysis plan, subgroup analysis for relevant patient populations, and honest reporting of limitations. The study is documented, archived, and integrated into the clinical evaluation report under MDR Article 61 and Annex XIV Part A.

This is not a cheaper version of a clinical investigation. It is a different kind of study, suited to a specific kind of evidence question. Done well, it answers the technical and analytical performance question better than a prospective clinical investigation would. Done poorly. Cherry-picked test sets, post-hoc metric selection, reference standards that leak into the training data. It answers nothing and will be rejected by any competent Notified Body.

The mistake we see most often is founders assuming that because the device is software, the ethics and sponsor framework does not apply. It does. The framework attaches to the activity, not the technology.

A retrospective performance study on already-collected, anonymised or pseudonymised data typically does not require a new ethics committee approval for the study itself, but the underlying data must have been collected under valid ethical approval and consent, and the use of the data for secondary research must comply with the applicable data protection law (GDPR and national implementations) and any conditions attached to the original consent. A data protection impact assessment is usually required. Institutional review may be required by the data provider even if no new ethics submission is needed.

A prospective clinical investigation on SaMD. The moment a subject is enrolled and the software is run on their data for the purpose of generating new safety or performance evidence. Requires full ethics committee approval under the national procedure, competent authority notification under MDR Article 70 where the Regulation requires it, and execution under EN ISO 14155:2020+A11:2024. The sponsor obligations under MDR Article 72 apply identically to a SaMD sponsor as to a hardware sponsor. Adverse event reporting under Article 80 applies. End-of-investigation reporting under Article 77 applies. The fact that the device is code rather than metal does not reduce any obligation.

The red line is the same as for hardware. No deployment of a SaMD on patients for the purpose of generating safety or performance data without the full approved framework. "Internal pilot," "friendly hospital test," "silent mode on live data to see how it performs". None of these are legal workarounds. A silent-mode evaluation that compares algorithm output against clinician decisions in a live clinical setting is a clinical investigation the moment its purpose is to generate evidence about the device. Dress it up however you like; the MDR reads it the same way.

Lean investigation design for SaMD

When a prospective investigation is genuinely needed, the lean design principles from our operational post How to Run a Lean Clinical Investigation as a Startup with Limited Budget apply directly, with software-specific adjustments.

Define one scientific question and one primary endpoint. For SaMD, the endpoint is usually an algorithm performance metric measured in the clinical workflow, or a clinical decision accuracy compared to a reference standard or to standard-of-care decisions. Every extra endpoint must defend itself against the protocol, the ethics committee, and the budget.

Size the study to the statistical minimum. A biostatistician familiar with diagnostic accuracy studies calculates sample size from the expected performance, the required confidence interval, and the prevalence of the clinical condition. The number from the calculation, plus dropout buffer, is the protocol number. Not double.

Choose one or two clinical partner sites, recruited on day one. The principal investigator should understand the algorithm and the intended workflow, not just the clinical area. A site whose data the model has already seen during development is disqualified. The test data must be independent.

Lock the model before the study starts. A clinical investigation on a moving target is not a clinical investigation. The model version under investigation is fixed, version-controlled under EN 62304:2006+A1:2015 software configuration management, and unchanged for the duration of the study. Any update to the model during the study terminates the investigation and starts a new one.

Plan PMCF from day one. Whatever the prospective pre-market investigation does not address is a PMCF commitment under Annex XIV Part B. For SaMD, PMCF typically includes drift monitoring, ongoing performance tracking on representative data, subgroup performance surveillance, and a clear trigger for corrective action if performance degrades. The pre-market investigation and the PMCF plan are designed together, not sequentially.

Common founder errors

Assuming the software is exempt from clinical evaluation. No medical device is exempt. MDR Article 61 applies to every device on the EU market. A SaMD without a clinical evaluation is a SaMD without a legal basis for CE marking.

Confusing EN 62304:2006+A1:2015 compliance with clinical evidence. EN 62304:2006+A1:2015 governs the software development lifecycle. It is the floor for demonstrating that the software was built under a controlled process. It is not clinical evidence. A fully EN 62304:2006+A1:2015-compliant software lifecycle with zero clinical evidence is not a certifiable device.

Using training data as test data. The test set for any performance study must be genuinely independent of the data used for training, tuning, feature selection, and model selection. Data leakage is the single most common technical error in SaMD clinical evaluations and the one Notified Bodies catch fastest.

Running "silent mode" evaluations on live patient data. If the purpose is to generate evidence about the device, it is a clinical investigation. No exceptions. Silent-mode deployments without ethics approval and proper authorisation are a red line, not a shortcut.

Designing the investigation before defining the intended purpose. The clinical evaluation, including any investigation, is built to support the specific claims in the intended purpose. An investigation designed before the intended purpose is locked will either over-scope (expensive) or miss the target (useless).

Treating the clinical investigation as the end of the clinical evaluation. The clinical evaluation is a lifecycle process under Annex XIV Part A and Part B. The pre-market investigation is one input. PMCF under Annex XIV Part B is the ongoing input. A software company that treats the investigation as a one-time milestone will rebuild the clinical evaluation under audit pressure later.

The Subtract to Ship angle

The Evidence Pass for SaMD runs in the same order as for any device, with software-specific content at each step. Define the intended purpose honestly. Identify the specific GSPR claims and classification implications under Annex VIII Rule 11. For each clinical claim, evaluate literature, retrospective performance data on locked test sets, equivalence under MDCG 2019-11 Rev.1 where legitimately available, and only then a new prospective investigation. Assemble the clinical evaluation from the cheapest combination of sources that genuinely answers the questions the GSPRs pose. For the gaps the first three sources cannot close, design the prospective investigation. Small, one or two sites, one primary endpoint, locked model, PMCF commitment attached.

The Salzburg story at the start of this post is that pass executed correctly. A consultant had written a multi-year prospective investigation as the default. The pass reversed the order. Literature and locked retrospective performance on independent data for the pre-market evidence. Prospective outcome evidence deferred to PMCF. The result satisfied MDR Article 61 and Annex XIV without burning runway on a trial that was not required for the specific claims in the intended purpose.

This is not permission to skip work. When the three situations for prospective investigation apply, the investigation is the right answer and it must be done properly under MDR Article 62, Annex XV, and EN ISO 14155:2020+A11:2024. The Evidence Pass is the discipline of not defaulting to the most expensive pathway before the cheaper ones have been honestly evaluated. Our methodology pillar The Subtract to Ship Framework for MDR Compliance covers the full framework.

Reality Check. Where do you stand?

Do you know your SaMD classification under Annex VIII Rule 11, and have you confirmed it against MDCG 2019-11 Rev.1 rather than your own best guess?
Can you state the single clinical claim your clinical evaluation has to support, in one sentence, without hedging?
Have you honestly evaluated whether a locked-test-set retrospective performance study answers the scientific question, before committing to a prospective investigation?
If you are planning a retrospective study, is your test set genuinely independent of training, tuning, and model selection data?
If you are planning a prospective investigation, is the model version under investigation locked and under EN 62304:2006+A1:2015 configuration management for the full duration of the study?
Have you had the conversation with the Notified Body about what evidence base they will accept, before you commit to a study design?
Is your PMCF plan under Annex XIV Part B designed alongside the pre-market evidence strategy, or are you planning to "figure it out after CE marking"?
Have any evaluations of your software on live patient data happened outside an ethics-approved framework? If yes, stop and call counsel before proceeding.

Frequently Asked Questions

Does every SaMD need a clinical investigation under MDR? No. Every SaMD needs a clinical evaluation under MDR Article 61, but a prospective clinical investigation under MDR Article 62 and Annex XV is only one possible source of clinical evidence. For most Class IIa SaMD and a share of Class IIb SaMD, the clinical evaluation can be built from scientific literature, retrospective performance studies on locked test sets, and recognised standards, without a new prospective investigation. Class III SaMD under Annex VIII Rule 11 almost always requires a prospective investigation under the general rule of MDR Article 61(4).

Is a retrospective performance study a clinical investigation under MDR? Usually no. A retrospective performance study that runs the algorithm on a locked, independent test set of already-collected data does not enrol new human subjects and does not generate new clinical data. It is not a clinical investigation under MDR Article 2(45) and does not trigger the Article 62 to 82 framework. The underlying data must have been ethically obtained and the study must comply with applicable data protection law, but the study itself is a different kind of evidence generation. MDCG 2019-11 Rev.1 recognises performance studies as a legitimate component of SaMD clinical evaluation.

What is a "silent mode" deployment and why is it a problem? A silent-mode deployment is when the software runs on live patient data in a clinical setting, its outputs are recorded, and they are compared against clinician decisions or a reference standard. Without the software actually influencing care. If the purpose of the silent-mode deployment is to generate safety or performance evidence about the device, it meets the definition of a clinical investigation under MDR Article 2(45) and requires full ethics committee approval and the applicable competent authority notification under MDR Article 70. A silent-mode evaluation run outside that framework is unauthorised clinical investigation, not a clever workaround.

Does EN 62304:2006+A1:2015 compliance replace clinical evidence? No. EN 62304:2006+A1:2015 is the harmonised standard for medical device software lifecycle processes. It governs how the software is developed, verified, maintained, and retired. It is the floor on which the clinical argument stands, not a substitute for the argument. A SaMD can be fully EN 62304:2006+A1:2015 compliant and still fail a clinical evaluation if the clinical evidence for the intended purpose is insufficient under MDR Article 61 and Annex XIV.

How does PMCF work for a SaMD that did not run a prospective pre-market investigation? PMCF under MDR Annex XIV Part B is the ongoing generation of clinical evidence after CE marking. For a SaMD whose pre-market evidence was built from literature and retrospective performance studies, PMCF typically commits to prospective performance monitoring, drift detection, subgroup performance surveillance, and pre-specified triggers for corrective action if performance degrades below acceptance criteria. The PMCF plan is part of the technical documentation at the time of CE marking and is reviewed by the Notified Body. It is not "future work". It is a current, committed, executable plan.

Can AI tools be used to design or run SaMD clinical investigations? AI tools can help with literature screening, protocol drafting, statistical analysis plan drafting, adverse event coding, and monitoring dashboards. Our direct experience with Flinn.ai and the state of tooling in 2026 is that the core regulatory and clinical judgements. Scientific question, endpoint selection, sample size justification, risk-benefit analysis, the decision about whether an investigation is legally required at all. Are not delegable to AI. Use AI for the mechanical work and keep humans on the judgement work. For a longer treatment, see our post on Clinical Evaluation of AI/ML Medical Devices.

What Is Software as a Medical Device (SaMD)?, the qualification and classification groundwork that determines whether the clinical evidence question applies at all.
What Is a Clinical Investigation Under MDR?, the definitional companion post for clinical investigations across all device types.
How to Run a Lean Clinical Investigation as a Startup with Limited Budget, the operational procedure for lean prospective investigations.
Clinical Evaluation of AI/ML Medical Devices, the AI-specific clinical evaluation framework that overlaps heavily with SaMD.
MDCG 2019-11 Rev.1 Software Qualification and Classification, the authoritative guidance behind the SaMD clinical evaluation frame.
Annex VIII Rule 11 Software Classification, why most SaMD ends up Class IIa or higher and what that means for clinical evidence.
EN 62304 Software Lifecycle for MedTech Startups, the lifecycle standard that underpins SaMD technical documentation.
Sufficient Clinical Evidence Under MDR, the general framework for deciding when the clinical evidence base is enough.
Equivalence Under MDR, one route that may remove a SaMD investigation from the plan when the conditions are met.
MDR Article 62 General Requirements for Clinical Investigations, the legal spine of any prospective investigation.
The Subtract to Ship Framework for MDR Compliance, the methodology behind the Evidence Pass referenced in this post.

Sources

Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 61 (clinical evaluation), Article 62 (general requirements regarding clinical investigations conducted to demonstrate conformity of devices), Annex XV (clinical investigations), Annex VIII Rule 11 (classification of software), Annex XIV Part A and Part B (clinical evaluation and post-market clinical follow-up). Official Journal L 117, 5.5.2017.
MDCG 2019-11 Rev.1. Guidance on Qualification and Classification of Software in Regulation (EU) 2017/745. MDR and Regulation (EU) 2017/746. IVDR, October 2019, Revision 1 June 2025.
EN ISO 14155:2020+A11:2024. Clinical investigation of medical devices for human subjects. Good clinical practice.
EN 62304:2006+A1:2015. Medical device software. Software life-cycle processes.

This post sits in the Clinical Evaluation & Clinical Investigations cluster of the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. If you are trying to decide whether your SaMD genuinely needs a prospective clinical investigation or whether a disciplined retrospective performance study will carry the clinical evaluation, Zechmeister Strategic Solutions works with founders on exactly that decision. The one where the algorithm, the Regulation, and the runway have to agree on the same study design.