AI for Clinical Evaluation: Automating Literature Reviews Under MDR

AI tools can legitimately compress the mechanical work in an MDR clinical evaluation literature review, including search execution, deduplication, initial screening, and citation management. But they cannot make the appraisal or weight-of-evidence calls that MDR Article 61 and Annex XIV Part A require. The tool itself becomes a QMS object that must be validated under EN ISO 13485:2016+A11:2021 clause 4.1.6 before its output enters your Clinical Evaluation Report.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

MDR Article 61 and Annex XIV Part A require a documented, systematic, methodologically sound clinical evaluation. MDCG 2020-5 and MEDDEV 2.7/1 rev.4 still define how that looks in practice.
AI tools reliably accelerate the upstream phases: query design support, database retrieval, deduplication, abstract screening, citation export, and traceability logs.
AI cannot perform critical appraisal, weight-of-evidence determination, or the benefit-risk conclusion. These remain the evaluator's responsibility and must carry a named human author in the Clinical Evaluation Report.
If an AI tool influences any CER content, it is QMS software used in a regulated process and must be validated under EN ISO 13485:2016+A11:2021 clause 4.1.6.
Audit-ready evidence means logging prompts, model versions, search strings, inclusion/exclusion decisions, and the human override trail. Not just the final CER.

Why literature reviews are the bottleneck (Hook)

A Clinical Evaluation Report under MDR is where most startup regulatory programs stall. The Notified Body expects a systematic literature review that covers the device itself, the state of the art, and comparable devices. Executed against the protocol, reproducible, and justified line by line. For a first-time founder team, that usually means two to four months of PubMed, Embase, Cochrane, screening spreadsheets, and manual deduplication before the actual appraisal even begins.

Tibor has reviewed literature reviews from more than fifty startup Clinical Evaluation Reports as a Notified Body lead auditor. The most common pattern is not wrong methodology. It is exhaustion. Teams run out of patience during screening and start making inclusion decisions by convenience rather than protocol. AI tools, used honestly, can prevent that failure mode. Used carelessly, they create a new one: a literature review that looks complete but has no traceable reasoning behind its inclusions.

This post is the honest breakdown. Where AI genuinely helps, where it does not, and what you need to put in place so that a Notified Body auditor can open your CER, follow the AI-assisted workflow back to a verified tool, and sign off.

What MDR actually says (Surface)

MDR Article 61(1) requires the manufacturer to "plan, conduct and document a clinical evaluation in accordance with this Article and with Part A of Annex XIV." Annex XIV Part A then specifies the steps: establish a Clinical Evaluation Plan, identify available clinical data through a systematic scientific literature search, appraise that data, generate any missing data, and analyse all relevant clinical data to reach conclusions about the safety and clinical performance of the device.

The operative words for AI work are "systematic," "documented," and "appraise." The MDR does not prescribe which tools you use. It prescribes that the process is systematic, that every decision is documented, and that a qualified evaluator is accountable for the appraisal.

MDCG 2020-5 on equivalence and MEDDEV 2.7/1 rev.4 on clinical evaluation remain the operational references that Notified Bodies apply. MEDDEV 2.7/1 rev.4 is pre-MDR but is still used as the methodological backbone for the literature review phase because the MDR does not replace its detail. Your CER must name the evaluator, justify their qualification, and show that the evaluator personally assessed the clinical data. Not merely signed off on a machine output.

Then there is the tool itself. EN ISO 13485:2016+A11:2021 clause 4.1.6 requires the organisation to apply a documented approach to the validation of software used in the quality management system. An AI tool that screens papers or extracts data for your CER is, by definition, QMS software in a regulated process. Validation is not optional, and "the vendor validated it" is not a valid answer. The validation must be proportionate to the risk your use introduces.

A worked example (Test)

Consider a Class IIa software-based device for post-operative wound assessment via smartphone imaging. The Clinical Evaluation Plan calls for a literature search covering: the clinical condition, the state of the art in wound assessment, comparable imaging-based devices, and any equivalence candidates.

The founder has two clinical affairs people and eight weeks before the planned Notified Body submission. Without tooling, a realistic estimate is: one week building the search strategy, one week running searches and exporting, two weeks deduplication and title/abstract screening, two weeks full-text screening, two weeks appraisal and writing. Eight weeks with zero slack.

With an AI-assisted workflow the allocation shifts. Search strategy still takes a week. The evaluator designs the PICO framework, selects databases, defines inclusion and exclusion criteria, and records the protocol. The AI tool then executes searches across PubMed, Embase, and Cochrane, deduplicates the output across databases, and performs a first-pass abstract screening against the documented inclusion criteria. That phase drops from roughly three weeks to roughly five days, but only because a human evaluator reviews every AI inclusion and, more importantly, every AI exclusion. Exclusions are where risk concentrates. A paper wrongly excluded by the AI can become a gap the Notified Body finds.

The full-text appraisal phase does not compress in any meaningful way. A qualified evaluator must read each included paper, apply the appraisal criteria from MEDDEV 2.7/1 rev.4 (suitability and contribution to demonstration of performance and safety), and record a structured judgement. AI-generated summaries can help orient the reader but cannot be the appraisal record. The CER explicitly names the human evaluator and their reasoning.

The saving for this team is roughly three to four weeks across the literature review. That is the realistic number. Not ninety percent. Not "autonomous CER generation." A meaningful, auditable, defensible acceleration of the mechanical phases.

The Subtract to Ship playbook (Ship)

The Subtract to Ship principle for AI-assisted CERs is: remove the work that does not require judgement, keep every decision that does. Here is the concrete playbook.

Step one. Write the Clinical Evaluation Plan first, tool-agnostic. Your CEP names databases, search strings, inclusion and exclusion criteria, appraisal method, and the named evaluator. The CEP must be defensible even if you later remove all AI tooling. The AI layer is an execution accelerator for a plan you already own.

Step two. Validate the tool before it touches clinical data. Under EN ISO 13485:2016+A11:2021 clause 4.1.6, any software used in a QMS-controlled process needs validated intended use. For a literature review assistant, write a short validation protocol: what inputs, what outputs, what you are relying on the tool for, what could go wrong, how you test it. Run a test case with a known-good reference set. A small corpus where you already know which papers should be included and excluded. Document the result, the tool version, and the model version. Repeat whenever the model version changes. This validation file lives in your QMS and is one of the first things an auditor will ask for.

Step three. Keep prompts and runs under version control. Every prompt you send, every response, every search string, every deduplication run, every screening decision: logged, timestamped, and retrievable. This is how you satisfy "systematic and documented" in Annex XIV Part A. If your tool does not natively log this, log it externally. No logs means no audit trail means no CER.

Step four. Human review every exclusion. Inclusions are cheap: a wrongly included paper just gets dropped at full-text review. Exclusions are expensive: a wrongly excluded paper is a silent gap. Build the workflow so that a qualified human looks at every exclusion decision the AI makes and either confirms or reverses it with a documented reason.

Step five. The appraisal is fully human. The evaluator reads every full-text paper, applies the appraisal criteria, and writes the justification. AI summarisation is fine as an orientation aid but cannot be the record. The CER names the evaluator, cites their CV, and the evaluator owns every inclusion and every weight-of-evidence statement.

Step six. Disclose the tooling in the CER. In the methodology section, describe the AI tools used, their intended use in your workflow, the validation status, and the human review steps. Notified Bodies are increasingly asking this question directly. A manufacturer who describes it proactively and with a validation file attached is in a much stronger position than one who tries to hide it.

Step seven. Refresh on schedule. Your CER is a living document. The literature search protocol, including the AI-assisted execution, must be rerun on the cadence defined in your Clinical Evaluation Plan and PMS Plan. Typically annually for higher-risk devices. Each rerun is its own validated execution with its own logs.

Reality Check

Can you hand your Notified Body auditor a validation file for every AI tool that influenced content in your CER?
Is the named evaluator in your CER a real person with a CV on file, who has personally read every included paper?
Are your AI-generated exclusions reviewed by a human and documented with a reason?
Do you have logs of every prompt, search string, and model version used in the literature review?
Does your Clinical Evaluation Plan describe the methodology independently of any specific tool, so a new evaluator could reproduce the work?
When the model version of your AI tool changes, does something in your QMS trigger revalidation?
Is the AI tool's role in your CER disclosed in the methodology section of the report?
Can you defend every inclusion and every exclusion against a Notified Body reviewer asking "why this paper and not that one"?

Frequently Asked Questions

Can an AI tool write my Clinical Evaluation Report for me? No. The CER must be written and signed by a named, qualified evaluator who personally appraised the clinical data. AI tools can draft sections for the evaluator to review and rewrite, but the evaluator owns every conclusion. A CER authored primarily by AI and rubber-stamped by a human will not survive Notified Body review.

Does the AI tool count as a medical device? Usually not, because it is used internally for regulatory documentation rather than for clinical decisions about specific patients. But it is QMS software under EN ISO 13485:2016+A11:2021 clause 4.1.6 and must be validated for your intended use.

What happens if my AI tool misses a relevant paper? You are accountable. The CER is your document. This is why human review of exclusions, a documented search protocol, and validation against a known-good test corpus matter. They reduce the risk that a silent false negative becomes a CER gap.

Do Notified Bodies accept AI-assisted literature reviews? Yes, provided the process is systematic, documented, the tool is validated, and the human evaluator retains clear authorship of appraisal and conclusions. Auditors ask targeted questions about which tools were used and how.

How do I validate an AI screening tool on a startup budget? Build a small reference set of papers. Twenty to fifty. Where you already know the correct inclusion decision. Run the tool on that set. Measure agreement. Document the protocol, the result, and any discrepancies. Rerun whenever the model version changes. This is proportionate validation, not a full ML validation study.

Does MDCG 2020-5 say anything about AI tools? MDCG 2020-5 addresses clinical evaluation equivalence. It does not mention AI tooling specifically. The obligations flow from MDR Article 61, Annex XIV Part A, the EN ISO 13485 clause on QMS software, and the evaluator's professional responsibility.

Systematic literature review for clinical evaluation, the non-AI methodological foundation every CER needs.
Literature search protocols for clinical evaluation, how to write the protocol before any tool runs.
AI to automate regulatory documentation, broader view of where AI fits across your regulatory stack.
Validating QMS software tools under MDR, the EN ISO 13485 clause 4.1.6 validation approach applied concretely.
Clinical evaluation of AI/ML medical devices, related but distinct: evaluating an AI device, not using AI to evaluate.

Sources

Regulation (EU) 2017/745 on medical devices, consolidated text. Article 61, Annex XIV Part A.
MDCG 2020-5 (April 2020). Clinical evaluation equivalence.
MEDDEV 2.7/1 rev.4 (June 2016). Clinical evaluation: a guide for manufacturers and notified bodies under Directives 93/42/EEC and 90/385/EEC.
EN ISO 13485:2016+A11:2021. Medical devices. Quality management systems. Requirements for regulatory purposes, clause 4.1.6.