---
title: How to Conduct a Systematic Literature Review for Clinical Evaluation
description: A systematic literature review is the backbone of most MedTech clinical evaluations. Here is how to scope, execute, and document it so the Notified Body accepts it.
authors: Tibor Zechmeister, Felix Lenhard
category: Clinical Evaluation & Investigations
primary_keyword: systematic literature review clinical evaluation MDR
canonical_url: https://zechmeister-solutions.com/en/blog/systematic-literature-review-clinical-evaluation
source: zechmeister-solutions.com
license: All rights reserved. Content may be cited with attribution and a link to the canonical URL.
---

# How to Conduct a Systematic Literature Review for Clinical Evaluation

*By Tibor Zechmeister (EU MDR Expert, Notified Body Lead Auditor) and Felix Lenhard.*

> **A systematic literature review under MDR is a pre-specified, reproducible search of the published scientific evidence that feeds the clinical evaluation under Article 61 and Annex XIV Part A. It is built in eight steps: scope the clinical questions, write a literature search protocol, select databases, construct search terms, define inclusion and exclusion criteria, appraise each retrieved record, synthesise the body of evidence, and document everything so another competent reviewer can re-run the search and reach a comparable result. Done properly, it is the backbone of most MedTech clinical evaluations. Done improperly, it is the fastest way to generate a Notified Body finding.**

**By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.**

---

## TL;DR

- A systematic literature review for clinical evaluation under MDR must be pre-specified in a written protocol before the first search is run. Reverse-engineering a protocol from a completed search is one of the most common and most damaging findings at Notified Body review.
- The review is governed by MDR Article 61 and Annex XIV Part A Section 1 (clinical evaluation plan) and Section 2 (identification, appraisal, and analysis of clinical data). MEDDEV 2.7/1 Rev 4 (June 2016) remains a useful structural reference; the MDR text takes precedence where they diverge.
- The search must cover both the device (or the equivalent device under MDCG 2020-5) and the clinical condition, underlying technology, and state of the art. Two parallel searches are standard practice.
- Databases typically include at least PubMed / MEDLINE and Embase. Cochrane Library, CENTRAL, and clinical trial registries are added where relevant to the question.
- Inclusion and exclusion criteria are written before the search, applied consistently, and documented in a PRISMA-style flow diagram showing how many records were identified, screened, excluded, and included.
- Each included record is appraised for methodological quality and relevance using criteria pre-specified in the protocol. Favourable and unfavourable data are both retained, as Annex XIV Part A Section 1(a) requires.
- The review output is a structured synthesis that feeds the clinical evaluation report against the acceptance criteria from the clinical evaluation plan, not a narrative summary of interesting papers.

---

## Why the literature review is the backbone of most MedTech CERs

For the majority of medical devices placed on the EU market, the bulk of the clinical evidence in the clinical evaluation report comes from published literature, not from new clinical investigations. The pillar post on clinical evaluation spells out why: Annex XIV Part A Section 2 treats literature, data from an equivalent device, and clinical investigations as three legitimate sources of clinical data, and for well-established technologies the literature pathway can carry most of the weight.

That makes the literature review the highest-leverage activity in most clinical evaluation projects. It is also the activity where Notified Body reviewers — Tibor included, in his lead auditor role — find the most defects. Searches that are not reproducible. Inclusion criteria that were clearly chosen after the results were known. PRISMA diagrams that do not add up. Appraisal scores that suspiciously cluster around "include" for favourable studies and "exclude" for unfavourable ones. Every one of these is a finding. Every finding is rework. Rework on a clinical evaluation report is measured in months.

The fix is procedural, not heroic. A systematic literature review that is planned before it is run, executed consistently with the plan, and documented so another competent reviewer can re-run it, is the review that survives audit. The eight steps that follow are how that review is actually built.

## Step 1 — Scope the clinical questions

A systematic literature review starts with the clinical questions the evaluation must answer, not with a search engine. Those questions come directly from the clinical evaluation plan (CEP) required under Annex XIV Part A Section 1(a): the intended purpose as defined in Article 2(12), the target population, the intended clinical benefits with specified outcome parameters, the specific general safety and performance requirements that need clinical data to support them, and the state of the art against which the benefit-risk ratio will be assessed.

For each of those, write a clinical question in structured form. The PICO format (Population, Intervention, Comparator, Outcome) works well for therapeutic and diagnostic device questions; for safety questions a simple "what adverse events have been reported for this technology in this population" is often enough. The questions determine the search, and the search determines what can be concluded. Skipping the scoping step produces a review that finds many papers and answers no question.

Two parallel search strands are standard under MDR practice. The first targets the device itself or an equivalent device (if equivalence under MDCG 2020-5 is being claimed). The second targets the clinical condition, the underlying technology, and the state of the art — the context against which the device's safety and performance will be judged. Both strands feed the same clinical evaluation report but answer different questions.

## Step 2 — Write the literature search protocol

The MDR literature search protocol is the written document that specifies, before any searching is done, exactly how the review will be conducted. It is a subsection of the clinical evaluation plan, not a separate deliverable. Without it, the review cannot be considered systematic.

The protocol specifies at minimum: the clinical questions (from Step 1), the databases to be searched, the search strings for each database, the date range, the languages accepted, the inclusion and exclusion criteria, the appraisal criteria and scoring method, the process for handling duplicates, the process for screening (title/abstract then full text), the number of reviewers and the disagreement resolution process, the data extraction fields, and the synthesis method. It also specifies the update cadence — the review will be repeated on a defined schedule and every time a new CER revision is due.

A startup that writes a one-page protocol and runs a disciplined search from it produces a defensible review. A startup that writes no protocol and runs "a PubMed search" produces a finding. The difference in effort is small. The difference at audit is large.

## Step 3 — Select the databases

No single database covers the full published medical literature. A systematic review under MDR typically searches at least two. PubMed / MEDLINE is always included because it is the reference biomedical database and is freely accessible. Embase is almost always included because it has broader European coverage and indexes some device-relevant journals that PubMed does not. The Cochrane Library and CENTRAL are added when the clinical question involves therapeutic effectiveness or systematic reviews of interventions.

Clinical trial registries (ClinicalTrials.gov, the EU Clinical Trials Register, and for devices specifically the relevant sections of Eudamed as they become available) are searched when ongoing or unpublished clinical investigations could affect the conclusions. Grey literature — conference proceedings, regulatory databases like the FDA MAUDE database for adverse event data, standards development organisation publications — is searched when the clinical question cannot be answered by the indexed databases alone.

The protocol names the specific databases, the specific interfaces used to access them, and the date of access. "We searched PubMed" is not enough; "We searched PubMed / MEDLINE via the NCBI interface on 12 March 2026" is. Reproducibility depends on that level of specificity.

## Step 4 — Construct the search terms

Search terms are the most technical part of the review and the part most likely to be done poorly by non-specialists. A good search is neither too narrow (missing relevant evidence) nor too broad (burying relevant evidence in thousands of irrelevant hits). The balance is built with three ingredients.

The first is controlled vocabulary — MeSH terms in PubMed, Emtree terms in Embase. Controlled vocabulary captures records that have been indexed under a concept regardless of the specific words used in the title or abstract, and it is the backbone of a reproducible search.

The second is free-text terms, including synonyms, abbreviations, trade names, and alternative spellings (British and American English both). Free-text terms catch the recent records that have not yet been indexed with controlled vocabulary.

The third is Boolean logic — the AND / OR / NOT operators that combine concepts. A typical structure is (device OR technology OR equivalent device terms) AND (clinical condition OR population terms) AND (outcome OR safety terms), with filters for language and publication type applied after the main string.

The search string is recorded verbatim in the protocol and in the final review documentation. Any change to the string — even a small one — produces a different search and must be recorded as a new iteration.

## Step 5 — Define inclusion and exclusion criteria

Inclusion and exclusion criteria are the filter that decides which retrieved records enter the appraisal stage and which do not. They are written before the search is run, applied consistently by every reviewer, and never modified mid-review without a documented amendment to the protocol.

Standard inclusion criteria cover: the population (does the study involve the intended target group?), the intervention (does it involve the device, an equivalent device under MDCG 2020-5, or the state-of-the-art comparator?), the outcomes (does it report on the clinical outcomes specified in the CEP?), the study design (is it the type of study that can answer the clinical question?), and the language and date range defined in the protocol.

Exclusion criteria cover the mirror image plus specific reasons to reject: case reports without sufficient methodological detail, duplicate publications, letters and editorials without original data, animal studies (unless the question is specifically about bench evidence), and studies on devices that are clearly not comparable.

The most important discipline here is the Annex XIV Part A Section 1(a) requirement that "favourable and unfavourable data" both be retained. Inclusion criteria that exclude unfavourable findings under the guise of methodological quality are a direct Notified Body finding. Quality-based exclusion is legitimate — methodologically weak studies can be excluded — but only if the same criteria are applied to favourable and unfavourable studies alike.

## Step 6 — Appraise each retrieved record

Records that pass the inclusion filter enter the appraisal stage. Appraisal is a pre-specified scoring of each record on two dimensions: methodological quality and relevance to the clinical question.

Methodological quality is judged using criteria appropriate to the study design — for a randomised controlled trial, criteria like randomisation adequacy, blinding, completeness of follow-up, and intention-to-treat analysis; for an observational study, criteria like sample size, confounder control, and length of follow-up; for a case series, criteria like consecutive enrolment and clear outcome definition. Standard appraisal frameworks exist and can be referenced directly in the protocol; the important point is that the same framework is applied to every record of a given design.

Relevance is judged against the clinical questions from Step 1. A methodologically excellent study on a different population or a different indication may be less relevant than a methodologically weaker study on exactly the target population. Both scores are recorded, and the combined weight of each record in the synthesis reflects both.

MEDDEV 2.7/1 Rev 4 (June 2016) contains a detailed appraisal structure that many manufacturers still reference. It remains a useful structural guide. Where its specific scoring approach conflicts with MDR Annex XIV Part A, the MDR text wins; in practice, the conflicts are rare at the appraisal level. EN ISO 14155:2020+A11:2024 applies where the record being appraised is a clinical investigation and the appraisal includes judgment of good clinical practice adherence.

## Step 7 — Synthesise the body of evidence

Synthesis is where the individual appraised records become a body of evidence that speaks to the clinical questions. The synthesis is not a narrative summary of each paper. It is a structured analysis that, for each clinical question from the CEP, states what the aggregated evidence shows, how strong the evidence is, and whether it meets the pre-specified acceptance criteria for the benefit-risk ratio.

For some questions — typically those with substantial homogeneous evidence — a quantitative synthesis (meta-analysis or pooled estimate) is possible. For most device clinical evaluations, the evidence is too heterogeneous for meaningful pooling, and the synthesis is qualitative but structured. Each clinical question is addressed in turn, the supporting and contradicting evidence is presented, and a conclusion is drawn against the CEP criteria.

The synthesis explicitly addresses unfavourable findings. Adverse events reported in the literature for the device, the equivalent device, or the underlying technology are analysed against the risk file maintained under EN ISO 14971:2019+A11:2021. Residual risks identified in the literature that are not already captured in the risk management file trigger an update to that file. This is one of the most important links in the clinical evaluation — literature review findings flow directly into risk management, and a review that does not produce any risk file updates is almost certainly a review that did not take unfavourable data seriously.

## Step 8 — Document everything for reproducibility

The final step is the one most often skimped. The systematic literature review section of the clinical evaluation report must contain enough detail for another competent reviewer to re-run the search and arrive at a comparable result. That means the search strings verbatim, the databases and dates, the inclusion and exclusion criteria exactly as applied, the number of records at each stage (identified, screened, excluded with reasons, full-text assessed, included), the appraisal framework and scores, the synthesis, and the conclusions against the CEP criteria.

A PRISMA-style flow diagram — the standard Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart adapted for medical device reviews — is the expected documentation format for the records flow. It is not legally mandatory under the MDR, but it is the documentation format reviewers expect, and it makes the numbers auditable at a glance. If the numbers in the diagram do not add up, the whole review is called into question. If they add up and map to the protocol, the whole review is credible before a single paper is read.

## Common pitfalls in the MDR literature search protocol

After reviewing many systematic literature reviews in CERs, the same pitfalls repeat:

- **Protocol written after the fact.** The search was run first, the protocol was written second to match what was done, and the appraisal criteria suspiciously fit the data. Reviewers can tell.
- **Only one database searched.** A PubMed-only review misses evidence indexed in Embase and Cochrane and is not considered systematic under current practice.
- **Non-reproducible search string.** The string is summarised in prose ("we searched for terms related to the device and the condition") rather than recorded verbatim. Another reviewer cannot reproduce it.
- **Cherry-picked inclusion criteria.** Exclusion reasons are applied inconsistently — stricter to unfavourable studies, looser to favourable ones.
- **No PRISMA-style flow.** The numbers at each stage are not documented, and the final included set cannot be traced back to the initial hits.
- **No link to the risk file.** Adverse events in the literature are noted in the synthesis but never flow into the risk management process under EN ISO 14971:2019+A11:2021. The review and the risk file live in separate universes.
- **No update cadence.** The review is treated as a one-shot artifact for CE marking and is never re-run, so by the time the first PMS review cycle arrives the literature base is out of date.

Each of these is cheap to prevent at the protocol stage and expensive to repair after a Notified Body finding.

## The Subtract to Ship angle on the literature review

Subtract to Ship applied to the systematic literature review does not mean a smaller search. It means a smaller scope of claim that needs to be supported, so that the search is focused and the evidence load is proportionate. The Evidence Pass of the framework runs in this order: define the intended purpose tightly, identify the specific GSPRs and clinical claims that need literature support, then run the review against those claims only.

The subtraction happens in two places. First, in the clinical questions: do not search for evidence on claims the device does not actually make, because every extra claim multiplies the evidence burden. Second, in the synthesis: do not extend conclusions beyond what the evidence supports, because overreach at synthesis stage is what forces additional clinical investigations later to fill the gap. A review that is honestly scoped and honestly synthesised is smaller and more defensible than one that tries to cover every possible future claim.

What cannot be subtracted is rigour. The protocol, the search reproducibility, the consistent application of inclusion criteria, the PRISMA-style documentation, the link into the risk file — all of these stay. They are the requirements that Annex XIV Part A Section 1(a) and Section 2 put in place, and they are the reasons a literature-dominated clinical evaluation can legitimately replace expensive clinical investigations for established technologies.

## Reality Check — Where do you stand on your literature review?

1. Do you have a written literature search protocol that was finalised before the first search was run, and is it embedded in your clinical evaluation plan?
2. Does your protocol name the specific databases, the specific interfaces, the search strings verbatim, the date range, and the languages accepted?
3. Are your inclusion and exclusion criteria written down and applied consistently to favourable and unfavourable studies alike?
4. Can another competent reviewer read your protocol, re-run your search, and arrive at a comparable set of included records?
5. Do you have a PRISMA-style flow diagram showing records identified, screened, excluded with reasons, full-text assessed, and included?
6. Is your appraisal framework pre-specified, and are the scores for methodological quality and relevance recorded for every included record?
7. Does your synthesis address both favourable and unfavourable findings, and does every adverse event identified in the literature map to an entry in your risk management file under EN ISO 14971:2019+A11:2021?
8. Is there a defined update cadence for the review that matches your PMS and PMCF plan, so the literature base does not go stale after CE marking?
9. Are you claiming equivalence under MDCG 2020-5, and if so, is the equivalent device explicitly included in the device-strand search?
10. When was the last time you dropped a clinical claim from the CEP because the literature could not support it, rather than stretching the synthesis to fit?

## Frequently Asked Questions

**Is a systematic literature review legally required for every MDR clinical evaluation?**
MDR Article 61(3) requires the clinical evaluation to follow a "defined and methodologically sound procedure" based on a critical evaluation of the relevant scientific literature, the results of all available clinical investigations, and consideration of currently available alternative treatment options. Annex XIV Part A Section 2 expands this into the four-stage process of scope, identification, appraisal, and analysis. In practice, for any device where literature is a data source, a systematic literature review is the only way to meet these requirements credibly.

**How many databases do I need to search for an MDR literature review?**
The MDR does not specify a number. Current practice expects at least two independent biomedical databases, typically PubMed / MEDLINE and Embase, with additional sources (Cochrane, trial registries, grey literature) added where the clinical question requires them. A single-database review is not considered systematic and will attract a Notified Body finding.

**Do I need a PRISMA flow diagram for my clinical evaluation literature review?**
PRISMA is not legally mandated by the MDR, but it is the documentation format Notified Body reviewers expect, and it is the clearest way to show that the review was conducted as the protocol specifies. Most current clinical evaluation reports that survive audit include a PRISMA-style flow diagram for each search strand.

**How often should the literature review be updated?**
Annex XIV Part A and Article 61(3) require the clinical evaluation to be updated throughout the life of the device. For Class III and implantable devices, Article 61(3) requires at least annual updates. For lower-risk devices, the update cadence is defined in the PMS and PMCF plans, proportionate to the risk and the novelty of the device. A literature review that is not re-run on that cadence becomes the weakest link in the next CER update.

**Can I use MEDDEV 2.7/1 Rev 4 as my methodology?**
MEDDEV 2.7/1 Rev 4 (June 2016) remains a useful structural reference for the four-stage clinical evaluation process and for appraisal methodology. Where it diverges from the MDR text or from MDCG 2020-5 — particularly on equivalence — the MDR text and the MDCG guidance take precedence. Treat MEDDEV 2.7/1 Rev 4 as a source of structure, not as current binding interpretation.

**What PRISMA version should I use for a medical device literature review?**
The MDR does not name a specific PRISMA version. The 2020 update of the PRISMA reporting guideline is the current standard for systematic reviews in the biomedical literature and is the version most current reviews reference. The point is consistent, reproducible documentation of the records flow, not formal adherence to a specific checklist.

## Related reading

- [What Is Clinical Evaluation Under MDR?](/blog/what-is-clinical-evaluation-under-mdr) — the pillar post for the Clinical Evaluation cluster and the starting point for the whole topic.
- [MDR Article 61 Clinical Evaluation Requirements](/blog/mdr-article-61-clinical-evaluation-requirements) — the article-by-article walkthrough of what Article 61 obliges you to do.
- [MDR Annex XIV Part A: Clinical Evaluation Requirements](/blog/mdr-annex-xiv-part-a-clinical-evaluation) — the annex that sits directly underneath this methodology.
- [The Clinical Evaluation Plan Under MDR](/blog/clinical-evaluation-plan-mdr) — how the CEP defines the clinical questions that drive the literature search.
- [Clinical Evaluation Report Structure Under MDR](/blog/clinical-evaluation-report-structure-mdr) — how the literature review section fits inside the full CER.
- [Appraisal of Clinical Data Under MDR](/blog/appraisal-clinical-data-mdr) — the pre-specified appraisal criteria that make Step 6 defensible.
- [Equivalence Under MDR](/blog/equivalence-under-mdr) — how an equivalence claim under MDCG 2020-5 changes the device-strand search.
- [Literature Search Strategy Templates for MDR](/blog/literature-search-strategy-templates-mdr) — reusable search-string building blocks for common device categories.
- [PRISMA Flow Diagrams for Medical Device Reviews](/blog/prisma-flow-diagram-medical-device) — the dedicated walk-through of PRISMA-style documentation for MDR reviews.
- [The Subtract to Ship Framework for MDR Compliance](/blog/subtract-to-ship-framework-mdr) — the methodology pillar behind the Evidence Pass that governs how the literature review is scoped.

## Sources

1. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 61 (clinical evaluation), Annex XIV Part A Section 1 (clinical evaluation plan), Annex XIV Part A Section 2 (identification, appraisal, and analysis of clinical data and clinical evaluation report). Official Journal L 117, 5.5.2017.
2. MDCG 2020-5 — Clinical Evaluation — Equivalence: A guide for manufacturers and notified bodies, April 2020.
3. MEDDEV 2.7/1 revision 4 — Clinical Evaluation: A Guide for Manufacturers and Notified Bodies under Directives 93/42/EEC and 90/385/EEC, June 2016 (legacy guidance, still referenced for structural approach; MDR text and MDCG 2020-5 take precedence where they diverge).
4. EN ISO 14155:2020 + A11:2024 — Clinical investigation of medical devices for human subjects — Good clinical practice (referenced where records appraised are clinical investigations).
5. EN ISO 14971:2019 + A11:2021 — Medical devices — Application of risk management to medical devices (the destination for adverse-event findings from the literature review).

---

*This post is part of the Clinical Evaluation & Clinical Investigations cluster in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. The systematic literature review is where a lean clinical evaluation either earns its defensibility or quietly fails — and the hour spent on the protocol is worth more than any number of hours spent on the search itself.*

---

*This post is part of the [Clinical Evaluation & Investigations](https://zechmeister-solutions.com/en/blog/category/clinical-evaluation) cluster in the [Subtract to Ship: MDR Blog](https://zechmeister-solutions.com/en/blog). For EU MDR certification consulting, see [zechmeister-solutions.com](https://zechmeister-solutions.com).*