Appraisal of clinical data under MDR is the pre-specified, documented grading of every identified data record on two dimensions: relevance to the clinical questions in the clinical evaluation plan, and methodological quality of the record itself. It is the third stage of the four-stage clinical evaluation process defined by MDR Article 61 and Annex XIV Part A Section 2, sitting between the identification of clinical data and the analysis that produces the clinical evaluation report. Done with honest, consistent criteria applied equally to favourable and unfavourable data, appraisal is what makes a clinical evaluation defensible. Done as an afterthought, it is where Notified Body findings concentrate.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.


TL;DR

  • Appraisal of clinical data under MDR scores every identified record on two independent dimensions: relevance to the clinical questions from the clinical evaluation plan, and methodological quality of the record itself.
  • It is stage three of the four-stage clinical evaluation process described in MDR Annex XIV Part A Section 2: scope, identification, appraisal, analysis. The four stages are a direct inheritance from MEDDEV 2.7/1 Rev 4 (June 2016) and remain the practical structure under MDR.
  • Appraisal criteria must be pre-specified in the clinical evaluation plan under Annex XIV Part A Section 1. Criteria invented after the data is known are a direct Notified Body finding.
  • Annex XIV Part A Section 1(a) requires favourable and unfavourable data to be retained and appraised with the same criteria. Inclusion rules that filter out unfavourable findings under the guise of quality are the single most common and most serious appraisal failure.
  • MEDDEV 2.7/1 Rev 4 remains the structural reference for a detailed appraisal scoring scheme. MDCG 2020-5 governs how equivalence claims interact with appraisal. Where the legacy document conflicts with the MDR text, the MDR text wins.
  • The appraisal output is a structured data set that feeds stage four (analysis) against the acceptance criteria from the clinical evaluation plan, not a narrative commentary on interesting papers.

Why appraisal is where clinical evaluations win or lose

A clinical evaluation report can have a large body of identified literature, a clean PRISMA-style flow diagram, and a polished narrative. And still fail at the Notified Body review. The failure almost always sits in the appraisal stage. The records are listed. The scores are vague. The criteria are not traceable to the clinical evaluation plan. The methodologically weaker studies that happen to report favourable outcomes are retained; the methodologically weaker studies that happen to report adverse events are cut. Read ten clinical evaluation reports from the wrong side of the auditor's desk and the pattern repeats.

Appraisal is the stage where the manufacturer's judgment becomes visible. The identification stage is mechanical. A search string, a database, a date. The analysis stage is synthetic. Once the appraisal scores exist, the analysis follows. The appraisal stage is where a human being decides, record by record, how much weight each piece of evidence should carry in the conclusions about safety, performance, and clinical benefits. That is exactly why pre-specification matters, why consistency matters, and why the Annex XIV Part A Section 1(a) requirement to retain favourable and unfavourable data with the same rigour is the single line that separates a credible clinical evaluation from a fragile one.

This post walks through what appraisal under MDR actually is, how the two scoring dimensions work, where the four-stage clinical evaluation process puts appraisal, the mistakes that repeat across manufacturers, and what a Notified Body reviewer looks for when they open the appraisal tables.

What appraisal of clinical data actually means

Appraisal is the stage at which every record identified by the systematic literature review, every data set from an equivalent device claim, and every clinical investigation result is graded against pre-specified criteria. The grading produces structured scores that carry forward into the analysis and become part of the evidence base for every conclusion in the clinical evaluation report.

Two things make appraisal specifically an MDR activity rather than a generic research review. First, the criteria are anchored in the clinical evaluation plan required under MDR Annex XIV Part A Section 1(a). The plan states the clinical questions, the acceptance criteria for conformity with the relevant general safety and performance requirements, and the methodology for appraisal. The appraisal then executes that methodology. A record is not judged against an abstract notion of "good science" but against whether it answers the specific clinical questions the plan committed to.

Second, Annex XIV Part A Section 1(a) explicitly requires favourable and unfavourable data both to be retained and evaluated. This is not a suggestion. It is a mandatory provision of the Annex, and Notified Body reviewers treat selective retention as one of the most serious possible findings. A record that reports an adverse event cannot be dropped because it is inconvenient. It can only be dropped because the pre-specified criteria. Applied equally to all records. Produce a legitimate reason to exclude it. The criteria must be visible. The scoring must be traceable. The decision must be defensible to a reviewer who suspects the worst.

Relevance scoring. Does this record answer the clinical question?

The first of the two appraisal dimensions is relevance. Relevance is the judgment of how directly a record addresses the clinical questions defined in the clinical evaluation plan. A methodologically excellent study on a different device, a different population, or a different indication can be less relevant than a methodologically weaker study on exactly the target population, the target condition, and the intended clinical outcomes.

Relevance scoring is built from a small set of axes that the clinical evaluation plan pre-specifies:

  • Device axis. Does the record concern the subject device, the equivalent device under MDCG 2020-5 (April 2020), a device in the same generic group, or a broader technology class? Records on the exact device carry more relevance than records on similar devices, which carry more than records on the underlying technology in general.
  • Population axis. Does the record study the intended target population as defined in the clinical evaluation plan, a subset, a broader group, or a different population? A record on elderly patients does not automatically generalise to a paediatric indication and should not be scored as if it did.
  • Clinical condition and indication axis. Does the record address the exact clinical condition and indication of the device, or a related one? A study on one surgical indication does not carry the same relevance weight for a different indication even with identical technology.
  • Outcome axis. Does the record report on the clinical outcomes and benefit parameters specified in the clinical evaluation plan, or on different outcomes? The clinical benefits defined under Annex XIV Part A Section 1(a) are the outcomes that count for appraisal.
  • Use-condition axis. Was the device used as intended per Article 2(12), or was the use off-label, investigational, or otherwise outside the intended purpose?

The clinical evaluation plan specifies how these axes combine into a relevance score. Some manufacturers use a numeric scale, some use a three-band high/medium/low rating, some use a hybrid. What matters is not the specific scale but that the scale is declared before scoring starts and is applied consistently to every record, favourable and unfavourable alike.

Methodological quality scoring. Can this record be trusted?

The second dimension is methodological quality. Methodological quality is independent of relevance. A record can be highly relevant and methodologically weak, or methodologically excellent but marginally relevant. Both dimensions must be scored, and both must carry forward into the analysis.

Methodological quality is judged against criteria appropriate to the study design of each record. The criteria change by design; the appraisal framework in the clinical evaluation plan lists the relevant design types and the criteria for each:

  • Randomised controlled trials. Randomisation adequacy and concealment, blinding where possible, completeness of follow-up, intention-to-treat analysis, sample size justification, handling of missing data.
  • Non-randomised controlled studies and comparative observational studies. Comparability of groups, control for confounders, length of follow-up, sample size, outcome definition and measurement, attrition and its handling.
  • Single-arm studies and case series. Consecutive enrolment, clear inclusion and exclusion criteria, standardised outcome measurement, length of follow-up, sample size relative to the claim being supported.
  • Registry and real-world data. Data quality controls in the source registry, representativeness of the population, completeness of outcome capture, and the relationship between the subject device and the registry data points.
  • Clinical investigations of the subject device. Adherence to the investigational plan, adherence to good clinical practice under EN ISO 14155:2020+A11:2024, monitoring quality, adverse event capture, and the statistical analysis plan.

MEDDEV 2.7/1 Rev 4 (June 2016) contains a detailed scoring structure for methodological quality, organised by study design, that many manufacturers still use directly or as the basis for their own plan-specific schemes. MEDDEV 2.7/1 Rev 4 remains a legitimate structural reference for this work. Where its specific wording conflicts with the MDR text or with MDCG 2020-5. Particularly on equivalence and on the treatment of data from similar but non-equivalent devices. The MDR text and the MDCG guidance win. In practice, at the appraisal scoring level, the conflicts are rare, and the MEDDEV 2.7/1 Rev 4 scoring approach can be adopted with minor plan-specific adjustments.

Where appraisal sits in the four-stage clinical evaluation process

MDR Annex XIV Part A Section 2 structures the clinical evaluation into four stages, inherited from MEDDEV 2.7/1 Rev 4 and retained in the current MDR-era practice:

  • Stage 1. Scope. The clinical evaluation plan under Annex XIV Part A Section 1 defines the intended purpose, the clinical questions, the clinical benefits and claims to be substantiated, the acceptance criteria, the data sources, and the methodology for appraisal and analysis. Everything downstream traces to this plan.
  • Stage 2. Identification. The systematic literature review (covered in the companion post at /blog/systematic-literature-review-clinical-evaluation), the equivalence documentation, and the clinical investigation results identify the records that enter the evaluation. The output is a set of records, documented in a PRISMA-style flow where literature is the source.
  • Stage 3. Appraisal. Every identified record is scored on relevance and methodological quality using the criteria declared in the plan. The output is a structured appraisal table that assigns each record a pair of scores and a defensible position in the evidence base.
  • Stage 4. Analysis. The appraised records are synthesised against the clinical questions and the acceptance criteria. The analysis produces the conclusions about safety, performance, benefit-risk, and the adequacy of the clinical evidence for each of the relevant general safety and performance requirements. Unfavourable findings feed into the risk management file under EN ISO 14971:2019+A11:2021.

Appraisal is the hinge. The identification stage decides what is on the table; the analysis stage decides what the evidence says; the appraisal stage decides how much weight each record carries when the analysis is written. Skipping or skimping the appraisal stage is the fastest way to produce an analysis that cannot be defended record by record under questioning.

Common appraisal mistakes Notified Body reviewers find

The pattern of appraisal failures repeats across manufacturers and clinical evaluation projects. The same defects show up, and they are all preventable at the plan stage:

  • Criteria invented after the data is known. The records were read first, the "appraisal criteria" were written second to match the records the manufacturer wanted to keep. The scores have a suspicious internal consistency and no documented link to a plan that pre-dates the search. A reviewer can usually tell within an hour.
  • One dimension collapsed into the other. Relevance and methodological quality are fused into a single "include / exclude" judgment. The record-by-record reasoning disappears, and the reviewer cannot reconstruct why specific records ended up where they did.
  • Quality criteria applied selectively. Methodologically weak studies with favourable results are retained. Methodologically weak studies with adverse events are excluded. The Annex XIV Part A Section 1(a) requirement to treat favourable and unfavourable data with the same rigour is violated. This is the most serious finding in the catalogue and the one most likely to result in major non-conformities.
  • Scoring inconsistent across reviewers. Two appraisers score the same record differently, no disagreement resolution process is documented, and the final scores reflect whichever appraiser happened to enter them last. A documented second-reviewer check and disagreement protocol prevents this.
  • No connection to the clinical evaluation plan. The appraisal table does not reference which clinical question each record is addressing. The synthesis in stage four therefore cannot cleanly trace which evidence supports which conclusion.
  • Unfavourable findings never reach the risk file. Adverse events are scored in appraisal, mentioned in analysis, and then disappear. They never trigger an update to the risk management file under EN ISO 14971:2019+A11:2021, and the next risk file review cycle cannot show a trace from literature to risk. This is a direct link that reviewers now actively check.
  • Appraisal frozen at a moment in time. The appraisal was done once, before CE marking, and never re-run. When the clinical evaluation is updated under Article 61(3), new records are added but the existing scores are not re-examined against evolving acceptance criteria.

Each of these is cheap to prevent at the clinical evaluation plan stage and expensive to repair after a finding. The fix in every case is the same: pre-specify, apply consistently, document exhaustively.

What Notified Body reviewers actually look for

A Notified Body reviewer opening the appraisal section of a clinical evaluation report works from a small set of questions. They will not always state them out loud, but the answers determine whether the section earns credibility or earns a finding.

The reviewer asks whether the appraisal criteria in use match the criteria declared in the clinical evaluation plan. They look for the version and date of the plan and compare it with the dates on the appraisal table. They spot-check a handful of records. Typically the records that report unfavourable findings. And walk through the scoring to see whether the same rigour was applied as to the favourable records. They look at the distribution of scores: an appraisal in which every favourable record scores high and every unfavourable record scores low is a distribution that invites closer inspection. They check whether a second reviewer signed off on the scores and whether a disagreement resolution process was used. They trace at least one adverse event from an appraised record through to the risk management file under EN ISO 14971:2019+A11:2021 and see whether the risk file was updated.

A clinical evaluation that survives this inspection is almost always one where the appraisal plan was written in advance, applied mechanically, and documented in a way that makes the scoring auditable at a glance. A clinical evaluation that fails it is almost always one where the appraisal was done after the fact, under time pressure, with no second reviewer and no link to the risk file.

The Subtract to Ship angle on appraisal

Subtract to Ship applied to appraisal does not mean a lighter appraisal. The Evidence Pass of the framework is explicit that rigour cannot be subtracted from clinical evaluation. What can be subtracted is the appraisal work that does not trace to a specific clinical question in the clinical evaluation plan. If the plan does not need a record to answer a question, the record does not need to be in the appraisal set at all.

The subtraction happens at scope, not at rigour. A tightly scoped clinical evaluation plan. Built around the honestly defined intended purpose under Article 2(12). Produces a narrower set of clinical questions. A narrower set of clinical questions produces a narrower search. A narrower search produces fewer records to appraise. The appraisal work that remains is the work that actually matters, and it can be done to the full standard with fewer records taking the effort.

What cannot be subtracted is the pre-specification of criteria, the equal treatment of favourable and unfavourable data, the second-reviewer check, the documentation, or the connection into the risk file. These are the things that make appraisal defensible. Cut any of them and the whole clinical evaluation loses its foundation.

Reality Check. Where do you stand on your appraisal?

  1. Is your appraisal methodology pre-specified in the clinical evaluation plan under Annex XIV Part A Section 1, with a version and a date that precedes the first appraisal score?
  2. Do you score every record on two independent dimensions. Relevance to the clinical questions, and methodological quality against design-appropriate criteria?
  3. Are the methodological quality criteria differentiated by study design (RCT, observational, case series, registry, clinical investigation) rather than a single one-size-fits-all scale?
  4. Can you demonstrate that favourable and unfavourable findings were scored against the same criteria with the same rigour, as Annex XIV Part A Section 1(a) requires?
  5. Is a second reviewer involved in the appraisal, and is there a documented process for resolving disagreements between reviewers?
  6. For every unfavourable finding retained in the appraisal, can you trace it forward into the analysis in stage four and into the risk management file under EN ISO 14971:2019+A11:2021?
  7. Does your appraisal table reference the specific clinical question from the clinical evaluation plan that each record is addressing, so the analysis can be traced back record by record?
  8. When the clinical evaluation is updated, do you re-examine existing appraisal scores against the current plan, or do you only add new records to an unchanged legacy table?
  9. If you are claiming equivalence under MDCG 2020-5, is the equivalent device data appraised with the same criteria as the literature on the subject device?
  10. When was the last time you removed a clinical claim from the evaluation because the appraised evidence could not support it, rather than stretching the synthesis to fit?

Frequently Asked Questions

What is appraisal of clinical data under MDR? Appraisal is the stage of clinical evaluation in which every identified data record is scored against pre-specified criteria for relevance to the clinical questions and methodological quality of the record itself. It is stage three of the four-stage clinical evaluation process under MDR Annex XIV Part A Section 2, sitting between identification and analysis, and it is governed by criteria declared in the clinical evaluation plan under Annex XIV Part A Section 1.

Is appraisal required for every MDR clinical evaluation? Yes. MDR Article 61(3) requires a defined and methodologically sound procedure based on a critical evaluation of the relevant scientific literature and the results of all available clinical investigations. Annex XIV Part A Section 2 operationalises this as the four-stage process in which appraisal is stage three. A clinical evaluation that identifies records without appraising them has skipped a mandatory step.

Can I use MEDDEV 2.7/1 Rev 4 as my appraisal methodology? MEDDEV 2.7/1 Rev 4 (June 2016) remains a useful structural reference for detailed appraisal scoring organised by study design, and many manufacturers still use it as the basis for their clinical evaluation plan appraisal scheme. Where its wording conflicts with the MDR text or with MDCG 2020-5, the MDR text and the MDCG guidance take precedence. Treat MEDDEV 2.7/1 Rev 4 as a legacy structural guide, not as binding current interpretation.

Can I exclude unfavourable studies on quality grounds? Only if the same quality criteria are applied equally to favourable studies and the exclusion is documented against the pre-specified rules. Annex XIV Part A Section 1(a) requires favourable and unfavourable data to be retained and evaluated with the same rigour. Selective application of quality criteria. Stricter to adverse-event reports, looser to benefit reports. Is one of the most serious possible Notified Body findings and one that reviewers actively look for.

How many reviewers should be involved in appraisal? The MDR does not specify a number. Current practice expects at least a second-reviewer check on the appraisal scores, with a documented process for resolving disagreements. A single-reviewer appraisal with no second-check is harder to defend and gives reviewers a reason to scrutinise the scoring more closely.

How does appraisal connect to the risk management file? Every unfavourable finding retained in appraisal. Particularly adverse events, residual risks, and known failure modes identified in the literature. Must be traceable into the risk management file maintained under EN ISO 14971:2019+A11:2021. If a literature-reported adverse event is not reflected as a hazard or a residual risk in the risk file, either the risk file needs updating or the appraisal did not take the finding seriously. Reviewers now actively trace this connection, and a missing link is a finding.

Sources

  1. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 61 (clinical evaluation), Annex XIV Part A Section 1 (clinical evaluation plan, including 1(a) on favourable and unfavourable data), Annex XIV Part A Section 2 (identification, appraisal, and analysis of clinical data). Official Journal L 117, 5.5.2017.
  2. MDCG 2020-5. Clinical Evaluation. Equivalence: A guide for manufacturers and notified bodies, April 2020.
  3. MEDDEV 2.7/1 revision 4. Clinical Evaluation: A Guide for Manufacturers and Notified Bodies under Directives 93/42/EEC and 90/385/EEC, June 2016 (legacy guidance, still referenced for appraisal scoring structure; MDR text and MDCG 2020-5 take precedence where they diverge).
  4. EN ISO 14155:2020 + A11:2024. Clinical investigation of medical devices for human subjects. Good clinical practice (applied where the appraised record is a clinical investigation of the subject device).
  5. EN ISO 14971:2019 + A11:2021. Medical devices. Application of risk management to medical devices (destination for unfavourable appraisal findings).

This post is part of the Clinical Evaluation & Clinical Investigations cluster in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. Appraisal is the quiet stage of clinical evaluation where the credibility of the whole report is decided. And the hour spent pre-specifying the criteria is worth more than any number of hours spent defending scores after a Notified Body finding.