IVDR performance evaluation rests on three pillars. Scientific validity establishes that the measured analyte is meaningfully linked to the clinical condition. Analytical performance proves the device can detect or measure the analyte accurately. Clinical performance proves the device's output delivers clinical benefit in the intended population. Each pillar builds on the previous, and omitting any one of them breaks the chain of evidence.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

  • Scientific validity is the biological and clinical association between an analyte and a condition. It is mostly a literature and state-of-the-art exercise.
  • Analytical performance is the technical ability of the device to detect or measure the analyte. It is demonstrated through laboratory studies using recognised protocols.
  • Clinical performance is the ability of the device's output to deliver a clinical benefit, evidenced through literature, routine data, retrospective specimens, or prospective studies.
  • The three pillars are sequential. A device with strong analytical performance but weak scientific validity is not safe to use. A device with strong scientific validity but weak analytical performance produces meaningless numbers.
  • The Performance Evaluation Report consolidates all three pillars. Notified bodies review the report as a single chain of evidence, not three separate sections.

Why this matters

Tibor's experience reviewing IVDR technical documentation surfaces the same failure mode repeatedly. The manufacturer has strong analytical performance data, adequate clinical performance data, and almost nothing on scientific validity. The Performance Evaluation Report jumps straight to "the device measures the analyte with X sensitivity and Y specificity" without ever explaining why the analyte is worth measuring in the first place.

That gap is not a paperwork formality. It is the question that protects patients. If the analyte is not causally or correlatively linked to the clinical condition, then a perfectly accurate measurement of the analyte tells the clinician nothing useful. The device could be flawlessly engineered and clinically useless at the same time.

The three pillars exist to enforce a chain of reasoning. Without scientific validity, analytical performance is engineering vanity. Without analytical performance, clinical performance is coincidence. Without clinical performance, the device has no proven benefit and therefore no favourable risk-benefit profile. IVDR encodes this logic into Article 56 and Annex XIII .

Felix's Subtract to Ship framing: the three pillars are the minimum viable evidence base. Nothing less will pass notified body review. Anything more is investment in confidence, not regulatory compliance.

What IVDR actually says

IVDR defines performance evaluation as the continuous process of assessing and analysing data to demonstrate scientific validity, analytical performance, and clinical performance of a device for its intended purpose . Annex XIII Part A details the structure and content of the Performance Evaluation Plan and Performance Evaluation Report .

The regulation defines each pillar.

Scientific validity of an analyte is the association of an analyte with a clinical condition or a physiological state . This is a biological and clinical claim, not a device claim. A manufacturer does not own scientific validity. The manufacturer documents that the association is established in the scientific and clinical literature, in consensus guidelines, in pathophysiology references, and in any existing regulatory acknowledgements.

Analytical performance is the ability of a device to correctly detect or measure a particular analyte . Analytical performance is a device claim, verified by laboratory study. It includes parameters such as analytical sensitivity, analytical specificity, trueness, precision, accuracy, limit of detection, limit of quantification, measuring range, linearity, cut-off, and robustness against interferences and matrix effects.

Clinical performance is the ability of a device to yield results that are correlated with a clinical condition or a physiological or pathological process or state in accordance with the target population and intended user . Clinical performance is a use-in-context claim. It is evidenced through diagnostic sensitivity, diagnostic specificity, positive and negative predictive values, likelihood ratios, and, where relevant, expected values in relevant populations.

The three pillars are not interchangeable. Strong data in one does not compensate for weak data in another. The notified body reviews the Performance Evaluation Report as a linked argument: is the analyte meaningful, can the device measure it, and does the measurement change clinical outcomes?

A worked example

Consider a startup developing a rapid test for a cardiac marker intended to rule out acute coronary syndrome in emergency department triage.

Scientific validity: is the cardiac marker linked to acute coronary syndrome? The team performs a structured literature review. The biological pathway is well understood. Guideline documents from relevant professional societies recommend measurement of the marker as part of rule-out algorithms. Decades of epidemiological and clinical studies document the association. Scientific validity is unambiguous and can be documented with a focused bibliography and a written rationale of a few pages.

Analytical performance: can the device measure the marker accurately? The team runs in-house studies following recognised protocols. Precision across operators, days, instruments, and lots. Trueness against a reference method calibrated to an internationally recognised reference material. Limit of detection calculated from replicate measurements of blank and low-concentration samples. Interfering substances tested systematically: haemoglobin, bilirubin, triglycerides, common drugs. Matrix tested: venous blood, plasma, serum, capillary blood. Stability tested across storage conditions. The analytical performance dataset occupies hundreds of pages of raw data and dozens of study reports, all summarised in the Performance Evaluation Report.

Clinical performance: does the measurement change clinical decisions? The team uses retrospective specimens from a biobank of emergency department patients with adjudicated diagnoses. The test is run blinded. Diagnostic sensitivity, specificity, positive predictive value, and negative predictive value are calculated against the adjudicated clinical gold standard. Confidence intervals are tight because the sample size is large. The negative predictive value is high enough to justify the rule-out claim in the intended population.

Each pillar builds on the previous. Scientific validity justifies the product. Analytical performance justifies trust in the numbers. Clinical performance justifies the intended use claim. The chain is complete, and the notified body review focuses on the content of each link, not on whether the links exist.

Contrast with a failure mode Tibor has seen. A startup develops a novel biomarker assay with beautiful analytical performance. Precision, linearity, limit of detection, all excellent. Clinical performance data is generated from retrospective specimens showing apparent correlation with a specific disease. Scientific validity is asserted in two paragraphs referencing a small number of exploratory publications. The notified body asks: is the biomarker recognised? Is there a validated causal or correlative pathway? Are there consensus guidelines? The answers are no, not yet, not yet. The entire submission is returned with a major finding. The real fix is not more data, it is time. The scientific literature has to catch up, or the manufacturer has to sponsor or contribute to the research that builds the validity base. That is a multi-year setback.

The Subtract to Ship playbook

Step 1. Build the scientific validity file first. Before the first assay prototype. Before the first lab study. A literature-based scientific validity review is cheap, fast, and can save years. If the literature cannot support the analyte-condition link, that is a business-critical finding.

Step 2. Write scientific validity as a narrative, not a table. A Performance Evaluation Report reviewer wants to read a coherent argument: the analyte X is biologically linked to the condition Y through mechanism Z, supported by evidence A, B, and C, recognised by guideline D. A table of citations alone does not demonstrate understanding.

Step 3. Lock analytical performance protocols to recognised standards. Use published CLSI documents, ISO standards, or pharmacopoeia methods wherever possible. Do not invent a protocol if a recognised one exists. Notified bodies trust recognised methodology and scrutinise ad-hoc methodology.

Step 4. Separate clinical performance endpoints by intended use claim. If the intended use claims rule-out capability, negative predictive value is the primary endpoint. If the claim is detection, sensitivity is primary. If the claim is confirmation, specificity is primary. Mismatched endpoints and claims are one of the most common NB findings in this area.

Step 5. Treat the Performance Evaluation Report as a single argument. Felix's coaching rule: write the conclusion first, then the evidence. If the conclusion does not follow from the three pillars in that order, the evidence base is incomplete.

Step 6. Build continuous performance follow-up into the QMS. IVDR treats performance evaluation as continuous. Post-market performance data must feed back into the Performance Evaluation Report on a defined cadence. Set up the data pipeline before certification, not after.

Step 7. Map the three pillars to your QMS records. Every statement in the Performance Evaluation Report should trace back to a controlled document: a literature review report, an analytical study report, a clinical performance data set. Traceability is what distinguishes a submission that passes from one that triggers a major finding.

Reality Check

  1. Can you explain, in three sentences, why your analyte is clinically meaningful?
  2. Have you documented scientific validity as a narrative with traceable references, not a bibliography dump?
  3. Does every analytical performance claim trace back to a recognised protocol or a justified deviation?
  4. Do your clinical performance endpoints match your intended use claims exactly?
  5. Is the Performance Evaluation Report structured so a reader can follow the three pillars as a linked argument?
  6. Do you have a post-market performance follow-up plan written down and resourced?
  7. Can every statement in the Performance Evaluation Report be traced to a controlled QMS record?
  8. If the notified body asked for a major gap to be closed, do you know which pillar is weakest?

If you cannot answer question 8 immediately, the three pillars have not been internalised yet.

Frequently Asked Questions

Can strong analytical performance compensate for weak scientific validity? No. They answer different questions. Analytical performance tells you the number is correct. Scientific validity tells you the number matters. Neither can substitute for the other.

Do all three pillars require new primary data? Not necessarily. Scientific validity often relies entirely on existing literature. Analytical performance almost always requires in-house study data. Clinical performance can draw from literature, routine data, retrospective specimens, or prospective studies, depending on novelty and class.

Where does scientific validity data come from for a novel biomarker? From exploratory and validation studies in the peer-reviewed literature, from conference proceedings, from consensus statements, and, when these are insufficient, from sponsor-funded research. Novel biomarkers carry the highest risk of insufficient scientific validity at the time of submission.

How does this map to the Performance Evaluation Report structure? The Performance Evaluation Report has sections for each pillar, and a summary section that integrates them. Annex XIII Part A sets the structural expectations .

What happens if one pillar is weak at submission? The notified body issues a finding. Depending on severity, the finding may block certification until additional evidence is provided. Weak scientific validity is usually the hardest to fix quickly, because it depends on the state of published literature, not on the manufacturer's effort alone.

Sources

  1. Regulation (EU) 2017/746 on in vitro diagnostic medical devices, Article 56 and Annex XIII Part A [MDR VERIFY].
  2. Regulation (EU) 2017/745 on medical devices, Article 61 and Annex XIV, for comparison.
  3. EN ISO 13485:2016+A11:2021 and EN ISO 14971:2019+A11:2021.