When a Notified Body reviews technical documentation under MDR Annex IX Section 4, they do not read every page. They sample specific sections of Annex II, pick individual claims, and follow the evidence chain from the GSPR checklist into the risk file, the verification and validation evidence, the clinical evaluation, and the labelling. What gets scrutinized is traceability: can a sampled claim be followed end to end without a broken link? The file that survives is the file where every trace holds. The file that fails is the file where the auditor asks one question and the chain breaks.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.


TL;DR

  • Under MDR Article 52 and Annex IX Section 4, the Notified Body assesses the technical documentation for conformity with Regulation (EU) 2017/745. The assessment is sample-based, not exhaustive.
  • The sample typically includes the device description, the GSPR checklist, a handful of specific Annex I requirements traced end to end, the risk management file, the clinical evaluation, and the labelling against the substantiated claims.
  • Every sampled claim is traced through the evidence chain. If the chain breaks at any point, a nonconformity is written against the specific provision that the missing link supports.
  • The most common reviewer flags are broken cross-references, GSPR items silently marked not applicable, risk controls that are not implemented in the design evidence, and clinical claims that exceed what the clinical evaluation substantiates.
  • Structure beats volume. A small Annex II-shaped file that traces cleanly outperforms a large unstructured file where the content exists but cannot be followed.

The view from the reviewer's chair

The Notified Body review of technical documentation does not happen the way many founders imagine it. The reviewer does not start at page one and read through to the end. There is not enough time, and the Regulation does not ask for it. What happens instead is a structured sampling walk, guided by MDR Annex IX Section 4, where the reviewer picks entry points, pulls threads, and watches what happens when the thread is followed.

The mental model is closer to an investigator than an editor. The reviewer has a checklist derived from Annex II and Annex I, a risk class that determines the depth of sampling, and a limited amount of time to form a defensible judgment on whether this file supports the conformity of this device. Every sampled point has to produce an answer, and every answer has to be traceable. Points that produce silence, confusion, or contradiction become findings.

Tibor has sat in that chair. The file he later described as a treasure hunt was substantial on the page count. The content was mostly there. But every sampled point produced a pause while the team went looking, and every pause lowered the reviewer's confidence in the file as a whole. By contrast, the three-person company in Lower Austria handed over a file where every sampled point resolved in minutes, and the audit closed with zero nonconformities. The variable was not size. The variable was whether the trace held.

The sampling approach

The reviewer does not audit every document. Under Annex IX Section 4, the Notified Body assesses the technical documentation on a representative basis, with sampling depth driven by device class, novelty, and risk. For a Class IIa device, the sample can be modest. For a Class III implantable, the sample can be deep and broad. Either way, the sample is structured around the same entry points.

Entry point one is the device description and intended purpose in Annex II Section 1. The reviewer reads this to form the mental model for everything that follows. If this section is unclear, every later sample becomes harder because there is no stable reference against which to judge evidence.

Entry point two is the GSPR checklist in Annex II Section 4. This is the map into the rest of the file. The reviewer picks a handful of Annex I requirements. Usually the ones most relevant to the device's risk profile. And follows each one into the evidence the checklist points to.

Entry point three is the risk management file under Annex II Section 5 and EN ISO 14971:2019+A11:2021. The reviewer picks risks and traces the control measures into the design evidence and the labelling.

Entry point four is the clinical evaluation report in Annex II Section 6. The reviewer compares the clinical claims to the intended purpose in Section 1 and checks that the evidence actually substantiates what the device is claimed to do.

Entry point five is the labelling and instructions for use in Annex II Section 2, cross-checked against the claims the rest of the file supports.

The sample size is small relative to the file. The consequences of a broken trace are large. That asymmetry is the point of the method.

The GSPR trace

The GSPR checklist is the single most sampled artefact in the review. The reviewer picks three, five, or more Annex I requirements and follows each one through the chain. The chain looks like this: the Annex I requirement, the method of demonstration stated in the checklist, the specific evidence referenced, the document that holds the evidence, the version of that document, and the conclusion that the evidence supports conformity.

Every step has to resolve. If the method of demonstration is "harmonised standard," the referenced standard has to be a current version and the test report has to exist. If the method is "in-house test," the test report has to be identifiable, version-controlled, and aligned with the claim. If the method is "literature," the literature has to be cited with sufficient specificity that the reviewer can find it and judge its relevance.

The common failure mode is a GSPR row marked "not applicable" without a real justification. Silent non-applicability is the cheapest way to produce a finding. Every non-applicable row needs a reasoned explanation, and the reasoning has to stand up to an informed reader. "Not applicable because the device does not have this feature" is sometimes correct and sometimes lazy. The reviewer can tell the difference.

The second failure mode is a GSPR row that points to a document identifier that no longer exists, or to a version that has been superseded. This is what broken cross-references look like in practice. The content was once there. A later revision renamed the document, moved it, or updated the version, and the GSPR checklist did not follow. The trace breaks at the reference layer.

The risk-design-V&V chain

The second deep trace the reviewer runs is the risk-design-verification chain. The reviewer picks a risk from the ISO 14971 file, reads the control measure, and looks for that control measure implemented in the design evidence and verified in the test reports.

A working chain looks like this. Risk identified: device delivers incorrect dose. Control measure: hardware dose-limiting circuit combined with software cross-check. Design evidence: circuit schematic, software requirement specification, integration test protocol. Verification evidence: circuit test report showing the limit is enforced, software verification report showing the cross-check fires on injected faults. Residual risk evaluation: acceptable given the combined controls. Benefit-risk conclusion: benefits outweigh the residual risk for the intended population.

Every link is a document. Every document is version-controlled. Every reference in the chain resolves to the correct version. The reviewer can walk the chain in minutes because the chain was built as a chain, not reconstructed from parallel binders.

The broken version of this chain is the risk file that sits beside the design file with no connective tissue. The risk has a control measure on paper. The design has no corresponding element. When the reviewer asks where the control is implemented, the team has to invent the trace on the spot. Invention under audit pressure rarely produces the answer the reviewer is willing to write down.

The clinical evaluation cross-check

The third deep trace compares the clinical evaluation report in Annex II Section 6 to the intended purpose in Annex II Section 1 and to the clinical claims in the labelling in Annex II Section 2. The three have to agree.

The reviewer reads the intended purpose. Then the reviewer reads the list of clinical claims the device makes. On the label, in the IFU, on the website where the manufacturer has placed it. Then the reviewer opens the clinical evaluation report and checks that each claim is substantiated by evidence at a level appropriate for the class of the device, under MDR Article 61 and Annex XIV.

Scope drift is the common finding. The intended purpose in Section 1 says one thing. The clinical evaluation evaluates something slightly broader or slightly different. The labelling claims something that neither Section 1 nor the CER quite covers. The three drift apart because they were written by different people at different times without a single reference version.

The fix is to lock the intended purpose early and reference it. Literally reference it, by document identifier. From the clinical evaluation and from the labelling. When the intended purpose changes, the references propagate. When the intended purpose stays still, the references stay true. This is document control applied to a single sentence, and it prevents the most expensive category of finding in the file.

The labelling claims check

The fourth deep trace is a claims audit on the labelling and on external materials. The reviewer reads the label, reads the IFU, reads the warnings and contraindications, and checks that everything on the printed material traces into the technical file.

A warning on the label should trace to a hazard in the risk file and a control measure that assigns user information as the mitigation. An indication on the label should trace to the clinical evaluation. A performance claim should trace to a verification test report. A compatibility statement should trace to interoperability evidence. Every line on the label is a claim, and every claim is in scope.

External materials are in scope too. Under the MDR definition of intended purpose, promotional and sales materials are part of the intended purpose whether the manufacturer treated them that way or not. The reviewer can and does open the company website during the review. A claim on the website that does not appear in the technical file is a finding as surely as a claim on the label itself.

What raises a flag

Beyond the trace failures, certain surface-level features of the file catch the reviewer's attention immediately and raise the scrutiny level for every subsequent sample. These are the early signals that the file needs a careful read rather than a confirmatory one.

A table of contents that does not match Annex II is the first signal. If the file uses internal project naming, reorders the sections, or merges Annex II sections to "simplify" the structure, the reviewer has to translate every sample back into Annex II terms. Translation lowers trust and raises the sampling depth.

A risk file that lives as a standalone binder with no cross-references into Section 3 or Section 6 is the second signal. The reviewer expects the risk file to be woven into the design evidence, not parked next to it.

A GSPR checklist with many "not applicable" rows and few justifications is the third signal. Silent non-applicability tells the reviewer that either the team did not understand Annex I or the team is hoping the reviewer will not look closely.

A file without a version, a last-updated date, or a master index is the fourth signal. If the file itself has no version, document control is not in place, and the rest of the file becomes suspect.

Copy-paste content with placeholder company names or stock procedures that do not describe the company is the fifth signal. The reviewer notices within a few pages, and every subsequent document in the file is read with the question "is this real or is this template?"

Common findings

The common findings that result from the trace failures above are the same set in almost every first-audit technical documentation review.

  • GSPR items marked not applicable without justification, or marked applicable with broken evidence references.
  • Risk control measures that exist in the risk file but cannot be located in the design evidence.
  • Clinical evaluation scope that does not match the intended purpose in Section 1.
  • Label claims that exceed what the clinical evaluation and the verification evidence substantiate.
  • Verification and validation test reports that are not referenced by any GSPR row and serve no identifiable compliance purpose.
  • Post-market surveillance documentation under Annex III that is missing or reduced to a placeholder.
  • Documents in the file that are not under the document control procedure required by EN ISO 13485:2016+A11:2021.

The pattern behind all of these is the same pattern. The content was built as parallel streams by different contributors, the streams were never integrated into a single traceable file, and the reviewer's sampling method finds exactly the seams where the streams fail to meet.

The Subtract to Ship angle

The review is a trace test. The file either traces or it does not. Every document in the file that does not contribute to a trace is weight that makes the tracing harder without making it more reliable.

Subtract to Ship applied to the file before the review is the discipline of asking, for every document: what trace does this document support? If a document supports a trace from a GSPR row to evidence, from a risk to a design control, from a clinical claim to a study, it stays. If it supports no trace, it comes out. The Subtract to Ship framework for MDR provides the broader methodology; applied to technical documentation before a Notified Body review, the principle is identical.

The file that wins the review is not the largest file. It is the file where every document earns its place on the trace map, and where the reviewer's samples resolve without reconstruction.

Reality Check. Where do you stand?

  1. If a reviewer picked five rows of your GSPR checklist right now, would every one of them resolve to a real document at the version cited?
  2. Can you trace a single risk from the ISO 14971 file through the design evidence into a verification report and into the residual risk statement, in under five minutes?
  3. Does the intended purpose in Annex II Section 1 match, word for word where possible, the scope of your clinical evaluation and the claims on your label?
  4. For every "not applicable" row in your GSPR checklist, is the justification reasoned and specific to this device?
  5. Is every document in the file version-controlled under EN ISO 13485:2016+A11:2021, with a clear owner and review cycle?
  6. Do the claims on your website match the claims your technical file substantiates? When was the last direct comparison?
  7. Does your file have a master index and a version number at the file level, or only at the individual document level?

Frequently Asked Questions

What does the Notified Body actually check during technical documentation review? Under MDR Article 52 and Annex IX Section 4, the Notified Body samples the technical documentation and traces specific claims from the GSPR checklist, the risk management file, the clinical evaluation, and the labelling into the underlying evidence. The review is not exhaustive. It is a structured sampling walk designed to judge whether the trace holds across the file.

How deep does the sampling go? Sampling depth scales with device class, novelty, and risk. A Class IIa device with a well-established technology gets a modest sample. A Class III implantable or a first-of-its-kind device gets a deep and broad sample. In every case the sample is chosen to test the parts of the file most likely to reveal integrity problems.

What is the single most sampled artefact in the file? The GSPR checklist in Annex II Section 4. It is the map into the rest of the file, and the reviewer uses it as the launching point for most trace exercises. A GSPR checklist that points cleanly into real evidence in the rest of the file protects almost every other section of the review.

Can the reviewer write a finding based on the website? Yes. Promotional and sales materials are part of the intended purpose under the MDR definition, which means a claim on the website that is not substantiated by the technical file can produce a finding as surely as a claim on the physical label. Reviewers do open the website during the review.

What makes a file "treasure hunt" bad? A file where the content exists but cannot be located within the expected Annex II section within a reasonable time. The reviewer treats findability as evidence. If the information cannot be found, the default position is that it is missing, and the burden shifts to the manufacturer to prove otherwise during the audit. That shift almost always lowers the final outcome of the review.

Sources

  1. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 52 (conformity assessment procedures), Annex IX Section 4 (assessment of the technical documentation), Annex II (technical documentation, sections 1 to 6), Annex I (general safety and performance requirements), Annex III (technical documentation on post-market surveillance). Official Journal L 117, 5.5.2017.
  2. EN ISO 13485:2016 + A11:2021. Medical devices. Quality management systems. Requirements for regulatory purposes.
  3. EN ISO 14971:2019 + A11:2021. Medical devices. Application of risk management to medical devices.

This post is part of the Technical Documentation & Labeling series in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. Tibor has reviewed technical documentation from the Notified Body side of the table and built his own from the founder side. The sampling approach described here is the approach that has produced both zero-nonconformity outcomes and the treasure hunt outcomes. The difference between the two is always the trace.