---
title: Risk Estimation: Probability and Severity Scales for MedTech Startups
description: How to build defensible probability and severity scales for medical device risk estimation under EN ISO 14971 clause 5.5 and MDR Annex I GSPR 2.
authors: Tibor Zechmeister, Felix Lenhard
category: Risk Management Under MDR
primary_keyword: risk estimation probability severity medical device
canonical_url: https://zechmeister-solutions.com/en/blog/risk-estimation-probability-severity
source: zechmeister-solutions.com
license: All rights reserved. Content may be cited with attribution and a link to the canonical URL.
---

# Risk Estimation: Probability and Severity Scales for MedTech Startups

*By Tibor Zechmeister (EU MDR Expert, Notified Body Lead Auditor) and Felix Lenhard.*

> **EN ISO 14971:2019+A11:2021 clause 5.5 requires the manufacturer to estimate the risk for each identified hazardous situation using available information or data. Risk is the combination of probability of occurrence of harm and severity of that harm. MDR startups do not need novel scales. They need defensible ones.**

**By Tibor Zechmeister and Felix Lenhard.**

## TL;DR
- EN ISO 14971:2019+A11:2021 clause 5.5 defines risk as the combination of probability of occurrence of harm and severity of that harm.
- Qualitative scales with four to five levels are acceptable and common. Semi-quantitative scales with numeric ranges are also acceptable if data exists to support them.
- Fully quantitative scales are rarely practical for startups and rarely necessary.
- Inventing a novel scale is the fastest way to trigger notified body questions. Use the standard's own examples as a starting point.
- Severity levels must map to clinically meaningful outcomes: negligible, minor, serious, critical, catastrophic.
- Probability levels must be justified against data, state of the art, or expert reasoning, and the justification has to be written down.

## Why this matters

Risk estimation is the step where most startups make a choice that defines the rest of their risk file. The choice is whether to build a scale that the team can actually apply consistently, or to copy a scale from a template without understanding what the columns mean.

The second path is the more common one, and it produces risk files where the same hazard is rated one way in January and a different way in March by a different team member, with no change in the underlying data. Notified body desk reviewers catch this in minutes. They compare the rating language against the actual probability descriptions and find inconsistencies that no founder can defend.

In Tibor's experience, the failure pattern is predictable. A startup downloads a template with probability levels labelled "improbable, remote, occasional, probable, frequent" and severity levels labelled "negligible, minor, serious, critical, catastrophic". The team applies these labels without defining what any of them mean in their specific context. The audit arrives. The auditor asks: what does "occasional" mean for your device? No one has an answer.

A defensible risk estimation system is not a fancier template. It is a simple template whose every level is defined in terms that the team can apply without ambiguity.

## What MDR actually says

MDR Annex I GSPR 1 requires that devices achieve the performance intended by the manufacturer and that any risks are acceptable when weighed against the benefits to the patient. Acceptability depends on estimation.

MDR Annex I GSPR 2 requires a risk management system. Estimation is a named part of that system.

MDR Annex I GSPR 3 requires risks to be reduced as far as possible through safe design and manufacture. "As far as possible" is measured against an estimated risk level, which presupposes a method of estimation.

MDR Annex I GSPR 4 requires protective measures for risks that remain after design controls. The adequacy of protective measures is judged against the residual risk estimate.

EN ISO 14971:2019+A11:2021 clause 5.5 is the operative clause. The standard states that for each identified hazardous situation, the manufacturer shall estimate the associated risks using available information or data. The standard explicitly permits qualitative or quantitative analysis, and the choice depends on the nature of the device and the data available.

Clause 5.5 also requires that the procedure used for risk estimation, including any system of categorisation of probability and severity, shall be recorded in the risk management file. In other words, the scales themselves are auditable documents.

Annex C.1 of EN ISO 14971 provides example severity levels. Annex C.2 provides example probability categories. These examples are not mandatory, but they represent the consensus starting point that notified bodies are familiar with.

Important note on MDR interaction. Section 6 of EN ISO 14971 says that if a risk is initially seen as acceptable, no further risk control is needed. This is not compliant with MDR. MDR Annex I GSPR 3 requires risks to be reduced "as far as possible" regardless of initial acceptability. EN ISO 14971 alone follows "as low as reasonably practicable" logic. The MDR ratchet is not baked into ISO 14971. Founders who copy the standard into their QMS silently miss this, and the mismatch shows up at the notified body. The estimation step must feed a reduction step that applies the MDR logic, not the bare ISO 14971 logic.

## A worked example

A startup is estimating risk for a software-only diagnostic product that flags potential skin lesions in primary-care photographs. One of the identified hazardous situations is "false negative in a malignant lesion, leading to delayed referral and delayed diagnosis of skin cancer".

The team drafts the first version of a five-level severity scale:

- 1 Negligible. Inconvenience only, no injury.
- 2 Minor. Reversible harm, no medical intervention required.
- 3 Serious. Reversible harm requiring medical intervention.
- 4 Critical. Irreversible harm, or serious injury requiring major intervention.
- 5 Catastrophic. Death or permanent disabling injury.

For the false-negative hazard, the outcome is delayed diagnosis of a malignancy. The team rates this severity 5, catastrophic. The rating is defensible because the clinical literature supports that delayed melanoma diagnosis can be fatal.

The probability scale is qualitative with five levels:

- A Improbable. Has never occurred in similar devices and is not expected.
- B Remote. Possible under specific conditions but not expected in normal use.
- C Occasional. Expected to occur a few times over the device lifetime.
- D Probable. Expected to occur during normal use of the device.
- E Frequent. Expected to occur repeatedly in normal use.

For the false-negative hazard, the team has to justify the probability. Published literature on similar AI dermatology tools reports false-negative rates between two and eight percent depending on lesion type. The team rates probability C, occasional, and documents the literature reference.

The risk level is the combination. Severity 5, probability C. In the team's risk matrix, this cell falls into the "high" zone, requiring risk control actions and residual risk re-evaluation. The team documents the clinical confidence threshold in the algorithm, the mandatory human-in-the-loop confirmation, and the IFU warning that the device is an aid to diagnosis and not a substitute for dermatologist review. After controls, the residual risk is re-estimated at severity 5, probability B, which the matrix places in "medium". Further control is applied through mandatory referral prompts, and residual is re-estimated.

The team never invented a novel scale. They used the Annex C examples, tightened the definitions, and justified every rating with evidence. A notified body reviewer reading the file can follow the reasoning step by step. That is the test of a defensible estimation method.

## The Subtract to Ship playbook

Felix coaches founders through a short, opinionated playbook for building the scales.

**One. Start from Annex C of EN ISO 14971.** Do not draft a novel scheme. The Annex C examples are the consensus baseline. Every deviation from them has to be defended. Every match to them saves an argument with the notified body.

**Two. Use five levels for severity and five levels for probability.** Four is too coarse to distinguish meaningful differences. Six or more is too granular for startups to apply consistently. Five is the sweet spot and it matches the most common notified body expectations.

**Three. Write the definitions of each level in device-specific terms.** Generic labels are useless. "Occasional" means nothing without a frequency range. "Serious" means nothing without a medical-intervention criterion. Every cell in the scale document needs one or two sentences of device-specific text.

**Four. Choose qualitative for new device categories and semi-quantitative where data exists.** If you have real occurrence data from predicate devices, pilot studies, or published literature, convert the probability scale into numeric ranges. If you do not, stay qualitative and say so explicitly. Do not pretend to quantitative precision you do not have.

**Five. Approve the scales at top management level and version-control them.** The scales are the measuring instrument for the entire risk file. Changing them mid-project without a formal change record invalidates every prior rating.

**Six. Do not invent a novel scheme to look sophisticated.** Tibor has audited dozens of startups who built their own scoring matrices with weighted factors, custom categories, and hybrid qualitative-quantitative formulas. Every one of them failed to apply the system consistently across their own hazard list. The notified body reviewer caught the inconsistency within an hour. The finding cost months of rework. Use the standard's own examples.

Subtract complexity. A clean five-by-five matrix with well-defined levels and evidence-backed ratings beats any bespoke formula.

## Reality Check

1. Are your severity and probability scales traceable to Annex C of EN ISO 14971?
2. Is every level of every scale defined in device-specific language, not only as a label?
3. Can every member of the risk management team apply the scales and reach the same rating for the same hazardous situation?
4. Have you documented the source of evidence for every probability rating in the risk file?
5. Does your risk management procedure apply the MDR "as far as possible" logic, not only the ISO 14971 acceptability logic?
6. Are the scales approved at top management level and under version control?
7. When the scales change, is there a documented change record and an impact assessment on prior ratings?

## Frequently Asked Questions

**Can we use a three-level scale to keep things simple?**
You can, but notified bodies generally expect five levels because three levels cannot distinguish between "minor" and "serious" in a way that drives different risk control actions. Tibor's recommendation is five levels with clear device-specific definitions.

**Is semi-quantitative always better than qualitative?**
No. Semi-quantitative is better when real probability data exists. When the data does not exist, semi-quantitative creates the illusion of precision and invites the notified body to ask for the source of the numbers. Qualitative with clear definitions is defensible. Fake semi-quantitative is not.

**What is the difference between probability of harm and probability of the hazardous situation?**
EN ISO 14971 uses a chain model. Probability that a hazardous situation occurs, multiplied by probability that the hazardous situation leads to harm. Some scales combine these into a single "probability of harm" estimate. Both approaches are acceptable if the method is documented in the risk management plan.

**Do we have to update the risk estimates after first certification?**
Yes. Post-market surveillance data feeds back into the risk file under MDR Article 83 and 84. Probability estimates get updated when new data arrives. Severity rarely changes, but new clinical evidence can shift it. Tibor's recommendation is to review all probability estimates at least annually and after any significant vigilance signal.

**Can we copy the scales from a template we found online?**
You can use a template as a starting point. You cannot use it unchanged. Every level has to be defined in your device's context, and every definition has to be approved by your own top management. Templates are scaffolding, not content.

**What is the single most common mistake at this step?**
In Tibor's experience, applying the scales inconsistently across the hazard list. Same hazard, different ratings from different team members, no documented reason for the difference. The fix is a short calibration workshop where the team rates ten sample hazards together and compares results before the real rating exercise starts.

## Related reading

- [Hazard identification methods](/blog/hazard-identification-methods) on the clause 5.4 step that feeds clause 5.5 risk estimation.
- [Intended use and reasonably foreseeable misuse in risk analysis](/blog/intended-use-foreseeable-misuse-risk-analysis) on the inputs that bound the hazard space before estimation begins.
- [The ISO 14971 Annex Z trap](/blog/iso-14971-annex-z-trap) on the MDR-specific overrides of the base standard's acceptability logic.
- [Benefit-risk analysis in the technical documentation](/blog/benefit-risk-analysis-technical-documentation) on how the estimated risks connect to the benefit-risk determination in the technical file.

## Sources

1. Regulation (EU) 2017/745 on medical devices, consolidated text. Annex I GSPR 1, 2, 3, and 4.
2. EN ISO 14971:2019+A11:2021, Medical devices, Application of risk management to medical devices, clause 5.5 and Annex C.

---

*This post is part of the [Risk Management Under MDR](https://zechmeister-solutions.com/en/blog/category/risk-management) cluster in the [Subtract to Ship: MDR Blog](https://zechmeister-solutions.com/en/blog). For EU MDR certification consulting, see [zechmeister-solutions.com](https://zechmeister-solutions.com).*
