Formative Usability Evaluation: Test Early and Often as a Startup

Formative usability evaluation is the iterative, exploratory testing a startup runs while the device is still changing. Its job is to find use errors, confusion, and workflow friction so the design can be fixed before EN 62366-1:2015+A1:2020 summative validation. It is cheap, repeated, and informal in logistics, but formal in intent.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

Formative evaluation under EN 62366-1:2015+A1:2020 clause 5.8 is iterative and informs the design while change is still possible.
Summative evaluation (clause 5.9) is the final validation and happens when the design is frozen.
A group of engineers reviewing the device around a table is not formative evaluation. Real formative testing recruits representative users and captures real behaviour.
Startups that start formative testing with paper prototypes and early software builds find use errors that otherwise only surface in summative, where fixing them is five to ten times more expensive.
Formative evaluation feeds directly into the risk management file and the use specification under MDR Annex I §5.

Why formative usability matters before a single line of code freezes

In Tibor's experience auditing MedTech companies, the most common usability failure is not a dramatic disaster. It is a slow accumulation of small use errors that nobody caught in time. By the time the device reaches summative validation, those errors are structural. Fixing them means redesigning the device, updating the risk file, changing the instructions for use, and running summative again.

Formative evaluation is the opposite pattern. It is cheap. It is fast. It happens while everything is still fluid. And it is the single most effective way a MedTech startup can protect its summative budget and its certification timeline.

Felix has coached founders who thought formative evaluation was a nice-to-have. Their reasoning sounded sensible. They were short on runway. They had a small development team. They would do proper testing at the end. Every one of those founders later admitted that skipping formative was the most expensive decision of their development cycle. The use errors they found in summative were the same ones a ten-euro paper prototype test would have surfaced eighteen months earlier.

The regulatory reason formative matters is Annex I §5 of MDR (EU) 2017/745. The General Safety and Performance Requirements demand that devices be designed to reduce use errors to the lowest possible level, taking into account the technical knowledge, experience, education, training, and environment of the intended user. That language is not a suggestion. It is a hard requirement, and EN 62366-1:2015+A1:2020 is the harmonised standard that provides the presumption of conformity. Formative evaluation is how the standard expects that reduction to happen.

What EN 62366-1:2015+A1:2020 actually says about formative evaluation

Clause 5.8 of EN 62366-1:2015+A1:2020 describes formative evaluation as the activity conducted to explore user interface design strengths, weaknesses, and unexpected use errors. The standard explicitly positions formative evaluation as iterative. It happens throughout the design process, and its outputs feed back into the user interface specification, the use specification (clause 5.1), the hazard-related use scenarios (clause 5.3), and the risk analysis.

Clause 5.8 also makes clear that formative evaluation is not a single event. It is a category of activity. A paper prototype walkthrough with three nurses is formative evaluation. A tabletop simulation with a clickable mock-up is formative evaluation. A cognitive walkthrough with a home-care patient on a near-final prototype is formative evaluation. What ties these together is that the design is still changing, the goal is learning, and the output feeds back into the next iteration.

The standard draws a hard line between formative and summative. Summative evaluation, covered in clause 5.9, validates the final user interface against the use specification. Summative evaluation happens once, on a frozen design, with representative users in realistic conditions. Mixing the two is a common startup mistake, and it is a mistake notified bodies catch immediately.

Tibor describes the pattern he sees in audits: a startup team sits around a table, reviews the device for an afternoon, writes a short memo saying "no significant usability issues identified", and files it as summative evaluation. That is not summative. It is not formative either. It is a group design review, which is valuable but does not fulfil any clause of EN 62366-1:2015+A1:2020. When a notified body sees this, it pushes back and demands real evaluation evidence. The startup now has to run both formative and summative from scratch, under time pressure, with the certification clock ticking.

A worked example: the handheld device with a flipped display

Consider a scenario drawn from Tibor's casebook. A small team developed a handheld diagnostic device with a touchscreen display. The engineering team happened to be mostly left-handed. Without noticing, they designed the display orientation, the button placement, and the grip ergonomics around left-handed use. Everybody on the team found the device intuitive. Internal design reviews raised no issues. The device proceeded to summative validation.

Summative evaluation with recruited right-handed users surfaced the problem within the first two sessions. Users held the device naturally and found the display upside down from their perspective. The fix was a software iteration that let the user flip display orientation on first use. It was not a catastrophic redesign, but it triggered change control, a new round of verification, a risk file update, and a repeat summative run for the affected use scenarios.

The cost of the fix was measurable. The cost of not catching it was higher. Had the team run even one formative session with a mixed-handedness user group on an early prototype, the issue would have been visible in week six of development instead of week ninety. The hardware grip could have been adjusted before tooling was cut. The software flip would have been a day-one design requirement rather than a field patch.

This is the pattern formative evaluation is designed to catch. It is not about finding the catastrophic safety issue, although it sometimes does. It is about finding the assumptions the development team cannot see about itself. Formative evaluation is a mirror the team cannot hold up to itself from the inside.

The Subtract to Ship playbook for formative evaluation

Felix's Subtract to Ship methodology takes a specific stance on formative evaluation: run it cheaply, run it often, and never pretend it is something else. The playbook has six steps.

Step 1: Write the use specification first. Before the first formative session, document the intended users, use environments, frequency of use, and operational scenarios under EN 62366-1:2015+A1:2020 clause 5.1. Tibor is explicit that the use specification is the most-skipped, most-important document in the whole process. Decompose every real-world procedure: cleaning, transport, installation, normal operation, edge cases, failure recovery. Granular procedures make hazards visible.

Step 2: Test with paper prototypes in week one. Before the first line of UI code is written, print the screens, sit a representative user in front of them, and ask them to complete a task. Observe what they touch, where they hesitate, what they misread. This is formative evaluation. It costs nothing, it takes an afternoon, and it catches assumptions that would otherwise ship to summative.

Step 3: Recruit real users, not friends. Tibor has seen startups run formative evaluation with engineers, sales staff, friendly customers, and key opinion leaders. Engineers are too skilled. Sales staff know the pitch. KOLs are too familiar with the clinical domain. None of them represent the real seventy-year-old home user who will actually hold the device. Budget for participant recruitment from day one. It is the single highest-use line item in the usability budget.

Step 4: Capture observations, not opinions. During formative sessions, capture what the user did, not what they said they liked. Opinions are noise. Behaviour is signal. Video or structured note-taking is sufficient. Formative evaluation under clause 5.8 does not require the same level of formality as summative, but it does require evidence.

Step 5: Feed findings into the hazard analysis. Every formative finding should be evaluated for risk management impact under EN ISO 14971:2019+A11:2021. If a user made an error, the hazard analysis must ask whether that error could lead to harm. If yes, the risk control strategy changes. If no, the finding still informs the user interface specification.

Step 6: Iterate until the design is stable, then freeze. Formative evaluation continues until the design team is confident the remaining user interface issues are minor. At that point, the design is frozen, and summative validation runs under clause 5.9. Summative is a snapshot of a finished design. Formative is the process that made the design worth snapshotting.

The Subtract to Ship discipline is to resist the temptation to compress these steps. A startup that tries to save money by skipping formative will spend ten times as much fixing issues in summative or, worse, in post-market corrective action.

Reality Check

Does your startup have a written use specification under EN 62366-1:2015+A1:2020 clause 5.1 that covers every real-world procedure, including cleaning, transport, installation, and edge cases?
Have you run at least one formative session with representative users who are not employees, founders, or friends of the team?
Are your formative findings documented in a way that feeds into the risk management file and the user interface specification?
Can you show a notified body auditor evidence that formative evaluation happened iteratively throughout development, not as a single pre-summative event?
Do you have a clear rule for when formative ends and the design is frozen for summative validation?
If you removed every team member from the room and replaced them with representative users, would your device still look intuitive?
Is your formative budget proportional to your summative budget? Most startups under-fund formative by a factor of ten.

Frequently Asked Questions

How many formative sessions does EN 62366-1:2015+A1:2020 require? The standard does not specify a minimum number. Clause 5.8 requires that formative evaluation be sufficient to inform the design. In practice, three to six sessions across the development lifecycle, with five to eight participants each, is a defensible baseline for a low-to-moderate complexity device.

Can formative evaluation replace summative validation? No. Formative and summative serve different purposes under clauses 5.8 and 5.9. Formative explores and informs. Summative validates. A notified body will reject a submission that uses formative results to substitute for summative validation.

Is a design review meeting formative evaluation? No. A group of engineers discussing the device around a table is a design review, not a formative evaluation. Tibor is explicit on this point. Formative evaluation requires representative users interacting with the device or prototype in conditions that approximate intended use.

How early should formative evaluation start? As early as there is something to test. Paper prototypes, wireframes, and clickable mock-ups are all valid artefacts for formative evaluation. Starting in week one of development is normal and recommended.

Do notified bodies review formative evaluation records? Yes. Notified bodies expect to see formative evaluation evidence as part of the usability engineering file under EN 62366-1:2015+A1:2020 clause 5.7. Absence of formative evidence, combined with a thin summative report, is a common audit finding.

Does formative evaluation apply to software-only devices and apps? Yes. EN 62366-1:2015+A1:2020 applies to the user interface of the medical device, regardless of whether the device is hardware, software, or a combination. App-based devices are in scope, and connected device combinations must evaluate the full user journey.

MDR Summative Usability Evaluation: The Final Validation Test is the anchor post for summative evaluation under EN 62366-1:2015+A1:2020.
How to Plan and Execute Usability Tests for Medical Devices covers protocol design, recruitment, and data capture.
Risk Management and Usability Engineering: How They Link shows how formative findings feed the EN ISO 14971 risk file.
EN 60601-1-6 Usability Cross-Reference connects the general usability standard to medical electrical equipment.

Sources

Regulation (EU) 2017/745 on medical devices, consolidated text. Annex I §5, §22. Annex II.
EN 62366-1:2015+A1:2020, Medical devices, Application of usability engineering to medical devices. Clauses 5.1, 5.3, 5.7, 5.8, 5.9.
EN ISO 14971:2019+A11:2021, Medical devices, Application of risk management to medical devices.