How to Plan and Execute Usability Tests for Medical Devices

A usability test plan for a medical device covers six components under EN 62366-1:2015+A1:2020 clauses 5.7 to 5.9: the test objectives, the participant profile, the use environment, the scenarios to be tested, the data capture method, and the analysis approach. Each component must trace back to the use specification and the hazard-related use scenarios in the usability engineering file.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

A usability test plan under EN 62366-1:2015+A1:2020 must specify objectives, participants, environment, scenarios, data capture, and analysis.
Participant recruitment is the highest-use decision in the entire plan. Testing with friendly users invalidates the result.
The use environment must be realistic enough to elicit representative user behaviour.
Data capture should prioritise observed behaviour over user opinions.
Findings feed directly into the risk management file under EN ISO 14971:2019+A11:2021.
The same protocol structure works for both formative (clause 5.8) and summative (clause 5.9) evaluation, with different levels of formality.

Why a written test plan matters

Felix has coached founders who treated usability testing as something to be done when the device was ready. No plan, no protocol, no written objectives. Just a session or two with whoever was available. Every one of these founders later had to repeat the work when the notified body asked for the usability engineering file and found nothing to review.

A written usability test plan is not bureaucracy. It is the artefact that proves the testing was structured, the participants were representative, the scenarios were hazard-relevant, and the findings were analysed. Without the plan, the test has no evidentiary value under EN 62366-1:2015+A1:2020 clause 5.7, which requires a documented usability engineering file.

Tibor sees the same pattern from the auditor side. A startup produces a short memo saying usability testing was completed. The memo has no protocol, no participant profile, no scenario list, and no analysis. The auditor asks for the underlying plan. There is none. The submission stalls. The startup has to run the testing again, this time properly, under schedule pressure.

The fix is to write the plan first. The plan is cheap. It takes a few days. It forces the team to think through who, where, what, and how before spending money on participants. It becomes the scaffold the test report is built on. And it is the first thing the notified body will ask to see.

The six components of a usability test plan

A complete usability test plan under EN 62366-1:2015+A1:2020 has six components. Each one maps to a specific clause of the standard.

Component 1: Test objectives. The objectives state what the test is trying to learn or validate. For formative evaluation under clause 5.8, objectives are exploratory: identify use errors, confusion, and friction in specific parts of the user interface. For summative evaluation under clause 5.9, objectives are validation: confirm that the intended users can safely and effectively complete the hazard-related use scenarios. The objectives must trace back to the use specification under clause 5.1.

Component 2: Participant profile. The participant profile defines the intended user groups the test will cover, the number of participants per group, and the inclusion and exclusion criteria. For a home-care device, the profile might include age range, clinical condition, prior experience with similar devices, language, and relevant disabilities. The profile is drawn from the user groups identified in the use specification. Testing outside the profile is not valid.

Component 3: Use environment. The environment describes where and under what conditions the test will run. A realistic home environment for a home-care device. A simulated clinic for a hospital device. A controlled outdoor environment for a device with outdoor use scenarios. The environment must be realistic enough to elicit representative behaviour. A conference room with a table and a whiteboard is not a home.

Component 4: Test scenarios. The scenarios are the tasks the participant will complete during the session. They must cover every hazard-related use scenario identified under clause 5.3. For a summative test, every hazard-related scenario is in scope. For a formative test, the team can focus on specific parts of the user interface being iterated on. Each scenario has a starting state, a goal, and a success criterion.

Component 5: Data capture method. The method defines what will be recorded during the session. Video is standard. Audio think-aloud protocols are common. Structured observer notes capture use errors, close calls, hesitations, and questions. For summative evaluation, the data capture must be thorough enough to support the analysis. For formative evaluation, the capture can be lighter but must still produce evidence.

Component 6: Analysis approach. The analysis defines how findings will be evaluated. Every observed use error is assessed for risk impact under EN ISO 14971:2019+A11:2021. New hazards trigger new risk controls. Confirmed hazards with adequate controls are documented as validated. Unresolved hazards block the submission. The analysis approach also defines how the findings will be presented in the usability engineering file.

A worked example: planning a summative test for a handheld device

Consider a startup preparing summative evaluation for a handheld diagnostic device intended for home use by patients aged sixty and older. The device has a touchscreen display, a battery indicator, a test strip insertion port, and a result screen. The use specification identifies one primary user group: home users aged sixty plus with limited prior experience of medical devices.

The test plan is drafted as follows.

Objectives. Validate that the intended user can safely complete the six hazard-related use scenarios identified in the risk file: unpacking and initial setup, inserting a test strip, reading a result, interpreting an error message, cleaning the device, and replacing the battery. Confirm that no new hazards surface in the intended use environment.

Participant profile. Fifteen participants aged sixty to eighty, recruited through a professional clinical recruiter. Inclusion criteria: no prior use of the device, native speakers of the language the IFU is written in, corrected vision adequate for reading the display, dexterity adequate for holding the device. Exclusion criteria: current employees of the manufacturer, family members of employees, participants from prior formative sessions.

Use environment. A simulated home living room with natural lighting, a coffee table, and a chair. Participants bring nothing to the session. The device, the IFU, and the test strips are presented on the table in a commercial-style package.

Test scenarios. Six scenarios, one per hazard-related use scenario, each with a starting state, a goal, and a success criterion. For example, scenario three: reading a result. Starting state: device has completed a test cycle and is displaying a result. Goal: the participant correctly identifies the result value and the reference range. Success criterion: the participant states the correct value and correctly classifies it as within or outside the reference range without coaching.

Data capture. Video of the participant's hands, face, and the device screen. Audio think-aloud protocol, prompted by a neutral facilitator. Structured observer notes capturing every use error, hesitation, and question. Post-session structured interview covering any difficulties the participant reports.

Analysis. Each observed use error is logged against the hazard-related use scenario it occurred in. Use errors are classified as close calls, recoverable errors, or potential harm events. Each classification feeds into the risk file and triggers a risk management decision. The summative evaluation report presents the findings, the analysis, and the conclusions for the notified body.

This plan is not hypothetical. It is the skeleton Tibor uses when reviewing startup summative evaluation plans for compliance with EN 62366-1:2015+A1:2020.

The Subtract to Ship playbook for executing the test

Planning is half the battle. Execution is the other half. Felix's Subtract to Ship approach to usability test execution has six disciplines.

Discipline 1: Recruit through a professional, not through the team. Tibor is explicit that testing with friendly users is one of the most common startup mistakes. Engineers are too familiar. Sales staff know the pitch. KOLs are not representative. Family and friends are not representative. Budget for a professional recruiter or a clinical research network. This line item is not optional.

Discipline 2: Run a pilot session first. Before the main test, run one pilot session to shake out the protocol. The pilot catches timing issues, facilitator script problems, data capture gaps, and environment friction. The pilot participant is excluded from the main analysis.

Discipline 3: Use a neutral facilitator. The facilitator should not be a member of the development team. Developers cannot resist prompting participants. A neutral facilitator asks neutral questions, avoids coaching, and lets the participant reveal real behaviour. If the startup cannot afford a professional facilitator, the founder is usually the least-compromised option, provided the founder is disciplined about silence.

Discipline 4: Capture behaviour, not opinions. During the session, capture what the participant does, not what they say they like. After the session, opinions can be collected through a structured interview, but the primary data is observed behaviour. This discipline is drawn from human factors methodology and is backed by decades of research showing that users' stated preferences often contradict their observed performance.

Discipline 5: Debrief after every session. After each session, the facilitator and observer debrief for fifteen minutes to capture impressions, flag emerging patterns, and update the protocol if something clearly broken needs fixing. For formative testing, protocol updates mid-test are fine. For summative testing, protocol changes invalidate the run and must be avoided.

Discipline 6: Feed findings into the risk file in real time. Do not wait until the end of the test cycle to update the risk management file. Every significant finding is logged against the hazard analysis the same week it is observed. This discipline keeps the risk file current and prevents end-of-test analysis backlog.

Reality Check

Do you have a written usability test plan under EN 62366-1:2015+A1:2020 clause 5.7 that covers objectives, participants, environment, scenarios, data capture, and analysis?
Are your test objectives traceable to the use specification and the hazard-related use scenarios?
Is your participant recruitment handled by a professional recruiter or a clinical network, or are you testing with friendly users?
Is your use environment realistic enough to elicit representative behaviour, or are you running the test in a conference room?
Do your test scenarios cover every hazard-related use scenario from clause 5.3?
Is your data capture thorough enough to support the analysis a notified body will review?
Is your analysis approach integrated with the risk management file under EN ISO 14971:2019+A11:2021?
Have you run a pilot session to shake out the protocol before the main test?
Is your facilitator neutral, or are they a member of the development team who cannot resist coaching?

Frequently Asked Questions

How long should a usability test session last? For a typical handheld or home-use device, ninety minutes per participant is a reasonable target. Complex clinical devices may require longer sessions. Shorter sessions risk missing use errors that only surface after the participant has had time to interact with the full workflow.

How many pilot sessions should I run before the main test? One pilot is usually sufficient to shake out the protocol. If the pilot reveals significant problems, run a second pilot after fixing them. Pilot participants are excluded from the main analysis.

Can I use remote usability testing for a medical device? For software-only devices and apps, remote testing can be valid if the remote environment is representative and the data capture is thorough. For hardware devices, remote testing is generally not suitable for summative evaluation because the physical interaction cannot be observed adequately.

Do I need ethics committee approval for usability testing? Usability testing that does not collect clinical data, does not expose participants to clinical risk, and does not involve patients using the device on themselves in a way that differs from normal use typically does not require ethics committee approval. Edge cases exist. A regulatory professional should review the study design before recruitment begins.

What data capture format do notified bodies prefer? Notified bodies expect a summative evaluation report that includes the protocol, the participant demographics, the observed use errors, the analysis, and the conclusions. Raw video is usually not required in the submission but must be available on request. Structured observer logs and the analysis spreadsheet are standard.

How do I handle a participant who cannot complete a scenario? Record the failure, note the point at which the participant gave up, capture their stated reason, and analyse the finding against the hazard analysis. A failed scenario is valuable data. It reveals a use error that might otherwise have reached the market.

How much should a summative evaluation cost? For a straightforward handheld or home-use device, a well-run summative evaluation with fifteen participants, professional recruitment, a realistic simulated environment, and structured analysis typically costs between twenty and fifty thousand euros. Complex clinical devices cost more. This is rarely the place to cut corners.

Formative Usability Evaluation: How to Test Early and Often as a Startup covers the iterative testing that precedes summative.
MDR Summative Usability Evaluation: The Final Validation Test is the anchor post for the final validation step.
EN 60601-1-6 Usability Cross-Reference connects the general usability standard to medical electrical equipment.
Risk Management and Usability Engineering: How They Link shows how test findings feed the EN ISO 14971 risk file.

Sources

Regulation (EU) 2017/745 on medical devices, consolidated text. Annex I §5, §22. Annex II.
EN 62366-1:2015+A1:2020, Medical devices, Application of usability engineering to medical devices. Clauses 5.1, 5.3, 5.7, 5.8, 5.9.
EN ISO 14971:2019+A11:2021, Medical devices, Application of risk management to medical devices.