For summative usability evaluation of a medical device under EN 62366-1:2015+A1:2020, the defensible floor is typically 15 representative users per distinct user group. The users must match the documented intended user profiles, not engineers, not sales staff, not friendly KOLs. Recruitment is where most startups silently fail their usability file.
By Tibor Zechmeister and Felix Lenhard.
TL;DR
- EN 62366-1:2015+A1:2020 clauses 5.7 to 5.9 require summative usability evaluation with representative users who match the documented user profiles in the use specification.
- The industry convention for summative testing is around 15 participants per distinct user group. A cardiologist and a home-care patient are two groups, so two sets of 15.
- Engineers, sales staff, and friendly key opinion leaders are not intended users. They are too familiar with the device to produce valid observations.
- Formative evaluation can use smaller groups to iterate design. Summative evaluation proves that residual use-related risks are acceptable and needs harder rigor.
- Recruitment is the cost driver. Startups that plan recruitment early can combine usability testing with early customer discovery and save both money and time.
- Wrong-user testing produces a usability report that looks complete and then collapses during notified body review, triggering change control after market entry.
Why this matters
Tibor has reviewed usability files where the participant list was five engineers from the same building as the development team. The device was a handheld point-of-care diagnostic intended for community nurses visiting elderly patients at home. The engineers completed every task in under two minutes. They rated the device highly intuitive. The file looked clean. It was not.
A real community nurse, recruited later under change control pressure, took eleven minutes and made three use errors that the engineers never triggered. One of those errors was a misread result. That is a patient safety problem, not a user interface annoyance. The cost of fixing it after the notified body flagged the wrong-user recruitment was several times what a correct summative round would have cost in the first place.
The pattern Tibor sees again and again: startups save money on recruitment and pay for it with change control, delayed market entry, and a damaged relationship with the notified body. EN 62366-1 is not prescriptive about exact numbers, but notified bodies read the same industry conventions founders read, and the convention for summative evaluation is clear.
What MDR actually says
The MDR does not give sample sizes. It tells manufacturers what the usability file must achieve. Annex I §5 of Regulation (EU) 2017/745 requires that devices be designed and manufactured so the risks linked to ergonomic features and the environment of intended use are reduced as far as possible. Annex I §22 requires that the design account for the technical knowledge, experience, education, training and use environment of the intended user.
The operational standard is EN 62366-1:2015+A1:2020, which is the harmonized usability engineering standard referenced by the MDR. Clause 5.1 requires a documented use specification. Clause 5.7 requires a usability evaluation plan. Clause 5.8 covers formative evaluation during design. Clause 5.9 covers summative evaluation of the final user interface.
Summative evaluation under clause 5.9 must be conducted with representative users performing the critical tasks in a realistic or simulated use environment. The output is evidence that residual use-related risks have been reduced as far as possible. That evidence depends entirely on the validity of the participant pool.
Representative means the participant population matches the documented intended user profiles in the use specification. If the use specification says the device is used by home-care nurses aged 25 to 60 with a vocational nursing qualification and no prior experience with this device family, that is who must sit in the summative sessions. An R&D engineer who built the device is disqualified by definition.
A worked example
A Vienna-based startup was building a connected wearable for post-surgical recovery monitoring. The intended users in the use specification were two groups. Group A: patients aged 55 to 80, recovering at home, with limited smartphone experience. Group B: home-care nurses performing check-in visits.
The founders initially planned summative testing with eight participants total, all recruited from a local co-working space. Average age was 32. All were fluent smartphone users. None had recent surgery. The notified body pre-audit rejected this plan with a short written comment: the participants do not match the intended user profile.
The corrected plan recruited 15 patients in the 55 to 80 range through a local outpatient clinic and 15 home-care nurses through a regional nursing agency. Recruitment took four weeks and cost about 9,000 euros including participant compensation, room rental, and observer time. The testing revealed seven distinct use errors in the patient group, three of which required software changes. One was a mislabeled button in the patient app that would have caused missed medication reminders.
The notified body accepted the revised summative round. The company shipped five months later than originally planned but with a clean usability file. Felix has watched this pattern repeat across several coached startups: the time cost of doing recruitment correctly the first time is always smaller than the time cost of doing it wrong and doing it again.
The Subtract to Ship playbook
Recruitment is a process problem, not a research problem. The goal is to get the right users into the right sessions at the lowest defensible cost. The playbook below traces back to clauses 5.7 to 5.9 of EN 62366-1 and to Annex I §22 of the MDR.
Step 1. Lock the use specification first. Before writing any recruitment brief, the use specification must name every distinct intended user group. Cleaners count if they handle the device. Service technicians count if they maintain it. Patients count separately from caregivers. The use specification is the contract that defines who is in and who is out of the participant pool.
Step 2. Set the sample size per group. The working default for summative evaluation in medical device practice is 15 participants per distinct user group. Some notified bodies accept lower numbers with written justification for low-risk devices. Some demand more for Class IIb and Class III. Tibor has never seen a notified body accept fewer than five participants per group for a summative round and has seen several reject eight.
Step 3. Disqualify the forbidden pool. Engineers working on the device, anyone who has seen the user interface more than twice, sales staff, marketing staff, company friends, and coached KOLs are all disqualified for summative work. They may participate in formative rounds where the goal is to explore and iterate, but they cannot be counted toward the summative sample.
Step 4. Recruit through external channels. For home-care users, partner with outpatient clinics, patient advocacy groups, or community health programs. For clinician users, partner with professional associations or staffing agencies. For specialized users such as radiologists, partner with the relevant specialist society. Pay participants a fair compensation. Document the recruitment channel and the compensation in the usability report. Notified bodies read this section.
Step 5. Screen for representativeness. Build a short screening questionnaire that maps directly to the use specification user profile. Age range, profession, training level, prior device experience, language, any assistive needs. Reject candidates who fail the screen. Document the screening criteria and the rejection rate. This is the audit trail that proves the sample was representative.
Step 6. Plan the environment. Summative testing in a simulated environment is acceptable under clause 5.9 if the simulation is realistic. For software-only or mobile applications, a quiet room with the actual device and realistic distractors can be sufficient. For devices used in clinical settings, a simulated clinic or operating room is expected. Employees testing the device in their own office is not summative evaluation.
Step 7. Combine purposes where ethical. Felix has coached startups that recruited summative participants through early customer discovery channels. Done transparently, with informed consent and no sales pressure, some participants become early customers after the test. This is not a trick. It is a practical outcome when you are recruiting real users of a real product.
Reality Check
Answer these honestly before you lock your summative evaluation plan. If you cannot answer yes to all of them, your usability file has a hole.
- Does your use specification name every distinct intended user group, including cleaners, service technicians, and caregivers where applicable?
- Is your summative sample size at least 15 per distinct user group, or do you have written justification from your notified body for a smaller number?
- Have you excluded every engineer, marketer, investor, and friendly KOL from the summative participant pool?
- Are your participants recruited through external channels with a documented screening questionnaire that maps to the use specification?
- Is your test environment either the real use environment or a documented realistic simulation?
- Are participant compensation, recruitment channel, and screening rejection rate recorded in the usability report?
- Can you trace every observed use error from the summative sessions back to a line item in the risk management file under EN ISO 14971:2019+A11:2021?
Frequently Asked Questions
Is 15 participants a hard legal requirement for summative testing? No. Neither the MDR nor EN 62366-1 specifies an exact number. The figure of 15 per user group is an industry convention that notified bodies recognize as defensible. Lower numbers require written justification tied to device risk class and the critical task count.
Can formative evaluation use internal staff? Formative evaluation under clause 5.8 can use a wider range of participants because its purpose is to iterate the design, not to prove residual risk acceptability. Internal staff can contribute to formative work, but the results cannot be carried over into the summative file.
We have one user group but two use environments. How many participants? Two use environments usually imply different hazard-related use scenarios. Tibor typically recommends at least 15 participants per environment unless the notified body accepts combined testing with justification. The use specification should document both environments and the usability plan should reflect them.
Do we need to retest after a minor UI change? If the change affects a task that was part of summative evaluation, retesting that task with representative users is usually required. This is why engaging the notified body early on the summative plan is worth the effort. Retesting costs less than redesign plus retesting.
How long does recruitment take in practice? For typical home-care or clinician groups, four to eight weeks is realistic. For rare specialist groups, such as interventional radiologists working on a specific anatomy, recruitment can take months. Plan recruitment before you finish the final user interface, not after.
Can we use contract research organizations for recruitment? Yes. Reputable human factors CROs handle recruitment, screening, session moderation, and reporting. The cost is higher than self-run recruitment but the documentation is usually cleaner. For startups with no in-house human factors expertise, this trade-off is often worth it.
Related reading
- Risk management and usability engineering link. How EN 62366-1 outputs feed the EN ISO 14971 risk file.
- IEC 60601-1-6 usability cross reference. The bridge between electrical safety and usability for active devices.
- Design validation under MDR and ISO 13485. Why summative usability evaluation is part of design validation, not a separate exercise.
Sources
- Regulation (EU) 2017/745 on medical devices, consolidated text. Annex I §5, Annex I §22, Annex II.
- EN 62366-1:2015+A1:2020, Medical devices, Part 1: Application of usability engineering to medical devices. Clauses 5.1, 5.7, 5.8, 5.9.
- EN ISO 14971:2019+A11:2021, Medical devices, Application of risk management to medical devices.