Automated Testing for Medical Software: Tools and Compliance

Q: Does EN 62304 require IQ/OQ/PQ for test tools?

No. EN 62304 defers to EN ISO 13485 clause 4.1.6 for QMS software validation. That clause mandates a risk-based, proportionate approach. IQ/OQ/PQ is one way to do validation but is not required for test automation tools in a typical SaMD setting.

Quick Summary

Automated testing is fully compatible with EN 62304 and MDR. What changes in the regulated setting is that the CI pipeline, the frameworks, and any tool that decides whether a test passed must be qualified under EN ISO 13485:2016+A11:2021 clause 4.1.6, and every run must leave behind retrievable.

Automated testing is fully compatible with EN 62304 and MDR. What changes in the regulated setting is that the CI pipeline, the frameworks, and any tool that decides whether a test passed must be qualified under EN ISO 13485:2016+A11:2021 clause 4.1.6, and every run must leave behind retrievable records. Get those two things right and you can run a modern DevOps workflow without compromising compliance.

By Tibor Zechmeister and Felix Lenhard.

TL;DR

EN 62304:2006+A1:2015 does not prohibit or penalise automation. It requires that verification activities are planned, executed, and recorded. Automation just makes all three cheaper.
EN ISO 13485:2016+A11:2021 clause 4.1.6 requires validation of software used in the QMS, in production, or for monitoring and measurement, including test automation tools when their results determine acceptance.
Tool qualification does not mean writing a 50-page IQ/OQ/PQ package. It means documenting the tool's intended use, assessing risk, and gathering proportionate evidence that it does what you rely on it to do.
Evidence capture is the part startups forget. A passing test without a retained log is not verification. It is a rumour.
Decide deliberately what to automate and what to keep manual. Usability-critical workflows and exploratory safety checks often belong outside the pipeline.
The cost of a compliant CI pipeline for a small team is measured in days, not months, if you start from the right mental model.

Why this matters

The cultural collision is real. A typical medical software startup has engineers who came from fintech or SaaS and a regulatory consultant who came from Class III implants. The engineers want GitHub Actions, preview environments, feature flags, and deploys on merge. The consultant has seen too many MDR audits go wrong and wants waterfall-shaped documentation wrapped around every change.

Both sides are partly right. The engineers are right that EN 62304 never forbids automation and that notified bodies are comfortable with modern pipelines when the evidence trail is intact. The consultant is right that running tests without thinking about tool qualification and record retention is a direct path to a major non-conformity at the next audit.

The resolution is not a compromise that makes both sides slightly unhappy. It is a setup where the automation does more than the manual workflow ever did, and generates better evidence while doing it.

What MDR and the standards actually require

The MDR route runs through Annex I §17.2, which requires software to be developed "in accordance with the state of the art" with proper verification and validation. That points at EN 62304:2006+A1:2015 for the lifecycle activities.

EN 62304 clause 5.1.11 requires the manufacturer to document the methods, tools, and configuration management to be used, clause 5.5 (unit verification), 5.6 (integration testing), and 5.7 (system testing) require plans, execution, and records. Clause 8 covers configuration management. Every build, test, and release must be reconstructible.

None of those clauses say "you may not use CI". What they say is: whatever you use, plan it, run it, and record it.

The second pillar is EN ISO 13485:2016+A11:2021. Clause 4.1.6 states that the organisation shall apply "a risk-based approach to the validation of computer software used in the quality management system" and that such validation shall be "proportionate to the risk associated with the use of the software", clause 7.6 extends this to software used for monitoring and measurement, a test automation tool, when its output is the basis for releasing software, falls squarely inside 4.1.6.

That points at EN 62304:2006+A1:2015 for the lifecycle activities.

The phrase that matters is risk-based and proportionate. This is the clause that lets you avoid writing a full validation protocol for every npm package you depend on. You validate what matters, in proportion to its role.

Tool qualification without theatre

Founders often imagine tool qualification as a massive IQ/OQ/PQ exercise inherited from pharmaceutical manufacturing. For test automation in a small SaMD team, it is much lighter.

A pragmatic qualification record for a single tool contains:

Intended use. One paragraph. "We use pytest version X.Y to execute unit verification activities under EN 62304 clause 5.5. Test outcomes (pass/fail) contribute to the release decision for our Class B SaMD."
Risk assessment. What happens if the tool misbehaves? For pytest, a false pass could let a defect through. For a linter, a false pass is lower-risk because the output is not directly tied to release.
Qualification approach. Based on the risk, describe how you gain confidence. For pytest, relying on a well-known version, running a smoke test suite that intentionally includes known-failing tests, and pinning the version in a lockfile is often sufficient.
Evidence. Record the version, the lockfile entry, and the results of the smoke check. Date and sign.
Change control. When the tool version changes, repeat the check and update the record.

A startup with a modern pipeline typically ends up with qualification records for: the test runner (pytest, jest, or equivalent), the integration test framework, the end-to-end test tool (Playwright or Cypress), the CI system itself, and any custom reporting or gating scripts. That is five to ten short documents, not fifty.

importantly, you do not qualify the underlying libraries your product code imports. Those are SOUP under EN 62304 clause 8 and are handled through the SOUP evaluation process, not through 4.1.6 tool validation. The distinction matters. Mixing them up leads to unnecessary paperwork.

Evidence capture: the part that trips teams up

Evidence is the deliverable. A test that passed last Tuesday without a retained record does not exist for audit purposes. The rules for evidence are simple and unforgiving.

Records must be retrievable. If a reviewer asks for the system test results from release 2.4.1 shipped eight months ago, you must be able to produce them. This is why relying purely on ephemeral CI logs is dangerous, CI providers rotate logs, change pricing tiers, and occasionally lose data, export critical test outputs to durable storage.

Records must be linked to the release. A test log that you cannot tie to a specific version of the software is not useful evidence. The release record in the DHF should reference the test artefacts by identifier, hash, or URL, and those artefacts should include the version under test.

Records must be protected from unauthorised change. EN ISO 13485 clause 4.2.5 requires records to remain legible, readily identifiable, and retrievable. Storing them in an append-only bucket with access controls is sufficient. Storing them in a wiki anyone can edit is not.

Records must be retained for the required period. For medical devices, the usual retention period is the device lifetime plus a defined number of years (commonly five or longer, depending on the QMS and national requirements). The retention rule applies to test evidence too.

The simplest compliant setup: on every release-tagged CI run, export test reports (JUnit XML, coverage reports, Playwright traces) to object storage with write-once semantics, name them by release tag, and reference them from the release record in your document control system, done.

A worked example

A six-person SaMD team is building a web-based Class IIa triage decision support tool. They use GitHub Actions for CI, pytest for unit tests, Playwright for end-to-end tests, and Dependabot for dependency updates.

Their compliance setup:

Tool qualification records for GitHub Actions (as the CI orchestrator), pytest, Playwright, and a small internal script that generates the release evidence bundle. Four documents, two pages each.
On every commit, unit tests run. Results stay in GitHub Actions for seven days. Sufficient for development but not for audit evidence.
On every release tag, a dedicated workflow runs the full test suite, generates JUnit XML and Playwright traces, and uploads them to an S3 bucket with object lock enabled. The bucket name, object key, and sha256 hash are written to the release record.
Traceability matrix in the DHF links each software requirement to test case IDs. Test case IDs are the names of the test functions in the codebase. A reviewer can go from requirement to test function to archived test result in under a minute.
What is not automated: usability validation under EN 62366-1:2015+A1:2020 (done with real users, recorded separately), and two test scenarios involving PDF rendering that were unreliable in automation. Those run from a signed manual test script before each release.

Total compliance overhead beyond what the team would have done anyway: roughly two weeks of documentation work, then ongoing maintenance of perhaps an hour per release. The pipeline produces better records than any manual process the team had before.

The Efficiency Lens

1. Automate what deserves automation. Unit tests, API contract tests, regression checks, and most end-to-end flows. Automation here pays for itself within weeks.

2. Keep manual what deserves manual. Clinician-facing usability workflows, tests that require real hardware interaction, and exploratory safety checks. Manual does not mean undocumented. It means a signed test script with a recorded result.

3. Qualify tools proportionately. Two pages per tool is usually enough. Focus effort on tools whose output gates a release decision.

4. Treat evidence as a first-class output. Every release-gated CI run must produce durable, retrievable, version-linked artefacts. If your CI provider disappeared tomorrow, you should still have the evidence.

Review and update the tool qualification records with version bumps.

5. Wire traceability through test IDs. If your test function names or test case IDs appear in both the traceability matrix and the archived test results, the audit trail is already in place.

6. Review and update the tool qualification records with version bumps. Dependabot can bump the library version automatically. A human still has to update the qualification record when a new major version is adopted.

Bottom Line

Can you list every tool in your CI pipeline whose output contributes to a release decision, and point to a qualification record for each?
Are your release-gated test results stored somewhere that will outlive your current CI provider contract?
Does each release record in your DHF contain a link or hash to the test evidence for that specific version?
Can you reconstruct the test results for a release from twelve months ago in under thirty minutes?
Does your SOP distinguish clearly between SOUP components (clause 8 of EN 62304) and validated tools (clause 4.1.6 of EN ISO 13485)?
When a tool version changes, does someone actually update the qualification record, or does it drift silently?
Are manual test scripts signed and retained with the same rigour as automated test evidence?
If a reviewer asked "how do you know pytest reported the right result?", do you have a two-sentence answer backed by a document?

Frequently Asked Questions

Does EN 62304 require IQ/OQ/PQ for test tools?

No, EN 62304 defers to EN ISO 13485 clause 4.1.6 for QMS software validation, that clause mandates a risk-based, proportionate approach. IQ/OQ/PQ is one way to do validation but is not required for test automation tools in a typical SaMD setting.

Can we use cloud CI services like GitHub Actions for regulated software?

Yes. There is no regulatory rule against cloud CI. You must qualify the service for its intended use and ensure that evidence is retained outside the service if the service itself does not provide long-term retention that meets your record-keeping requirements.

Does every dependency in our lockfile need to be qualified?

No. Runtime dependencies of the product are SOUP under EN 62304 clause 8, handled through SOUP evaluation. Development dependencies and test tools are handled under clause 4.1.6 only when their output contributes to quality decisions.

What about AI-based test generation tools?

Same rule: if the tool's output contributes to release decisions, qualify it. The risk assessment should consider that AI-generated tests may be less predictable than hand-written ones and that human review of generated tests is usually part of the qualification approach.

How does this interact with EN 62366-1 usability testing?

Usability validation under EN 62366-1:2015+A1:2020 generally cannot be automated. It must be conducted with representative users in representative use environments. Automated UI tests verify functional behaviour, not usability.

Do we need to version-control the test infrastructure code itself?

Yes. CI configuration files, test scripts, and any custom tooling belong under configuration management per EN 62304 clause 8. Git is sufficient if access controls and change history are preserved.

Agile development under MDR and EN 62304, how modern iterative development fits inside the standard.
DevOps for medical software under MDR, the broader pipeline view, including deployment and monitoring.
Validating QMS software tools under MDR, the clause 4.1.6 workflow applied beyond just test tools.
MDR software test strategy with EN 62304, the strategy document that sits above your automation.
API Design for Medical Devices: Regulatory Considerations Under MDR

Sources

Regulation (EU) 2017/745 on medical devices, consolidated text. Annex I §17.2.
EN 62304:2006+A1:2015. Medical device software. Software life cycle processes. Clauses 5.1.11, 5.5, 5.6, 5.7, 8.
EN ISO 13485:2016+A11:2021. Quality management systems for medical devices. Clauses 4.1.6, 4.2.5, 7.6.
EN 62366-1:2015+A1:2020. Application of usability engineering to medical devices.

The 10 Most Common MDR Software Audit Findings: EN 62304 Gaps