AI can take on the structured, high-volume layer of post-market surveillance — categorising incoming complaints, clustering similar events, flagging duplicates, surfacing early signals that a human reviewer would miss in the noise — and it can do this at a speed a small regulatory team cannot match by hand. What it cannot do is decide whether a cluster is a real safety signal, whether an event is a serious incident under MDR Article 87, or whether a field safety corrective action is warranted. The operating model that works under MDR Articles 83 to 86 is AI as first-pass triage and signal surfacing, qualified human as final adjudicator, with an audit trail that shows who decided what and when. This post walks through the data volume problem, what AI does well in complaint triage, the signal detection methods that hold up at audit, the human-in-the-loop discipline that keeps the system honest, the audit trail expectations Notified Bodies are asking about in 2026, and the mistakes small teams keep making.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.


TL;DR

  • MDR Articles 83 to 86 require a proactive PMS system proportionate to the risk class. Nothing in the Regulation prohibits AI-assisted complaint analysis, and for higher-volume products it is becoming the only way a small team can stay on top of the data.
  • AI does well at the structured triage layer: categorising complaints by event type, clustering similar complaints, flagging likely duplicates, tagging relevance to specific device components, and producing a ranked queue for human review.
  • Signal detection with AI uses clustering, rate-change monitoring against a baseline, and early-warning thresholds. The methods are well-understood; what matters is that they run on a defined cadence with defined escalation rules.
  • The hard rule is that no safety signal and no vigilance reporting decision is made by the tool. MDCG 2023-3 Rev.2 (January 2025) and MDR Articles 87 to 89 put those decisions on a qualified human at the manufacturer.
  • Audit trail expectations are concrete. Who drafted, who reviewed, who approved, with what inputs, on what date, using which version of the tool — the QMS under EN ISO 13485:2016+A11:2021 has to cover all of it.
  • The most common mistake is treating a ranked queue as a sorted answer and letting the bottom of the queue go unread. The second is having no locked baseline to measure signal rates against.

Why this matters — the PMS data volume problem

Classical PMS in a small MedTech team was a process the regulatory lead could hold in their head. Maybe a handful of complaints per month. A service log that fit on one screen. A literature search every quarter. The data volume was low enough that reading every item was realistic, and the team's job was mostly disciplined record-keeping and escalation.

That model does not survive contact with a product that is actually in the field at scale. A diagnostic device with a few hundred installed sites starts producing complaint intake through multiple channels — a customer portal, a field service system, distributor reports, clinician emails, support tickets that sometimes contain a safety-relevant sentence buried in a paragraph about something else. Add MDCG 2025-10 (December 2025) expectations for proactive data collection from a wider range of sources, and the volume of material a PMS system has to ingest grows faster than any small team's reading capacity.

The team in the Flinn customer case Tibor describes elsewhere in this blog — the two people who quit because their job had become Excel copy-paste — were drowning in exactly this problem. Hundreds of reports a week. Every one had to be read, categorised, and logged. The work was necessary and the people doing it were overqualified for the mechanics and underpowered for the volume. Neither of those is a sustainable position.

AI-assisted PMS is the response to that arithmetic. Not because the Regulation changed, but because the data volume in a real product at real scale changed, and the human-only workflow stopped clearing the queue.

For the foundational framework, see what is post-market surveillance under MDR and MDR Articles 83 to 86 explained. For the drift-oriented PMS pattern for AI devices themselves, see post-market surveillance for AI devices.

What AI does well in complaint triage

The band where AI tools are reliable for complaint triage is narrower than the marketing suggests, but it is real, and it maps cleanly to the things a small team spends the most hours on.

Event-type categorisation. Tagging each incoming complaint with a draft category — serious incident candidate, non-serious incident, malfunction, user error, no device relationship, off-topic — is a structured classification task current models handle well. In the Flinn customer case, pre-categorisation of the safety database saved more than 80% of the manual time compared with the previous Excel-based workflow. That is not a universal number, but it is a real one for a specific workflow under specific conditions.

Clustering similar complaints. Grouping complaints that describe the same underlying event pattern — same component, same failure mode, same clinical context — lets a reviewer see twenty related reports as a single cluster rather than as twenty unrelated items in a queue. Clusters are the raw material of signal detection. A tool that clusters reliably cuts the hours spent looking for patterns by eye.

Duplicate detection. The same event reported through two channels, the same clinician filing twice after a follow-up, the same distributor forwarding a complaint that was already logged. Duplicates inflate apparent signal rates and waste reviewer time. A simple similarity check over the text and metadata removes them before the queue reaches the human.

Component and subsystem tagging. Tagging complaints to the specific component, subsystem, or use context they describe turns an unstructured text pile into a structured dataset the reviewer can filter, count, and query. This is the layer that makes rate-change signal detection possible.

Relevance ranking. Producing a ranked queue where the most likely safety-relevant items are at the top and the clearly off-topic items are at the bottom is the single highest-leverage use of AI in PMS triage. The reviewer still reads the queue, but the order matches the risk.

First-draft narratives. For each complaint, a short structured draft — what happened, to whom, with which device, with what outcome — saves the reviewer from writing the same structured sentences repeatedly. The reviewer edits the draft against the raw input. The draft is never the final record.

Felix's summary from the interviews applies here cleanly. AI maintains documentation, flags discrepancies, runs questionnaires, increases speed, maintains quality, reduces costs — and in the PMS context, the "flagging discrepancies" part is the core of the value. The tool reads faster than a human can, and it raises its hand when something looks off.

Signal detection methods that hold up at audit

Signal detection in PMS is the process of recognising, in the flow of incoming data, that something meaningful is happening — a new failure mode, a rising rate, a subgroup of users disproportionately affected, an off-label use pattern that changes the risk-benefit picture. Doing this well is the difference between a PMS system that is a filing cabinet and a PMS system that protects patients.

The methods that hold up in a Notified Body review are the ones that are specified in the PMS plan, run on a defined cadence, and produce traceable outputs.

Rate-change monitoring against a baseline. For each category of event — per component, per failure mode, per clinical context — the PMS plan defines a baseline rate from an agreed reference period. An AI-assisted pipeline computes the current rate on a defined cadence (monthly is common, weekly for higher-volume products) and compares it to the baseline. A pre-defined threshold — for example, a rate increase exceeding a specified statistical bound — triggers formal review. The threshold is set before the signal happens, not after.

Cluster emergence detection. New clusters that were not present in previous periods are a signal worth looking at regardless of absolute rate. A tool that surfaces "this cluster of seven complaints did not exist last month" lets the reviewer investigate a pattern that a rate-based method would miss because the absolute number is still small.

Subgroup drift. Breakdowns by user type, geography, device configuration, or clinical context can reveal a signal concentrated in one subgroup that disappears in the aggregate. The PMS plan specifies which subgroups are monitored and why, based on the hazards identified in the risk file.

Free-text topic surfacing. Running topic extraction over the narrative fields of complaints surfaces phrases and themes that are rising in frequency even when the structured categories have not moved. This is where early weak signals often live — in the sentences people write before there is a clean category for what they are describing.

Cross-reference with literature and regulator data. Tying the internal signal stream to external signals — new relevant publications, Eudamed incident notifications for similar devices, recall databases — is another layer where AI pre-processing helps a small team keep up with sources they could not read exhaustively by hand.

None of these methods replace a human deciding what to do with the signal. They surface material a reviewer can then adjudicate. The adjudication is the regulated activity; the surfacing is the productivity layer underneath it.

For the article-by-article view of the Article 83 to 86 obligations these methods support, see MDR Articles 83 to 86 explained. For how the PMS plan under Annex III structures all of this, see the PMS plan under Annex III.

The human-in-the-loop discipline

The single most important operating rule for AI in PMS is the same one that applies to AI in regulatory documentation and AI in vigilance triage more broadly: AI as first-pass, qualified human as final adjudicator, never the reverse. In a PMS context this has specific meaning.

Named adjudicator per decision. Every signal that enters formal review has a named human who is responsible for the conclusion. Every vigilance reporting decision — whether an event is a serious incident under MDR Article 87, whether an FSCA is warranted under Article 89 — is made by a qualified person at the manufacturer. MDCG 2023-3 Rev.2 (January 2025) is the current operational guidance on the vigilance terminology these decisions turn on, and it assumes a human is applying it.

Full review of the top of the queue. The reviewer reads the items the tool ranked as most likely safety-relevant from the raw input, not from the tool's summary. A draft narrative is a starting point; the record is built from the raw data.

Mandatory spot-check rate on the bottom of the queue. The failure mode to fight is not that the top of the queue is wrong — it is that the bottom of the queue is wrong and nobody reads the bottom. A fixed percentage of items the tool ranked as low-relevance are re-reviewed from scratch by a human on a defined cadence. This is the only way to catch the signal the tool missed before it compounds.

Override logging. Every time the reviewer disagrees with the tool's categorisation, cluster assignment, or ranking, the disagreement is logged with a reason. The log is reviewed periodically for patterns — both places where the tool is systematically weak and places where the reviewer is drifting toward rubber-stamping.

Rotation. The same reviewer does not supervise the same AI-assisted workflow for months on end. Fresh eyes re-establish critical distance. In a small team this is logistically difficult; in a small team it is also where complacency hits fastest.

The eleventh-result problem. The pattern Tibor watches for in every AI-assisted PMS workflow is the same pattern that shows up in every human-automation system. The tool gets ten results right in a row, the reviewer stops really reviewing, and the eleventh result is the one that matters. The countermeasures above exist because this pattern is real and documented, not because it is theoretical.

Audit trail expectations

Under MDR Article 10 the manufacturer is responsible for the PMS system, the technical documentation, and everything the two produce. Under EN ISO 13485:2016+A11:2021 the QMS has to describe how records are created, reviewed, approved, and controlled — including how any software used in the process is qualified. For an AI-assisted PMS pipeline in 2026, the audit trail Notified Bodies are asking about is concrete.

  • Tool identity and version. Which tool produced the triage and ranking, which version was in effect on the date the records were generated. A versioned tool change is itself a change control event in the QMS.
  • Inputs used. Which fields from the source systems were fed into the tool, and which were not. An auditor can ask and the answer has to be in the record.
  • Draft outputs. The tool's raw categorisation, cluster assignments, ranking, and draft narratives — stored as-is, not overwritten by the reviewer's edits.
  • Human review evidence. Who reviewed, on what date, what they changed, what they kept, why. The override log is part of this.
  • Approval and signature. The named human whose signature is on the final record, with role and date.
  • Qualification of the tool. A record in the QMS of the tool's intended use, validation evidence, limits, and the controls that apply when it is used. MDCG 2025-10 (December 2025) expects PMS processes to be described in real operational terms, not abstracted.

A PMS record that shows only the final approved entry, with no evidence of the triage process underneath it, looks fine on first reading and fails a serious Notified Body review. The auditor will ask how the item got to the top of the queue and what happened to the items that did not. The answer has to exist.

Common mistakes

  • Treating the ranked queue as the answer. The queue is an ordering, not a conclusion. Reviewers who read only the top and never sample the bottom are running the tool as a filter, not a triage aid.
  • No baseline to measure against. Without a defined reference period and agreed baseline rates per category, rate-change signal detection has nothing to compare to. "We noticed an increase" is not an auditable statement.
  • Threshold-free monitoring. Dashboards with no pre-defined escalation thresholds create the illusion of oversight. When everything is monitored and nothing triggers action, the system is decorative.
  • Skipping the QMS qualification of the tool. Using a tool that is not described anywhere in the QMS creates an invisible dependency. Describe it, control it, and keep the description up to date.
  • Copying the tool's draft narrative into the record without checking it against the raw complaint. The draft is a starting point. A record that is the draft, unedited, is a record the reviewer did not make.
  • Merging triage and adjudication into one step. The tool categorises, the human adjudicates. A workflow where the human clicks "approve" on the tool's category in the same screen it is shown on is an approval step that will drift toward rubber-stamping the fastest.
  • No fall-back plan. If the tool disappears tomorrow, the PMS process still has to run. A team that cannot operate without the tool is not using the tool well.

The Subtract to Ship angle

From a Subtract to Ship perspective, AI-assisted PMS fits the framework cleanly when it subtracts hours from the structured triage layer of work that MDR Articles 83 to 86 already require. The obligations do not move. The time spent on the triage layer does.

The test remains the same as everywhere else in the framework. Every activity in the PMS plan has to trace to a specific MDR article, annex, or harmonised standard. AI changes how fast each activity runs; it does not change whether the activity is required. A team that uses the tool to clear structured triage faster, and then spends the recovered hours on the adjudication and the risk-file integration and the PMCF analysis — that is subtraction done right. A team that uses the tool to skip the review step once the outputs look reliable is cutting compliance, not cutting waste, and the framework does not allow it.

The lean PMS pipeline for a product with meaningful complaint volume looks like this. One ingestion pathway per source, normalised into a common schema. One AI-assisted triage step that categorises, clusters, deduplicates, and ranks. One human review step with a named adjudicator, a mandatory spot-check rate on the bottom of the queue, and override logging. One signal detection layer with pre-defined baselines and thresholds. One escalation pathway into vigilance assessment under MDCG 2023-3 Rev.2. One feedback loop into the risk management file and the clinical evaluation. One PMS Report under Article 85 for Class I devices or PSUR under Article 86 for Class IIa, IIb, and III that reflects the real data from the period. Nothing in that list is optional. Nothing beyond it earns its place.

Reality Check — Where do you stand?

  1. For your product's actual complaint volume, can a human-only workflow read every incoming item end to end this month? If not, what is the current coping strategy?
  2. If you use an AI tool in triage, is it described in your QMS with an intended use, a validation record, a scope, and a named owner?
  3. Do you have defined baseline rates per event category, and pre-defined thresholds that trigger formal signal review?
  4. Is there a mandatory spot-check rate on the bottom of the ranked queue, and can you prove from the records it is actually being done?
  5. For every vigilance reporting decision in the last quarter, is there a named human adjudicator with a documented rationale, independent of the tool's draft?
  6. If a Notified Body auditor asked how a specific complaint ended up where it did in your process — and what happened to the fifty complaints around it — could you answer from the records without improvising?
  7. If the AI tool disappeared tomorrow, how long before your PMS process fell behind, and is there a documented fall-back workflow?

Frequently Asked Questions

Does MDR allow AI-assisted complaint analysis in the PMS system? Yes, within limits. The MDR does not prohibit AI tools in PMS. What Articles 83 to 86 require is a PMS system that is proactive, proportionate to the risk class, and appropriate for the device, with the manufacturer remaining fully responsible for the outputs. EN ISO 13485:2016+A11:2021 requires the QMS to describe how any software used in the process is qualified and controlled. If those conditions are met, AI-assisted triage is a legitimate productivity layer.

Can an AI tool decide whether an event is a serious incident? No. The determination of whether an event is a serious incident under MDR Article 87, and any subsequent reporting to the competent authority under Articles 87 to 89, is a manufacturer decision made by a qualified human. MDCG 2023-3 Rev.2 (January 2025) is the current operational guidance on the terminology and criteria for that decision. An AI tool can triage, draft, and flag. The decision and the signature stay with the human.

What is the single most important control for AI-assisted PMS? A mandatory spot-check rate on the bottom of the ranked queue, combined with override logging. The failure mode to fight is not that the top of the queue is wrong. It is that the bottom of the queue is wrong and nobody reads the bottom. Without the spot-check, the tool silently becomes a filter that hides the signals that matter most.

How do I set signal detection thresholds without over-fitting to noise? Set them from a defined baseline period where the data is well-understood, document the statistical reasoning in the PMS plan, and re-evaluate the thresholds on a fixed cadence as more data accumulates. Pre-defined thresholds prevent both over-reaction to noise and under-reaction to real signals. The reasoning is written down so a Notified Body reviewer can follow it.

Does the Notified Body need to know that I use AI in my PMS process? Transparency is the right posture. Notified Bodies in 2026 are increasingly familiar with AI-assisted workflows in QMS processes and will look for the tool's qualification in the QMS, the validation evidence, the human review controls, and the audit trail. Describing the tool's role honestly is stronger than hiding it.

What happens if the tool is wrong and a signal is missed? The responsibility is the manufacturer's regardless of the tool. The PMS system has to be designed so that tool errors do not translate directly into missed signals — the spot-check rate, the override logging, the rotation, and the independent human adjudication exist precisely for this. If a signal is missed, the root-cause analysis includes the process controls that should have caught it, and the process is corrected accordingly.

Sources

  1. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices — Article 83 (post-market surveillance system), Article 84 (post-market surveillance plan), Article 85 (PMS Report for Class I), Article 86 (PSUR for Class IIa, IIb, III). Official Journal L 117, 5.5.2017, consolidated text.
  2. MDCG 2025-10 — Guidance on post-market surveillance of medical devices and in vitro diagnostic medical devices. Medical Device Coordination Group, December 2025.
  3. MDCG 2023-3 Rev.2 — Questions and Answers on vigilance terms and concepts as outlined in Regulation (EU) 2017/745 and Regulation (EU) 2017/746. Medical Device Coordination Group, first publication February 2023, Revision 2 January 2025.
  4. EN ISO 13485:2016 + A11:2021 — Medical devices — Quality management systems — Requirements for regulatory purposes.

This post is part of the AI, Machine Learning and Algorithmic Devices series in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. AI-assisted PMS is a productivity layer on top of the Article 83 to 86 obligations, not a way around them — the qualified human at the manufacturer still owns the decisions, and the tool exists to make the human's attention land in the right place.