AI tools are taking over the repetitive, high-volume, low-judgement layer of MedTech regulatory work — pre-categorising safety database reports, drafting technical documentation, running literature searches, flagging discrepancies across files. In the best cases they save 80% or more of the manual time on structured tasks. They do not take over the parts that require judgement, clinical reasoning, or final adjudication of a safety signal, and they introduce a new failure mode: after the tool gets ten results right in a row, humans stop double-checking, and the eleventh one slips through. For a startup, the right question is not whether to use AI in regulatory work, but how to integrate it so the time savings are real and the safety net still holds.

By Tibor Zechmeister and Felix Lenhard. Last updated 10 April 2026.


TL;DR

  • AI tools are now doing the repetitive layer of MedTech regulatory operations — vigilance triage, documentation drafting, literature review, gap analysis, consistency checks — at a speed and cost that no startup can match with human-only workflows.
  • In one real case, an AI-assisted pre-categorisation of a safety database saved more than 80% of the manual time compared with the previous Excel-based process.
  • The obligations under MDR Articles 10 and 83–92 do not change when AI enters the workflow. The manufacturer remains fully responsible for every vigilance decision, every PMS output, and every file that leaves the QMS.
  • The main risk is complacency. Once a tool gets ten results right in a row, humans start rubber-stamping the eleventh. A regulatory workflow that assumes the tool is always right is a workflow without a safety net.
  • The correct operating model is AI as first draft, expert as final adjudicator — never the reverse. The human-in-the-loop rule is not ceremonial; it is the safety net the Regulation expects.

Disclosure. Tibor Zechmeister is the founder of Flinn.ai. This post discusses Flinn.ai as one example within a growing category of AI tools for regulatory work. The aim is a fair description of what the category does well and where it fails, not a product pitch. Where claims are specific to Flinn, they are labelled as such. Where the points apply to the category as a whole, they apply to any comparable tool.


The people who quit over Excel sheets

One of the MedTech companies that eventually became a Flinn.ai customer lost two team members in quick succession. Both of them were technically capable, both were well-paid, and both walked out for the same reason. Their job was safety database monitoring: pulling reports from the vigilance database, categorising each one as death, injury, or malfunction, copying the result into an Excel sheet, and moving to the next. For hundreds of reports. Every week.

The sentence both of them used on the way out was some version of the same thing: "I did not go to university to copy-paste in Excel sheets."

The company had a problem that is becoming universal in MedTech. The regulatory workload under MDR has grown in every direction — more post-market surveillance, more vigilance reporting, more technical documentation, more clinical evaluation updates, more literature monitoring. The people qualified to do that work did not train for years to spend their days on structured copy-paste. And the people willing to do structured copy-paste for long stretches do not have the qualifications the Regulation requires.

This is the gap AI tools are filling. Not the judgement parts. The grinding parts.

The problem AI is solving: the grinding layer

If you look at what regulatory operations actually consist of in a small MedTech company, a large fraction of the hours is spent on work that is structured but not simple. Reading incoming safety reports and bucketing them. Checking whether a given document in the technical file references the current version of a standard. Running a literature search on a clinical topic and tagging each hit for relevance. Comparing two versions of a risk file to spot changes. Drafting the first version of a PMS report that will then be reviewed by a qualified person.

None of that work is creative. All of it is required. Under MDR Articles 83 through 92, a manufacturer must maintain a post-market surveillance system, evaluate incoming data, report serious incidents and field safety corrective actions, produce PSURs or PMS reports depending on class, and feed everything back into the QMS and the technical documentation. MDCG 2025-10 (December 2025) describes the PMS system in practical terms, and MDCG 2023-3 Rev.2 (January 2025) covers the vigilance Q&A that shapes how incidents are classified and reported.

The grinding layer is not optional. It is where the Regulation meets the calendar. And it is the layer that eats most of the qualified regulatory hours in a small company — hours that could be going into the work only humans can do.

What AI actually does well

Across the category of AI tools for regulatory work, the things that work reliably today sit in a narrow but valuable band.

Pre-categorisation of structured inputs. Feeding a stream of vigilance reports through a model that tags each one with a draft category — serious incident, non-serious, malfunction, no device relationship — is the kind of task current models do well. In the Flinn customer case, this single step saved more than 80% of the manual time the team had been spending on the same work in Excel.

Literature review triage. Pulling a set of results from a PubMed or equivalent query and ranking them for relevance to a specific clinical question is structured enough that a model can produce a useful first pass. A human still reads the top results; the model saves the hours that went into reading the bottom ones.

Gap analysis across documents. Checking whether a technical file cites the current version of a harmonised standard, whether a risk control is mirrored in the instructions for use, whether a PMS plan matches the PMS report — these cross-document consistency checks are the kind of work humans do badly because it is tedious, and models do reasonably well because it is pattern-matching.

First-draft documentation. A model can produce the first draft of a PMS report, a CER update, or a deviation narrative from structured inputs. The draft is never the final version. It is a starting point that replaces a blank page, which in documentation work is often half the battle.

Discrepancy flagging. The most useful mode in a regulatory workflow is AI as a flagger. The tool scans a file, marks anything that looks inconsistent or out of date, and routes the flag to a human for adjudication. The human decides. The tool only raises a hand.

Felix's summary of the current state, from the interviews: AI maintains documentation, flags discrepancies, runs questionnaires, increases speed, maintains quality, reduces costs. Used this way it is a genuine shift in what a two-person regulatory team can accomplish. The biggest single waste in MDR consulting is documentation work that is repetitive and structured — and this is exactly the waste AI removes.

What AI cannot do (and must not be allowed to do)

The things AI cannot do today, and should not be allowed to do even when it gets better, sit in a different band entirely.

Final adjudication of a safety signal. Deciding whether a cluster of vigilance reports constitutes a safety signal that triggers a field safety corrective action is a judgement call with patient-safety consequences. The Regulation puts that decision on the manufacturer — specifically, on the person responsible for regulatory compliance under Article 15 and the quality management system under Article 10(9). No tool takes that seat.

Clinical reasoning. Deciding whether a clinical evaluation meets the state of the art under Article 61, whether an equivalence claim is defensible, whether a residual risk is acceptable under EN ISO 14971:2019+A11:2021 — these are interpretive decisions that require a qualified human. A model can draft the text; it cannot hold the responsibility.

Risk-benefit determination. The risk-benefit analysis required throughout the MDR, from GSPR conformity to clinical evaluation to post-market surveillance, is the kind of judgement that regulators expect a qualified person to make with their name attached to it. A tool can surface the inputs. The decision is human.

Regulatory strategy. Classification decisions, conformity assessment route selection, Notified Body interaction strategy — none of this is automatable in any meaningful sense, because the right answer depends on context the tool does not have and trade-offs the tool cannot weigh.

The rule Felix keeps coming back to is simple. Current good practice is AI flags issues, expert adjudicates. It must never be the other way around. An expert who rubber-stamps AI output is not adjudicating; they are laundering a machine decision through a human signature.

The complacency risk: the eleventh result

Here is the failure mode Tibor watches for in every AI-assisted workflow, including at Flinn customers.

The tool gets the first result right. Then the second. Then the third. By the tenth correct pre-categorisation, the human reviewer is no longer really reviewing. They are clicking "approve" because every previous one was fine. The eleventh result is wrong — a serious incident mis-tagged as a non-serious malfunction, a safety-relevant literature hit marked as not relevant — and nobody catches it, because the review stopped being a review several results ago.

This is a well-documented pattern in human-automation interaction research. It is called automation complacency, and it shows up in every domain where humans supervise reliable automated systems. In aviation it is trained against. In clinical AI it is studied. In regulatory operations it is, at the moment, largely informal.

The complacency risk is what makes AI tools in regulatory work different from AI tools in most other business contexts. In sales or marketing or customer support, the cost of the eleventh wrong result is a bad email. In vigilance it can be a missed safety signal that affects patients. The calibration of how much attention the reviewer gives the tool's output has to account for this — explicitly, in the SOP, not just in people's heads.

Concrete countermeasures a startup can build in:

  • Mandatory spot-check rate. Even when the tool is right ninety-nine times in a row, a fixed percentage of outputs is re-checked from scratch by a human who has not seen the tool's suggestion.
  • Structured override logging. Every time the human disagrees with the tool is logged with the reason. The log is reviewed monthly for patterns — both where the tool is systematically wrong and where humans are systematically rubber-stamping.
  • Rotation. The same reviewer does not supervise the same tool workflow for long stretches. Fresh eyes re-establish critical distance.
  • Pre-defined escalation triggers. Anything that matches a serious-incident pattern gets flagged to a second reviewer regardless of what the tool says.

None of these are novel. All of them are the basic mechanics of keeping a human-in-the-loop system actually in the loop.

Where humans must stay in the loop

The way to think about this for a small MedTech company is to draw a line between the work that can be AI-assisted and the work that must be human-final.

AI-assisted (first draft, triage, flagging, consistency checks): - Vigilance report pre-categorisation - Literature review triage - Cross-document consistency checks - Regulatory change monitoring (new MDCG, new standards, new Commission guidance) - First drafts of PMS reports, CER updates, deviation narratives - QMS document gap analysis

Human-final (judgement, adjudication, sign-off): - Safety signal determination and FSCA decisions - Clinical evaluation conclusions - Risk-benefit determinations - Classification and conformity assessment strategy - Any document that goes to the Notified Body with a signature on it - Any vigilance report that goes to the competent authority

MDR Article 10 puts the manufacturer's obligations on the manufacturer. Articles 83 to 92 put the post-market obligations on the manufacturer. No AI tool moves that responsibility anywhere else. The signature at the bottom of the page is still a human one, and the person signing owns what is above it.

How to integrate AI tools responsibly

A startup introducing an AI tool into regulatory operations should do it the same way it would introduce any other critical supplier or software. The specifics depend on the tool, but the scaffolding is the same.

  1. Scope the use case narrowly at first. Pick one workflow — vigilance triage, literature review, document gap analysis — and run the tool only there until you understand its failure modes.
  2. Qualify the tool in the QMS. Treat it as software used in the QMS. Document what it does, what it does not do, how outputs are reviewed, and how failures are handled. This is not optional under EN ISO 13485:2016+A11:2021.
  3. Validate against a known dataset. Before going live, run the tool on a batch of previously-processed data where the correct answers are known. Measure the agreement rate. Understand where it disagrees and why.
  4. Write the SOP around the tool, not the other way around. The SOP says what the human does before, during, and after the tool runs — including the mandatory spot-check rate and the override logging.
  5. Review quarterly. The tool changes. The inputs change. The Regulation changes. The integration needs to be re-reviewed on a fixed cadence, not only when something breaks.
  6. Keep the exit ready. If the tool stops being available, or stops being good enough, the workflow must be runnable without it. Never let a tool become the single point of failure for a regulatory process.

This is not exotic. It is the same discipline any other critical software-as-a-service inside a QMS demands.

The broader category beyond Flinn

Flinn.ai is one example, and it has the specific history that Tibor founded it partly in response to the gap this post describes. The broader category includes other specialised tools for literature surveillance, eQMS platforms that have added AI features for document review, generic large language model tools configured for regulatory use, and in-house tools that MedTech teams are building themselves against model APIs.

All of them face the same constraints. All of them are most useful in the same narrow band — structured, repetitive, reviewable tasks — and all of them introduce the same complacency risk if the human-in-the-loop discipline is not built into the workflow. The question a startup should ask is not which brand name to buy. It is which specific bottleneck in its own regulatory operations costs the most hours and whether any tool — Flinn, a competitor, or an in-house build — reliably removes those hours while leaving the human adjudication intact.

Fit matters more than feature lists. A tool that saves 80% of the time on a workflow you do not currently have is worth nothing. A tool that saves 50% on the workflow that is eating your two most expensive people is worth everything.

The Subtract to Ship angle

From a Subtract to Ship perspective, AI tools in regulatory work are exactly the kind of move the framework approves of. They subtract the repetitive, low-judgement layer of work from the critical-path regulatory hours. They let a two-person regulatory team do the work of a four-person team, without hiring two more people whose first reaction to Excel copy-paste is to quit.

But Subtract to Ship is not about subtracting required work. The four passes — Purpose, Classification, Evidence, Operations — cut the work that does not trace back to a specific MDR obligation. AI tools cut the time spent on work that does. Both kinds of subtraction are legitimate. Neither kind touches the obligations themselves.

If the AI tool is used to skip a required review step, that is not subtraction. That is cutting compliance, and the framework does not allow it. The test stays the same: every activity that remains must trace to a specific MDR article or annex. AI changes how fast you do the activity. It does not change whether you have to do it.

Reality Check — Where do you stand?

  1. What is the single workflow in your regulatory operations that consumes the most qualified-person hours per week, and is it a structured task an AI tool could pre-process?
  2. Have you lost anyone — or come close to losing anyone — because the work you are asking them to do does not match their qualifications?
  3. If you already use an AI tool in regulatory work, what is the mandatory spot-check rate written into your SOP, and can you prove it is actually being done?
  4. If the AI tool disappeared tomorrow, could the workflow still run? How long would it take to fall back to the manual process?
  5. For every AI-assisted decision in your workflow, is there a named human who adjudicates and signs?
  6. Have you validated the tool against a known dataset before going live, and do you have the validation record in your QMS?
  7. When was the last time you reviewed the override log to look for drift in either direction — tool getting worse or humans getting lazier?

Frequently Asked Questions

Does the MDR allow AI tools in regulatory operations? The MDR does not prohibit the use of AI tools by manufacturers in their own regulatory operations. What the Regulation requires is that the manufacturer meets its obligations — including the obligations under Articles 10 and 83 through 92 — and that the QMS under EN ISO 13485:2016+A11:2021 covers how any software used in regulatory processes is qualified and controlled. If those conditions are met, AI tools are a legitimate productivity layer. If they are not, the tool is not the problem; the QMS is.

Can an AI tool make vigilance reporting decisions? No. Vigilance reporting decisions — whether an event is a serious incident, whether it triggers reporting to the competent authority, whether it requires a field safety corrective action — are manufacturer decisions under MDR Articles 87 through 89 and must be made by a qualified person. An AI tool can triage, draft, and flag. The decision and the signature stay with the human.

How much time can an AI tool realistically save? It depends on the workflow. For highly structured tasks like vigilance pre-categorisation, real-world savings of 80% or more have been observed. For tasks that are less structured, the savings are smaller. The honest answer is that the savings are large where the work is repetitive and shrink quickly as the work requires more judgement. Pilot the tool on your specific workflow before promising any number to the board.

Is Flinn.ai the only tool in this category? No. Flinn.ai is one example and the one this post disclosed. There are other specialised tools, eQMS vendors adding AI features, and in-house builds. The post is about the category, not the brand. The right tool is the one that fits the specific bottleneck in your operations and integrates cleanly with your QMS.

What is the biggest mistake startups make when adopting AI in regulatory work? Skipping the human review step once the tool looks reliable. Automation complacency is the biggest documented failure mode in human-AI systems across every domain, and regulatory operations are not an exception. The mandatory spot-check rate, the override logging, and the rotation are not bureaucracy — they are the safety net that keeps the eleventh result from being the one that hurts someone.

Sources

  1. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, Article 10 (general obligations of manufacturers), Articles 83 to 92 (post-market surveillance, vigilance, and market surveillance). Official Journal L 117, 5.5.2017, consolidated text.
  2. MDCG 2023-3 Rev.2 — Questions and Answers on vigilance terms and concepts as outlined in Regulation (EU) 2017/745, revision 2, January 2025.
  3. MDCG 2025-10 — Guidance on post-market surveillance of medical devices and in vitro diagnostic medical devices, December 2025.
  4. EN ISO 13485:2016 + A11:2021 — Medical devices — Quality management systems — Requirements for regulatory purposes.
  5. EN ISO 14971:2019 + A11:2021 — Medical devices — Application of risk management to medical devices.

This post is part of the AI, Machine Learning & Algorithmic Devices series in the Subtract to Ship: MDR blog. Authored by Felix Lenhard and Tibor Zechmeister. Tibor is the founder of Flinn.ai; the post discusses the tool as one example within a category and applies the same subtraction discipline to AI tooling that the rest of the blog applies to the Regulation itself.