Insights Business| SaaS| Technology How the FDA Regulates Clinical AI: From Approval to Real-World Safety Monitoring
Business
|
SaaS
|
Technology
Jun 22, 2026

How the FDA Regulates Clinical AI: From Approval to Real-World Safety Monitoring

AUTHOR

James A. Wondrasek James A. Wondrasek
How the FDA Regulates Clinical AI From Approval to Real-World Safety

For years the conversation around clinical AI regulation was simple: how do I get my device cleared? The January 2026 Clinical Decision Support guidance from the FDA shifted that conversation from how to get cleared to what happens after clearance, and the answer is not what most people assume.

Over 1,400 AI-enabled medical devices have been authorised by the FDA as of March 2026. The vast majority reached market through a pathway that requires demonstrating similarity to an existing device rather than standalone clinical evidence. Fewer than 15 percent have published real-world outcomes data. Algorithmic drift, the gradual degradation of model performance when deployment conditions change, remains undetectable at scale under the current regulatory infrastructure. The gap between what clinicians believe FDA clearance represents and what the system actually verifies is where patient harm occurs. If you are responsible for evaluating or deploying these tools, understanding that gap — and where it sits within the wider governance picture — is where your work starts.

What Is the FDA’s Total Product Lifecycle Approach, and How Did the January 2026 CDS Guidance Change the Regulatory Landscape?

The FDA’s Total Product Lifecycle approach is the organising philosophy that regulatory oversight must span a device’s entire lifespan, from pre-market design and clearance through post-market surveillance to eventual decommissioning. It treats 510(k) clearance or De Novo authorisation as an early checkpoint, not a finish line.

The framework rests on two practical mechanisms. Good Machine Learning Practice principles, developed with Health Canada and the UK’s MHRA, provide cross-stakeholder guidelines for data management, model validation, and performance monitoring. The Predetermined Change Control Plan lets manufacturers pre-specify planned updates so they do not need a new submission for every model retraining.

Where things changed was the January 2026 CDS guidance, which revised how the FDA interprets the four statutory criteria under the 21st Century Cures Act Section 3060. The criteria themselves, codified at FD&C §520(o)(1)(E), determine whether clinical decision-support software counts as a medical device. The software must not acquire or process medical signals from a device. It must display or analyse medical information about a patient. It must support clinician-directed recommendations with independent review capability. And the basis for its recommendations must be transparent and understandable.

The 2026 guidance made four changes to how those criteria are applied. Time-critical CDS is no longer automatically classified as a device; it is treated as a risk factor instead. Tools that offer only one clinically appropriate option now receive enforcement discretion, provided other criteria are met and the use is not time-critical. The definition of “medical information about a patient” was broadened to include lab results, genetic tests, and peer-reviewed studies rather than only data commonly discussed in practice. And Criterion 4 now requires accessible documentation about the software’s logic and data sources: the more opaque the tool, the more likely the FDA will regulate it.

STAT News characterised the guidance as a deregulatory pivot. But as one horizon scan analysis notes, the CDS pullback should be read alongside the FDA’s January 2025 draft guidance proposing a lifecycle framework for AI-enabled devices. The agency is deregulating one category while building more structured controls for another. We explored where clinical AI sits in the broader regulatory picture in AI in Clinical Settings: From Helpful Tool to Regulated System. And when those controls fail, the question of who is accountable when clinical AI causes patient harm becomes the practical reality that regulation has not yet answered.

What Is the Difference Between Locked and Continuously Learning AI Systems, and Which Model Is Safer for Clinical Use?

Locked AI models have fixed parameters after deployment. Updates happen only through batch retraining and new regulatory submissions. In IMDRF terminology, a model is “in a locked state when changes are not permitted.” Continuously learning systems, by the same IMDRF definition, undergo “training that leads to change of an MLMD with each exposure to data that takes place on an ongoing basis during the operation phase.” In plain terms, the model keeps learning from every new patient case it sees.

There is no universally safer model. Locked AI guarantees reproducibility and regulatory clarity but degrades silently under dataset shift, when the deployment population drifts away from training conditions. Continuous AI can adapt to changing environments but introduces catastrophic forgetting, where retraining on new data causes loss of previously learned capabilities, and creates validation problems because performance becomes a moving target.

The real-world evidence bears this out. IDx-DR, the first autonomous AI diagnostic, was a locked model cleared through the De Novo pathway in 2018. SkinVision, a CE-marked skin cancer detection app, is also a locked model. Both showed significant performance divergence from their controlled testing conditions once deployed across varied clinical settings. We will return to the specific numbers in Section 4, because they are not anomalies; they are what happens when locked models encounter real-world variation. HeartFlow FFRCT, an AI-based coronary artery disease assessment, shows the opposite problem: its continuously updated algorithm faced regulatory friction because its clinical performance could not be determined at a single point in time, the standard that static regulatory frameworks demand.

The PCCP is the FDA’s attempt to create a middle path between these two architectures, and the EU AI Act, mandatory from 2027, classifies medical AI as high-risk but currently lacks a pathway for continuous learning. Neither jurisdiction has solved this. The locked-versus-continuous choice has direct implications for liability when AI causes harm, and financial services has governed adaptive models for decades in ways that healthcare has not yet adopted.

How Does Post-Market Surveillance Work for AI Medical Devices, and Why Have Fewer Than 15% of 1,200 FDA-Cleared Devices Published Real-World Outcomes Data?

Post-market surveillance is the ongoing monitoring obligation after clearance. For AI devices this extends to drift detection, bias monitoring, cybersecurity vigilance, and performance degradation tracking. It is where the TPLC philosophy gets operationalised, and where the gap between philosophy and reality is most visible.

The fewer-than-15-percent statistic reflects a structural problem rather than manufacturer negligence. Most devices reach market through the 510(k) pathway, which requires substantial equivalence to a predicate device rather than new clinical evidence. Post-market publication is not a regulatory requirement. The system does not demand what it does not fund or provide infrastructure for.

The primary surveillance mechanism, mandatory adverse event reporting through MedWatch, is reactive. It captures harm after it occurs rather than detecting degradation before it causes patient injury. As one comprehensive review notes, such reports may miss many AI-specific issues because misdiagnoses or errors may not be recognised as device-related.

The emerging solution, proposed in academic literature and by organisations including the Royal College of Radiologists, is a Green/Amber/Red tiered monitoring framework. Green represents routine operation within validated bounds. Amber means drift indicators have been triggered, requiring internal root cause analysis. Red signals a material safety concern requiring suspension and regulatory notification. For this to work, regulators and manufacturers need to agree on three things: what to measure, how to detect drift, and where the data comes from. None of this infrastructure exists at scale yet.

The Royal College of Radiologists usefully distinguishes post-deployment monitoring, the hospital-side activity, from post-market surveillance, the manufacturer and regulator activity. Both are necessary. Neither is resourced at scale. This surveillance gap is the regulatory dimension of the benchmark-to-bedside accuracy gap, and it is where liability questions crystallise when undetected drift causes harm — the kind of gap that accountability and liability frameworks must eventually close.

What Is Algorithmic Drift and Why Does It Threaten Clinical AI Safety?

Algorithmic drift, also called model drift or dataset shift, is the degradation of model performance when deployment data diverges from training data. The causes are mundane: changing patient demographics, new imaging equipment, evolving clinical practices, emerging disease presentations. The effects are not: a model that was accurate at clearance becomes progressively less so, and the degradation may go undetected.

Drift comes in two forms worth distinguishing. Data drift, or covariate shift, happens when input feature distributions change while the underlying clinical relationships remain stable. If a model was trained on patients averaging 55 years old and the deployment population averages 62, performance degrades even though the clinical relationships have not changed. Concept drift is more insidious: the relationship itself changes, as when pre-COVID symptoms that reliably indicated pneumonia began indicating COVID-19 with the same features.

Catastrophic forgetting is the extreme case for continuously learning systems: new data overrides previously acquired knowledge.

Now the case studies. IDx-DR’s pivotal trial reported 87.2 percent sensitivity and 90.7 percent specificity, but a large German study of 875 patients found that in 26.1 percent of cases the system could not analyse the image at all. The high failure rate came from miotic pupils, a deployment condition the original clearance did not anticipate. When it could analyse images, it matched ophthalmologist grading in only about 54.2 percent of cases. SkinVision, the skin cancer detection app, showed real-world sensitivity ranging from 41 to 83 percent depending on the clinical setting, with specificity of 60 to 83 percent. For every 100 lesions, up to 40 false alarms would occur. Both are locked models. Both drifted. Neither had systematic monitoring in place.

Epic’s sepsis model provides a different cautionary tale. After deployment at more than 100 hospitals, the model achieved only 33 percent sensitivity, missing two-thirds of sepsis cases. This was one documented case at one point in time, not a universal outcome, but it illustrates the scale of what can go wrong without adequate monitoring.

Causal inference for post-market surveillance is an emerging approach that attempts to disentangle true model decay from effective clinical interventions: a model that prompted better treatment might appear to fail at predicting an outcome because that outcome was prevented. Current surveillance systems cannot make this distinction. Drift is the mechanism by which the benchmark-to-bedside accuracy gap widens over time, and financial services has the operational drift detection that healthcare entirely lacks.

What Is a Predetermined Change Control Plan, and How Does It Let AI Developers Update Models Without Resubmitting to the FDA?

A Predetermined Change Control Plan is a regulatory mechanism, finalised by FDA guidance in August 2025, that allows manufacturers to describe planned future algorithm modifications, validation methods, and impact assessments in their initial marketing submission. When approved, bounded updates within the PCCP scope proceed without a new 510(k) or PMA submission.

The PCCP has three components: a Description of Modifications covering what changes are planned, a Modification Protocol detailing how changes are developed and validated, and an Impact Assessment analysing how changes affect safety and effectiveness. The Algorithm Change Protocol is the technical sub-component specifying methods for retraining, adaptation, and validation.

The PCCP is the FDA’s bridge between static-device regulation and the reality of iterative AI development. Without it, every model retraining would trigger re-authorisation, making adaptive AI commercially unworkable.

Only about 10 percent of 2025 AI clearances included an authorised PCCP. The mechanism is new, untested at scale, and does not address unanticipated changes that fall outside the pre-approved scope. The FDA usually takes 90 days to review 510(k) submissions, and it is not clear whether PCCP reviews by technical experts will happen at the same rate. About half of AI/ML SaMD submissions get an Additional Information request that stops the review clock for 30 to 60 days. Building the PCCP into the original submission is the recommended approach; bolting it on later leads to back-and-forth with FDA reviewers.

The PCCP’s insufficiency as a complete answer means healthcare organisations cannot wait. Financial services has mature model change-management governance that healthcare has not yet adopted, and PCCP-regulated updates aim to close the accuracy gap without full resubmission.

How Should a Healthcare Organisation Evaluate and Monitor a Clinical AI Tool Before and After Deployment?

Healthcare organisations deploying clinical AI are effectively self-insuring against drift risk, with minimal regulatory guidance on what to monitor or how. You need your own governance infrastructure, and you need it before the first AI tool goes live.

Before deployment, run shadow testing where the AI tool operates silently against current clinical standards. This establishes baseline performance, error rates, and subgroup accuracy in your actual patient population. Demand from vendors published real-world outcomes data, not just pivotal trial results, plus documented performance across demographics relevant to your patients, a Software Bill of Materials for cybersecurity assessment, and a clear post-market surveillance plan.

After deployment, implement tiered monitoring with pre-defined performance thresholds. As discussed in Section 3, the Green/Amber/Red framework gives you a template: Green for routine operation, Amber when drift indicators trigger investigation, Red when safety concerns require suspension. Continuous monitoring should detect emerging biases by analysing outputs across diverse patient populations.

The HAIRA Maturity Model provides a five-level framework spanning seven governance domains, from organisational structure through monitoring maintenance. It uses a weakest-link rule: your overall level is the highest level for which every domain meets the minimum standard. To qualify for Level 3, production deployment, you need a named governance body with decision rights, documented internal validation including subgroup and fairness checks, and live monitoring with an incident-reporting pathway. Most hospitals employ predictive models but only half assess them for bias and two-thirds for accuracy. The OPTICA Tool offers the most comprehensive single evaluation framework covering the full AI lifecycle, and the NIST AI Risk Management Framework provides a complementary governance structure built around four functions: govern, map, measure, and manage.

These frameworks are the operational response to the accuracy gap between benchmarks and bedside performance. The evaluation and monitoring you put in place are what shape liability outcomes when AI causes harm, and financial services model risk management provides a mature template that healthcare has not yet adopted.

How Does the EU AI Act and MDR Compare to the FDA’s Approach to Clinical AI Regulation?

The EU regulates medical AI through two overlapping frameworks: the Medical Device Regulation (MDR 2017/745), requiring CE marking through notified-body conformity assessment with clinical benefit demonstration, and the AI Act (2024/1689), mandatory from 2027, classifying medical AI as high-risk.

The structural differences are fundamental. The FDA uses a centralised agency review. The EU uses a distributed notified-body system. The FDA’s 510(k) pathway relies on substantial equivalence to an existing predicate. The EU MDR requires direct proof of clinical benefit through a Clinical Evaluation Report. The FDA’s PCCP provides a pathway for iterative AI updates. The EU MDR currently treats AI as static software with no equivalent mechanism.

The practical implications for manufacturers targeting both markets are significant. Transitioning an FDA-cleared Class IIb software application into the EU typically takes 12 to 18 months and costs between €170,000 and €382,000. The EU AI Act carries penalties up to €30 million or 6 percent of global revenue, far more punitive than FDA enforcement. And the AI Act requires that training and testing data sets come from multiple clinical sites reflecting variations across ages, genders, and ethnicities.

HeartFlow FFRCT, introduced in Section 2, is the paradigmatic case of adaptive AI confronting static EU regulation. Because its algorithm was constantly updating, its clinical performance could not be determined at a single point in time as the MDR requires. The FDA’s PCCP pathway was designed specifically to accommodate this kind of system. The EU has no answer for it yet, though the AI Act may eventually pave the way for specialised pathways for continuous learning AI.

The burden on healthcare organisations is not unique to the US regulatory model. The EU’s approach creates the same gap through different mechanisms. Neither system has operationalised lifecycle surveillance for adaptive AI. This places clinical AI regulation in the wider global governance picture, and again, financial services governance models transcend these jurisdictional boundaries in ways healthcare governance has not yet matched.

The FDA’s Total Product Lifecycle philosophy is the right idea: regulation must span a device’s entire lifespan, not end at clearance. The operational infrastructure to deliver on that philosophy does not exist. Algorithmic drift cannot be systematically detected under current regulatory architecture in any jurisdiction. The PCCP is an innovative partial answer, but with 10 percent adoption and no capacity to handle unanticipated changes, it cannot close the gap alone.

What this means for you: if you are deploying clinical AI, you are the de facto safety net. The governance infrastructure across healthcare offers the frameworks to build what you need — but the question is no longer whether a device has been cleared. It is who is watching it now, and with what tools.

Frequently Asked Questions

Is FDA-cleared the same thing as FDA-approved?

No, and the distinction is important. FDA clearance refers to the 510(k) pathway where a device is found substantially equivalent to an existing predicate, without requiring new clinical evidence. FDA approval refers to the PMA pathway for high-risk devices, which does require independent clinical data demonstrating safety and effectiveness. About 95 to 97 per cent of AI medical devices are cleared not approved, meaning most have reached market by demonstrating similarity rather than standalone clinical benefit.

How long does it take to get an AI medical device cleared by the FDA?

Most AI medical devices use the 510(k) pathway, which typically takes 90 to 180 days from submission to FDA decision. However, the preparatory phase (documentation, testing, predicate device analysis) often spans 12 to 18 months beforehand. De Novo classification for novel devices takes longer, and PMA approval for high-risk AI can exceed two years. Including a Predetermined Change Control Plan in the initial submission adds additional review time.

Does an AI diagnostic tool need to prove it works better than a doctor to get cleared?

No. Under the 510(k) pathway used by 95 to 97 per cent of AI medical devices, the manufacturer only needs to demonstrate substantial equivalence to an existing predicate device. The FDA does not require head-to-head comparison against physician performance. This is a common misconception: clearance confirms the tool is similar enough to something already on the market, not that it outperforms standard care or even matches it in real-world conditions.

What should I ask my doctor if they are using AI in my care?

Ask three questions: what is the AI tool doing with my data, was it trained on people like me, and who is checking that it works correctly over time. You have a right to know whether an AI system is contributing to clinical decisions about your treatment. If the doctor cannot explain the tool’s intended use, validated population, and monitoring process, that is a reasonable basis to request care without AI involvement.

Can a medical AI device be pulled from the market?

Yes. The FDA can issue a recall, withdraw clearance, or mandate a safety notification if post-market surveillance identifies material risks, including algorithmic drift that compromises diagnostic accuracy. The Total Product Lifecycle framework explicitly includes decommissioning as a regulatory endpoint. In practice, mandatory recalls of AI devices remain rare because the reactive adverse event reporting system often detects problems only after patient harm has already occurred.

Are large language models like ChatGPT regulated by the FDA when used clinically?

It depends on the use case, and this is one of the most contested regulatory questions in 2026. General-purpose LLMs used for administrative tasks like scheduling fall outside FDA oversight. However, if an LLM analyses patient data and generates diagnostic or treatment recommendations, it may meet the medical device definition under the four Cures Act criteria examined in the 2026 CDS guidance. The FDA has not yet issued LLM-specific regulatory guidance.

If a hospital has been using an AI tool for years, how do they know it is still safe?

Without active monitoring, they do not, and this is the core of the algorithmic drift problem. A hospital should demand that the vendor provide real-world performance data for the specific patient population being treated, not just pivotal trial results from years earlier. Internally, the hospital should implement tiered Green/Amber/Red monitoring with defined performance thresholds and run periodic validation against current clinical outcomes to detect degradation before it causes harm.

Do other countries recognise FDA clearance?

No, there is no automatic international recognition of FDA clearance. The EU requires separate CE marking under the Medical Device Regulation, and high-risk AI must also comply with the EU AI Act from 2027. The UK operates its own UKCA regime post-Brexit. The IMDRF promotes international harmonisation, but manufacturers targeting multiple markets confront separate regulatory submissions, divergent evidence requirements, and additional costs estimated at roughly €170,000 to €382,000 per transition.

What actually triggers an FDA investigation of an AI medical device?

The most common trigger is an adverse event report filed through the Mandatory Device Reporting system by a manufacturer, healthcare facility, or clinician. Reports of diagnostic errors, unexplained performance degradation, or patient harm linked to an AI tool can prompt an FDA review. The weakness of this system for AI is that it is reactive: it captures harm after it occurs rather than detecting algorithmic degradation before it causes patient injury.

What is a Software Bill of Materials and why does it matter for clinical AI?

A Software Bill of Materials is a formal inventory of every software component, library, and dependency inside an AI medical device. It matters because clinical AI systems incorporate numerous third-party and open-source components, each a potential cybersecurity vulnerability. When a critical flaw is discovered in a widely used library, the SBOM lets a hospital immediately determine whether its deployed AI tools are affected, rather than waiting for vendor notification.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter