Ontario auditors find doctors' AI note takers routinely blow basic facts

TL;DR

An audit by Ontario’s Office of the Auditor General found that 9 out of 20 AI systems used for medical note-taking regularly fabricated information and missed key details. The evaluation suggests serious concerns about AI accuracy in healthcare settings, with potential risks for patient safety.

The Ontario Office of the Auditor General has reported that nine out of 20 AI note-taking systems approved for healthcare providers routinely produce inaccurate or fabricated information in patient records. This finding raises concerns about the reliability of AI tools used in critical medical documentation, impacting patient safety and clinical decision-making.

The audit evaluated 20 AI-based medical note systems approved for use across Ontario’s healthcare sector. During simulated assessments, evaluators discovered that nine systems fabricated details, such as suggesting treatments or symptoms not discussed in the original recordings. Additionally, 12 systems inserted incorrect medication information, and 17 missed key mental health details discussed during consultations. The evaluation process involved comparing AI-generated notes with original doctor-patient recordings, revealing significant inaccuracies and omissions.

Despite these issues, OntarioMD, a support group for physicians adopting new technologies, recommends manual review of AI notes, but the audit notes that none of the approved systems include mandatory accuracy attestations. The report also criticizes the evaluation process, noting that only 4 percent of the scoring focused on medical accuracy, with other criteria like vendor presence and security measures weighted more heavily. The report warns that such imbalanced evaluation metrics could lead to selecting vendors whose AI tools may produce biased or unreliable records, posing risks to patient safety and data security.

Why It Matters

This report underscores potential risks associated with deploying AI tools in healthcare, especially when accuracy and safety are paramount. Erroneous or fabricated medical records can lead to misdiagnoses, inappropriate treatments, or overlooked health issues, directly affecting patient outcomes. The findings raise questions about the oversight and regulation of AI in critical health applications and highlight the need for stricter evaluation standards to ensure patient safety and data integrity.

Plaud Note AI Voice Recorder, Note Taker w/Case, App Control, Transcribe & Summarize with AI, Support 112 Languages, for Meetings, Calls, Lectures, Professionals, Teams, Black, Non-Pro Version

Plaud Note AI Voice Recorder, Note Taker w/Case, App Control, Transcribe & Summarize with AI, Support 112 Languages, for Meetings, Calls, Lectures, Professionals, Teams, Black, Non-Pro Version

Plaud Intelligence: Capture conversations in 112 languages and generate accurate transcripts with the Plaud App and Web. Plaud…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

The use of AI for medical documentation has increased in recent years, with several jurisdictions adopting AI scribe systems to reduce administrative burdens on healthcare professionals. Ontario approved multiple vendors for its AI Scribe program, which was evaluated through simulated doctor-patient recordings. Previous studies and reports have raised concerns about AI accuracy in medical contexts, but this is among the first comprehensive audits to assess the performance of such systems in a public healthcare setting.

Prior to this report, there was limited formal oversight of AI accuracy in Ontario’s health sector, with some officials emphasizing potential benefits but few scrutinizing actual performance. The audit’s findings suggest a disconnect between the perceived utility of AI tools and their real-world reliability, especially in high-stakes environments like healthcare.

“Nine out of 20 evaluated AI systems fabricated information and missed key clinical details, raising concerns about their reliability in medical documentation.”

— Office of the Auditor General of Ontario

“Doctors are advised to review AI-generated notes manually; however, no system currently mandates accuracy verification.”

— OntarioMD representative

“Inaccurate weightings in the evaluation criteria could result in selecting vendors whose AI tools may produce biased or inaccurate medical records.”

— Audit report authors

AI Voice Recorder - AI Note Taking Recording Device, Phone Wallet Card Holder Design, Transcribe & Summarize, APP Control Audio Recorder, Supports 137 Languages, for Lectures, Meetings, Calls (Black)

AI Voice Recorder – AI Note Taking Recording Device, Phone Wallet Card Holder Design, Transcribe & Summarize, APP Control Audio Recorder, Supports 137 Languages, for Lectures, Meetings, Calls (Black)

【Transcription & Summarization】As a top-tier AI recorder, it integrates a variety of leading AI models—including Deepseek, QWen, Llama,…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is still unclear how widespread the use of these AI systems is beyond the evaluated sample, or whether remedial actions are being implemented by Ontario health authorities. The long-term impact of these inaccuracies on patient safety and clinical outcomes remains to be studied. Additionally, the government’s response to the report and whether stricter oversight or new evaluation standards will be adopted are not yet known.

Quicken Willmaker & Trust 2022: Book & Software Kit (Nolo)

Quicken Willmaker & Trust 2022: Book & Software Kit (Nolo)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The Ontario Ministry of Health is expected to review the audit findings and may revise evaluation criteria for AI systems. Further investigations or audits could assess real-world performance and safety. Healthcare providers and AI vendors may also be required to enhance accuracy verification features, including mandatory attestations and improved oversight mechanisms, before broader deployment.

Generative Ai In Clinical Practice: A Physician's Guide To Transforming Medicine

Generative Ai In Clinical Practice: A Physician's Guide To Transforming Medicine

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What specific errors did the AI systems make?

The audit found that nine systems fabricated information, such as symptoms and treatment suggestions not discussed in recordings, while others inserted incorrect medication data or missed key mental health details.

How many physicians are using these AI note-taking tools in Ontario?

More than 5,000 physicians in Ontario are participating in the AI Scribe program, according to the Ministry of Health, with no reported patient harms so far.

Are there regulations to ensure AI accuracy in healthcare in Ontario?

Current regulations are limited, and the audit suggests that evaluation criteria may not adequately prioritize accuracy or safety, prompting calls for stricter oversight.

Will the government change its approach based on this report?

The government has not yet announced specific policy changes, but the audit is likely to prompt review and potential revisions to evaluation standards for AI tools in healthcare.

You May Also Like

Stop Slack/Teams Overload: Notification Settings That Save Your Brain

Great notification settings can reduce overload, but discovering how to customize them effectively will help you stay focused and avoid burnout.

Blue Screen? The First 5 Things to Check (In Order)

Having a blue screen? The first five steps to diagnose the issue could save your system—find out what to check first.

Phone Mount Safety: Where Not to Place It

By avoiding unsafe placement, you can ensure safer driving, but where exactly should you avoid mounting your phone?

For Palantir, AI Is a Product, a Punching Bag–and a Problem

Palantir treats AI as a core product but faces internal and external criticism, highlighting its complex role in the company’s strategy and reputation.