Skip to content
Safety

AI hallucinations in therapy notes — patterns, examples, and how to catch them

Practical patterns of AI-scribe hallucination specific to therapy documentation, real before/after examples, and a review workflow that catches them.

TherapyScribes Editorial9 min · 770 words
Reviewed by TherapyScribes EditorialUpdated Facts verified Methodology

Why hallucination is a bigger problem in therapy than primary care Primary-care visits have anchors — vitals, labs, exam findings, structured chief complaint. The model has objective facts to align with. Therapy sessions are loose, language-heavy, and built on subjective experience. A fluent-sounding fabrication has very little to disconfirm it. That is why ambient scribes built for medicine often hallucinate more in therapy than therapy-first tools do, and why even the therapy-first tools still get it wrong sometimes.

The clinician remains accountable for the final record. AI removes typing; it does not remove judgment.

The five hallucination patterns specific to therapy

1. Plausible-but-wrong direct quotes The model assigns a quotation to the client that paraphrases the gist but uses words the client did not say. Often the wording is more articulate or more clinical than the client's actual speech.

❌ Client stated, "I feel a profound sense of inadequacy in my role as a parent." ✅ Client described feeling like a "bad mom" when her son refused dinner.

Rule: if you do not specifically recognize a quoted sentence, delete the quotes (keep the paraphrase) or rewrite it.

2. Fabricated MSE elements The model fills MSE fields it should leave blank — most dangerously denying suicidal ideation when SI was not assessed.

❌ "Denies SI/HI. Mood euthymic. No psychotic symptoms reported." When the session never touched on risk or psychosis.

Rule: if you did not ask, the note cannot say the client denied. Configure your scribe to leave unaddressed MSE fields blank, not auto-populate them.

3. Confabulated history Names, dates, family members, jobs, diagnoses the client did not mention this session — pulled either from no source at all or, worse, from the model's memory of a *different* client's note if your tool uses cross-session context.

Rule: anything specific (name, date, number) that you do not remember saying or hearing needs to be verified or removed.

4. Risk-assessment over-confidence The most dangerous pattern. The model writes "no risk identified" or "low risk" in a session where risk was discussed but not resolved, or writes a richer risk paragraph than the actual conversation supports.

❌ "Risk assessment: low. Client denies SI, HI, plan or intent. Protective factors include family support and treatment engagement." When you and the client had a brief, ambiguous conversation about hopelessness.

Rule: read every risk sentence against your own memory. If the AI's risk paragraph is more confident than your own, rewrite it. Document ambiguity as ambiguity.

5. Treatment-plan invention Interventions described as "applied" or "delivered" that were discussed but not done, or that the model assumed because the goal mentions them.

❌ "Therapist delivered EMDR Phase 4 desensitization on target memory." When you actually did Phase 3 assessment and resource installation.

Rule: verify any intervention named with a specific phase, protocol or technique. Generic process language ("explored," "validated," "reflected") is harder to falsify and lower-risk.

A 90-second review workflow that catches most of this Before signing any note:

1. Skim the risk section first. Is every risk statement something you remember discussing and concluding? If not, rewrite. 2. Scan direct quotes. Anything you do not specifically recognize → unquote or delete. 3. Check MSE for auto-populated fields. Anything denied that was not asked → delete. 4. Check intervention names against what you actually did. Specific protocols / phases / techniques are the high-risk items. 5. Check the plan for invented homework or referrals.

This takes about 90 seconds on a normal note and 3 minutes on a complex one. It is the price of using ambient scribes responsibly.

What the better tools do - Limit themselves to summary rather than direct quotation when uncertain (Upheal default behavior). - Flag low-confidence segments for clinician review (Eleos, some Mentalyc templates). - Leave unaddressed sections blank rather than confabulating (configurable in Upheal, Clinical Notes AI). - Surface "added on regeneration" so you can see when re-running the note introduced new facts.

If your current tool does none of these, that is itself a signal.

Documentation hygiene for the AI era - Note in your client record (once, in the intake or treatment plan) that AI-assisted documentation is in use and that the clinician reviews and signs every note. - Keep your audit logs — most BAAs include them on request. - If a hallucination ever reaches a chart, correct it through your standard amendment process; do not edit silently.

The goal is not zero AI involvement. The goal is a record where every signed sentence is one the clinician would have written.

Continue reading