AI hallucinations in therapy notes — patterns, examples, and how to catch them

Reviewed by TherapyScribes EditorialUpdated June 15, 2026Facts verified June 15, 2026Methodology

Why hallucination is a bigger problem in therapy than primary care Primary-care visits have anchors — vitals, labs, exam findings, structured chief complaint. The model has objective facts to align with. Therapy sessions are loose, language-heavy, and built on subjective experience. A fluent-sounding fabrication has very little to disconfirm it. That is why ambient scribes built for medicine often hallucinate more in therapy than therapy-first tools do, and why even the therapy-first tools still get it wrong sometimes.

The clinician remains accountable for the final record. AI removes typing; it does not remove judgment.

The five hallucination patterns specific to therapy

1. Plausible-but-wrong direct quotes The model assigns a quotation to the client that paraphrases the gist but uses words the client did not say. Often the wording is more articulate or more clinical than the client's actual speech.

❌ Client stated, "I feel a profound sense of inadequacy in my role as a parent." ✅ Client described feeling like a "bad mom" when her son refused dinner.

Rule: if you do not specifically recognize a quoted sentence, delete the quotes (keep the paraphrase) or rewrite it.

2. Fabricated MSE elements The model fills MSE fields it should leave blank — most dangerously denying suicidal ideation when SI was not assessed.

❌ "Denies SI/HI. Mood euthymic. No psychotic symptoms reported." When the session never touched on risk or psychosis.

Rule: if you did not ask, the note cannot say the client denied. Configure your scribe to leave unaddressed MSE fields blank, not auto-populate them.

3. Confabulated history Names, dates, family members, jobs, diagnoses the client did not mention this session — pulled either from no source at all or, worse, from the model's memory of a different client's note if your tool uses cross-session context.

Rule: anything specific (name, date, number) that you do not remember saying or hearing needs to be verified or removed.

4. Risk-assessment over-confidence The most dangerous pattern. The model writes "no risk identified" or "low risk" in a session where risk was discussed but not resolved, or writes a richer risk paragraph than the actual conversation supports.

❌ "Risk assessment: low. Client denies SI, HI, plan or intent. Protective factors include family support and treatment engagement." When you and the client had a brief, ambiguous conversation about hopelessness.

Rule: read every risk sentence against your own memory. If the AI's risk paragraph is more confident than your own, rewrite it. Document ambiguity as ambiguity.

5. Treatment-plan invention Interventions described as "applied" or "delivered" that were discussed but not done, or that the model assumed because the goal mentions them.

❌ "Therapist delivered EMDR Phase 4 desensitization on target memory." When you actually did Phase 3 assessment and resource installation.

Rule: verify any intervention named with a specific phase, protocol or technique. Generic process language ("explored," "validated," "reflected") is harder to falsify and lower-risk.

A 90-second review workflow that catches most of this Before signing any note:

1. Skim the risk section first. Is every risk statement something you remember discussing and concluding? If not, rewrite. 2. Scan direct quotes. Anything you do not specifically recognize → unquote or delete. 3. Check MSE for auto-populated fields. Anything denied that was not asked → delete. 4. Check intervention names against what you actually did. Specific protocols / phases / techniques are the high-risk items. 5. Check the plan for invented homework or referrals.

This takes about 90 seconds on a normal note and 3 minutes on a complex one. It is the price of using ambient scribes responsibly.

What the better tools do - Limit themselves to summary rather than direct quotation when uncertain (Upheal default behavior). - Flag low-confidence segments for clinician review (Eleos, some Mentalyc templates). - Leave unaddressed sections blank rather than confabulating (configurable in Upheal, Clinical Notes AI). - Surface "added on regeneration" so you can see when re-running the note introduced new facts.

If your current tool does none of these, that is itself a signal.

Documentation hygiene for the AI era - Note in your client record (once, in the intake or treatment plan) that AI-assisted documentation is in use and that the clinician reviews and signs every note. - Keep your audit logs — most BAAs include them on request. - If a hallucination ever reaches a chart, correct it through your standard amendment process; do not edit silently.

The goal is not zero AI involvement. The goal is a record where every signed sentence is one the clinician would have written.

AI hallucinations in therapy notes — patterns, examples, and how to catch them

The five hallucination patterns specific to therapy

1. Plausible-but-wrong direct quotes The model assigns a quotation to the client that paraphrases the gist but uses words the client did not say. Often the wording is more articulate or more clinical than the client's actual speech.

2. Fabricated MSE elements The model fills MSE fields it should leave blank — most dangerously denying suicidal ideation when SI was not assessed.

3. Confabulated history Names, dates, family members, jobs, diagnoses the client did not mention this session — pulled either from no source at all or, worse, from the model's memory of a different client's note if your tool uses cross-session context.

4. Risk-assessment over-confidence The most dangerous pattern. The model writes "no risk identified" or "low risk" in a session where risk was discussed but not resolved, or writes a richer risk paragraph than the actual conversation supports.

5. Treatment-plan invention Interventions described as "applied" or "delivered" that were discussed but not done, or that the model assumed because the goal mentions them.

A 90-second review workflow that catches most of this Before signing any note:

Continue reading

Upheal vs Mentalyc — head-to-head for therapy practices

Best AI scribe for private-practice therapists in 2026

SOAP vs DAP vs BIRP vs GIRP — formats explained, with examples

The five hallucination patterns specific to therapy

1. Plausible-but-wrong direct quotes The model assigns a quotation to the client that paraphrases the gist but uses words the client did not say. Often the wording is more articulate or more clinical than the client's actual speech.

2. Fabricated MSE elements The model fills MSE fields it should leave blank — most dangerously denying suicidal ideation when SI was not assessed.

3. Confabulated history Names, dates, family members, jobs, diagnoses the client did not mention this session — pulled either from no source at all or, worse, from the model's memory of a *different* client's note if your tool uses cross-session context.

4. Risk-assessment over-confidence The most dangerous pattern. The model writes "no risk identified" or "low risk" in a session where risk was discussed but not resolved, or writes a richer risk paragraph than the actual conversation supports.

5. Treatment-plan invention Interventions described as "applied" or "delivered" that were discussed but not done, or that the model assumed because the goal mentions them.

A 90-second review workflow that catches most of this Before signing any note:

Continue reading

Upheal vs Mentalyc — head-to-head for therapy practices

Best AI scribe for private-practice therapists in 2026

SOAP vs DAP vs BIRP vs GIRP — formats explained, with examples

3. Confabulated history Names, dates, family members, jobs, diagnoses the client did not mention this session — pulled either from no source at all or, worse, from the model's memory of a different client's note if your tool uses cross-session context.