Automating collateral histories in dementia: Development and proof‑of‑concept evaluation of the LUMEN conversational AI

Harrison, JR; Robertson, A; Tang, SL; Kaur, L; Poole, M; Mullin, D; Robertson, E; De, Silva, P; Collis, T; Huang, L; Blackburn, D; Meinert, E; Liang, H; Taylor, JP

doi:10.1016/j.inpsyc.2026.100221

Automating collateral histories in dementia: Development and proof‑of‑concept evaluation of the LUMEN conversational AI

Lookup NU author(s): Dr Judith Harrison ORCiD, Alex Robertson, Dr Marie Poole ORCiD, Tom Collis, Liting Huang, Professor Edward Meinert ORCiD, Dr Huizhi Liang ORCiD, Professor John-Paul Taylor ORCiD

Downloads

Published version [.pdf]

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Abstract

© 2026 The Author(s). Published by Elsevier Inc. on behalf of International Psychogeriatric Association. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/Background: Collateral histories from carers are central to dementia diagnosis but are often collected inconsistently and variably documented. With rising demand on memory services and the emergence of disease-modifying therapies requiring timely diagnosis, there is increasing need for structured and efficient assessment approaches. Conversational AI powered by large language models (LLMs) may support standardised collateral history acquisition while maintaining clinician oversight. We developed LUMEN, a stakeholder-informed prototype designed to generate structured collateral summaries for clinical review. Methods: A five-stage patient, public and professional involvement programme (approximately 232 participants) co-designed the question set, interface and outputs. Seven open-source LLMs were benchmarked; Qwen3-30B-A3B was selected to generate structured summaries from interview transcripts. Six clinician-authored vignettes representing Alzheimer’s disease, dementia with Lewy bodies, vascular dementia, frontotemporal dementia, mild cognitive impairment and normal cognition were used to generate 54 synthetic dialogues (27 clinician role-played, 27 GPT-4 generated). Diagnostic categories were assigned using a deterministic rule-based rubric applied to structured summaries. Two clinicians independently rated each dialogue. Outcomes included exploratory evaluation of alignment with diagnostic categories measured by area under the receiver operating characteristic curve (AUROC) and Cohen’s κ, and System Usability Scale (SUS) scores. Results: In this small synthetic vignette-based dataset, macro-average AUROC was 0.95; these values reflect performance under closed-loop proof-of-concept conditions rather than real-world diagnostic accuracy. Discrimination was highest for Alzheimer’s disease and vascular dementia (AUROC = 1.00 in this synthetic dataset) and lowest for mild cognitive impairment (AUROC = 0.77). Agreement between categories assigned by the rule-based rubric and averaged clinician ratings was κ = 0.88 (95% CI 0.83–0.93). Mean SUS score was 78.1/100. Conclusions: In a small, closed-loop synthetic proof-of-concept dataset, this LLM-assisted, rubric-based pipeline showed that structured summaries could be processed reproducibly by the rubric and separated diagnostic categories under controlled conditions. These findings do not show real-world diagnostic performance. Further evaluation is required to determine clinical usefulness, robustness and workflow impact.

Publication metadata

Author(s): Harrison JR, Robertson A, Tang SL, Kaur L, Poole M, Mullin D, Robertson E, De Silva P, Collis T, Huang L, Blackburn D, Meinert E, Liang H, Taylor JP

Publication type: Article

Publication status: Published

Journal: International Psychogeriatrics

Year: 2026

Pages: Epub ahead of print

Online publication date: 25/05/2026

Acceptance date: 08/05/2026

Date deposited: 01/06/2026

ISSN (print): 1041-6102

ISSN (electronic): 1741-203X

Publisher: Elsevier

URL: https://doi.org/10.1016/j.inpsyc.2026.100221

DOI: 10.1016/j.inpsyc.2026.100221

Data Access Statement: The synthetic carer–patient dialogues and the corresponding struc tured summaries that underpin this analysis are available on reasonable request from the corresponding author. These data do not contain real patient information.

Altmetrics

Altmetrics provided by Altmetric

Funding

Funder reference	Funder name
Alzheimer’s Research UK
iCure Programme
NIHR Newcastle Biomedical Research Centre
NIHR Academic Clinical Lectureship
Royal College of Psychiatrists
UKRI Engineering and Physical Sciences Research Council (EPSRC)

ePrints

Automating collateral histories in dementia: Development and proof‑of‑concept evaluation of the LUMEN conversational AI

Downloads

Licence

Abstract

Publication metadata

Altmetrics

Funding

Share