Toggle Main Menu Toggle Search

Open Access padlockePrints

Stratification of diabetes in the context of comorbidities, using representation learning and topological data analysis

Lookup NU author(s): Dr Dexter CanoyORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2023, The Author(s). Diabetes is a heterogenous, multimorbid disorder with a large variation in manifestations, trajectories, and outcomes. The aim of this study is to validate a novel machine learning method for the phenotyping of diabetes in the context of comorbidities. Data from 9967 multimorbid patients with a new diagnosis of diabetes were extracted from Clinical Practice Research Datalink. First, using BEHRT (a transformer-based deep learning architecture), the embeddings corresponding to diabetes were learned. Next, topological data analysis (TDA) was carried out to test how different areas in high-dimensional manifold correspond to different risk profiles. The following endpoints were considered when profiling risk trajectories: major adverse cardiovascular events (MACE), coronary artery disease (CAD), stroke (CVA), heart failure (HF), renal failure (RF), diabetic neuropathy, peripheral arterial disease, reduced visual acuity and all-cause mortality. Kaplan Meier curves were plotted for each derived phenotype. Finally, we tested the performance of an established risk prediction model (QRISK) by adding TDA-derived features. We identified four subgroups of patients with diabetes and divergent comorbidity patterns differing in their risk of future cardiovascular, renal, and other microvascular outcomes. Phenotype 1 (young with chronic inflammatory conditions) and phenotype 2 (young with CAD) included relatively younger patients with diabetes compared to phenotypes 3 (older with hypertension and renal disease) and 4 (older with previous CVA), and those subgroups had a higher frequency of pre-existing cardio-renal diseases. Within ten years of follow-up, 2592 patients (26%) experienced MACE, 2515 patients (25%) died, and 2020 patients (20%) suffered RF. QRISK3 model’s AUC was augmented from 67.26% (CI 67.25–67.28%) to 67.67% (CI 67.66–67.69%) by adding specific TDA-derived phenotype and the distances to both extremities of the TDA graph improving its performance in the prediction of CV outcomes. We confirmed the importance of accounting for multimorbidity when risk stratifying heterogenous cohort of patients with new diagnosis of diabetes. Our unsupervised machine learning method improved the prediction of clinical outcomes.


Publication metadata

Author(s): Wamil M, Hassaine A, Rao S, Li Y, Mamouei M, Canoy D, Nazarzadeh M, Bidel Z, Copland E, Rahimi K, Salimi-Khorshidi G

Publication type: Article

Publication status: Published

Journal: Scientific Reports

Year: 2023

Volume: 13

Online publication date: 16/07/2023

Acceptance date: 05/07/2023

Date deposited: 01/03/2024

ISSN (electronic): 2045-2322

Publisher: Springer Nature

URL: https://doi.org/10.1038/s41598-023-38251-1

DOI: 10.1038/s41598-023-38251-1

Data Access Statement: The CPRD database used for this study has been approved by an Independent Scientific Advisory Committee (ISAC). The ISAC protocol number for this study is: 16_049. To obtain access to CPRD data, researchers are advised to follow the required procedure on the CPRD website Data Access page (https://www.cprd.com/Data-access). The data supporting this study's findings are available from Clinical Practice Research Datalink (CPRD). The link: https://www.cprd.com/Data explains more in-depth about the nature and accessibility of the data. Furthermore, regarding accessibility, https://www.cprd.com/primary-care explains: “Access to data from CPRD is subject to a full licence agreement containing detailed terms and conditions of use. Patient-level datasets can be extracted for researchers against specific study specifications, following protocol approval from the Independent Scientific Advisory Committee (ISAC).” Thus, restrictions apply to the availability of these data.

PubMed id: 37455284


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
British Heart Foundation
ES/P011055/1
FS/19/36/34346
National Institute of Health Research (NIHR) Oxford Biomedical Research Centre
Oxford Martin School
PG/18/65/33872
UKRI Global Challenge Research Fund

Share