Toggle Main Menu Toggle Search

Open Access padlockePrints

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts

Lookup NU author(s): Dr Dexter CanoyORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2022 The Author(s). Published by Oxford University Press on behalf of the European Society of Cardiology. Aims: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. Methods and results: Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. Conclusion: The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.


Publication metadata

Author(s): Li Y, Salimi-Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M

Publication type: Article

Publication status: Published

Journal: European Heart Journal - Digital Health

Year: 2022

Volume: 3

Issue: 4

Pages: 535-547

Print publication date: 01/12/2022

Online publication date: 21/10/2022

Acceptance date: 22/09/2022

Date deposited: 01/03/2024

ISSN (electronic): 2634-3916

Publisher: Oxford University Press

URL: https://doi.org/10.1093/ehjdh/ztac061

DOI: 10.1093/ehjdh/ztac061

Data Access Statement: Scientific approval for this study was given by the Clinical Practice Research Datalink (CPRD) Independent Scientific Advisory Committee of UK (protocol number 16_049R). Data shared by consenting GP practices is de-identified and does not require individual patient consent for approved research (but individual patients can opt out from data sharing). The accessibility of the data is clearly explained on the website (https://www.cprd.com): ‘Access to data from Clinical Practice Research Datalink is subject to a full licence agreement containing detailed terms and conditions of use. Patient level datasets can be extracted for researchers against specific study specifications, following protocol approval from the Independent Scientific Advisory Committee (ISAC) of UK.’


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
British Heart Foundation
Oxford Martin School
Oxford National Institute for Health and Care Research(NIHR) Biomedical Research Centre
UKRI Global Challenges Research Fund

Share