Toggle Main Menu Toggle Search

Open Access padlockePrints

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Lookup NU author(s): Matt McTeer, Professor Quentin AnsteeORCiD, Professor Paolo MissierORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2024 by the authors.Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.


Publication metadata

Author(s): McTeer M, Henderson R, Anstee QM, Missier P

Publication type: Article

Publication status: Published

Journal: Mathematics

Year: 2024

Volume: 12

Issue: 5

Online publication date: 05/03/2024

Acceptance date: 03/03/2024

Date deposited: 26/03/2024

ISSN (electronic): 2227-7390

Publisher: Multidisciplinary Digital Publishing Institute (MDPI)

URL: https://doi.org/10.3390/math12050777

DOI: 10.3390/math12050777


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
European Union’s Horizon 2020

Share