Browse by author
Lookup NU author(s): Emeritus Professor Paul BurtonORCiD, Dr Patricia Ryser-Welch
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
© The Author(s) 2022. Published by Oxford University Press. MOTIVATION: In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS: Here, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATION: dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Author(s): Cao H, Zhang Y, Baumbach J, Burton PR, Dwyer D, Koutsouleris N, Matschinske J, Marcon Y, Rajan S, Rieg T, Ryser-Welch P, Spath J, Herrmann C, Schwarz E
Publication type: Article
Publication status: Published
Journal: Bioinformatics
Year: 2022
Volume: 38
Issue: 21
Pages: 4919-4926
Print publication date: 01/11/2022
Online publication date: 08/09/2022
Acceptance date: 07/09/2022
Date deposited: 14/11/2022
ISSN (print): 1367-4803
ISSN (electronic): 1367-4811
Publisher: Oxford University Press
URL: https://doi.org/10.1093/bioinformatics/btac616
DOI: 10.1093/bioinformatics/btac616
PubMed id: 36073911
Altmetrics provided by Altmetric