Toggle Main Menu Toggle Search

Open Access padlockePrints

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning

Lookup NU author(s): Emeritus Professor Paul BurtonORCiD, Dr Patricia Ryser-Welch

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© The Author(s) 2022. Published by Oxford University Press. MOTIVATION: In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS: Here, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATION: dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Publication metadata

Author(s): Cao H, Zhang Y, Baumbach J, Burton PR, Dwyer D, Koutsouleris N, Matschinske J, Marcon Y, Rajan S, Rieg T, Ryser-Welch P, Spath J, Herrmann C, Schwarz E

Publication type: Article

Publication status: Published

Journal: Bioinformatics

Year: 2022

Volume: 38

Issue: 21

Pages: 4919-4926

Print publication date: 01/11/2022

Online publication date: 08/09/2022

Acceptance date: 07/09/2022

Date deposited: 14/11/2022

ISSN (print): 1367-4803

ISSN (electronic): 1367-4811

Publisher: Oxford University Press

URL: https://doi.org/10.1093/bioinformatics/btac616

DOI: 10.1093/bioinformatics/btac616

PubMed id: 36073911


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
01KU1905A
01ZX1904A
777111Commission of the European Communities
826078
NCT00001260, 900142
SCHW 1768/1-1

Share