Automatic Diverse Subset Selection From Enzyme Families by Solving the Maximum Diversity Problem

Atallah, C; James, K; Ou, Z; Skelton, J; Markham, D; Finnigan, J; Charnock, S; Wipat, A

doi:10.1109/CIBCB55180.2022.9863021

Automatic Diverse Subset Selection From Enzyme Families by Solving the Maximum Diversity Problem

Lookup NU author(s): Christian Atallah, Dr Katherine James ORCiD, Zhen Ou, Dr David Markham, Professor Anil Wipat

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.

Abstract

Enzymes are being increasingly exploited in various industries for their potential as biocatalysts. Increasing the portfolio of available and useful biocatalysts depends on the reliable annotation of enzyme catalytic function. However, the required quality of such annotation can only be confidently guaranteed through experimental characterisation in the laboratory. The selection of catalytically diverse enzyme panels for experimentally characterisation is therefore an important step for shedding light on the currently unannotated proteins in enzyme families. As current selection methods lack efficiency and scalability, and are non systematic, we present a novel approach for the automatic selection of subsets from enzyme families. A tabu search algorithm solving the maximum diversity problem for sequence identity was designed and implemented, and applied on three diverse enzyme families. We show that this approach automatically selects panels of enzymes that contain high richness and relative abundance of the known catalytic functions, and outperforms other methods such as k-medoids.