Toggle Main Menu Toggle Search

Open Access padlockePrints

Automatic Diverse Subset Selection From Enzyme Families by Solving the Maximum Diversity Problem

Lookup NU author(s): Christian Atallah, Dr Katherine JamesORCiD, Zhen Ou, David Markham, Professor Anil Wipat


Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Enzymes are being increasingly exploited in various industries for their potential as biocatalysts. Increasing the portfolio of available and useful biocatalysts depends on the reliable annotation of enzyme catalytic function. However, the required quality of such annotation can only be confidently guaranteed through experimental characterisation in the laboratory. The selection of catalytically diverse enzyme panels for experimentally characterisation is therefore an important step for shedding light on the currently unannotated proteins in enzyme families. As current selection methods lack efficiency and scalability, and are non systematic, we present a novel approach for the automatic selection of subsets from enzyme families. A tabu search algorithm solving the maximum diversity problem for sequence identity was designed and implemented, and applied on three diverse enzyme families. We show that this approach automatically selects panels of enzymes that contain high richness and relative abundance of the known catalytic functions, and outperforms other methods such as k-medoids.

Publication metadata

Author(s): Atallah C, James K, Ou Z, Skelton J, Markham D, Finnigan J, Charnock S, Wipat A

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Year of Conference: 2022

Pages: 1-9

Online publication date: 26/08/2022

Acceptance date: 15/07/2022

Publisher: IEEE


DOI: 10.1109/CIBCB55180.2022.9863021

Library holdings: Search Newcastle University Library for this item

ISBN: 9781665484633