Toggle Main Menu Toggle Search

Open Access padlockePrints

Interpretable Yield Prediction of Supercritical CO2Extraction from Various Essential Oil Sources Using Optimized Machine Learning and PCA-Based Descriptors

Lookup NU author(s): Dr Jie ZhangORCiD

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

© 2025 American Chemical SocietyPredicting essential oil yield in supercritical CO2 (SC–CO2) extraction remains difficult due to variations in plant composition and process conditions. Conventional models often assume uniform feedstock behavior, which limits their applicability across diverse species. This study develops machine learning models that integrate extraction parameters with principal component analysis (PCA)-based molecular descriptors representing the seven major compounds of each essential oil source. A data set of 1313 experimental records from 42 plant species was compiled to train three algorithms: LightGBM (LGBMR), HistGradientBoosting (HGBR), and Extra Trees (ETR). The models were optimized using four metaheuristic algorithms to improve their predictive accuracy. All models achieved high predictive performance (R2 > 0.97). The ETR model optimized by a genetic algorithm (ETR-3PCs-GA) attained the highest performance (R2 = 0.9808, root-mean-square error (RMSE) = 0.7802), while the HGBR model with two principal components and GA optimization (HGBR-2PCs-GA) demonstrated superior ability to predict dynamic extraction profiles (RMSE = 0.408). SHapley Additive exPlanations (SHAP) analysis identified pressure and selected PCA coordinates as the most influential features, revealing that both process parameters and molecular composition jointly determine extraction efficiency. The model successfully generalized yield prediction across species and reproduced known process trends, such as the positive effects of pressure and flow rate on yield. The findings also indicate a synergistic effect, whereby the entire molecular profile, not just the most abundant compounds, governs the final yield. This approach demonstrates that integrating molecular-level information with process data can provide transferable, interpretable models for optimizing SC–CO2 extraction of essential oils.


Publication metadata

Author(s): Amar MK, Hentabli M, Touzout N, Bouaziz I, Rahal S, Laidi M, Amrane A, Hanini S, Zhang J, Saeed MF, Jamal A

Publication type: Article

Publication status: Published

Journal: Journal of Chemical Information and Modeling

Year: 2026

Volume: 66

Issue: 1

Pages: 194-215

Print publication date: 12/01/2026

Online publication date: 15/12/2025

Acceptance date: 02/12/2025

ISSN (print): 1549-9596

ISSN (electronic): 1549-960X

Publisher: American Chemical Society

URL: https://doi.org/10.1021/acs.jcim.5c02171

DOI: 10.1021/acs.jcim.5c02171

PubMed id: 41392587


Altmetrics

Altmetrics provided by Altmetric


Share