• EI
  • Scopus
  • 中国科技期刊卓越行动计划项目资助期刊
  • 北大核心期刊
  • DOAJ
  • EBSCO
  • 中国核心学术期刊RCCSE A+
  • 中国精品科技期刊
  • JST China
  • FSTA
  • 中国农林核心期刊
  • 中国科技核心期刊CSTPCD
  • CA
  • WJCI
  • 食品科学与工程领域高质量科技期刊分级目录第一方阵T1
中国精品科技期刊2020
韩浩,胡梦雅,王珍珍,等. 代谢组学与机器学习相结合用于大蒜的产地溯源[J]. 食品工业科技,2024,45(24):1−8. doi: 10.13386/j.issn1002-0306.2024020220.
引用本文: 韩浩,胡梦雅,王珍珍,等. 代谢组学与机器学习相结合用于大蒜的产地溯源[J]. 食品工业科技,2024,45(24):1−8. doi: 10.13386/j.issn1002-0306.2024020220.
HAN Hao, HU Mengya, WANG Zhenzhen, et al. Metabolomics Combined with Machine Learning for Geographical Origin Tracing of Garlic[J]. Science and Technology of Food Industry, 2024, 45(24): 1−8. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2024020220.
Citation: HAN Hao, HU Mengya, WANG Zhenzhen, et al. Metabolomics Combined with Machine Learning for Geographical Origin Tracing of Garlic[J]. Science and Technology of Food Industry, 2024, 45(24): 1−8. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2024020220.

代谢组学与机器学习相结合用于大蒜的产地溯源

Metabolomics Combined with Machine Learning for Geographical Origin Tracing of Garlic

  • 摘要: 以云南、山东、河南、安徽和江苏5个产地的200份紫皮大蒜为原料,基于气相色谱-质谱联用(gas chromatography-mass spectrometry,GC-MS)代谢物数据,使用主成分分析(PCA)和偏最小二乘判别分析(partial least squares-discriminant analysis,PLS-DA)进行代谢组学分析,采用最大最小归一化(min-max scaler,MMS)、标准差标准化(standard scaler,SS)和标准正态变量变换(standard normalized variate,SNV)三种预处理方法分别建立了随机森林(random forest,RF)、支持向量机(support vectormachine,SVM)、XGBoost以及卷积神经网络(convolutional neural network,CNN)、长短期记忆神经网络(long short term memory,LSTM)对大蒜产地进行分类判别。结果表明,不同产地的大蒜中共筛分到66种代谢物,基于PLS-DA筛选出12种差异代谢物,涉及到6条代谢通路:分别是缬氨酸、亮氨酸和异亮氨酸生物合成、半乳糖代谢、氰氨基酸代谢、乙醛酸和二羧酸代谢、甘氨酸、丝氨酸和苏氨酸代谢、D-氨基酸代谢。在5种机器学习模型中,LSTM表现最佳,其在三种预处理方法下的测试集准确率均为100%。本研究基于代谢组学和机器学习LSTM相结合,在识别大蒜产地方面具有很高的准确性和可靠性,为大蒜产品的溯源提供了可靠的技术手段。

     

    Abstract: Metabolomics analysis was conducted using gas chromatography-mass spectrometry (GC-MS) data, employing principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA). Two hundred samples of purple-skinned garlic from five regions (Yunnan, Shandong, Henan, Anhui, and Jiangsu) were analyzed. Three preprocessing methods encompassing maximum-minimum normalization (MMS), standardization of standard deviation (SS), and standard normal variate transform (SNV) were used. Five machine learning models including random forest (RF), support vector machine (SVM), XGBoost, convolutional neural network (CNN), and long short-term memory neural network (LSTM) were utilized to classify and discriminate the origin of garlic. The analysis revealed 66 metabolites screened across garlic samples from different origins, with 12 differential metabolites identified through PLS-DA analysis. These metabolites were associated with 6 metabolic pathways: valine, leucine, and isoleucine biosynthesis, galactose metabolism, cyanoamino acid metabolism, glyoxylate and dicarboxylic acid metabolism, glycine, serine, and threonine metabolism, D-amino acid metabolism. Among the five machine learning models, LSTM exhibited the best performance, achieving 100% accuracy on the test set across three preprocessing methods. This study based on the combination of metabolomics and LSTM, a type of machine learning, demonstrates high accuracy and reliability in identifying the origin of garlic, providing a reliable technical means for tracing the source of garlic products.

     

/

返回文章
返回