DENG Zhiyang, LIAO Qiang, SHAO Shujuan, et al. Nondestructive Near-infrared Identification of Hawthorn Fruit Cultivars Based on Natural Language Processing[J]. Science and Technology of Food Industry, 2023, 44(22): 249−256. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2023010132.
Citation: DENG Zhiyang, LIAO Qiang, SHAO Shujuan, et al. Nondestructive Near-infrared Identification of Hawthorn Fruit Cultivars Based on Natural Language Processing[J]. Science and Technology of Food Industry, 2023, 44(22): 249−256. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2023010132.

Nondestructive Near-infrared Identification of Hawthorn Fruit Cultivars Based on Natural Language Processing

More Information
  • Received Date: January 31, 2023
  • Available Online: September 17, 2023
  • Hawthorn fruits of different varieties have varied nutritional composition, sensory properties etc., thus required for different processing for product development. Due to the limitations of traditional analytical methods of time-consuming, destructive sample preparation, and high cost ect., non-destructive techniques for variety identification are needed which would benefit for large scale production of foods with hawthorn fruits. In this study, a total of 240 hawthorn fruit samples from four different varieties were subjected for near-infrared spectroscopy analysis and the collected spectral data were pre-processed by different algorithms. In order to achieve non-destructive identification of hawthorn varieties, natural language processing (NLP) model was applied for data analysis, including long short-term memory (LSTM), gated recurrent unit (GRU) neural network, logistic regression, native Bayes, decision trees, and k-nearest neighbors. The results showed that the two deep learning models both had the best discrimination effect on the spectral preprocessed by principal component analysis (PCA) with the accuracy of the validation set and test set reached 99.46%±0.00% and 100%±0.00%. While, the logistic regression model showed excellent discrimination ability for hawthorn fruit spectra but poor discrimination ability for the difference of second order (D2) pretreatment spectra (accuracy of 96.65% in the validation set and 89.58% in the test set). The naive Bayes model also showed excellent discrimination effect on the spectra processed by PCA, and the accuracy of the validation set was 95.65%, and the accuracy of the test set was 95.83%. Results gained in this study confirmed the feasibility of applying NLP to the near-infrared non-destructive identification of hawthorn fruits.
  • [1]
    李丽, 袁建琴, 王文斌. 山楂果肉中多酚闪式提取工艺的研究[J]. 中国酿造,2020,39(5):179−182 doi: 10.11882/j.issn.0254-5071.2020.05.034

    LI L, YUAN J Q, WANG W B. Flash extraction process of polyphenols from hawthorn pulp[J]. China Brewing,2020,39(5):179−182. doi: 10.11882/j.issn.0254-5071.2020.05.034
    [2]
    丰宝田, 赵焕谆. 中国果树志·山楂卷[M]. 北京:中国林业出版社, 1996:16−94

    FENG B T, ZHAO H X. Chinese fruit tree records·Hawthorn part[M]. Beijing:China Forestry Publishing House, 1996:16−94.
    [3]
    李长滨, 牛畅炜, 苏丽, 等. 不同产地山药的近红外鉴别和差异分析[J]. 食品研究与开发,2022,43(15):175−181

    LI C B, NIU C W, SU L, et al. Identification and variance analysis of Chinese Yam from different origins by nearinfrared spectroscopy[J]. Food Research and Development,2022,43(15):175−181.
    [4]
    POREP J U, KAMMERER D R, CARLE R. On-line application of near infrared (NIR) spectroscopy in food production[J]. Trends in Food Science & Technology,2015,46(2):211−230.
    [5]
    YANG H L, ZANG H C, HU T, et al. Classigcation and quantigcation analysis of hawthorn from different origins with near-infrared diffuse reection spectroscopy[J]. Chinese Journal of Pharmaceutical Analysis,2014,34(3):396−401.
    [6]
    张静, 徐阳, 姜彦武, 等. 近红外光谱技术在葡萄及其制品品质检测中的应用研究进展[J]. 光谱学与光谱分析,2021,41(12):3653−3659

    ZHANG J, XU Y, JIANG Y W, et al. Recent advances in application of near-lnfrared spectroscopy for quality detections of grapes and grape products[J]. Spectroscopy and Spectral Analysis,2021,41(12):3653−3659.
    [7]
    ZHANG C, WU W Y, ZHOU L, et al. Developing deep learning based regression approaches for determination of chemical compositions in dry black goji berries ( Lycium ruthenicum Murr.) using near-infrared hyperspectral imaging[J]. Food Chemistry,2020,319:126536. doi: 10.1016/j.foodchem.2020.126536
    [8]
    SHAO Y N, HE Y, BAO Y D, et al. Near-infrared spectroscopy for classification of oranges and prediction of the sugar content[J]. International Journal of Food Properties,2009,12(3):644−658. doi: 10.1080/10942910801992991
    [9]
    TIAN X, WANG Q Y, HUANG W Q, et al. Online detection of apples with moldy core using the VIS/NIR full-transmittance spectra[J]. Postharvest Biology and Technology, 2020, 168:111269.
    [10]
    高荣强, 范世福. 现代近红外光谱分析技术的原理及应用[J]. 分析仪器,2002(3):9−12 doi: 10.3969/j.issn.1001-232X.2002.03.002

    GAO R Q, FAN S F. Principles and applications of modern near infrared spectroscopic techniques[J]. Analytical Instruments,2002(3):9−12. doi: 10.3969/j.issn.1001-232X.2002.03.002
    [11]
    LI X L, YI S L, HE S L, et al. Identification of pummelo cultivars by using VIS/NIR spectra and pattern recognition methods[J]. Precision Agriculture,2016,17(3):365−374. doi: 10.1007/s11119-015-9426-5
    [12]
    安鹏, 曹丹平, 赵宝银, 等. 基于LSTM循环神经网络的储层物性参数预测方法研究[J]. 地球物理学进展,2019,34(5):1849−1858 doi: 10.6038/pg2019CC0366

    AN P, CAO D P, ZHAO B Y, et al. Reservoir physical parameters prediction based on LSTM recurrent neural network[J]. Progress in Geophysics,2019,34(5):1849−1858. doi: 10.6038/pg2019CC0366
    [13]
    ZHONG Z, ZHANG X, YU J X, et al. Deep neural networks for the classification of pure and impure strawberry purees[J]. Sensors, 2020, 20(4):1223.
    [14]
    HONG Z Q, ZHANG C, KONG D D, et al. Identification of storage years of black tea using near-infrared hyperspectral imaging with deep learning methods[J]. Infrared Physics & Technology,2021,114:10366.
    [15]
    陈勇, 吴彩娥, 熊智新. 基于衰减消去蜻蜓算法的小麦粉蛋白质近红外特征波长优选[J]. 食品科学,2022,43(14):219−225 doi: 10.7506/spkx1002-6630-20210608-102

    CHEN Y, WU C E, XIONG Z X. Selection of near infrared wavelengths using attenuation elimination-binary dragonfly algorithm for wheat flour protein content prediction[J]. Food Science,2022,43(14):219−225. doi: 10.7506/spkx1002-6630-20210608-102
    [16]
    王燕南. 基于深度学习的说话人无关单通道语音分离[D]. 合肥:中国科学技术大学, 2017

    WANG Y N. Speaker independent single-channel speech separation based on deep learning[D]. Hefei:University of Science and Technology of China, 2017.
    [17]
    李超凡, 马凯. 基于注意力机制结合CNN-BiLSTM模型的电子病历文本分类[J]. 科学技术与工程,2022,22(6):2363−2370 doi: 10.3969/j.issn.1671-1815.2022.06.028

    LI C F, MA K. Electronic medical record text classification based on attention mechanism combined with CNN-BILSTM[J]. Science Technology and Engineering,2022,22(6):2363−2370. doi: 10.3969/j.issn.1671-1815.2022.06.028
    [18]
    FAN E. Extended tanh-function method and its applications to nonlinear equations[J]. Physics Letters A,2000,277(4):212−218.
    [19]
    YIN X Y, GOUDRIAAN J, LANTINGA E A, et al. A flexible sigmoid function of determinate growth[J]. Annals of Botany,2003,91(3):361−371. doi: 10.1093/aob/mcg029
    [20]
    CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statis tical machine translation[C]//Doha: Conference on Empirical Methods in Natural Language Processing, 2014:1724-1734.
    [21]
    王鹏新, 王婕, 田惠仁, 等. 基于遥感多参数和门控循环单元网络的冬小麦单产估测[J]. 农业机械学报,2022,53(9):207−216 doi: 10.6041/j.issn.1000-1298.2022.09.021

    WANG P X, WANG J, TIAN H R, et al. Yield estimation of winter wheat based on multiple remotely sensed parametersand gated recurrent unit neural network[J]. Transactions of the Chinese Society for Agricultural Machinery,2022,53(9):207−216. doi: 10.6041/j.issn.1000-1298.2022.09.021
    [22]
    SPERANDEI S. Understanding logistic regression analysis[J]. Biochemia Medica,2014,24(1):12−18.
    [23]
    菅小艳. 贝叶斯网基础及应用[M]. 武汉:武汉大学出版社, 2019:19-20

    JIAN X Y. Foundation and application of bayesian networks[M]. Wuhan:Wuhan University Press, 2019:19-20.
    [24]
    周志华. 机器学习[M]. 北京:清华大学出版社, 2016:153−174

    ZHOU Z H. Machine learning[M]. Beijing:Tsinghua University Press, 2016:153−174.
    [25]
    匡芳君. 大数据挖掘与分析在金融领域中的应用研究[M]. 哈尔滨:哈尔滨工业大学出版社, 2020:68−79

    KUANG F J. Research on the application of big data mining and analysis in the financial field[M]. Harbin:Harbin Institute of Technology Press, 2020:68−79.
    [26]
    覃礼堂, 刘树深, 肖乾芬, 等. QSAR模型内部和外部验证方法综述[J]. 环境化学,2013,32(7):1205−1211 doi: 10.7524/j.issn.0254-6108.2013.07.012

    QIN L T, LIU S S, XIAO Q F, et al. Internal and external validtions of QSAR model:Review[J]. Environmental Chemistry,2013,32(7):1205−1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
    [27]
    DONG W J, NI Y N, KOKOT S. A near-infrared reflectance spectroscopy method for direct analysis of several chemical components and properties of fruit, for example, Chinese hawthorn[J]. Journal of Agricultural and Food Chemistry,2013,61(3):540−546. doi: 10.1021/jf305272s
    [28]
    杨暑东. Emoji自然语言处理综述[J]. 计算机应用与软件,2022,39(9):11−20 doi: 10.3969/j.issn.1000-386x.2022.09.002

    YANG S D. Survey on emoji-embedded natural language processing[J]. Computer Applications and Software,2022,39(9):11−20. doi: 10.3969/j.issn.1000-386x.2022.09.002
    [29]
    李华旭. 基于RNN和Transformer模型的自然语言处理研究综述[J]. 信息记录材料,2021,22(12):7−10 doi: 10.3969/j.issn.1009-5624.2021.12.xxjlcl202112004

    LI H X. A review of natural language processing based on RNN and Transformer models[J]. Information Recording Materials,2021,22(12):7−10. doi: 10.3969/j.issn.1009-5624.2021.12.xxjlcl202112004
    [30]
    邵帅斌, 刘美含, 石宇晴, 等. 基于卷积神经网络的乳粉掺杂物拉曼光谱分类方法[J]. 食品科学,2022,43(14):296−301

    SHAO S B, LIU M H, SHI Y Q, et al. Raman spectroscopic classification of adulterants in milk powder samples using convolutional neural network[J]. Food Science,2022,43(14):296−301.
    [31]
    李思奇, 吕王勇, 邓柙, 等. 基于改进PCA的朴素贝叶斯分类算法[J]. 统计与决策,2022,38(1):34−37 doi: 10.13546/j.cnki.tjyjc.2022.01.007

    LI S Q, LÜ W Y, DENG X, et al. Naive Bayes classification algorithm based on improved PCA[J]. Statistics & Decision,2022,38(1):34−37. doi: 10.13546/j.cnki.tjyjc.2022.01.007
    [32]
    白文明, 王来兵, 成日青, 等. 近红外高光谱成像技术在药物分析中的研究进展[J]. 药物分析杂志,2018,38(10):1661−1667

    BAI W M, WANG L B, CHENG R Q, et al. Research advance in pharmaceutical analysis based on near-infrared hyperspectral imaging technique[J]. Chinese Journal of Pharmaceutical Analysis,2018,38(10):1661−1667.
    [33]
    李楚进, 付泽正. 对朴素贝叶斯分类器的改进[J]. 统计与决策,2016(21):9−11

    LI C J, FU Z Z. Improvement of naive Bayes classifier[J]. Statistics & Decision,2016(21):9−11.
    [34]
    田海清. 西瓜品质可见/近红外光谱无损检测技术研究[D]. 杭州:浙江大学, 2006

    TIAN H Q. Nondestructive evaluation of watermelon internal quality byvisible and near-infrared spectroscopy[D]. Hangzhou:Zhejiang University, 2006.
    [35]
    PENG Y F, ZHENG C, GUO S, et al. Metabolomics integrated with machine learning to discriminate the geographic origin of Rougui Wuyi rock tea[J]. NPJ Science of Food,2023,7(1):7−10. doi: 10.1038/s41538-023-00187-1
    [36]
    WANG F Y, YANG J, WANG X X, et al.Chat with chatgpt on industry 5.0:Learning and decision-making for intelligent industries[J]. IEEE/CAA Journal of Automatica Sinica,2023,10(4):831−834. doi: 10.1109/JAS.2023.123552
    [37]
    FLORIDI L, CHIRIATTI M. GPT-3:Its nature, scope, limits, and consequences[J]. Minds and Machines,2020,30(4):681−694. doi: 10.1007/s11023-020-09548-1
  • Cited by

    Periodical cited type(4)

    1. 马琳,祁琪,李雅轩,赵昕. 甜蜜素对果蝇繁殖生长及运动能力的影响. 首都师范大学学报(自然科学版). 2024(04): 36-41 .
    2. 严静,薛秋艳,王旸,陈汶意,谢诗晴,江津津,黎攀,杜冰. 发酵米荞对高脂肪秀丽隐杆线虫的降脂及抗氧化作用. 食品工业科技. 2023(06): 8-15 . 本站查看
    3. 祁少俊,唐延金,张正铎,吴虹,张佳程,秦川,刘锐,高希宝. 补充多种微量元素对高糖饮食大鼠的保护作用. 山东大学学报(医学版). 2023(07): 19-26 .
    4. 文明明,毕洁,贺艳萍,戴煌,张威,舒在习,肖安红. 高糖饮食抑制后代雄性果蝇寿命和育性及其作用机制. 现代食品科技. 2022(10): 9-18 .

    Other cited types(5)

Catalog

    Article Metrics

    Article views (69) PDF downloads (20) Cited by(9)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return