Zero-Shot Food Image Detection Based on Transformer

SONG Jingru; MIN Weiqing; ZHOU Pengfei; RAO Quanrui; SHENG Guorui; YANG Yancun; WANG Lili; JIANG Shuqiang

doi:10.13386/j.issn1002-0306.2024030027

SONG Jingru, MIN Weiqing, ZHOU Pengfei, et al. Zero-Shot Food Image Detection Based on Transformer[J]. Science and Technology of Food Industry, 2024, 45(22): 18−26. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2024030027.

Citation:

Zero-Shot Food Image Detection Based on Transformer

Graphical Abstract

Graphical Abstract

Abstract

Abstract

As a fundamental task in food computing, food detection played a crucial role in locating and identifying food items from input images, particularly in applications such as intelligent canteen settlement and dietary health management. However, food categories were constantly updating in practical scenarios, making it difficult for food detectors trained on fixed categories to accurately detect previously unseen food categories. To address this issue, this paper proposed a zero-shot food image detection method. Firstly, a Transformer-based food primitive generator was constructed, where each primitive contained fine-grained attributes relevant to food categories. These primitives could be selectively assembled based on the food characteristics to synthesize new food features. Secondly, an enhancement component of visual feature disentanglement was proposed in order to impose more constraints on the visual features of unseen food categories. The visual features of food images were decomposed into semantically related features and semantically unrelated features, thereby better transferring semantic knowledge of food categories to their visual features. The proposed method was extensively evaluated on the ZSFooD and UEC-FOOD256 datasets through numerous experiments and ablation studies. Under the zero-shot detection (ZSD) setting, optimal average precision on unseen classes reached 4.9% and 24.1%, respectively, demonstrating the effectiveness of the proposed approach. Under the generalized zero-shot detection (GZSD) setting, the harmonic mean of visible and unseen classes reaches 5.8% and 22.0%, respectively, further validating the effectiveness of the proposed method.

FullText(HTML)

References (33)

Supplements (1)

Cited By

Science and Technology of Food Industry

Zero-Shot Food Image Detection Based on Transformer

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content