Enhancing Large Language Models with Multimodality and Knowledge Graphs for Hallucination-free Open-set Object Recognition

摘要

Open-set object recognition plays a significant role in today’s production and daily life, such as in surface defect detection, biometric identification, and autonomous driving recognition. However, due to the diversity of unknown categories and the complexity of scenarios, existing methods often perform poorly. Therefore, open-set object recognition remains an important and popular research topic. Recently, collaborative utilization of multiple pre-trained Large Language Models (LLMs) has emerged rapidly, which becomes a new research hotspot in addressing open-set object recognition tasks. Among this, a core challenge lies in amplifying the strengths of individual LLMs while mitigating their weaknesses. In this paper, we propose a novel joint framework tailored for open-set object recognition tasks, aiming to more efficiently harness the capabilities of diverse LLMs and Knowledge Graphs (KGs). Initially, for the text data generated by textual LLMs, we use Wikipedia to correct and complete it. Then, we designed a text-image multi-modal fusion method to further correct and complete the text information by utilizing the implicit semantic information in the image. Additionally, we propose some novel designs to alleviate the hallucination issue of LLMs and reduce their instability. Extensive experiments demonstrate that our approach outperforms all the comparison methods.

出版物
VLDB Workshops
刘新富
刘新富
23级博士
巫义锐
巫义锐
青年教授, CCF 高级会员

My research interests include Computer Vision, Artifical Intelligence, Multimedia Computing and Intelligent Water Conservancy.

周玉婷
周玉婷
22级学硕