Multi-scale Relational Reasoning with Regional Attention for Visual Question Answering

摘要

One of the main challenges in visual question answering (VQA) is properly reasoning relations among visual regions involved in the question. In this paper, we propose a novel neural network for question-guided relational reasoning at multiple scales in VQA, where each image region is enhanced through regional attention. Specifically, we introduce a regional attention module consisting of both soft and hard attention mechanisms to select informative regions of the image based on question-guided evaluations. Different combinations of informative regions are then concatenated with question embeddings across scales to capture relational information. The relational reasoning module extracts question-based relationships among regions, with the multi-scale mechanism enhancing the model’s sensitivity to numbers and its ability to model diverse relationships. Experimental results demonstrate that our approach achieves state-of-the-art performance on the VQA v2 dataset.

出版物
2020 25th International Conference on Pattern Recognition (ICPR)
巫义锐
巫义锐
青年教授, CCF 高级会员

My research interests include Computer Vision, Artifical Intelligence, Multimedia Computing and Intelligent Water Conservancy.