One of the main challenges in visual question answering (VQA) is properly reasoning relations among visual regions involved in the question. In this paper, we propose a novel neural network for question-guided relational reasoning at multiple scales in VQA, where each image region is enhanced through regional attention. Specifically, we introduce a regional attention module consisting of both soft and hard attention mechanisms to select informative regions of the image based on question-guided evaluations. Different combinations of informative regions are then concatenated with question embeddings across scales to capture relational information. The relational reasoning module extracts question-based relationships among regions, with the multi-scale mechanism enhancing the model’s sensitivity to numbers and its ability to model diverse relationships. Experimental results demonstrate that our approach achieves state-of-the-art performance on the VQA v2 dataset.