Description of field of research:

The research in the visual question answering (VQA) was simulated by connecting the computer vision with the natural language processing. The early proposed techniques to solve the visual question answering is based on the semantic segmentation  and graph parsing. In order to achieve good performance,  most of the existing methods are based on input feature maps from object detection models that are pretrained with the relevant object classes. With the aim to overcome aforementioned challenges, this work focus on development of weakly supervised framework and convolution based transformer for visual question answering on pathology images.


Computer Science and Engineering

Research areas


The main contact will be the supervisor (Imran Razzak), but day-to-day supervision will also be provided by Ph.D. students. Students are expected to skillful in Natural Language Processing.

The expected outcome of this work is an weakly supervised framework using convolution based transformer for medical visual question and answering for pathological images.

  • Urooj, Aisha, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, and Mubarak Shah. "Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8465-8474. 2021.