Multi-label learning, which aims to recognise all the relevant labels in an image, is a fundamental task in computer vision applications, such as scene understanding, surveillance systems and self-driving cars. In real-world applications, multi-label recognition systems should learn tens of thousands of labels, locate them in images, and even deal with many unseen labels. To date, classic multi-label classification methods trained and tested with seen labels are far from fulfilling the requirements for real applications, where plenty of unseen labels exist.

To identify the unseen labels in an image, many multi-label zero-shot learning methods have been recently developed by transferring knowledge between seen and unseen labels. However, most existing methods contain two main issues. First, these methods solely exploit single-modal knowledge by a pre-trained textual label embeddings like GloVe, while ignoring the visual semantic image-text pair information. Second, although such textual label embeddings (e.g., GloVe) handle word labels well, they cannot be easily extended to text labels, thus hindering the flexibility of the models.

To tackle the above issues, in this project, we will introduce two novel contrastive objectives to regularise individual modal training. An inter-modal contrastive objective is designed to mitigate the modal gap, by performing cross-modal contrasts using centralised data in the local training phase, which complements for the information of the absent modality in uni-modal clients. To bridge the task gap, an intra-modal contrastive objective is proposed to contrast local representations to their corresponding global ones in each modal, regularising models to head towards the global consensus.


Computer Science and Engineering

Research Area

Computational algorithms | Operational research

The research team will provide two GPU workstations for the student to conduct experiments and collect real-world datasets.

An advanced and innovative mathematical model for multi-view multi-label learning.