Live machine translation using LLMs (formally referred to as: "Simultaneous Machine Translation") is a challenging task because it involves translating sentences while they are being uttered. Imagine a live translator that hears a person say "I went to the bank". Should the translator anticipate that the word 'bank' is a financial institution? What if the person completes the sentence as "..to catch fish"? While current LLMs 'anticipate' what's coming next, and translate upcoming words before they are spoken, this 'proactive' approach often leads to hallucinations, committing to incorrect translations based on flawed guesses. The project aims to create new methodologies for live machine translation utilising existing neural and statistical models. The team is co-led by Dr. Aditya Joshi, an expert in natural language processing (NLP), and Dipankar Srirag, a PhD student, in the NLP research group.

The ideal student will have strong programming skills in NLP libraries (such as Pytorch, HuggingFace) and NLP/deep learning techniques. Completion of an academic course in NLP (similar to COMP6713 at UNSW) would be highly regarded.

School

Computer Science and Engineering

Research Area

Natural language processing | Artificial intelligence

Suitable for recognition of Work Integrated Learning (industrial training)?

No

The student will be a part of the natural language processing (NLP) research group consisting of postdocs, software engineers and PhD students. The student will have access to typical computing facilities at UNSW.

  1. Methodology that utilises mechanisms such as uncertainty gating to achieve the tradeoff between speed and accuracy in Simultaneous Machine Translation.
  2. Benchmarking on standardised datasets (such as MuST-C).
  3. Well-documented code and programmer manuals
  4. Report that is in a form suitable for a research paper