Deep learning models and large language models (LLMs) are increasingly deployed across critical domains such as transport, supply chain, defence, finance, space, and communications. As their decisions directly affect safety and trust, ensuring responsible and reliable AI has become a top priority for both industry and research.
This project focuses on the testing and verification of neural networks and LLMs to evaluate and strengthen their robustness, a core correctness property for trustworthy AI systems. Robustness ensures that small perturbations in inputs do not lead to significant or unexpected changes in outputs. However, existing models often exhibit vulnerabilities such as unstable behaviour, biased outcomes, or privacy leakage, raising concerns about their reliability and fairness.
To address these challenges, the project will combine fuzz testing, for systematically exploring diverse input scenarios and uncovering hidden failure cases, with formal verification techniques to provide stronger guarantees. We will investigate abstract interpretation methods (e.g., ACT) and optimisation based solvers (e.g., Gurobi) to certify robustness and quantify accuracy variance in deep models. The integration of fuzzing and verification will enable scalable and automated approaches for assessing and improving the trustworthiness of DNNs and LLMs in real world applications.
Computer Science and Engineering
Software testing and analysis
No
- Research environment
- Expected outcomes
- Supervisory team
- Reference material/links
Based on the open source tool: https://github.com/SVF-tools/ACT
- Analyzing and verifying modern AI models