Mr Xiangyu Zhang

Mr Xiangyu Zhang

Casual Academic
Engineering
Computer Science and Engineering

Xiangyu Zhang is a  PhD student at UNSW Sydney Supervised by Julien Epps and Beena Ahmed. His research interests include Speech and Language Processing, Foundation Models, Machine Learning, and Digital Health. Before starting my PhD at UNSW, he completed my Master's degree at Johns Hopkins University under the supervision of Leibny Paola Garcia. Prior to that, I earned my Bachelor's degree at the University of Western Australia.

  • Journal articles | 2025
    Chen M; Zhang Q; Wang M; Zhang X; Liu H; Ambikairaiah E; Chen D, 2025, 'Selective State Space Model for Monaural Speech Enhancement', IEEE Transactions on Consumer Electronics, http://dx.doi.org/10.1109/TCE.2024.3523297
    Journal articles | 2025
    Zhang X; Zhang Q; Liu H; Xiao T; Qian X; Ahmed B; Ambikairajah E; Li H; Epps J, 2025, 'Mamba in Speech: Towards an Alternative to Self-Attention', IEEE Transactions on Audio, Speech and Language Processing, 33, pp. 1933 - 1948, http://dx.doi.org/10.1109/taslpro.2025.3566210
    Journal articles | 2023
    Shu H; Liang R; Li Z; Goodridge A; Zhang X; Ding H; Nagururu N; Sahu M; Creighton FX; Taylor RH; Munawar A; Unberath M, 2023, 'Twin-S: a digital twin for skull base surgery.', Int J Comput Assist Radiol Surg, 18, pp. 1077 - 1084, http://dx.doi.org/10.1007/s11548-023-02863-9
  • Preprints | 2025
    Zhang X; Ahmed B; Epps J, 2025, Why Pre-trained Models Fail: Feature Entanglement in Multi-modal Depression Detection, http://arxiv.org/abs/2503.06620v1
    Preprints | 2025
    Zhang X; Fang F; Gao P; Qin B; Ahmed B; Epps J, 2025, Distinctive Feature Codec: Adaptive Segmentation for Efficient Speech Representation, http://arxiv.org/abs/2505.18516v1
    Preprints | 2025
    Zhang X; Liu H; Zhang Q; Ahmed B; Epps J, 2025, SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information, http://arxiv.org/abs/2502.10950v2
    Conference Papers | 2025
    Zhang X; Ma J; Shahin M; Ahmed B; Epps J, 2025, 'Rethinking Mamba in Speech Processing by Self-Supervised Models', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889111
    Conference Papers | 2024
    Joshi A; Renzella J; Bhattacharyya P; Jha S; Zhang X, 2024, 'Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy', in Teachnlp 2024 6th Workshop on Teaching Nlp Proceedings of the Workshop, pp. 23 - 32
    Preprints | 2024
    Joshi A; Renzella J; Bhattacharyya P; Jha S; Zhang X, 2024, Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy, http://arxiv.org/abs/2405.09854v2
    Conference Papers | 2024
    Liang R; Zhang X; Li Q; Wei L; Liu H; Kumar A; Kempski Leadingham KM; Punnoose J; Garcia LP; Manbachi A, 2024, 'Unidirectional Brain-Computer Interface: Artificial Neural Network Encoding Natural Images to FMRI Response in the Visual Cortex', in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1851 - 1855, presented at ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 14 April 2024 - 19 April 2024, http://dx.doi.org/10.1109/icassp48485.2024.10446366
    Conference Papers | 2024
    Liu H; Garcia LP; Zhang X; Khong AWH; Khudanpur S, 2024, 'ENHANCING CODE-SWITCHING SPEECH RECOGNITION WITH INTERACTIVE LANGUAGE BIASES', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, pp. 10886 - 10890, http://dx.doi.org/10.1109/ICASSP48485.2024.10448335
    Conference Papers | 2024
    Meng H; Zhang Q; Zhang X; Sethu V; Ambikairajah E, 2024, 'Binaural Selective Attention Model for Target Speaker Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4323 - 4327, http://dx.doi.org/10.21437/Interspeech.2024-683
    Conference Papers | 2024
    Zhang X; Liu D; Liu H; Zhang Q; Meng H; Garcia LP; Chng ES; Yao L, 2024, 'Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 159 - 171, http://dx.doi.org/10.18653/v1/2024.emnlp-main.9
    Preprints | 2024
    Zhang X; Liu D; Liu H; Zhang Q; Meng H; Garcia LP; Chng ES; Yao L, 2024, Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model, http://arxiv.org/abs/2402.10642v2
    Preprints | 2024
    Zhang X; Liu D; Xiao T; Xiao C; Szalay T; Shahin M; Ahmed B; Epps J, 2024, Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction, http://arxiv.org/abs/2409.07969v2
    Conference Papers | 2024
    Zhang X; Liu H; Xu K; Zhang Q; Liu D; Ahmed B; Epps J, 2024, 'When LLMs Meet Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 146 - 158, http://dx.doi.org/10.18653/v1/2024.emnlp-main.8
    Preprints | 2024
    Zhang X; Liu H; Xu K; Zhang Q; Liu D; Ahmed B; Epps J, 2024, When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection, http://arxiv.org/abs/2402.13276v2
    Preprints | 2024
    Zhang X; Ma J; Shahin M; Ahmed B; Epps J, 2024, Rethinking Mamba in Speech Processing by Self-Supervised Models, http://arxiv.org/abs/2409.07273v1
    Preprints | 2024
    Zhang X; Zhang Q; Liu H; Xiao T; Qian X; Ahmed B; Ambikairajah E; Li H; Epps J, 2024, Mamba in Speech: Towards an Alternative to Self-Attention, http://arxiv.org/abs/2405.12609v6
    Conference Papers | 2023
    Chua VYH; Liu H; Garcia LP; Woon FT; Wong J; Zhang X; Khudanpur S; Khong AWH; Dauwels J; Styles SJ, 2023, 'MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization', in INTERSPEECH 2023, ISCA, pp. 4109 - 4113, presented at INTERSPEECH 2023, http://dx.doi.org/10.21437/interspeech.2023-1446
    Conference Papers | 2023
    Li SS; Zhang X; Zhou S; Shu H; Liang R; Liu H; Garcia LP, 2023, 'PQLM - Multilingual Decentralized Portable Quantum Language Model', in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1 - 5, presented at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 June 2023 - 10 June 2023, http://dx.doi.org/10.1109/icassp49357.2023.10095215
    Conference Papers | 2023
    Xuan Y; Zhang X; Li SS; Shen Z; Xie X; Garcia LP; Togneri R, 2023, 'A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters', in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1 - 5, presented at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 June 2023 - 10 June 2023, http://dx.doi.org/10.1109/icassp49357.2023.10095885
    Preprints | 2022
    Li SS; Zhang X; Zhou S; Shu H; Liang R; Liu H; Garcia LP, 2022, PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection, http://arxiv.org/abs/2210.03221v5
    Preprints | 2022
    Zhang X; Li SS; He Z; Togneri R; Garcia LP, 2022, End-to-End Lyrics Recognition with Self-supervised Learning, http://arxiv.org/abs/2209.12702v4