A persistent challenge in online hate speech detection is linguistic drift. To evade moderation, hate actors regularly invent euphemisms, use irony, and adopt coded “dog-whistle” expressions. Static models, trained on fixed corpora, rapidly lose accuracy once deployed, a problem known as concept drift.

Recent work on Hierarchical Reasoning Models (HRMs) demonstrates that models can spontaneously form multi-level planning and execution structures, enabling robustness in dynamic reasoning tasks. This inspires a new direction: can hierarchical reasoning mechanisms, when combined with continual learning, help hate speech detectors adapt to newly emerging implicit forms of abuse?

This project proposes the Self-Enhancing Hierarchical Social Reasoning Network (SE-HSRN): a two-layer adaptive framework that (1) discovers emergent expressions (e.g., euphemisms and coded language), and (2) adapts its detector in real time using continual/test-time learning. The system aims to bridge the gap between interpretability (reasoning layers) and adaptability (concept drift resilience), producing a model that not only classifies hate speech but also proactively updates itself as online language evolves.

School

Computer Science and Engineering

Research Area

Natural language processing (NLP) | Computational sociolinguistics | Deep learning & concept drift adaptation | Implicit hate & adversarial language understanding

Suitable for recognition of Work Integrated Learning (industrial training)? 

No

The student will work under regular supervision, gain hands-on experience with NLP toolkits, and interact with ongoing projects in adversarial language understanding. Weekly meetings will ensure feedback, direction, and milestone tracking.

System: Prototype of SE-HSRN integrating discovery + adaptation for implicit hate.

Experience: Training in continual learning, evaluation of NLP models under drift, and interpretability analysis.

Report: A structured research report detailing methods, experiments, results, and future extensions.

Poster: Visual results (performance curves, discovered terms, architecture diagram) for showcase presentation.

Hierarchical reasoning:

Jin, Jiajie, et al. "Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search." arXiv preprint arXiv:2507.02652 (2025).

Wang, Guan, et al. "Hierarchical Reasoning Model." arXiv preprint arXiv:2506.21734 (2025).

Implicit hate:

Sasse, Kuleen, et al. "Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats." arXiv preprint arXiv:2412.12072 (2024).

Huang, Fan, Haewoon Kwak, and Jisun An. "Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech." Companion proceedings of the ACM web conference 2023. 2023.

Concept drift & continual learning:

Ashrafee, Alif, et al. "Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment." arXiv preprint arXiv:2507.02310 (2025).

Bidaki, Seyed Amir, et al. "Online continual learning: A systematic literature review of approaches, challenges, and benchmarks." arXiv preprint arXiv:2501.04897 (2025).

Zhang, Qiyuan, et al. "A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?." arXiv preprint arXiv:2503.24235 (2025).

Robustness / adversarial & benchmarking (useful for evaluation section):

Shen, Xinyue, et al. "HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns." (2025).