Zhuosheng Zhang

Tenure-Track Assistant Professor
School of Computer Science
Shanghai Jiao Tong University
Email: zhangzs@sjtu.edu.cn
Office: School of Software 5213
800 Dongchuan Road, Shanghai

Profile

I am a tenure-track assistant professor at Shanghai Jiao Tong University. I received my Ph.D. degree and my M.S. degree from Shanghai Jiao Tong University in 2023 and 2020, respectively. I was an intern at Amazon Web Services, Microsoft Research Redmond, Langboat Tech, NICT (Japan), and IBM. I have served as an action editor for ACL Rolling Review, and a (senior) area chair for ACL, NeurIPS, and EMNLP.

My research interests include natural language processing, LLM reasoning, LLM agents, and LLM safety. I have published over 100 papers in top-tier conferences and journals, including Nature Communications, TPAMI, ICML, ICLR, ACL, AAAI, EMNLP, TNNLS, TASLP, and COLING. I have won 1st place in various language understanding and reasoning leaderboards, such as SQuAD2.0, MuTual, RACE, ShARC, and CMRC. I was awarded as an Academic Star at Shanghai Jiao Tong University and was selected as one of the Global Top 100 Chinese Rising Stars in Artificial Intelligence. I won the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS), WAIC 2024 Youth Outstanding Paper Award, WAIC 2024 YunFan Award: Bright Star, and Baidu Scholarship.

Recent Projects

Agent Continual Learning: Generalization, Personalization, Socialization
Agent Interaction: GUI Agents, Human-Agent Collaboration, Multi-Agent Interaction, Generative UI
Agent OS: Parallel Scheduling for Massive Agents, Context Management
Agent Safety & Robustness: Environmental Injection, Over-Competition

Prospective students: We are actively looking for undergraduate interns at SJTU. We expect applicants to have some prior experience in AI/NLP/ML (prior research experience is not required), and a minimum of 10 hours per week commitment to research. Please email me with your CV if you are interested.

Teaching

Courses:

NIS3353: Artificial Intelligence Security
Undergraduate, Shanghai Jiao Tong University, 2024-
NIS8021: Frontier Technology in Natural Language Processing
Graduate, Shanghai Jiao Tong University, 2024-

Tutorials:

For Beginners: Dive into LLMs《动手学大模型》系列编程实践教程 New Updates! (May 2025)
dive-into-llms
CVPR 2024: From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond
Hao Fei, Yuan Yao, Ao Zhang, Haotian Liu, Fuxiao Liu, Zhuosheng Zhang, Shuicheng Yan.
Seattle WA, USA
[Website]
LREC-COLING 2024: From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning, Efficiency and Beyond
Hao Fei, Yuan Yao, Zhuosheng Zhang, Fuxiao Liu, Ao Zhang, Tat-Seng Chua.
Torino, Italia
[Website]
IJCNLP-AACL 2023: Learning WHO Saying WHAT to WHOM in Multi-Party Conversations
Jia-Chen Gu, Zhuosheng Zhang, and Zhen-Hua Ling.
Bali, Indonesia.
[Website]
IJCAI 2021: Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang and Hai Zhao.
Montreal, Canada (Virtual)
[Website]

Recent Talks

2026/04: Keynote at ICLR 2026 MemAgents Workshop. [slides]
2026/02: Talk "面向AI数字分身的个性化建模与流通" at CCF秀湖会议
2025/11: Talk "大模型智能体推理机制分析：从推理泛化到言行合一" at LMG 2025大模型深度推理论坛 [slides]
2025/10: Talk "智能体系统的技术架构、能力演化与全景评估" at CNCC 2025 AI Agent关键技术与应用论坛 [slides]
2025/10: Talk "从被动工具到主动伙伴：探索心智驱动的OS Agent" at CNCC 2025 面向移动生态的Agentic AI论坛 [slides]
2025/09: Talk "迈向可信赖的AI智能体：从隐式意图理解到拟人行为分析" at CIPS大模型前沿技术报告 [slides]
2025/07: Talk "大模型时代的智能交互：OS Agent技术与挑战" at 上海交通大学大模型智能体暑期研学营 [slides]
2024/09: Keynote "Caution for the environment: Multimodal Agents are Susceptible to Environmental Distractions" at CJNLP 2024. [slides]
2024/08: Keynote "Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities" at Knowledge-Augmented NLP Workshop @ ACL 2024. [slides]

Selected Publications

Discover

.
[2026]

Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning
Zheng Wu, Xingyu Lou, Xinbei Ma, Yansi Li, Weiwen Liu, Weinan Zhang, Jun Wang*, Zhuosheng Zhang*.
ACL, 2026
[PDF] [Abstract]

Agent-Dice

ParaCook: On Time-Efficient Planning for Multi-Agent Systems
Shiqi Zhang, Xinbei Ma, Yunqing Xu, Zouying Cao, Pengrui Lu, Haobo Yuan, Tiancheng Shen, Zhuosheng Zhang*, Hai Zhao*, Ming-Hsuan Yang*
ACL, 2026
[PDF] [Abstract]

ParaCook

The Confidence Paradox: Unveiling the Latent Discriminative Power of Diffusion Large Language Models in Mathematical Reasoning
Yansi Li, Gongshen Liu, Zhuosheng Zhang*
ACL, 2026
[PDF] [Abstract]

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
Qiushi_Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong
ACL, 2026
Best Paper Award at AIWILD @ ICLR2026
[PDF] [Abstract]

OS-Sentinel

EVA: Evolving Semantic Adversaries for Red-Teaming GUI Agents Against Environmental Injection Attacks
Yijie Lu, Manman Zhao, Tianjie Ju, Zihe Yan, Xinbei Ma, Yuan Guo, Daizong Ding, Gongshen Liu*, Zhuosheng Zhang*
ACL, 2026
[PDF] [Abstract]

As multimodal agents are increasingly trained to operate graphical user interfaces (GUIs) to complete user tasks, they face a growing threat from indirect prompt injection, attacks in which misleading instructions are embedded into the agent's visual environment, such as popups or chat messages, and misinterpreted as part of the intended task. A typical example is environmental injection, in which GUI elements are manipulated to influence agent behavior without directly modifying the user prompt. To address these emerging attacks, we propose EVA, a red teaming framework for indirect prompt injection which transforms the attack into a closed loop optimization by continuously monitoring an agent's attention distribution over the GUI and updating adversarial cues, keywords, phrasing, and layout, in response. Compared with prior one shot methods that generate fixed prompts without regard for how the model allocates visual attention, EVA dynamically adapts to emerging attention hotspots, yielding substantially higher attack success rates and far greater transferability across diverse GUI scenarios. We evaluate EVA on six widely used generalist and specialist GUI agents in realistic settings such as popup manipulation, chat based phishing, payments, and email composition. Experimental results show that EVA substantially improves success rates over static baselines. Under goal agnostic constraints, where the attacker does not know the agent's task intent, EVA still discovers effective patterns. Notably, we find that injection styles transfer well across models, revealing shared behavioral biases in GUI agents. These results suggest that evolving indirect prompt injection is a powerful tool not only for red teaming agents, but also for uncovering common vulnerabilities in their multimodal decision making.

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
Zongru Wu, Rui Mao, Zhiyuan Tian, Pengzhou Cheng, Tianjie Ju, Zheng Wu, Lingzhong Dong, Haiyue Sheng, Zhuosheng Zhang*, Gongshen Liu*.
CVPR, 2026
[PDF] [Abstract]

StaR

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
Zehao Deng, Tianjie Ju, Zheng Wu, Zhuosheng Zhang*, Gongshen Liu.
CVPR, 2026
[PDF] [Abstract]

CES

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents
Zihe Yan, Zhuosheng Zhang*, Jiaping Gui, Gongshen Liu.
CVPR, 2026
[PDF] [Abstract]

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu.
ICLR, 2026
[PDF] [Abstract]

Reinforcement learning (RL) with large language models shows promise in complex reasoning. However, its progress is hindered by the lack of large-scale training data that is sufficiently challenging, contamination-free and verifiable. To this end, we introduce DeepMath-103K, a large-scale mathematical dataset designed with high difficulty (primarily levels 5-9), rigorous decontamination against numerous benchmarks, and verifiable answers for rule-based RL reward. It further includes three distinct R1 solutions adaptable for diverse training paradigms such as supervised fine-tuning (SFT). Spanning a wide range of mathematical topics, DeepMath-103K fosters the development of generalizable and advancing reasoning. Notably, models trained on DeepMath-103K achieve leading results on challenging mathematical benchmarks and demonstrate generalization beyond math such as biology, physics and chemistry, underscoring its broad efficacy.

Auditing Partial Dataset Usage in Large Language Models Via Fuzzy Membership Aggregation
Hongyu Zhu, Sichu Liang, Bofan Chen, Shilin Wang, Zhuosheng Zhang, Weiping Ding.
IEEE Transactions on Fuzzy Systems, 2026
[PDF] [Abstract]

The remarkable capabilities of Large Language Models (LLMs) are fueled by massive internet-scale corpora. However, scraped data owners often do not consent to its use for training, raising significant legal and ethical concerns over copyright and privacy. Data auditing techniques seek to verify whether a protected dataset was used in training a target LLM, typically framing the task as membership inference: estimating binary sample-level membership and aggregating to a dataset-level decision. In this paper, we identify a fundamental limitation of this crisp binary paradigm: in realistic training pipelines, datasets are rarely used in full. Instead, models are trained on mixtures of partial subsets drawn from multiple sources. Existing auditing techniques, built upon an all-or-none assumption—declaring a dataset either entirely present or absent from training—collapse in partial dataset usage scenarios. Their predictions fluctuate unpredictably with the member ratio, causing unstable performance and high false-negative rates. Inspired by fuzzy set theory, we relax the crisp notion of binary membership to a continuous fuzzy membership in [0,1], quantifying each sample's degree of inclusion in the model's training set. We establish a theoretical bridge between sample-level fuzzy memberships and the dataset-level usage ratio, facilitating inference of the proportion of a protected dataset used during training. A neural network fuzzifier first estimates sample-level fuzzy memberships from binary labels in a reference set, then refines them using dataset-level member ratios as higher-order supervision. Finally, a defuzzification stage aggregates calibrated memberships to determine partial usage. Across LLMs of varying scales and multiple auditing datasets, our Fuzzy Auditor substantially outperforms state-of-the-art crisp binary techniques in detecting partial usage, estimating member proportions, and identifying individual member samples.

Generalizable and Adaptive Continual Learning Framework for AI-generated Image Detection
Hanyi Wang, Jun Lan, Yaoyu Kang, Huijia Zhu, Weiqiang Wang, Zhuosheng Zhang, Shilin Wang.
IEEE Transactions on Multimedia, 2026
[PDF] [Abstract]

The malicious misuse and widespread dissemination of AI-generated images pose a significant threat to the authenticity of online information. Current detection methods often struggle to generalize to unseen generative models, and the rapid evolution of generative techniques continuously exacerbates this challenge. Without adaptability, detection models risk becoming ineffective in real-world applications. To address this critical issue, we propose a novel three-stage domain continual learning framework designed for continuous adaptation to evolving generative models. In the first stage, we employ a strategic parameter-efficient fine-tuning approach to develop a transferable offline detection model with strong generalization capabilities. Building upon this foundation, the second stage integrates unseen data streams into a continual learning process. To efficiently learn from limited samples of novel generated models and mitigate overfitting, we design a data augmentation chain with progressively increasing complexity. Furthermore, we leverage the Kronecker-Factored Approximate Curvature (K-FAC) method to approximate the Hessian and alleviate catastrophic forgetting. Finally, the third stage utilizes a linear interpolation strategy based on Linear Mode Connectivity, effectively capturing commonalities across diverse generative models and further enhancing overall performance. We establish a comprehensive benchmark of 27 generative models, including GANs, deepfakes, and diffusion models, chronologically structured up to August 2024 to simulate real-world scenarios. Extensive experiments demonstrate that our initial offline detectors surpass the leading baseline by +5.51% in terms of mean average precision. Our continual learning strategy achieves an average accuracy of 92.20%, outperforming state-of-the-art methods.

GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
Zheng Wu, Pengzhou Cheng, Zongru Wu, Lingzhong Dong, Zhuosheng Zhang*
AAAI, 2026
[PDF] [Abstract]

GEM-OODforGUIagents

An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains
Zihe Yan, Kai Luo, Haoyu Yang, Yang Yu, Zhuosheng Zhang*, Guancheng Li*
AAAI, 2026
[PDF] [Abstract]

HSBRiskEvaluator

[2025 & Before]

Multimodal Chain-of-Thought Reasoning in Language Models
Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola.
TMLR, 2024
"Imagine learning a textbook with no figures: Multimodal-CoT surpasses humans on ScienceQA."
Featured in Dive into Deep Learning (Adopted at 500 universities from 70 countries)
[Top Trending Research on paperswithcode] [Idea Inspiration] [PDF] [Abstract]

MM-CoT

Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola.
ICLR, 2023
"Let's think not just step by step, but also one by one."
Featured in Dive into Deep Learning (Adopted at 400 universities from 60 countries)
[PDF] [Abstract] [bilibili] [slides]

Auto-CoT

You Only Look at Screens: Multimodal Chain-of-Action Agents
Zhuosheng Zhang, Aston Zhang.
ACL, 2024
"Perform a task on smart phones? Train an agent using screenshots."
[PDF] [Abstract] [slides]

Auto-GUI

Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy
Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein.
Nature Communications, 2025
[PDF] [Abstract]

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang#, Yao Yao#, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao.
ACM Computing Surveys, 2025
"Join us on an exciting journey from chain-of-thought reasoning to language agent!"
[PDF] [Abstract]

CoT-Igniting-Agent

Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents, which adeptly adhere to language instructions and execute actions within varied environments. This survey paper orchestrates a thorough discourse, penetrating vital research dimensions, encompassing: (i) the foundational mechanics of CoT techniques, with a focus on elucidating the circumstances and justification behind its efficacy; (ii) the paradigm shift in CoT; and (iii) the burgeoning of language agents fortified by CoT approaches. Prospective research avenues envelop explorations into generalization, efficiency, customization, scaling, and safety. We hope to offer readers a comprehensive understanding of prevalent research areas such as CoT reasoning and language agents and illuminate the interconnections weaving through these areas. This paper caters to a wide audience, including beginners seeking comprehensive knowledge of CoT reasoning and language agents, as well as experienced researchers interested in foundational mechanics and engaging in cutting-edge discussions on these topics. A repository for the related papers is available at https://github.com/Zoeyyao27/CoT-Igniting-Agent.

Do NOT Think That Much for 2+ 3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu.
ICML, 2025
[PDF] [Abstract]

Shared Tasks

[May 2022] HellaSwag Leaderboard on Commonsense Reasoning

The best among all submissions.

Leaderboard

Paper

[January 2021] ShARC Leaderboard on Conversational Question Answering

The best among all submissions.

Leaderboard

Paper

[September 2020] MuTual Leaderboard on Dialogue Reasoning Challenge

The best among all submissions.

Leaderboard

Paper

[July 2019] SQuAD2.0 Leaderboard on Machine Reading Comprehension

The best models for both single and ensemble settings among all submissions (2020.01).
The first to surpass human benchmark on both EM and F1 scores with a single model (from 2019.07-09).
The first time to exceed 90% F1 score with ensemble models.
[Leaderboard] [Paper] [Report]

[March 2019] RACE Leaderboard on Machine Reading Comprehension

The best among all submissions.
The best among all academic submissions.
[Leaderboard] [Paper] [Report]

[April 2019] SNLI Leaderboard on Language Inference

The best among all submissions.
[Leaderboard] [Paper]

[March 2019] GLUE Leaderboard on Language Understanding

The 3rd best among all submissions.
The best among all academic submissions.
[Leaderboard] [Paper]

[August 2017] Chinese Machine Reading Comprehension (CCL-CMRC 2017)

The best single system and the second ensemble system (Silver Medal).
[Leaderboard] [Paper] [Source]

Awards & Honors

2024: WAIC Youth Outstanding Paper Award, World Artificial Intelligence Conference.
2024: WAIC YunFan Award: Bright Star, World Artificial Intelligence Conference.
2023: Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS).
2023: Shanghai Outstanding Doctoral Graduate.
2022: Academic Stars of Graduate Students (10 recipients), Shanghai Jiao Tong University.
2021: Global Top 100 Chinese Rising Stars in Artificial Intelligence (Top 10 recommended), Baidu Research.
2021: Baidu Scholarship (10 recipients, worldwide), Baidu.
2020: National Scholarship of China, Ministry of Education of the P.R. China.
2019: Yang Yuanqing Education Fund, The foundation of Class 1988 in CS @ Shanghai Jiao Tong University.
2018: Academic Stars of Graduate Students (The only master student awardee), Shanghai Jiao Tong University.
2016: National Figures Nomination of College Students (20 total recipients), Ministry of Education of the P.R. China.
2015: CCF Elite Collegiate Award, China Computer Federation.

Academic Service

Organization:

Organizing Committee Member of CJNLP 2025
Co-chair of Hot Paper Session at CCL 2025
Session Chair at RL China 2024
Session Chair at CJNLP 2024
Session Chair at IJCNLP-AACL 2023
Co-chair of Student Seminar at CCL 2022
President of IBM Tech Club at Wuhan University, 2014-2015

(Senior) Area Chair / Action Editor/ SPC:

AAAI 2026
ACL Rolling Review
NeurIPS 2025
EMNLP 2025
ACL 2025
LREC-COLING 2024
IJCAI 2024
ICLR 2023 TinyPapers

Program Committee Member:

ML/AI conferences: ICLR, ICML, NeurIPS, AAAI, IJCAI, etc.
CL/NLP conferences: ARR, ACL, EMNLP, COLING, NAACL, AACL, NLPCC, CCL, etc.

Journal Reviewer:

Artificial Intelligence, IEEE/ACM TASLP, IEEE TNNLS, IEEE TETCI, IEEE Communications Magazine, ACM TALLIP, ACM TOIS, TMLR, Neurocomputing, Multimedia Systems, Neural Computing and Applications, Expert Systems With Applications.

Experience

Jul. 2022 - Aug. 2023, Amazon Web Services AI, CA, USA.
Applied Scientist Intern, advised by Dr. Aston Zhang, Mu Li, Alex Smola.
Feb. 2022 - June. 2022, Microsoft Cognitive Services Research Group, WA, USA.
Research Intern, advised by Dr. Shuohang Wang.
Mar. 2021 - Dec. 2021, Langboat Tech, Beijing, China.
Research Intern, advised by Prof. Ming Zhou.
Jun. 2019 - Jul. 2020, NICT, Kyoto, Japan.
Internship Research Fellow, advised by Prof. Rui Wang, Kehai Chen, Masao Utiyama, and Eiichiro Sumita.

Education

Sept. 2020 - Sept. 2023
Ph.D., Dept. of Computer Science and Engineering, Shanghai Jiao Tong University, advised by Prof. Hai Zhao.
Sept. 2016 - Mar. 2020
M.S., Dept. of Computer Science and Engineering, Shanghai Jiao Tong University, advised by Prof. Hai Zhao.
Sept. 2012 - Jun. 2016
B.S., Dept. of Computer Science and Engineering, Wuhan University, advised by Prof. Haojun Ai.

Research Team

PhD Students:

Yijie Lu (incoming)
Yansi Li (2025-)
Zihe Yan (2024-)
Yiming Wang, co-advising with Prof. Rui Wang (2023-)
Pengzhou Cheng, co-advising with Prof. Gongshen Liu (2022-)
Zongru Wu, co-advising with Prof. Gongshen Liu (2022-)
Haodong Zhao, co-advising with Prof. Gongshen Liu (2021-)
Tianjie Ju, co-advising with Prof. Gongshen Liu (2021-)
Zhiwei He, co-advising with Prof. Rui Wang (2021-)
Xinbei Ma, co-advising with Prof. Hai Zhao (2021-)

Master Students:

2026: Haowen Hu (incoming)
2025: Zheng Wu, Lingxiao Diao, Zhuzuoken Xuan
2024: Lingzhong Dong
2023: Tongxin Yuan, co-advising with Prof. Gongshen Liu ( → Tencent Xuanwu Lab)
2022: Anni Zou, co-advising with Prof. Hai Zhao ( → Alibaba Tongyi Lab)

Alumni:

2026: Yuan Guo
2025: Yiting Wang ( → MS at UCSD), Xuanchang Zhang ( → RA at UIUC)
2024: Yexin Wu ( → MS at UIUC)
2023: Sizhe Zhou ( → MS at UIUC)
2022: Siru Ouyang ( → PhD at UIUC), Jialin Chen ( → PhD at Yale University), Junlong Li ( → MS at SJTU), Dongjie Yang ( → PhD at SJTU), Yuchen He ( → MS at SJTU)
2021: Longxiang Liu ( → MS at ICT/CAS)
2020: Yuwei Wu ( → MS at CMU)