Zhuosheng Zhang is a tenure-track assistant professor at Shanghai Jiao Tong University. He received his Ph.D. degree and his M.S. degree from Shanghai Jiao Tong University in 2023 and 2020, advised by Prof. Hai Zhao. He was an intern at Amazon Web Services, Microsoft Research Redmond, Langboat Tech, NICT (Japan), and IBM. He serves as a PC member for ARR, ICML, NeurIPS, ICLR, ACL, AAAI, etc. He served as an area chair for LREC-Coling 2024, ICLR 2023 TinyPapers and a co-chair at CCL 2022.
His primary research interests include natural language processing, large language models, and autonomous agents. He has published over 50 papers in top-tier conferences and journals, including TPAMI, ICLR, ACL, AAAI, EMNLP, TNNLS, TASLP, and COLING. He has won 1st place in various language understanding and reasoning leaderboards, such as SQuAD2.0, MuTual, RACE, ShARC, and CMRC. He was awarded as an Academic Star at Shanghai Jiao Tong University and was selected as one of the Global Top 100 Chinese Rising Stars in Artificial Intelligence. He won the Baidu Scholarship and WAIC YunFan Award: Rising Star.
Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents, which adeptly adhere to language instructions and execute actions within varied environments. This survey paper orchestrates a thorough discourse, penetrating vital research dimensions, encompassing: (i) the foundational mechanics of CoT techniques, with a focus on elucidating the circumstances and justification behind its efficacy; (ii) the paradigm shift in CoT; and (iii) the burgeoning of language agents fortified by CoT approaches. Prospective research avenues envelop explorations into generalization, efficiency, customization, scaling, and safety. We hope to offer readers a comprehensive understanding of prevalent research areas such as CoT reasoning and language agents and illuminate the interconnections weaving through these areas. This paper caters to a wide audience, including beginners seeking comprehensive knowledge of CoT reasoning and language agents, as well as experienced researchers interested in foundational mechanics and engaging in cutting-edge discussions on these topics. A repository for the related papers is available at https://github.com/Zoeyyao27/CoT-Igniting-Agent. |
@article{zhang2023igniting, title={Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents}, author={Zhang, Zhuosheng and Yao, Yao and Zhang, Aston and Tang, Xiangru and Ma, Xinbei and He, Zhiwei and Wang, Yiming and Gerstein, Mark and Wang, Rui and Liu, Gongshen and others}, journal={arXiv preprint arXiv:2311.11797}, year={2023} }
Autonomous user interface (UI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LLMs) for effective engagement in diverse environments. To align with the input-output requirement of LLMs, existing approaches are developed under a sandbox setting where they rely on external tools and application-specific APIs to parse the environment into textual elements and interpret the predicted actions. Consequently, those approaches often grapple with inference inefficiency and error propagation risks. To mitigate the challenges, we introduce Auto-UI, a multimodal solution that directly interacts with the interface, bypassing the need for environment parsing or reliance on application-dependent APIs. Moreover, we propose a chain-of-action technique -- leveraging a series of intermediate previous action histories and future action plans -- to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30K unique instructions, spanning multi-step tasks such as application operation, web searching, and web shopping. Experimental results show that Auto-UI achieves state-of-the-art performance with an action type prediction accuracy of 90% and an overall action success rate of 74%. Code is publicly available at https://github.com/cooelf/Auto-UI. |
@article{zhang2023autoui, title={You Only Look at Screens: Multimodal Chain-of-Action Agents}, author={Zhang, Zhuosheng and Zhang, Aston}, journal={arXiv preprint arXiv:2309.11436}, year={2023} }
Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies are mostly isolated in the language modality with LLMs, where LLMs are hard to deploy. To elicit CoT reasoning in multimodality, a possible solution is to fine-tune small language models by fusing the vision and language features to perform CoT reasoning. The key challenge is that those language models tend to generate hallucinated reasoning chains that mislead the answer inference. To mitigate the effect of such mistakes, we propose Multimodal-CoT that incorporates vision features in a decoupled training framework. The framework separates the rationale generation and answer inference into two stages. By incorporating the vision features in both stages, the model is able to generate effective rationales that contribute to answer inference. With Multimodal-CoT, our model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16% (75.17%->91.68%) on the ScienceQA benchmark and even surpasses human performance. Code is publicly available at https://github.com/amazon-science/mm-cot. |
@article{zhang2023multicot, title={Multimodal Chain-of-Thought Reasoning in Language Models}, author={Zhang, Zhuosheng and Zhang, Aston and Li, Mu and Zhao, Hai and Karypis, George and Smola, Alex}, journal={arXiv preprint arXiv:2302.00923}, year={2023} }
Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstrations one by one. We show that such manual efforts may be eliminated by leveraging LLMs with the "Let's think step by step" prompt to generate reasoning chains for demonstrations one by one, i.e., let's think not just step by step, but also one by one. However, these generated chains often come with mistakes. To mitigate the effect of such mistakes, we find that diversity matters for automatically constructing demonstrations. We propose an automatic CoT prompting method: Auto-CoT. It samples questions with diversity and generates reasoning chains to construct demonstrations. On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations. Code is available at https://github.com/amazon-research/auto-cot |
@inproceedings{zhang2023automatic, title={Automatic Chain of Thought Prompting in Large Language Models}, author={Zhang, Zhuosheng and Zhang, Aston and Li, Mu and Smola, Alex}, booktitle={The Eleventh International Conference on Learning Representations (ICLR 2023)}, year={2023} }
Automatic summarization generates concise summaries that contain key ideas of source documents. As the most mainstream datasets for the news sub-domain, CNN/DailyMail and BBC XSum have been widely used for performance benchmarking. However, the reference summaries of those datasets turn out to be noisy, mainly in terms of factual hallucination and information redundancy. To address this challenge, we first annotate new expert-writing Element-aware test sets following the "Lasswell Communication Model" proposed by Lasswell (1948), allowing reference summaries to focus on more fine-grained news elements objectively and comprehensively. Utilizing the new test sets, we observe the surprising zero-shot summary ability of LLMs, which addresses the issue of the inconsistent results between human preference and automatic evaluation metrics of LLMs' zero-shot summaries in prior work. Further, we propose a Summary Chain-of-Thought (SumCoT) technique to elicit LLMs to generate summaries step by step, which helps them integrate more fine-grained details of source documents into the final summaries that correlate with the human writing mindset. Experimental results show our method outperforms state-of-the-art fine-tuned PLMs and zero-shot LLMs by +4.33/+4.77 in ROUGE-L on the two datasets, respectively. Dataset and code are publicly available at this https https://github.com/Alsace08/SumCoT. |
@inproceedings{wang2023element, title={Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method}, author={Wang, Yiming and Zhang, Zhuosheng and Wang, Rui}, booktitle={The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)}, year={2023} }
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. In contrast to traditional machine translation that focuses solely on source-target mapping, LLM-based translation can potentially mimic the human translation process that takes many preparatory steps to ensure high-quality translation. This work aims to explore this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. Specifically, we enable LLMs to first analyze the given source text and extract three aspects of translation-related knowledge: keywords, topics and relevant demonstrations to guide the translation process. To filter out the noisy and unhelpful knowledge, we employ a selection mechanism based on quality estimation. Experiments suggest that MAPS brings significant and consistent improvements over text-davinci-003 and Alpaca on eight translation directions from the latest WMT22 test sets. Our further analysis shows that the extracted knowledge is critical in resolving up to 59% of hallucination mistakes in translation. Code is available at this https https://github.com/zwhe99/MAPS-mt. |
@article{he2023exploring, title={Exploring Human-Like Translation Strategy with Large Language Models}, author={He, Zhiwei and Liang, Tian and Jiao, Wenxiang and Zhang, Zhuosheng and Yang, Yujiu and Wang, Rui and Tu, Zhaopeng and Shi, Shuming and Wang, Xing}, journal={arXiv preprint arXiv:2305.04118}, year={2023} }
Open-Domain Question Answering (ODQA) requires models to answer factoid questions with no context given. The common way for this task is to train models on a large-scale annotated dataset to retrieve related documents and generate answers based on these documents. In this paper, we show that the ODQA architecture can be dramatically simplified by treating Large Language Models (LLMs) as a knowledge corpus and propose a Self-Prompting framework for LLMs to perform ODQA so as to eliminate the need for training data and external knowledge corpus. Concretely, we firstly generate multiple pseudo QA pairs with background passages and one-sentence explanations for these QAs by prompting LLMs step by step and then leverage the generated QA pairs for in-context learning. Experimental results show our method surpasses previous state-of-the-art methods by +8.8 EM averagely on three widely-used ODQA datasets, and even achieves comparable performance with several retrieval-augmented fine-tuned models. |
@article{li2022self, title={Self-Prompting Large Language Models for Open-Domain QA}, author={Li, Junlong and hang, Zhuosheng and Zhao, Hai}, journal={arXiv preprint arXiv:2212.08635}, year={2022} }
Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies. |
@article{qin2023chatgpt, title={Is ChatGPT a General-Purpose Natural Language Processing Task Solver?}, author={Qin, Chengwei and Zhang, Aston and Zhang, Zhuosheng and Chen, Jiaao and Yasunaga, Michihiro and Yang, Diyi}, journal={arXiv preprint arXiv:2302.06476}, year={2023} }
Masked Language Modeling (MLM) has been widely used as the denoising objective in pre-training language models (PrLMs). Existing PrLMs commonly adopt a Random-Token Masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training. However, the model may receive complicated impact from pre-training status, which changes accordingly as training time goes on. In this paper, we show that such time-invariant MLM settings on masking ratio and masked content are unlikely to deliver an optimal outcome, which motivates us to explore the influence of time-variant MLM settings. We propose two scheduled masking approaches that adaptively tune the masking ratio and masked content in different training stages, which improves the pre-training efficiency and effectiveness verified on the downstream tasks. Our work is a pioneer study on time-variant masking strategy on ratio and content and gives a better understanding of how masking ratio and masked content influence the MLM pre-training. |
@inproceedings{yang2023learning, title={Learning Better Masking for Better Language Model Pre-training}, author={Yang, Dongjie and Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)}, year={2023} }
Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs or a shared cross-modal embedding space that is pre-trained on out-of-shelf text-image pairs. Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively. The two sequences of representations are further fused by an attention layer for the interaction of the two modalities. In this study, the retrieval process is controllable and flexible. The universal visual representation overcomes the lack of large-scale bilingual sentence-image pairs. Our method can be easily applied to text-only tasks without manually annotated multimodal parallel corpora. We apply the proposed method to a wide range of natural language generation and understanding tasks, including neural machine translation, natural language inference, and semantic similarity. Experimental results show that our method is generally effective for different tasks and languages. Analysis indicates that the visual signals enrich textual representations of content words, provide fine-grained grounding information about the relationship between concepts and events, and potentially conduce to disambiguation. |
@ARTICLE{zhang2023universal, author={Zhang, Zhuosheng and Chen, Kehai and Wang, Rui and Utiyama, Masao and Sumita, Eiichiro and Li, Zuchao and Zhao, Hai}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, title={Universal Multimodal Representation for Language Understanding}, year={2023}, volume={}, number={}, pages={1-18}, doi={10.1109/TPAMI.2023.3234170}}
Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. The proposed SG-Net is applied to typical Transformer encoders. Extensive experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design. |
@article{zhang2022sg, title={SG-Net: Syntax Guided Transformer for Language Representation}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhou, Junru and Duan, Sufeng and Zhao, Hai and Wang, Rui}, journal={IEEE Transactions on Pattern Analysis \& Machine Intelligence}, volume={44}, number={06}, pages={3285--3299}, year={2022}, publisher={IEEE Computer Society} }
Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models. Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. To tackle the challenge, we propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. We conduct extensive experiments on 40 datasets, which show that our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships. The task relationships reflected by the prefixes align transfer learning performance between tasks. They also suggest directions for data augmentation with complementary tasks, which help our model achieve human-parity results on commonsense reasoning leaderboards. |
@article{zhang2022task, title={Task Compass: Scaling Multi-task Pre-training with Task Prefix}, author={Zhang, Zhuosheng and Wang, Shuohang and Xu, Yichong and Fang, Yuwei and Yu, Wenhao and Liu, Yang and Zhao, Hai and Zhu, Chenguang and Zeng, Michael}, journal={arXiv preprint arXiv:2210.06277}, year={2022} }
Pre-trained language models (PrLMs) have demonstrated superior performance due to their strong ability to learn universal language representations from self-supervised pre-training. However, even with the help of the powerful PrLMs, it is still challenging to effectively capture task-related knowledge from dialogue texts which are enriched by correlations among speaker-aware utterances. In this work, we present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives: 1) utterance order restoration, which predicts the order of the permuted utterances in dialogue context; 2) sentence backbone regularization, which regularizes the model to improve the factual correctness of summarized subject-verb-object triplets. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks. |
@inproceedings{zhang2021structural, title={Structural Pre-training for Dialogue Comprehension}, author={Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021)}, year={2021} }
Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models with less computation but still ensures impressive performance. Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment. This technical report releases our pre-trained model called Mengzi, which stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants, capable of a wide range of language and vision tasks. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark with our optimized pre-training and fine-tuning techniques. Without modifying the model architecture, our model can be easily employed as an alternative to existing PLMs. Our sources are available at https://github.com/Langboat/Mengzi. |
@misc{zhang2021mengzi, title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou}, year={2021}, eprint={2110.06696}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Multi-party multi-turn dialogue comprehension brings unprecedented challenges in handling complicated scenarios, as the co-occurrence of multiple speakers causes complexity and inconsistency. As a result of the multiple participation, the shift of speaker roles and crisscrossed discourse relations among utterances hinder reading comprehension. Motivated by this, we further integrate the enhancements of speaker-related features for dialogue comprehension performance. This work proposes a novel model with enhancement from both sides of speaker roles and speaker-aware relations. At the token level, we apply a speaker mask for attention, while at the discourse level, we utilize heterogeneous graph networks for comprehensive speaker-aware discourse clues. Experimental results show that our E nhanced S peaker- A ware method (ESA) helps achieve state-of-the-art performance on the Molweni dataset, as well as significant improvements on the FriendsQA dataset. We find that our method makes steady improvements on stronger backbones. Analysis shows that our model enhances the connections between utterances and their own speakers and captures the speaker-aware discourse relations. Discussions on data features and error cases are presented, and a visualized case is displayed. The findings reveal the importance of speaker-aware signals in dialogue comprehension. |
@ARTICLE{10147329, author={Ma, Xinbei and Zhang, Zhuosheng and Zhao, Hai}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title={Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension}, year={2023}, volume={}, number={}, pages={1-16}, doi={10.1109/TASLP.2023.3284516} }
Commonsense fact verification, as a challenging branch of commonsense question-answering (QA), aims to verify through facts whether a given commonsense claim is correct or not. Answering commonsense questions necessitates a combination of knowledge from various levels. However, existing studies primarily rest on grasping either unstructured evidence or potential reasoning paths from structured knowledge bases, yet failing to exploit the benefits of heterogeneous knowledge simultaneously. In light of this, we propose Decker, a commonsense fact verification model that is capable of bridging heterogeneous knowledge by uncovering latent relationships between structured and unstructured knowledge. Experimental results on two commonsense fact verification benchmark datasets, CSQA2.0 and CREAK demonstrate the effectiveness of our Decker and further analysis verifies its capability to seize more precious information through reasoning. |
@inproceedings{zou2023decker, title={Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact Verification}, author={Zou, Anni and Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)}, year={2023} }
Machine reading comprehension (MRC) is an AI challenge that requires machine to determine the correct answers to questions based on a given passage. MRC systems must not only answer question when necessary but also distinguish when no answer is available according to the given passage and then tactfully abstain from answering. When unanswerable questions are involved in the MRC task, an essential verification module called verifier is especially required in addition to the encoder, though the latest practice on MRC modeling still most benefits from adopting well pre-trained language models as the encoder block by only focusing on the "reading". This paper devotes itself to exploring better verifier design for the MRC task with unanswerable questions. Inspired by how humans solve reading comprehension questions, we proposed a retrospective reader (Retro-Reader) that integrates two stages of reading and verification strategies: 1) sketchy reading that briefly investigates the overall interactions of passage and question, and yield an initial judgment; 2) intensive reading that verifies the answer and gives the final prediction. The proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0 and NewsQA, achieving new state-of-the-art results. Significance tests show that our model is significantly better than the strong ELECTRA and ALBERT baselines. A series of analysis is also conducted to interpret the effectiveness of the proposed reader. |
@inproceedings{zhang2021retro, title={Retrospective Reader for Machine Reading Comprehension}, author={Zhang, Zhuosheng and Yang, Junjie and Zhao, Hai}, booktitle={The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021)}, year={2021} }
Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning. |
@article{zhang2020mrc, title={Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond}, author={Zhang, Zhuosheng and Zhao, Hai and Wang, Rui}, journal={arXiv preprint arXiv:2005.06249}, year={2020} }
The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks. |
@inproceedings{zhang2020semantics, title={Semantics-aware bert for language understanding}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhao, Hai and Li, Zuchao and Zhang, Shuailiang and Zhou, Xi and Zhou, Xiang}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)}, volume={34}, number={05}, pages={9628--9635}, year={2020} }
![]() |
Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus. |
@inproceedings{zhang2018dua, title = {Modeling Multi-turn Conversation with Deep Utterance Aggregation}, author = {Zhang, Zhuosheng and Li, Jiangtong and Zhu, Pengfei and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages= {3740–-3752}, year = {2018}, }
2023: WAIC YunFan Award, Rising Star, World Artificial Intelligence Conference.
2023: Shanghai Outstanding Doctoral Graduate.
2022: Academic Stars of Graduate Students (10 recipients), Shanghai Jiao Tong University.
2021: Global Top 100 Chinese Rising Stars in Artificial Intelligence (Top 10 recommended), Baidu Research.
2021: Baidu Scholarship (Top-10, worldwide), Baidu.
2020: National Scholarship of China, Ministry of Education of the P.R. China.
2019: Yang Yuanqing Education Fund, The foundation of Class 1988 in CS @ Shanghai Jiao Tong University.
2018: Academic Stars of Graduate Students (The only master student awardee), Shanghai Jiao Tong University.
2016: National Figures Nomination of College Students (20 total recipients), Ministry of Education of the P.R. China.
2015: CCF Elite Collegiate Award, China Computer Federation.