Zhuosheng Zhang is a Ph.D. student in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, advised by Prof.
Hai Zhao. He was an internship research fellow at NICT from 2019-2020, working with Prof.
Rui Wang,
Kehai Chen,
Masao Utiyama, and
Eiichiro Sumita. He received his M.S. degree from Shanghai Jiao Tong University in 2020, and B.S. degree from Wuhan University in 2016.
His research focuses on Question Answering and Dialogue Systems. In particular, he is interested in Machine Reading Comprehension (MRC) and Language Modeling (Survey). He has published more than 20 papers in top-tier NLP/ML/AI conferences and journals, including ACL, ICLR, AAAI, EMNLP, COLING, TPAMI, TKDE, TASLP, etc. He has won 1st place in various MRC shared tasks and leaderboards, such as SQuAD2.0, MuTual, RACE, SNLI, CCL-CMRC2017, etc. He was the recipient of the Academic Star Award of SJTU (2018), the National College Students of the Year Nomination Award (2016), and the CCF Elite Collegiate Award (2015).
Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning. |
@article{zhang2020mrc, title={Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond}, author={Zhang, Zhuosheng and Zhao, Hai and Wang, Rui}, journal={arXiv preprint arXiv:2005.06249}, year={2020} }
Training machines to understand natural language and interact with humans is an elusive and essential task in the field of artificial intelligence. In recent years, a diversity of dialogue systems has been designed with the rapid development of deep learning researches, especially the recent pre-trained language models. Among these studies, the fundamental yet challenging part is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the perspective of dialogue modeling. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation question-answering, as well as dialogue-related language modeling techniques to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons we can learn from the empirical analysis and the prospects towards a new frontier of researches. |
@article{zhang2021advances, title={Advances in Multi-turn Dialogue Comprehension: A Survey}, author={Zhang, Zhuosheng and Zhao, Hai}, journal={arXiv preprint arXiv:2103.03125}, year={2021} }
Multi-turn dialogue reading comprehension aims to teach machines to read dialogue contexts and solve tasks such as response selection and answering questions. The major challenges involve noisy history contexts and especial prerequisites of commonsense knowledge that is unseen in the given material. Existing works mainly focus on context and response matching approaches. This work thus makes the first attempt to tackle the above two challenges by extracting substantially important turns as pivot utterances and utilizing external knowledge to enhance the representation of context. We propose a pivot-oriented deep selection model (PoDS) on top of the Transformer-based language models for dialogue comprehension. In detail, our model first picks out the pivot utterances from the conversation history according to the semantic matching with the candidate response or question, if any. Besides, knowledge items related to the dialogue context are extracted from a knowledge graph as external knowledge. Then, the pivot utterances and the external knowledge are combined together with a well-designed mechanism for refining predictions. Experimental results on four dialogue comprehension benchmark tasks show that our proposed model achieves great improvements on baselines. A series of empirical comparisons are conducted to show how our selection strategies and the extra knowledge injection influence the results. |
@article{zhang2021kkt, title={Multi-turn Dialogue Reading Comprehension with Pivot Turns and Knowledge}, author={Zhang, Zhuosheng and Li, Junlong and Zhao, Hai}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2021}, }
Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. The proposed SG-Net is applied to typical Transformer encoders. Extensive experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design. |
@article{zhang2020sgnet, title={{SG-Net}: Syntax Guided Transformer for Language Representation}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhou, Junru and Duan, Sufeng and Zhao, Hai and Wang, Rui}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2020}, doi={10.1109/TPAMI.2020.3046683}, publisher={IEEE} }
In this paper, we report our discovery on named entity distribution in general word embedding space, which helps an open definition on multilingual named entity definition rather than previous closed and constraint definition on named entities through a named entity dictionary, which is usually derived from human labor and replies on schedule update. Our initial visualization of monolingual word embeddings indicates named entities tend to gather together despite of named entity types and language difference, which enable us to model all named entities using a specific geometric structure inside embedding space, namely, the named entity hypersphere. For monolingual case, the proposed named entity model gives an open description on diverse named entity types and different languages. For cross-lingual case, mapping the proposed named entity model provides a novel way to build named entity dataset for resource-poor languages. At last, the proposed named entity model may be shown as a very useful clue to significantly enhance state-of-the-art named entity recognition systems generally. |
@article{luo2021open, title={Open Named Entity Modeling from Embedding Distribution}, author={Luo, Ying and Zhao, Hai and Zhang, Zhuosheng and Tang, Bingjie}, journal={IEEE Transactions on Knowledge and Data Engineering}, year={2021}, doi={10.1109/TKDE.2021.3049654}, publisher={IEEE} }
Machine reading comprehension (MRC) is an AI challenge that requires machine to determine the correct answers to questions based on a given passage. MRC systems must not only answer question when necessary but also distinguish when no answer is available according to the given passage and then tactfully abstain from answering. When unanswerable questions are involved in the MRC task, an essential verification module called verifier is especially required in addition to the encoder, though the latest practice on MRC modeling still most benefits from adopting well pre-trained language models as the encoder block by only focusing on the "reading". This paper devotes itself to exploring better verifier design for the MRC task with unanswerable questions. Inspired by how humans solve reading comprehension questions, we proposed a retrospective reader (Retro-Reader) that integrates two stages of reading and verification strategies: 1) sketchy reading that briefly investigates the overall interactions of passage and question, and yield an initial judgment; 2) intensive reading that verifies the answer and gives the final prediction. The proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0 and NewsQA, achieving new state-of-the-art results. Significance tests show that our model is significantly better than the strong ELECTRA and ALBERT baselines. A series of analysis is also conducted to interpret the effectiveness of the proposed reader. |
@inproceedings{zhang2021retro, title={Retrospective Reader for Machine Reading Comprehension}, author={Zhang, Zhuosheng and Yang, Junjie and Zhao, Hai}, booktitle={The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)}, year={2021} }
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely by taking the pairwise dialogue history and candidate response as a whole, the hierarchical information on either utterance interrelation or speaker roles coupled in such representations is not well addressed. In this work, we propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history. In detail, we decouple the contextualized word representations by masking mechanisms in Transformer-based PrLM, making each word only focus on the words in current utterance, other utterances, two speaker roles (i.e., utterances of sender and utterances of receiver), respectively. Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method. |
@inproceedings{liu2021filling, title={Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue}, author={Liu, Longxiang and Zhang, Zhuosheng and and Zhao, Hai and Zhou, Xi and Zhou, Xiang}, booktitle={The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)}, year={2021} }
Though visual information has been introduced for enhancing neural machine translation (NMT), its effectiveness strongly relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations. In this paper, we present a universal visual representation learned over the monolingual corpora with image annotations, which overcomes the lack of large-scale bilingual sentence-image pairs, thereby extending image applicability in NMT. In detail, a group of images with similar topics to the source sentence will be retrieved from a light topic-image lookup table learned over the existing sentence-image pairs, and then is encoded as image representations by a pre-trained ResNet. An attention layer with a gated weighting is to fuse the visual information and text information as input to the decoder for predicting target translations. In particular, the proposed method enables the visual information to be integrated into large-scale text-only NMT in addition to the multimodel NMT. Experiments on four widely used translation datasets, including the WMT'16 English-to-Romanian, WMT'14 English-to-German, WMT'14 English-to-French, and Multi30K, show that the proposed approach achieves significant improvements over strong baselines. |
@inproceedings{zhang2020neural, title={Neural Machine Translation with Universal Visual Representation}, author={Zhuosheng Zhang and Kehai Chen and Rui Wang and Masao Utiyama and Eiichiro Sumita and Zuchao Li and Hai Zhao}, booktitle={International Conference on Learning Representations}, year={2020}, url={https://openreview.net/forum?id=Byl8hhNYPS} }
The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks. |
@inproceedings{zhang2020semantics, title={Semantics-aware bert for language understanding}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhao, Hai and Li, Zuchao and Zhang, Shuailiang and Zhou, Xi and Zhou, Xiang}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2020)}, volume={34}, number={05}, pages={9628--9635}, year={2020} }
For machine reading comprehension, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy passages and getting ride of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. To verify its effectiveness, the proposed SG-Net is applied to typical pre-trained language model BERT which is right based on a Transformer encoder. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE show that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines. |
@inproceedings{zhang2020sg, title={SG-Net: Syntax-Guided Machine Reading Comprehension.}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhou, Junru and Duan, Sufeng and Zhao, Hai and Wang, Rui}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2020)}, pages={9636--9643}, year={2020} }
Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest. |
@inproceedings{zhang2020dcmn+, title={{DCMN+}: Dual co-matching network for multi-choice reading comprehension}, author={Zhang, Shuailiang and Zhao, Hai and Wu, Yuwei and Zhang, Zhuosheng and Zhou, Xi and Zhou, Xiang}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2020)}, volume={34}, number={05}, pages={9563--9570}, year={2020} }
In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank. |
@inproceedings{zhou2020limit, title={LIMIT-BERT: Linguistic informed multi-task bert}, author={Zhou, Junru and Zhang, Zhuosheng and Zhao, Hai and Zhang, Shuailiang}, booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", year = "2020", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.399", doi = "10.18653/v1/2020.findings-emnlp.399", pages = "4450--4461", }
Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism to support an open vocabulary learning during IME working. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior. |
@inproceedings{zhang2019acl, title = "Open Vocabulary Learning for Neural {Chinese} Pinyin {IME}", author = "Zhang, Zhuosheng and Huang, Yafang and Zhao, Hai", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)", url = "https://www.aclweb.org/anthology/P19-1154", pages = "1584--1594", year = "2019", }
Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in various types of text understanding tasks on both English and Chinese benchmarks. |
@article{Zhang2019subword, title={Effective Subword Segmentation for Text Comprehension}, author={Zhang, Zhuosheng and Zhao, Hai and Ling, Kangwei and Li, Jiangtong and He, Shexia and Fu, Guohong}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)}, year={2019}, volume={27}, number={11}, pages={1664-1674}, doi={10.1109/TASLP.2019.2922537} }
Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL) task. Despite of sharing a lot of processing characteristics and even task purpose, it is surprisingly that jointly considering these two related tasks was never formally reported in previous work. Thus this paper makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal predicates and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by explicit contextual semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art over strong baselines which have been enhanced by well pretrained language models from the latest progress. |
@inproceedings{zhang2019explicit, title = "Explicit Contextual Semantics for Text Comprehension", author = "Zhang, Zhuosheng and Wu, Yuwei and Li, Zuchao and Zhao, Hai", booktitle = "Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)", year = "2019", }
Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus. |
@inproceedings{zhang2018dua, title = {Modeling Multi-turn Conversation with Deep Utterance Aggregation}, author = {Zhang, Zhuosheng and Li, Jiangtong and Zhu, Pengfei and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages= {3740–-3752}, year = {2018}, }
Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model reader. In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets. |
@inproceedings{zhang2018mrc, title = {Subword-augmented Embedding for Cloze Reading Comprehension}, author = {Zhang, Zhuosheng and Huang,Yafang and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages = {1802-–1814}, year = {2018}, }
Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics. |
@inproceedings{zhang2018gaokao, title = {One-shot Learning for Question-Answering in Gaokao History Challenge}, author = {Zhang, Zhuosheng and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages = {449–-461}, year = {2018}, }
Program Committee Member: AAAI (2020, 2021), AACL (2020), ACL (2020, 2021), CCL (2019), COLING (2020), ICLR (2021), IJCAI (2020), NAACL (2019, 2021).
Journal Reviewer: ACM Trans. on ALLIP (TALLIP)
Teaching Assistant for "Natural language understanding (F033569)", Shanghai Jiao Tong University, Spring 2018 and Spring 2019.
National Scholarship for Graduate Student (2018 & 2020), Ministry of Education of P.R.China
Excellent M.S. Student Scholarship of Yang Yuanqing Education Fund, Shanghai Jiao Tong University
Academic Star of Graduate Students (上海交通大学研究生学术之星), Shanghai Jiao Tong University
National Annual Figures Nomination of College Students (中国大学生年度人物提名奖), Ministry of Education of P.R.China
The CCF Elite Collegiate Award (CCF优秀大学生), China Computer Federation
First Prize at 2018 IBM Watson Build Chatbot Competition, IBM China
First Prize at 2018 Jiaxing E-commerce Innovation Competition, Jiaxing municipal Bureau of Commerce
First Prize at 2017 IBM Hackathon, IBM China
Outstanding Bachelor Thesis Award, Hubei Provincial Department of Education
First Prize in the 2014&2015 TI Cup National Internet of Things Competition, CS Committee in Ministry of Education of P.R.China
Now, a few words on looking for things. When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you're only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you're sure to find some of them. -- The Zero Effect
Never give up the faith, pass on the torch, and keep the light burning.念念不忘,必有回响。-- The Grandmaster《一代宗师》