I am a Ph.D. candidate in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, advised by Prof. Hai Zhao. Before that, I received M.S. degree from Shanghai Jiao Tong University in 2020, and B.S. degree from Wuhan University in 2016.
My primary research interests are natural language processing and machine learning, with the long-term goal of building intelligent machine learning systems with the human-level language comprehension ability to assist humans in an effective, interpretable, and robust way (Survey). In pursuit of this goal, I develop principled methodologies of powering the deep neural networks with massive linguistic, commonsense, and multimodal knowledge, in support of real-world application scenarios such as question answering and multi-turn dialogue.
Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning. |
@article{zhang2020mrc, title={Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond}, author={Zhang, Zhuosheng and Zhao, Hai and Wang, Rui}, journal={arXiv preprint arXiv:2005.06249}, year={2020} }
Training machines to understand natural language and interact with humans is an elusive and essential task in the field of artificial intelligence. In recent years, a diversity of dialogue systems has been designed with the rapid development of deep learning researches, especially the recent pre-trained language models. Among these studies, the fundamental yet challenging part is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the perspective of dialogue modeling. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation question-answering, as well as dialogue-related language modeling techniques to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons we can learn from the empirical analysis and the prospects towards a new frontier of researches. |
@article{zhang2021advances, title={Advances in Multi-turn Dialogue Comprehension: A Survey}, author={Zhang, Zhuosheng and Zhao, Hai}, journal={arXiv preprint arXiv:2103.03125}, year={2021} }
Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models with less computation but still ensures impressive performance. Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment. This technical report releases our pre-trained model called Mengzi, which stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants, capable of a wide range of language and vision tasks. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark with our optimized pre-training and fine-tuning techniques. Without modifying the model architecture, our model can be easily employed as an alternative to existing PLMs. Our sources are available at https://github.com/Langboat/Mengzi. |
@misc{zhang2021mengzi, title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou}, year={2021}, eprint={2110.06696}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Machine reading comprehension is a heavily-studied research and test field for evaluating new pre-trained models and fine-tuning strategies, and recent studies have enriched the pre-trained models with syntactic, semantic and other linguistic information to improve the performance of the model. In this paper, we imitated the human's reading process in connecting the anaphoric expressions and explicitly leverage the coreference information to enhance the word embeddings from the pre-trained model, in order to highlight the coreference mentions that must be identified for coreference-intensive question answering in QUOREF, a relatively new dataset that is specifically designed to evaluate the coreference-related performance of a model. We used an additional BERT layer to focus on the coreference mentions, and a Relational Graph Convolutional Network to model the coreference relations. We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models. |
@inproceedings{huang2021tracing, title={Tracing Origins: Coref-aware Machine Reading Comprehension}, author={Huang, Baorong and Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)}, year={2021} }
Tangled multi-party dialogue context leads to challenges for dialogue reading comprehension, where multiple dialogue threads flow simultaneously within the same dialogue history, thus increasing difficulties in understanding a dialogue history for both human and machine. Dialogue disentanglement aims to clarify conversation threads in a multi-party dialogue history, thus reducing the difficulty of comprehending the long disordered dialogue passage. Existing studies commonly focus on utterance encoding with carefully designed feature engineering-based methods but pay inadequate attention to dialogue structure. This work designs a novel model to disentangle multi-party history into threads, by taking dialogue structure features into account. Specifically, based on the fact that dialogues are constructed through successive participation of speakers and interactions between users of interest, we extract clues of speaker property and reference of users to model the structure of a long dialogue record. The novel method is evaluated on the Ubuntu IRC dataset and shows state-of-the-art experimental results in dialogue disentanglement. |
@inproceedings{ma2022structural, title={Structural Modeling for Dialogue Disentanglement}, author={Ma, Xinbei and Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)}, year={2022} }
Training dense passage representations via contrastive learning has been shown effective for Open-Domain Passage Retrieval (ODPR). Existing studies focus on further optimizing by improving negative sampling strategy or extra pretraining. However, these studies keep unknown in capturing passage with internal representation conflicts from improper modeling granularity. This work thus presents a refined model on the basis of a smaller granularity, contextual sentences, to alleviate the concerned conflicts. In detail, we introduce an in-passage negative sampling strategy to encourage a diverse generation of sentence representations within the same passage. Experiments on three benchmark datasets verify the efficacy of our method, especially on datasets where conflicts are severe. Extensive experiments further present good transferability of our method across datasets. |
@inproceedings{wu2022sentence, title={Sentence-aware Contrastive Learning for Open-Domain Passage Retrieval}, author={Wu, Bohong and Zhang, Zhuosheng and Wang, Jinyuan and Zhao, Hai}, booktitle={The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)}, year={2022} }
Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of the current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks. |
@inproceedings{wang2022distinguishing, title={Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model}, author={Wang, Jiayi and Bao, Rongzhou and Zhang, Zhuosheng and Zhao, Hai}, booktitle={Findings of the Association for Computational Linguistics: ACL 2022}, year={2022} }
Recent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations than non-contextualized models, they are still subject to a sequence of text contexts without diverse hints from multimodality. This paper thus proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. In detail, we build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach. Analysis shows that our method with visual guidance pays more attention to content words, improves the representation diversity, and is potentially beneficial for enhancing the accuracy of disambiguation. |
@ARTICLE{9627795, author={Zhang, Zhuosheng and Yu, Haojie and Zhao, Hai and Utiyama, Masao}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title={Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles}, year={2022}, volume={30}, number={}, pages={49-59}, doi={10.1109/TASLP.2021.3130972}}
This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using a generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident n-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers and substantially outperforms two typical competitive one-span and Seq2Seq baseline decoders. |
@ARTICLE{9664340, author={Zhang, Zhuosheng and Zhang, Yiqing and Zhao, Hai}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title={Syntax-Aware Multi-Spans Generation for Reading Comprehension}, year={2022}, volume={30}, number={}, pages={260-268}, doi={10.1109/TASLP.2021.3138679}}
Multi-choice Machine Reading Comprehension (MRC) requires models to decide the correct answer from a set of answer options when given a passage and a question. Thus, in addition to a powerful Pre-trained Language Model (PrLM) as an encoder, multi-choice MRC especially relies on a matching network design that is supposed to effectively capture the relationships among the triplet of passage, question, and answers. While the newer and more powerful PrLMs have shown their strengths even without the support from a matching network, we propose a new DUal Multi-head Co-Attention (DUMA) model. It is inspired by the human transposition thinking process solving the multi-choice MRC problem by considering each others focus from the standpoint of passage and question. The proposed DUMA has been shown to be effective and is capable of generally promoting PrLMs. Our proposed method is evaluated on two benchmark multi-choice MRC tasks, DREAM, and RACE. Our results show that in terms of powerful PrLMs, DUMA can further boost the models to obtain higher performance. |
@ARTICLE{9664302, author={Zhu, Pengfei and Zhang, Zhuosheng and Zhao, Hai and Li, Xiaoguang}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title={DUMA: Reading Comprehension With Transposition Thinking}, year={2022}, volume={30}, number={}, pages={269-279}, doi={10.1109/TASLP.2021.3138683}}
Multi-choice Machine Reading Comprehension (MRC) as a challenge requires model to select the most appropriate answer from a set of candidates given passage and question. Most of the existing researches focus on the modeling of the task datasets without explicitly referring to external fine-grained knowledge sources, which is supposed to greatly make up the deficiency of the given passage. Thus we propose a novel reference-based knowledge enhancement model called Reference Knowledgeable Network (RekNet), which refines critical information from the passage and quote explicit knowledge in necessity. In detail, RekNet refines fine-grained critical information and defines it as Reference Span, then quotes explicit knowledge quadruples by the co-occurrence information of Reference Span and candidates. The proposed RekNet is evaluated on three multi-choice MRC benchmarks: RACE, DREAM and Cosmos QA, which shows consistent and remarkable performance improvement with observable statistical significance level over strong baselines. |
@ARTICLE{9748021, author={Zhao, Yilin and Zhang, Zhuosheng and Zhao, Hai}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title={Reference Knowledgeable Network for Machine Reading Comprehension}, year={2022}, volume={30}, number={}, pages={1461-1473}, doi={10.1109/TASLP.2022.3164219}}
Conversational machine reading (CMR) requires machines to communicate with humans through multi-turn interactions between two salient dialogue states of decision making and question generation processes. In open CMR settings, as the more realistic scenario, the retrieved background knowledge would be noisy, which results in severe challenges in the information transmission. Existing studies commonly train independent or pipeline systems for the two subtasks. However, those methods are trivial by using hard-label decisions to activate question generation, which eventually hinders the model performance. In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results. |
@inproceedings{zhang2021oscar, title={Smoothing Dialogue States for Open Conversational Machine Reading}, author={Zhang, Zhuosheng and Ouyang, Siru and Zhao, Hai and Utiyama, Masao and Sumita, Eiichiro}, booktitle={The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)}, year={2021} }
Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words in pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous methods are time-consuming and lack of flexibility. To alleviate the inconvenience, this paper presents a novel span fine-tuning method for PrLMs, which facilitates the span setting to be adaptively determined by specific downstream tasks during the fine-tuning phase. In detail, any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Then the segmentation information will be sent through a hierarchical CNN module together with the representation outputs of the PrLM and ultimately generate a span-enhanced representation. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM, and at the same time, offer more flexibility in an efficient way. |
@inproceedings{bao2021spanft, title={Span Fine-tuning for Pre-trained Language Models}, author={Bao, Rongzhou and Zhang, Zhuosheng and Zhao, Hai}, booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021}, year={2021} }
Pre-trained language models (PrLMs) have demonstrated superior performance due to their strong ability to learn universal language representations from self-supervised pre-training. However, even with the help of the powerful PrLMs, it is still challenging to effectively capture task-related knowledge from dialogue texts which are enriched by correlations among speaker-aware utterances. In this work, we present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives: 1) utterance order restoration, which predicts the order of the permuted utterances in dialogue context; 2) sentence backbone regularization, which regularizes the model to improve the factual correctness of summarized subject-verb-object triplets. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks. |
@inproceedings{zhang2021structural, title={Structural Pre-training for Dialogue Comprehension}, author={Zhang, Zhuosheng and Zhao, Hai}, booktitle={The 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021)}, year={2021} }
Conversational Machine Reading (CMR) aims at answering questions in complicated interactive scenarios. Machine needs to answer questions through interactions with users based on given rule document, user scenario and dialogue history, and even initiatively asks questions for clarification if necessary. Namely, the answer to the task needs a machine in the response of either \textsl{Yes, No, Irrelevant} or to raise a follow-up question for further clarification. To effectively capture multiple objects in such a challenging task, graph modeling is supposed to be adopted, though it is surprising that this does not happen until this work proposes a dialogue graph modeling framework by incorporating two complementary graph models, i.e., explicit discourse graph and implicit discourse graph, which respectively capture explicit and implicit interactions hidden in the rule documents. The proposed model is evaluated on the ShARC benchmark and achieves new state-of-the-art by first exceeding the milestone accuracy score of 80\%. |
@inproceedings{ouyang2021dialogue, title={Dialogue Graph Modeling for Conversational Machine Reading}, author={Ouyang, Siru and Zhang, Zhuosheng and Zhao, Hai}, booktitle={Findings of the Association for Computational Linguistics: ACL 2021}, year={2021} }
Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. The proposed SG-Net is applied to typical Transformer encoders. Extensive experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design. |
@article{zhang2020sgnet, title={{SG-Net}: Syntax Guided Transformer for Language Representation}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhou, Junru and Duan, Sufeng and Zhao, Hai and Wang, Rui}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2020}, doi={10.1109/TPAMI.2020.3046683}, publisher={IEEE} }
Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text compression. We propose three ways of integration, namely backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the backbone information into Transformer-based models for various downstream tasks. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines. We therefore conclude, when comparing the encodings to the baseline models, text compression helps the encoders to learn better language representations. |
@article{li2021text, title={Text Compression-aided Transformer Encoding}, author={Li, Zuchao and Zhang, Zhuosheng and Zhao, Hai and Wang, Rui and Chen, Kehai and Utiyama, Masao and Sumita, Eiichiro}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2021}, publisher={IEEE} }
Multi-turn dialogue reading comprehension aims to teach machines to read dialogue contexts and solve tasks such as response selection and answering questions. The major challenges involve noisy history contexts and especial prerequisites of commonsense knowledge that is unseen in the given material. Existing works mainly focus on context and response matching approaches. This work thus makes the first attempt to tackle the above two challenges by extracting substantially important turns as pivot utterances and utilizing external knowledge to enhance the representation of context. We propose a pivot-oriented deep selection model (PoDS) on top of the Transformer-based language models for dialogue comprehension. In detail, our model first picks out the pivot utterances from the conversation history according to the semantic matching with the candidate response or question, if any. Besides, knowledge items related to the dialogue context are extracted from a knowledge graph as external knowledge. Then, the pivot utterances and the external knowledge are combined together with a well-designed mechanism for refining predictions. Experimental results on four dialogue comprehension benchmark tasks show that our proposed model achieves great improvements on baselines. A series of empirical comparisons are conducted to show how our selection strategies and the extra knowledge injection influence the results. |
@article{zhang2021kkt, title={Multi-turn Dialogue Reading Comprehension with Pivot Turns and Knowledge}, author={Zhang, Zhuosheng and Li, Junlong and Zhao, Hai}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2021}, }
Machine reading comprehension (MRC) is an AI challenge that requires machine to determine the correct answers to questions based on a given passage. MRC systems must not only answer question when necessary but also distinguish when no answer is available according to the given passage and then tactfully abstain from answering. When unanswerable questions are involved in the MRC task, an essential verification module called verifier is especially required in addition to the encoder, though the latest practice on MRC modeling still most benefits from adopting well pre-trained language models as the encoder block by only focusing on the "reading". This paper devotes itself to exploring better verifier design for the MRC task with unanswerable questions. Inspired by how humans solve reading comprehension questions, we proposed a retrospective reader (Retro-Reader) that integrates two stages of reading and verification strategies: 1) sketchy reading that briefly investigates the overall interactions of passage and question, and yield an initial judgment; 2) intensive reading that verifies the answer and gives the final prediction. The proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0 and NewsQA, achieving new state-of-the-art results. Significance tests show that our model is significantly better than the strong ELECTRA and ALBERT baselines. A series of analysis is also conducted to interpret the effectiveness of the proposed reader. |
@inproceedings{zhang2021retro, title={Retrospective Reader for Machine Reading Comprehension}, author={Zhang, Zhuosheng and Yang, Junjie and Zhao, Hai}, booktitle={The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021)}, year={2021} }
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely by taking the pairwise dialogue history and candidate response as a whole, the hierarchical information on either utterance interrelation or speaker roles coupled in such representations is not well addressed. In this work, we propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history. In detail, we decouple the contextualized word representations by masking mechanisms in Transformer-based PrLM, making each word only focus on the words in current utterance, other utterances, two speaker roles (i.e., utterances of sender and utterances of receiver), respectively. Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method. |
@inproceedings{liu2021filling, title={Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue}, author={Liu, Longxiang and Zhang, Zhuosheng and and Zhao, Hai and Zhou, Xi and Zhou, Xiang}, booktitle={The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021)}, year={2021} }
Though visual information has been introduced for enhancing neural machine translation (NMT), its effectiveness strongly relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations. In this paper, we present a universal visual representation learned over the monolingual corpora with image annotations, which overcomes the lack of large-scale bilingual sentence-image pairs, thereby extending image applicability in NMT. In detail, a group of images with similar topics to the source sentence will be retrieved from a light topic-image lookup table learned over the existing sentence-image pairs, and then is encoded as image representations by a pre-trained ResNet. An attention layer with a gated weighting is to fuse the visual information and text information as input to the decoder for predicting target translations. In particular, the proposed method enables the visual information to be integrated into large-scale text-only NMT in addition to the multimodel NMT. Experiments on four widely used translation datasets, including the WMT'16 English-to-Romanian, WMT'14 English-to-German, WMT'14 English-to-French, and Multi30K, show that the proposed approach achieves significant improvements over strong baselines. |
@inproceedings{zhang2020neural, title={Neural Machine Translation with Universal Visual Representation}, author={Zhuosheng Zhang and Kehai Chen and Rui Wang and Masao Utiyama and Eiichiro Sumita and Zuchao Li and Hai Zhao}, booktitle={International Conference on Learning Representations}, year={2020}, url={https://openreview.net/forum?id=Byl8hhNYPS} }
The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks. |
@inproceedings{zhang2020semantics, title={Semantics-aware bert for language understanding}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhao, Hai and Li, Zuchao and Zhang, Shuailiang and Zhou, Xi and Zhou, Xiang}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)}, volume={34}, number={05}, pages={9628--9635}, year={2020} }
For machine reading comprehension, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy passages and getting ride of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. To verify its effectiveness, the proposed SG-Net is applied to typical pre-trained language model BERT which is right based on a Transformer encoder. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE show that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines. |
@inproceedings{zhang2020sg, title={SG-Net: Syntax-Guided Machine Reading Comprehension.}, author={Zhang, Zhuosheng and Wu, Yuwei and Zhou, Junru and Duan, Sufeng and Zhao, Hai and Wang, Rui}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)}, pages={9636--9643}, year={2020} }
Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest. |
@inproceedings{zhang2020dcmn+, title={{DCMN+}: Dual co-matching network for multi-choice reading comprehension}, author={Zhang, Shuailiang and Zhao, Hai and Wu, Yuwei and Zhang, Zhuosheng and Zhou, Xi and Zhou, Xiang}, booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)}, volume={34}, number={05}, pages={9563--9570}, year={2020} }
In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank. |
@inproceedings{zhou2020limit, title={LIMIT-BERT: Linguistic informed multi-task bert}, author={Zhou, Junru and Zhang, Zhuosheng and Zhao, Hai and Zhang, Shuailiang}, booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", year = "2020", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.399", doi = "10.18653/v1/2020.findings-emnlp.399", pages = "4450--4461", }
Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism to support an open vocabulary learning during IME working. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior. |
@inproceedings{zhang2019acl, title = "Open Vocabulary Learning for Neural {Chinese} Pinyin {IME}", author = "Zhang, Zhuosheng and Huang, Yafang and Zhao, Hai", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)", url = "https://www.aclweb.org/anthology/P19-1154", pages = "1584--1594", year = "2019", }
Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in various types of text understanding tasks on both English and Chinese benchmarks. |
@article{Zhang2019subword, title={Effective Subword Segmentation for Text Comprehension}, author={Zhang, Zhuosheng and Zhao, Hai and Ling, Kangwei and Li, Jiangtong and He, Shexia and Fu, Guohong}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)}, year={2019}, volume={27}, number={11}, pages={1664-1674}, doi={10.1109/TASLP.2019.2922537} }
Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL) task. Despite of sharing a lot of processing characteristics and even task purpose, it is surprisingly that jointly considering these two related tasks was never formally reported in previous work. Thus this paper makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal predicates and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by explicit contextual semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art over strong baselines which have been enhanced by well pretrained language models from the latest progress. |
@inproceedings{zhang2019explicit, title = "Explicit Contextual Semantics for Text Comprehension", author = "Zhang, Zhuosheng and Wu, Yuwei and Li, Zuchao and Zhao, Hai", booktitle = "Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)", year = "2019", }
![]() |
Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus. |
@inproceedings{zhang2018dua, title = {Modeling Multi-turn Conversation with Deep Utterance Aggregation}, author = {Zhang, Zhuosheng and Li, Jiangtong and Zhu, Pengfei and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages= {3740–-3752}, year = {2018}, }
![]() |
Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model reader. In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets. |
@inproceedings{zhang2018mrc, title = {Subword-augmented Embedding for Cloze Reading Comprehension}, author = {Zhang, Zhuosheng and Huang,Yafang and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages = {1802-–1814}, year = {2018}, }
![]() |
Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics. |
@inproceedings{zhang2018gaokao, title = {One-shot Learning for Question-Answering in Gaokao History Challenge}, author = {Zhang, Zhuosheng and Zhao, Hai}, booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, pages = {449–-461}, year = {2018}, }
Baidu Scholarship (Top-10, worldwide)
Global Top 100 Chinese Rising Stars in Artificial Intelligence
National Scholarship (2018 & 2020)
Excellent M.S. Student Scholarship of Yang Yuanqing Education Fund
Ten Academic Stars in SJTU
Organization: co-chair of CCL 2022 Student Seminar.
Journal Reviewer: ACM Trans. on ALLIP (TALLIP), Neurocomputing, Multimedia Systems, Transactions on Machine Learning Research (TMLR), Neural Computing and Applications, Expert Systems With Applications.
Teaching Assistant for "Natural language understanding", Shanghai Jiao Tong University, Spring 2018, Spring 2019, and Spring 2021.
Polishing work takes much time and energy. Though the process is painful, jumping out of comfort zone can make one stronger, once by once. As things around become so impatient and utilitarian, how long have we not settled down to focus? Maybe sit down, drink a coffee, pick up passion, and broaden the scope. -- Reflection, 2021/09
Now, a few words on looking for things. When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you're only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you're sure to find some of them. -- The Zero Effect
Never give up the faith. Pass on the torch, and keep the light burning.念念不忘,必有回响。-- The Grandmaster《一代宗师》