a photo      

Hai Zhao

Professor

Department of Computer Science and Engineering,

Shanghai Jiao Tong University

Address: 800 Dongchuan Road, Shanghai

E-mail: zhaohai at cs.sjtu.edu.cn


ACL-2018 Area Chair on Morphology and word segmentation,

ACL-2017 Area Chair on Parsing,

NLPCC 2017 Area Chair on Fundamental NLP,

ACL-2016 Publication Chair


News and for Phd/Master applicants

Shanghai Jiao Tong University, taken on March 16th, 2010 Shared Tasks      Publications      Software
 

Chinese Version


    

Research Interest

Natural Language Processing, Machine Learning, Data Mining, Bioinformatics and Artificial Intelligence
    

Teaching

Natural Language Processing
    

Shared Tasks

[2010] NEWS-2010(with Song Yan)

The shared task at NEWS-2010 on transliteration
  •           the first ranks for both Chinese-English and English-Chinese tasks,
  • The official results are here.

    [2009] CoNLL-2009 (with Chen Wenliang)

    The shared task at 13th Conference on Computational Natural Language Learning (CoNLL-2009), Syntactic and Semantic Dependencies in Multiple Languages
  •          The first in the semantic-only track out of all 20 submitted systems of this shared task for the avearage score of all seven languages.
  •          The second in the syntactic-semantic joint track out of 13 submitted systems for the avearage score of all seven languages,
  •                     the first in the semantic part of the joint track for all,
                        the first in the joint tracks for English, Catalan and Spanish, respectively.
    The official results of CoNLL-2009 are here, and our system reports are here and here.

    [2008] CoNLL-2008

    The shared task at 12th Conference on Computational Natural Language Learning (CoNLL-2008), the joint parsing of syntactic and semantic dependencies
  •           The fourth in the syntactic-semantic joint track out of 20 submitted systems.
  • The official results of CoNLL-2008 are here, and our system report is here.

    [2007] Bakeoff-4

    The Fourth International Chinese Language Processing Bakeoff & the First CIPS Chinese Language Processing Evaluation (Bakeoff-4, Bakeoff-2007, 2008)
  •           All five top results in closed challenge of word segmentation task out of 166 submitted runs from 28 participant teams,
  •          Three second best results and one third best result in NER tasks out of 33 sumitted runs.
  • The official results are here, and our system report is here.

    [2006] Bakeoff-3

    The Third International Chinese Word Segmentation Bakeoff (Bakeoff-3, Bakeoff-2006)
  •           Four first ranks and two third ranks in word segmentation task out of 101 submitted runs from 29 participant teams,
  • The official results are here, and our system report is here.

    Top

        

    Publications

    [2018]

    • Yafang Huang and Hai Zhao*,
      Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet,
      Proceedings of EMNLP 2018, October 31 - November 4, 2018, Brussels, Belgium.

    • Zuchao Li, Shexia He, Jiaxun Cai, Zhuosheng Zhang and Hai Zhao*, Gongshen Liu, Linlin Li, Luo Si
      A Unified Syntax-aware Framework for Semantic Role Labeling,
      Proceedings of EMNLP 2018, October 31 - November 4, 2018, Brussels, Belgium.

    • Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita and Hai Zhao,
      Exploring Recombination for Efficient Decoding of Neural Machine Translation,
      Proceedings of EMNLP 2018, October 31 - November 4, 2018, Brussels, Belgium.

    • Yingting Wu, Hai Zhao*, Jia-Jun Tong,
      Multilingual Universal Dependency Parsing from Raw Text with Low Resource Language Enhancement,
      Proceedings of CoNLL 2018, October 31 - November 1, 2018, Brussels, Belgium.

    • Zuchao Li, Shexia He, Zhuosheng Zhang, Hai Zhao*,
      Joint Learning for Universal Dependency Parsing,
      Proceedings of CoNLL 2018, October 31 - November 1, 2018, Brussels, Belgium.

    • Yingting Wu, Hai Zhao*,
      Finding Better Subword Segmentation for Neural Machine Translation,
      The Seventeenth China National Conference on Computational Linguistics, CCL 2018, October 19-21, 2018, Changsha China.
      [PDF]

    • Zhuosheng Zhang, Yafang Huang, Pengfei Zhu, Hai Zhao*,
      Effective Character-augmentedWord Embedding for Machine Reading Comprehension,
      Proceedings of The Seventh CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2018), August 26-30, 2018, Hohhot, China.

    • Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao*,
      Lingke: A Fine-grained Multi-turn Chatbot for Customer Service,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), System Demonstrations, pp.108每112, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao*, Gongshen Liu
      Modeling Multi-turn Conversation with Deep Utterance Aggregation,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp.3740每3752, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Zhuosheng Zhang, Yafang Huang and Hai Zhao*
      Subword-augmented Embedding for Cloze Reading Comprehension,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp.1802每1814, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Zhuosheng Zhang and Hai Zhao*
      One-shot Learning for Question-Answering in Gaokao History Challenge,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp.449每461, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Hongxiao Bai and Hai Zhao*
      Deep Enhanced Representation for Implicit Discourse Relation Recognition,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp. 571每583, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Jiaxun Cai, Shexia He, Zuchao Li and Hai Zhao*
      A Full End-to-End Semantic Role Labeler, Syntax-agnostic or Syntax-aware?
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp.2753每2765, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Zuchao Li, Jiaxun Cai, Shexia He and Hai Zhao*
      Seq2seq Dependency Parsing,
      Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pp.3203每3214, August 20-26, 2018, Santa Fe, New Mexico, USA.
      [PDF]

    • Huang Yafang, Li Zuchao, Zhang Zhuosheng, Hai Zhao*
      Neural-based Chinese Pinyin Aided Input Method with Customizable Association,
      Proceedings of ACL 2018, System Demonstrations, pp.140-145, Melbourne, Australia, July 15-20, 2018
      [PDF]

    • Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao and Shuming Shi
      Automatic Article Commenting: the Task and Dataset,
      Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018, Volume 2: Short Papers), pp.151-156, Melbourne, Australia, July 15-20, 2018
      [PDF]

    • Shexia He, Zuchao Li, Hai Zhao*, Hongxiao Bai, Gongshen Liu
      Syntax for Semantic Role Labeling, to Be, or Not to Be,
      Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018, Volume 1: Long Papers), pp.2061-2071, Melbourne, Australia, July 15-20, 2018
      [PDF]

    • Zhuosheng Zhang, Jiangtong Li, Hai Zhao*, Bingjie Tang
      SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings,
      Proceedings of The 12th International Workshop on Semantic Evaluation, pp.903-908, New Orleans, Louisiana, June 1-6, 2018
      [PDF]

    • Rui Wang, Hai Zhao*, Sabine Ploux, Bao-Liang Lu, Masao Utiyama, Eiichiro Sumita
      Graph-based Bilingual Word Embedding for Statistical Machine Translation,
      ACM Transaction on Asian and Low-Resource Language Information Processing, Vol.17(4): 1-24, 2018
      [PDF]

    • Haonan Li, Zhisong Zhang, Yuqi Ju, Hai Zhao*
      Neural Character-level Dependency Parsing for Chinese,
      Proceedings of The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp.5205-5212, New Orleans, Louisiana, USA, February 2每7, 2018
      [PDF]

    [2017]

    • Zhao Hai, Cai Deng, Huang Changning, Kit Chunyu
      Chinese Word Segmentation, a decade review (2007-2017)
      The Frontier of Empirical and Corpus Linguistics, Chunyu Kit and Meijun Liu ed., China Social Sciences Press, Beijing, China, July 2017
      [PDF]

    • Hao Wang, Hai Zhao*, Zhisong Zhang
      A Transition-based System for Universal Dependency Parsing,
      CoNLL 2017, pp.191-197, Vancouver, Canada, July 2017
      [PDF]

    • Deng Cai, Hai Zhao*, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang
      Fast and Accurate Neural Word Segmentation for Chinese,
      ACL 2017, pp.608-615, Vancouver, Canada, July 2017
      [PDF]

    • Lianhui Qin, Zhisong Zhang, Hai Zhao*, Zhiting Hu, Eric P. Xing
      Adversarial Connective-exploiting Network for Implicit Discourse Relation Classification,
      ACL 2017, pp.1006-1017, Vancouver, Canada, July 2017
      [PDF]

    • Deng Cai, Hai Zhao*
      Pair-Aware Neural Sentence Modeling for Implicit Discourse Relation Classification,
      IEA/AIE (2) 2017, LNCS, volume 10351: 458-466
      [PDF]

    • Deng Cai, Hai Zhao*, Yang Xin, Yuzhu Wang, Zhongye Jia
      A Hybrid Model for Chinese Spelling Check,
      ACM Transactions on Asian Low-Resource Language Information Process, 2017

    [2016]

    • Rui Wang, Hai Zhao*, Bao-Liang Lu, Masao Utiyama and Eiichro Sumita,
      Connecting Phrase based Statistical Machine Translation Adaptation,
      COLING-2016, pp.3135-3145, Osaka, Japan, December, 2016
      [PDF]

    • Lianhui Qin, Zhisong Zhang, and Hai Zhao*
      Implicit Discourse Relation Recognition with Context-aware Character-enhanced Embeddings,
      COLING-2016, pp.1914-1924, Osaka, Japan, December, 2016
      [PDF]

    • Lianhui Qin, Zhisong Zhang, and Hai Zhao*
      A stacking gated neural architecture for implicit discourse relation classification.
      Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.2263-2270, Austin, USA, November, 2016
      [PDF]

    • Chenxi Pang, Hai Zhao*, Zhongyi Li,
      I Can Guess What You Mean: A Monolingual Query Enhancement for Machine Translation,
      LNCS Vol.10035, CCL-2016, Yantai, China, Oct 15-16, 2016
      [PDF]

    • Zhongyi Li, Hai Zhao*, Chenxi Pang, Lili Wang, Huan Wang
      A Constituent Syntactic Parse Tree based Discourse Parser,
      CoNLL-2016 Shared Task, pp.60-64, Berlin, Germany, August 7-12, 2016

    • Lianhui Qin, Zhisong Zhang, Hai Zhao*
      Shallow Discourse Parsing using Convolutional Neural Network,
      CoNLL-2016 Shared Task, pp.70-77, Berlin, Germany, August 7-12, 2016

    • Deng Cai, Hai Zhao*
      Neural Word Segmentation Learning for Chinese ,
      ACL-2016, pp.409-420, Berlin, Germany, August 7-12, 2016
      [PDF]

    • Zhisong Zhang, Hai Zhao*, Lianhui Qin
      Probabilistic Graph-based Dependency Parsing with Convolutional Neural Network,
      ACL-2016, pp. 1382-1392, Berlin, Germany, August 7-12, 2016
      [PDF]

    • Rui Wang, Hai Zhao*, Sabine Ploux, Bao-Liang Lu, Masao Utiyama
      A Bilingual Graph-based Semantic Model for Statistical Machine Translation,
      IJCAI-2016, pp.2950-2956, New York, USA, July 9-15, 2016
      [PDF]

    • Peilu Wang, Yao Qian,Hai Zhao*, Frank K. Soong, Lei He, Ke Wu
      Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network,
      NAACL-2016, pp.527-533, San Diego, USA, June 12-15, 2016
      [PDF]

    • Rui Wang, Masao Utiyama, Isao Goto, Eiichiro Sumita, Hai Zhao*, Bao-Liang Lu,
      Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation,
      ACM Transactions on Asian Low-Resource Language Information Process, Vol. 15(3), Article 11, pp.1-26, January, 2016
      [PDF]

    • Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Hai Zhao*, Graham Neubig, Satoshi Nakamura,
      Learning local word reorderings for hierarchical phrase-based statistical machine translation,
      Machine Translation, Spinger, 2016
      [PDF]

    [2015]

    • Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao
      Word Embedding for Recurrent Neural Betwork based TTS Synthesis,
      Proc. of Acoustics, Speech and Signal Processing (ICASSP), pp. 4879-4883, Brisbane, Australia, 2015

    • Ge Yan, Zhao Hai, Qin Yulin et al.
      Mining National and Regional Images from Newspaper Reports (in Chinese)
      Academic Monthly, Vol.47(7): 163-170, July, 2015

    • Changge Chen, Hai Zhao*, Yang Yang
      Deceptive Opinion Spam Detection using Deep Level Linguistic Features,
      The 4th CCF Conference on Natural Language Processing & Chinese Computing(NLPCC 2015),October 9-13, 2015, Nanchang, China

    • Shuo Zang, Hai Zhao*, Chunyang Wu, Rui Wang,
      A Novel Word Reordering Method for Statistical Machine Translation,
      The 2015 11th International Conference on Natural Computation (ICNC'15) and the 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'15),
      August 15-17, 2015, Zhangjiajie, China

    • Changge Chen, Peilu Wang, Hai Zhao*,
      Shallow Discourse Parsing Using Constituent Parsing Tree,
      CoNLL 2015, July 30, 2015, Beijing, China

    • Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Hai Zhao*,
      LearningWord Reorderings for Hierarchical Phrase-based Statistical Machine Translation,
      ACL-IJCNLP 2015, pp.542-548, July 26-31, 2015, Beijing, China
      [PDF]

    • Rui Wang, Hai Zhao*, Bao-Liang Lu, Masao Utiyama and Eiichiro Sumita,
      Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation,
      IEEE/ACM Transactions on Audio, Speech, and Languange Processing, Vol.23(7): 1209-1220, 2015
      [PDF]

    [2014]

    • Rui Wang, Hai Zhao, Bao-Liang Lu, Masao Utiyama and Eiichro Sumita
      Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
      EMNLP 2014: 189-195, Doha, Qatar, October, 2014

    • Jingyi Zhang, Masao Utiyama and Eiichro Sumita, Hai Zhao
      Learning Hierarchical Translation Spans
      EMNLP 2014: 183-188, Doha, Qatar, October, 2014

    • Yang Xin, Hai Zhao, Yuzhu Wang and Zhongye Jia
      An Improved Graph Model for Chinese Spell Checking
      SIGHAN-2014, Wuhan, China, October, 2014

    • Xiaolin Wang, Hai Zhao, Bao-Liang Lu
      A Meta-Top-down Method for Large-scale Hierarchical Classification
      IEEE Transactions on Knowledge and Data Engineering, Vol.26(3):500-513,March 2014

    • Xiaolin Wang, Yangyang Chen, Hai Zhao, Bao-Liang Lu
      Parallelized Extreme Learning Machine Ensemble Based on Min-Max Modular Network
      Neurocomputing, Vol.128:31-41, March 2014

    • Jia, Zhongye, Hai Zhao
      A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction
      In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Vol.1: 1512-1523, Baltimore, Maryland
      [PDF]

    • Wang, Peilu and Jia, Zhongye and Hai Zhao
      Grammatical Error Detection and Correction using a Single Maximum Entropy Model
      Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), pages 74--82, Baltimore, Maryland, June

    [2013]

    • Rui Wang, Masao Utiyama, Isao Goto, Eiichro Sumita, Hai Zhao, and Bao-Liang Lu
      Converting Continuous-Space Language Models into N-gram Language Models for Statistical Machine Translation
      EMNLP-2013: 845-850, Seattle, USA, October, 2013

    • Xiao-Lin Wang, Hai Zhao, and Bao-Liang Lu
      Labeled Alignment for Recognizing Textual Entailment
      IJCNLP-2013: 605-613, Nagoya, Japan, October, 2013

    • Zhongye Jia, Hai Zhao
      Kyss 1.0: a Framework for Automatic Evaluation of Chinese Input Method Engines
      IJCNLP-2013: 1195-1201, Nagoya, Japan, October, 2013

    • Zhongye Jia, Peilu Wang, Hai Zhao
      Graph Model for Chinese Spell Checking
      SIGHAN-7: 88-92, Nagoya, Japan, October, 2013

    • Zhongye Jia, Peilu Wang, Hai Zhao
      Grammatical Error Correction as Multiclass Classification with Single Model
      CoNLL-2013: 74-81, Sofia, Bulgaria, August, 2013

    • Jingyi Zhang, Hai Zhao
      Improving Function Word Alignment with Frequency and Syntactic Information
      IJCAI-2013: 2211-2217, Beijing, China, August, 2013
      [PDF]

    • Xiaolin Wang, Hai Zhao, Bao-Liang Lu
      BCMI-NLP Labeled-Alignment-Based Entailment System for NTCIR-10 RITE-2 Task
      NTCIR-10: 474-478, Tokyo, Japan, June, 2013

    • Hai Zhao, Jingyi Zhang, Masao Utiyama and Eiichro Sumita
      An Improved Patent Machine Translation System Using Adaptive Enhancement for NTCIR-10 PatentMT Task
      NTCIR-10: 376-379, Tokyo, Japan, June, 2013

    • Hai Zhao, Xiaotian Zhang, and Chunyu Kit
      Integrative Semantic Dependency Parsing via Efficient Large-scale Feature Selection
      Journal of Artificial Intelligence Research, Volume 46:203-233, 2013
      [PDF]

    • Hai Zhao, Masao Utiyama, Eiichro Sumita, and Bao-Liang Lu
      An Empirical Study on Word Segmentation for Chinese Machine Translation
      A. Gelbukh (Ed.): CICLing 2013, Part II, LNCS 7817, pp. 248-263, 2013
      [PDF]

    [2012]

    • Shaohua Yang, Hai Zhao, Xiaolin Wang and Bao-liang Lu
      Spell Checking for Chinese
      Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pages 730-736, Istanbul, Turkey, May, 2012

    • Chunyang Wu and Hai Zhao
      Regression with Phrase Indicators for Estimating MT Quality
      Proceedings of the 7th Workshop on Statistical Machine Translation of NAACL-2012, pages 152-156,Montreal, Quebec, Canada, June 7 - 8, 2012

    • Heming Shou and Hai Zhao
      Hybrid Rule-based Algorithm for Coreference Resolution
      Proceedings of the Joint Conference on EMNLP and CoNLL, pages 118-121, Jeju Island, Korea, July, 2012

    • Xiaotian Zhang, Chunyang Wu and Hai Zhao
      Chinese Coreference Resolution via Ordered Filtering
      Proceedings of the Joint Conference on EMNLP and CoNLL, pages 95-99, Jeju Island, Korea, July, 2012

    • Shaohua Yang, Hai Zhao and Bao-Liang Lu
      A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion
      PACLIC-26, Bali, Indonesia, November, 2012

    • Xiaotian Zhang, Yao Qian, Hai Zhao, Frank Soong
      Break index labeling of Mandarin text via syntactic-to-prosodic tree mapping
      The 8th International Symposium on Chinese Spoken Language Processing (ISCSLP-2012), Hong Kong, December 5-8, 2012

    • Xiaotian Zhang, Hai Zhao and Cong Hui
      A Machine Learning Approach to Convert CCGbank to Penn Treebank
      the 24th International Conference on Computational Linguistics (COLING 2012), pp.535-542, Mumbai, India, 8-15 December 2012

    • Qiongkai Xu and Hai Zhao
      Using Deep Linguistic Features for Finding Deceptive Opinion Spam
      the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, 8-15 December 2012

    • Xuezhe Ma and Hai Zhao
      Fourth-Order Dependency Parsing
      the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, 8-15 December 2012
      [PDF]

    [2011]

    • Jian Zhang, Hai Zhao, Liqing Zhang, Bao-Liang Lu
      An Empirical Comparative Study on Two Large-Scale Hierarchical Text Classification Approaches
      International Journal Computer Processing of Oriental Language (IJCPOL) 23(4):309-326 (2011)

    • Xiaolin Wang, Hai Zhao and Bao-Liang Lu
      Enhance Top-down method with Meta-Classification for Very Large-scale Hierarchical Classification
      IJCNLP-2011, Chiang Mai, Thailand, November 9-11, 2011

    • Xiaotian Zhang and Hai Zhao
      Unsupervised Chinese Phrase Parsing Based on Tree Pattern Mining
      The 11th Confernece of China Computational Linguistics, Luoyang, China, August 20-22, 2011

    • Hai Zhao and Chunyu Kit
      Integrating unsupervised and supervised word segmentation: The role of goodness measures
      Information Sciences, Vol.181(1): 163-183, 2011, Elsevier
      [PDF]

    [2010]

    • ZHAO Hai
      Natural Language Processing as A Branch of Artificial Intelligence : The Stagnant Tech
      The Seventh Young Scholar Symposium on Natural Language Processing, Shenyang, China, September 18-19, 2010
      [PPT] [MP3(29M)]

    • Xuezhe Ma, Xiaotian Zhang, Hai Zhao, Bao-Liang Lu
      Dependency Parser for Chinese Constituent Parsing
      CIPS-SIGHAN-2010, August, 2010, Beijing, China

    • Yan Song, Chunyu Kit and Hai Zhao
      Reranking with Multiple Features for Better Transliteration
      NEWS-2010, pp.62-65, July, 2010, Uppsala, Sweden

    • Cong Hui, Hai Zhao, Yan Song, Bao-Liang Lu
      An Empirical Study on Development Set Selection Strategy for Machine Translation Learning
      WMT-2010, pp.67-71, July, 2010, Uppsala, Sweden

    • Shaodian Zhang, Hai Zhao, Guodong Zhou and Bao-liang Lu
      Hedge Detection and Scope Finding by Sequence Labeling with Procedural Feature Selection
      CoNLL-2010, pp.92-99, July, 2010, Uppsala, Sweden

    • Jian Zhang, Hai Zhao, and Bao-Liang Lu
      A Comparative Study on Two Large-Scale Hierarchical Text Categorization Tasks' Solutions
      IWWIP-2010, July, 2010, Qingdao, China

    • Hai Zhao, Chang-Ning Huang, Mu Li, Bao-Liang Lu
      A Unified Character-Based Tagging Framework for Chinese Word Segmentation
      ACM Trans. Asian Lang. Inf. Process. 9(2): 2010
      [PDF]

    • Gang Jin, Qi Kong, Jian Zhang, Xiaolin Wang, Cong Hui, Hai Zhao, and Bao-Liang Lu
      Multiple Strategies for NTCIR-08 Patent Mining at BCMI
      NTCIR-8, June, 2010, Tokyo, Japan

    • Minzhang Huang, Hai Zhao, Bao-Liang Lu
      Pruning Training Samples Using a Supervised Clustering Algorithm
      ISNN (2) 2010: 250-257, June, 2010, Shanghai, China

    • Hai Zhao, Yan Song, Chunyu Kit
      How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method.
      LREC 2010, May, 2010, Malta
      [PDF]

    [2009]

    • SONG Yan, CAI Dong-Feng, ZHANG Gui-Ping, ZHAO Hai
      An Approach to Chinese Word Segmentation based on Character-Word Joint Decoding
      Journal of Software, Vol.20, No.9, pp.2366-2375, 2009

    • Hai Zhao, Wenliang Chen, Chunyu Kit
      Semantic Dependency Parsing of NomBank and PropBank: An Efficient Integrated Approach via a Large-scale Feature Selection
      EMNLP 2009: conference on Empirical Methods in Natural Language Processing, pp.30-30, Singapore, August 6-7, 2009

    • Junhui Li, Guodong Zhou, Hai Zhao, Qiaoming Zhu, Peide Qian
      Improving Nominal SRL in Chinese Language with Verbal SRL Information and Automatic Predicate Recognition
      EMNLP 2009: conference on Empirical Methods in Natural Language Processing, pp.1280-1288, Singapore, August 6-7, 2009

    • Hai Zhao, Yan Song, Chunyu Kit, and Guodong Zhou
      Cross Language Dependency Parsing using a Bilingual Lexicon
      Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), pp.55-63, Singapore, August 2-5, 2009

    • Hai Zhao, Chunyu Kit, and Yan Song
      Character Dependency Tree based Lexical and Syntactic All-in-one Parsing for Chinese
      The 10th Chinese National Conference on Computational Linguistics (CNCCL-2009), pp.82-88, Yantai, China, July 24-26, 2009
      [PDF]

    • Hai Zhao, Wenliang Chen, Jun*ichi Kazama, Kiyotaka Uchimoto, and Kentaro Torisawa
      Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies
      Thirteenth Conference on Computational Natural Language Learning, (CoNLL-09), pp. 61-66, Boulder, CO, USA, June 4-5, 2009
      [PDF]

    • Hai Zhao, Wenliang Chen, Chunyu Kit, and Guodong Zhou
      Multilingual Dependency Learning: A Huge Feature Engineering Method to Semantic Dependency Parsing
      Thirteenth Conference on Computational Natural Language Learning, (CoNLL-09), pp. 55-60, Boulder, CO, USA, June 4-5, 2009
      [PDF]

    • Hai Zhao
      Character-Level Dependencies in Chinese: Usefulness and Learning
      The 12th Conference of the European Chapter of the Association for Computational Linguistics, (EACL-09), pp.879-887, Athens, Greece, March 30 - April 3, 2009
      [PDF]

    • Hai Zhao and Chunyu Kit
      A Simple and Efficient Model Pruning Method for Conditional Random Fields
      The 22nd International Conference on the Computer Processing of Oriental Languages (ICCPOL 2009), LNCS, Vol.5459, pp.149-159, Hong Kong, March 26-27, 2009
      [PDF]

    [2008]

    • Hai Zhao and Chunyu Kit
      Parsing Syntactic and Semantic Dependencies with Two Single-Stage Maximum Entropy Models
      Twelfth Conference on Computational Natural Language Learning, (CoNLL-2008), pp.203-207, Manchester, UK, August 16-17, 2008
      [PDF]

    • Hai Zhao and Chunyu Kit
      Scaling Conditional Random Fields by One-Against-the-Other Decomposition
      Journal of Computer Science and Technology, Vol. 23(4): 612-619, July, 2008

    • Hai Zhao and Chunyu Kit
      Exploiting Unlabeled Text with Different Unsupervised Segmentation Criteria for Chinese Word Segmentation
      The 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2008), Haifa, Israel, February 17-23, 2008
      Also in Research in Computing Science, Vol. 33: 93-104, 2008
      [PDF][Photos]

    • Hai Zhao and Chunyu Kit
      Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition
      The Sixth SIGHAN Workshop on Chinese Language Processing (SIGHAN-6), pp.106-111, Hyderabad, India, January 11-12, 2008
      [PDF]

    • Hai Zhao and Chunyu Kit
      An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
      The Third International Joint Conference on Natural Language Processing (IJCNLP-2008), Vol. 1: 9-16, Hyderabad, India, January 8-10, 2008
      [PDF][Photos]

    [2007]

    • Hai Zhao and Chunyu Kit
      Effective Subsequence-based Tagging for Chinese Word Segmentation (in Chinese)
      Journal of Chinese Information Processing, Vol. 21(5): 8-13, 2007
      [PDF]

    • Hai Zhao and Chunyu Kit
      Incorporating Global Information into Supervised Learning for Chinese Word Segmentation
      The 10th Conference of the Pacific Association for Computational Linguistics (PACLING-2007),pp.66-74, Melbourne, Australia, September 19-21, 2007
      [PDF][Photos]

    • Hai Zhao and Chunyu Kit
      Scaling Conditional Random Field with Application to Chinese Word Segmentation
      The Third International Conference on Natural Computation (ICNC'07), Vol. 5: 95-99, Haikou, China, August 24-27, 2007

    • Hai Zhao and Chunyu Kit
      Subsequence-based Tagging for Chinese Word Segmentation: Find a Better Tagging Unit (in Chinese)
      In: Frontiers of Content Computing: Research and Application, Sun Maosong and Chen Qunxiu (Eds.), pp.45-51, Tsinghua University Press,
      The 9th Chinese National Conference on Computational Linguistics (CNCCL-2007, formerly JSCL-2007), Dalian, China, August 6-8, 2007
      [Photos]

    • Chang-Ning Huang and Hai Zhao
      Chinese Word Segmentation: A Decade Review (in Chinese, Invited paper)
      Journal of Chinese Information Processing, Vol. 21(3): 8-20ㄛ2007

    [2006]

    • Chang-Ning Huang and Hai Zhao
      Character-based Tagging: A New Method for Chinese Word Segmentation (in Chinese, Invited paper)
      In Frontier and Progress of Chinese Information Processing, Tsinghua University Press, November 21-22, 2006
      [PPT]

    • Hai Zhao, Chang-Ning Huang, Mu Li, and Bao-Liang Lu
      Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling
      The 20th Pacific Asia Conference on Language, Information and Computation (PACLIC-20), pp.87-94, Wuhan, China, November 1-3, 2006
      [PDF]

    • Chang-Ning Huang and Hai Zhao
      Which Is Essential for Chinese Word Segmentation: Character versus Word (Invited paper)
      The 20th Pacific Asia Conference on Language, Information and Computation (PACLIC-20), pp.1-12, Wuhan, China, November 1-3, 2006

    • Hai Zhao, Chang-Ning Huang, and Mu Li
      An Improved Chinese Word Segmentation System with Conditional Random Field
      Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing (SIGHAN-5), pp.162-165, Sydney, Australia, July 22-23, 2006
      [PDF]

    • Hai Zhao and Bao-Liang Lu
      A Modular Reduction Method for k-NN Algorithm with Self-Recombination Learning
      The Third International Symposium on Neural Networks (ISNN-2006), LNCS Vol. 3971: 530-536, Chengdu, China, May 30 - June 1, 2006
      [PDF]

    More

        

    Software

    Here, some software about basic natural language processing tasks are released, most of them are the simplified versions of our state-of-the-art systems that have been approved by all kinds of benchmark evaluations. These software are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY. They can be used freely for non-commercial research and educational purposes. I welcome all comments, bug reports, and suggestions for improvements.

    Chinese Word Segmentation Lead Board on SIGHAN Bakeoff 2005

    Since a long ago, four benchmark corpora released by SIGHAN Bakeoff 2005 have been the standard for CWS evaluation.
    We maintain a CWS lead board to collect all possible results to show the progress of this area.

    It is also open for user submission!

    Note this board only accpets results collected from serious enough publications or online available systems.

    BaseSeg: Multi-standard Chinese Word Segmenter with Unknown Words Identification

    [Download (53.5M)] (Its Source code in C++ is available upon email request.)
    Function: BaseSeg (current version 1.5) is a Chinese word segmenter for four segmentation standards presented by Bakeoff-3, in which unknown word (OOV) identification function is included.

    Techniques: BaseSeg is written based on CRF++. It is trained with n-gram feature settings in our paper at SIGHAN-5.

    Performance: Its performance is high enough up to now (F-scores given by it are 0.954, 0.969, 0.932, and 0.961 for four corpora of Bakeoff-3, AS, CityU, CTB and MSRA, respectively. It is also with the state-of-the-art performance of unknown word identification.).

    BaseNER: Named Entity Recognizer from Unsegmented Chinese Text

    [Download (23.7M)]
    Function: BaseNER (current version 1.0) is an efficient and high-performance named entity recognizer for and ONLY for plain Chinese text. It supports two NE standards of Bakeoff-3.

    Techniques: BaseNER is written based on CRF++. It is trained with n-gram feature settings in our paper at SIGHAN-6.

    Performance:Its performance is high enough up to now. In fact, NER F-scores given by it are 0.8815 and 0.8524 for CityU and MSRA NE test corpus of Bakeoff-3, respectively.

    BasePoS: Part-of-Speech tagger for Chinese and English

    [Download (8.5M)]
    Function: BasePoS (current version 1.0) is a part-of-speech tagger for English and segmented ONLY Chinese text.

    Techniques: This is a maximum entropy tagger. The Chinese model is trained from CTB PoS training corpus of Bakeoff-4, and the English model is trained from PTB with sections 02-21. The training details will be available later.

    Performance:The evaluation results are 0.941 (tagging precision) in CTB test corpus of Bakeoff-4 for Chinese and 0.966 in PTB section 24 for English.

    Character-level Dependency Annotations for Chinese Pen TReebank

    [Download Request]
    Function: Character-level Dependencies with a guideline document are for building a full Chinese character dependency tree incorporated with word-level dependencies.

    Techniques: This is based on my EACL-2009 paper, (Zhao, 2009) and later further studies.


    Top      Experiences      Shared Tasks      Publications      Software
    (Last update: August 18th, 2018)
    Locations of visitors to this page since December 10th, 2009 (Netscape-HTML checked)