Title | Unsupervised Word Segmentation with BERT Oriented Probing and Transformation |
Authors | Li, Wei Song, Yuhan Su, Qi Shao, Yanqiu |
Affiliation | Beijing Language & Culture Univ, Sch Informat Sci, Beijing, Peoples R China Peking Univ, Sch EECS, Beijing, Peoples R China Peking Univ, Sch Foreign Languages, Beijing, Peoples R China |
Issue Date | 2022 |
Publisher | FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022) |
Abstract | Word Segmentation is a fundamental step for understanding many languages. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploit shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves a state-of-the-art F1 score on two CWS benchmark datasets. The proposed method can also help understand low resource languages and protect language diversity.(1) |
URI | http://hdl.handle.net/20.500.11897/654044 |
ISBN | 978-1-955917-25-4 |
Indexed | CPCI-SSH(ISSHP) CPCI-S(ISTP) |
Appears in Collections: | 信息科学技术学院 外国语学院 |