Institutional Repository of Peking University: Unsupervised Word Segmentation with BERT Oriented Probing and Transformation - 开云app体育

Title	Unsupervised Word Segmentation with BERT Oriented Probing and Transformation
Authors	Li, Wei Song, Yuhan Su, Qi Shao, Yanqiu
Affiliation	Beijing Language & Culture Univ, Sch Informat Sci, Beijing, Peoples R China Peking Univ, Sch EECS, Beijing, Peoples R China Peking Univ, Sch Foreign Languages, Beijing, Peoples R China
Issue Date	2022
Publisher	FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022)
Abstract	Word Segmentation is a fundamental step for understanding many languages. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploit shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves a state-of-the-art F1 score on two CWS benchmark datasets. The proposed method can also help understand low resource languages and protect language diversity.(1)
URI	http://hdl.handle.net/20.500.11897/654044
ISBN	978-1-955917-25-4
Indexed	CPCI-SSH(ISSHP) CPCI-S(ISTP)
Appears in Collections:	信息科学技术学院外国语学院

Files in This Work

There are no files associated with this item.

Web of Science®

0

Checked on Last Week

百度学术™

0

Checked on Current Time

Show full item record

License: See PKU IR operational policies.