Title Unsupervised Word Segmentation with BERT Oriented Probing and Transformation
Authors Li, Wei
Song, Yuhan
Su, Qi
Shao, Yanqiu
Affiliation Beijing Language & Culture Univ, Sch Informat Sci, Beijing, Peoples R China
Peking Univ, Sch EECS, Beijing, Peoples R China
Peking Univ, Sch Foreign Languages, Beijing, Peoples R China
Issue Date 2022
Publisher FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022)
Abstract Word Segmentation is a fundamental step for understanding many languages. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploit shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves a state-of-the-art F1 score on two CWS benchmark datasets. The proposed method can also help understand low resource languages and protect language diversity.(1)
URI http://hdl.handle.net/20.500.11897/654044
ISBN 978-1-955917-25-4
Indexed CPCI-SSH(ISSHP)
CPCI-S(ISTP)
Appears in Collections: 信息科学技术学院
外国语学院

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

百度学术™


0

Checked on Current Time




License: See PKU IR operational policies.