Title The building of a comprehensive toponym corpus for Chinese information processing
Authors Liu, Qiang
Yu, Jingsong
Wu, Shenglan
Wang, Huilin
Affiliation Department of Language Information Engineering, Peking University, No. 5, Yiheyuan Road, Haidian District, Beijing 100871, China
Institute of Scientific and Technical Information of China, No. 15, Fuxing Road, Beijing 100038, China
Issue Date 2013
Publisher icic express letters part b applications
Citation ICIC Express Letters, Part B: Applications.2013,4,(5),1409-1415.
Abstract This paper describes the process of creating a comprehensive and large-scaled Chinese Toponym Corpus which included names (and aliases) of every administrative divisions, roads or streets, and buildings as many as we can find in mainland China and geographical relationships among them. We use government standard files, GPS points of interests database and addresses information crawled and extracted from some house and office renting web sites as raw data for corpus building. N-gram counting set, improved mutual information and other parameters and bootstrapping method are computed to acquire statistical models for Chinese address chunk segmentation and attributes annotation using tag set we specially designed for Chinese natural language processing. We performed structural analysis and in depth statistical analysis of the Chinese toponyms and geographical entities to obtain a categorized toponym dictionary. Finally, based on sematic analysis of Chinese Toponym Corpus and results of all previous work, a Chinese toponym ontology with probabilistic information was built up using Neo4j graph database system. ? 2013 ISSN 2185-2766.
URI http://hdl.handle.net/20.500.11897/410426
ISSN 21852766
Indexed EI
Appears in Collections: 待认领

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

百度学术™


0

Checked on Current Time




License: See PKU IR operational policies.