Title Extracting and measuring uncertain biomedical knowledge from scientific statements
Authors Guo,Xin
Chen,Yuming
Du,Jian
Dong,Erdan
Affiliation Department of Cardiology, Institute of Vascular Medicine, Peking University Third Hospital, Beijing, China
NHC Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Beijing, China
Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Beijing, China
Beijing Key Laboratory of Cardiovascular Receptors Research, Beijing, China
Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
Medical Informatics Center, Peking University, Beijing, China
National Institute of Health Data Science, Peking University, Beijing, China
Institute of Cardiovascular Sciences, Peking University, Beijing, China
Keywords Cardiology
Data mining - Diagnosis - Disease control - Diseases - Natural language processing systems - Semantics
Issue Date 5-Dec-2021
Publisher arXiv
Abstract
Purpose: There is an increasing need for computable biomedical knowledge since the information overload of scientific literature which is generally expressed in unstructured natural language. This study aims to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach: Taking cardiovascular research publications in China as a sample, we extracted the SPO triples as knowledge unit and the hedging/conflicting uncertainties as the knowledge context. We introduced Information Entropy and Uncertainty Rate as potential metrics to quantity the uncertainty of biomedical knowledge claims represented at different levels, such as the SPO triples (micro level), as well as the semantic type pairs (micro-level). Findings: The results indicated that while the number of scientific publications and total SPO triples showed a liner growth, the novel SPO triples occurring per year remained stable. After examining the frequency of uncertain cue words in different part of scientific statements, we found hedging words tend to appear in conclusive and purposeful sentences, whereas conflicting terms often appear in background and act as the premise (e.g., unsettled scientific issues) of the work to be investigated. Research limitations: Using cue words to represent textual uncertainty of biomedical knowledge may lead to a small amount of noise. Practical implications: Our approach identified major uncertain knowledge areas, such as diagnostic biomarkers, genetic characteristics, and pharmacologic therapies surrounding cardiovascular diseases in China. These areas are suggested to be prioritized in which new hypotheses need to be verified, and disputes, conflicts, as well as contradictions to be settled further. Originality/value: We provided a novel approach by combining natural language processing, computational linguistics with informetric methods to extracting and measuring uncertain knowledge from scientific statements.
© 2021, CC BY-NC-ND.
URI http://hdl.handle.net/20.500.11897/663220
Indexed EI
Appears in Collections: 第三医院
公共卫生学院
医学信息中心
基础医学院

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

百度学术™


0

Checked on Current Time




License: See PKU IR operational policies.