Title Cardiovascular Risk Prediction Method Based on CFS Subset Evaluation and Random Forest Classification Framework
Authors Xu, Shan
Zhang, Zhen
Wang, Daoxian
Hu, Junfeng
Duan, Xiaohui
Zhu, Tiangang
Affiliation China Acad Informat Commun Technol, Beijing, Peoples R China.
Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China.
Peking Univ, Peoples Hosp, Beijing, Peoples R China.
Keywords Cardiovascular disease (CVD)
risk prediction
data mining
feature selection
random forest
HEART-DISEASE
SYSTEM
Issue Date 2017
Publisher 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA)
Citation 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA). 2017, 233-237.
Abstract Cardiovascular Disease (CVD) is a highly significant contributor to loss of quality and quantity of life all over the world. Early detection and risk prediction is very important for patients' treatment and doctors' diagnose. This paper focus on establishing a more accurate and practical risk prediction system based on data mining techniques to provide auxiliary medical service. In order to be practically used for collecting and analyzing patients' data in healthcare industries, the system consists of four parts: data interface, data preparation, feature selection and classification. Data interface response to obtain hospitals' raw data from hospital; data preprocessing is needed for data integration, data cleaning and rating mapping etc. Key features were then selected by CFS Subset Evaluation combined with Best-First-Search method to reduce dimensionality. Random forest was inducted as basic classifier to identify risk level, which is a prior trial in CVD risk prediction field. Cleveland Heart-Disease Database (CHDD) and Cardiology inpatient dataset of PKU People's Hospital were both tested to confirm accuracy as well as practicality. In CHDD test, our system has a significantly higher accuracy of 91.6% than other methods. In People's Hospital dataset test, it achieves an accuracy of 97%, which is better than most of other classifiers except SVM (98.9%), however random forest only take half of time than SVM. Comprehensively considering the risk prediction system shows great significance in accuracy and practical use for patients' treatment and doctors' diagnose.
URI http://hdl.handle.net/20.500.11897/504986
DOI 10.1109/ICBDA.2017.8078813
Indexed EI
CPCI-S(ISTP)
Appears in Collections: 信息科学技术学院
人民医院

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

Scopus®



Checked on Current Time

百度学术™


0

Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.