Title | Cardiovascular Risk Prediction Method Based on CFS Subset Evaluation and Random Forest Classification Framework |
Authors | Xu, Shan Zhang, Zhen Wang, Daoxian Hu, Junfeng Duan, Xiaohui Zhu, Tiangang |
Affiliation | China Acad Informat Commun Technol, Beijing, Peoples R China. Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China. Peking Univ, Peoples Hosp, Beijing, Peoples R China. |
Keywords | Cardiovascular disease (CVD) risk prediction data mining feature selection random forest HEART-DISEASE SYSTEM |
Issue Date | 2017 |
Publisher | 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA) |
Citation | 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA). 2017, 233-237. |
Abstract | Cardiovascular Disease (CVD) is a highly significant contributor to loss of quality and quantity of life all over the world. Early detection and risk prediction is very important for patients' treatment and doctors' diagnose. This paper focus on establishing a more accurate and practical risk prediction system based on data mining techniques to provide auxiliary medical service. In order to be practically used for collecting and analyzing patients' data in healthcare industries, the system consists of four parts: data interface, data preparation, feature selection and classification. Data interface response to obtain hospitals' raw data from hospital; data preprocessing is needed for data integration, data cleaning and rating mapping etc. Key features were then selected by CFS Subset Evaluation combined with Best-First-Search method to reduce dimensionality. Random forest was inducted as basic classifier to identify risk level, which is a prior trial in CVD risk prediction field. Cleveland Heart-Disease Database (CHDD) and Cardiology inpatient dataset of PKU People's Hospital were both tested to confirm accuracy as well as practicality. In CHDD test, our system has a significantly higher accuracy of 91.6% than other methods. In People's Hospital dataset test, it achieves an accuracy of 97%, which is better than most of other classifiers except SVM (98.9%), however random forest only take half of time than SVM. Comprehensively considering the risk prediction system shows great significance in accuracy and practical use for patients' treatment and doctors' diagnose. |
URI | http://hdl.handle.net/20.500.11897/504986 |
DOI | 10.1109/ICBDA.2017.8078813 |
Indexed | EI CPCI-S(ISTP) |
Appears in Collections: | 信息科学技术学院 人民医院 |