Title Predicting kinase functional sites using hierarchical stochastic language modelling
Authors Yu, Huan
Pei, Guojun
Ge, Peng
Fang, Xiangzhong
Sun, Fengzhu
Lai, Luhua
Qian, Minping
Deng, Minghua
Affiliation Peking Univ, Sch Math Sci, LMAM, Beijing 100871, Peoples R China.
Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China.
Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA.
Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China.
Keywords Kinase
Functional sites
Hierarchical stochastic language (HSL)
ADENYLATE KINASE
PROTEIN
DATABASE
ALIGNMENT
RESOURCE
BLOCKS
Issue Date 2010
Citation STATISTICS AND ITS INTERFACE.3(523-531).
Abstract Motivation: Predicting functional sites in kinases is an important problem in biology. Both the functional sites and the relationship among the amino acids within the sites need to be understood. An algorithm is developed for kinase functional site prediction using amino acid sequence data based on hierarchical stochastic language (HSL) modelling. Results: Our method is validated by using two complementary approaches. Firstly, the predicted functional sites using the HSL were compared with experimentally verified functional sites including the patterns in PROSITE, the contacting sites in the Protein Data Bank (PDB), and the domains in Pfam. Compared to the patterns in PROSITE and the contacting sites in PDB, the overall average recall/precision of the HSL model was 83.5%/23.0% and 66.1%/79.9%, respectively. Compared to Pfam, 90% of the predicted functional sites were parts of domains with names containing the substring "kinase". Secondly, 10-fold cross-validation was used to study the kinase function prediction accuracy of the HSL. The HSL achieved both high sensitivity (94.7%) and specificity (94.0%) compared to 94.5% and 85.8%, respectively, for MEME. The HSL model automatically detected kinase sub-families. The identified sub-families were consistent with known phylogenetic trees of the kinase sequences. Therefore, the HSL was applicable to kinase sequences with heterogeneous subsets sharing the same catalysis function.
URI http://hdl.handle.net/20.500.11897/314502
ISSN 1938-7989
Indexed SCI(E)
CPCI-S(ISTP)
Appears in Collections: 数学科学学院
数学及其应用教育部重点实验室

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

百度学术™


0

Checked on Current Time




License: See PKU IR operational policies.