Title A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
Authors Zhou, Yizhuang
Zheng, Jifang
Wu, Yepeng
Zhang, Wenting
Jin, Junfei
Affiliation Guilin Med Univ, Affiliated Hosp, Lab Hepatobiliary & Pancreat Surg, Guilin 541001, Guangxi, Peoples R China
Peking Univ, Acad Adv Interdisciplinary Studies, Peking Tsinghua Ctr Life Sci, Beijing 100871, Peoples R China
Guilin Med Univ, Guangxi Key Lab Tumor Immunol & Microenvironm Reg, Guilin 541001, Guangxi, Peoples R China
Guilin Med Univ, China USA Lipids Hlth & Dis Res Ctr, Guilin 541001, Guangxi, Peoples R China
Guilin Med Univ, Guangxi Key Lab Mol Med Liver Injury & Repair, Guilin 541001, Guangxi, Peoples R China
Keywords IDENTIFICATION
CLASSIFICATION
PHYLOGENY
ACCURATE
TAXONOMY
Issue Date 26-Feb-2020
Publisher BMC GENOMICS
Abstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. Results Here, we initially test a published approach called "genome-wide tetranucleotide frequency correlation coefficient" (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called "fragment tetranucleotide frequency correlation coefficient" (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves similar to 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. Conclusions FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.
URI http://hdl.handle.net/20.500.11897/586794
ISSN 1471-2164
DOI 10.1186/s12864-020-6597-x
Indexed SCI(E)
Scopus
Appears in Collections: 前沿交叉学科研究院

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

Scopus®



Checked on Current Time

百度学术™


0

Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.