Institutional Repository of Peking University: Extreme-scale realistic stencil computations on sunway taihulight with ten million cores - 开云app体育

Title	Extreme-scale realistic stencil computations on sunway taihulight with ten million cores
Authors	Cai, Ying Yang, Chao Ma, Wenjing Ao, Yulong
Affiliation	Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China CAPT, CCSE, School of Mathematical Sciences and NELVT, Peking University, Beijing, 100871, China University of Chinese Academy of Sciences, Beijing, 100049, China State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing, 100190, China
Issue Date	2018
Publisher	18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
Citation	18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. 2018, 566-571.
Abstract	Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil computation kernels on many leadership supercomputers, such as Sunway TaihuLight, which has relatively high computing throughput whilst relatively low data-moving capability. In this white paper, we show the efforts we have been making during the past two years in developing end-to-end implementation and optimization techniques for extreme-scale stencil computations on Sunway TaihuLight. We started with a work on optimizing the 3-D 2nd-order 13-point stencil for nonhydrostatic atmospheric dynamics simulation, which is an important part of the 2016 ACM Gordon Bell Prize winning work, and extended it in ways that can handle a broader range of realistic and challenging problems, such as the HPGMG benchmark that consists of memory-hungry stencils and the gaseous wave detonation simulation that relies on complex high-order stencils. The presented stencil computation paradigm on Sunway TaihuLight includes not only multilevel parallelization to exploit the parallelism on different hardware levels, but also systematic performance optimization techniques for communication, memory access, and computation. We show by extreme-scale tests that the proposed systematic stencil computation paradigm can successfully deliver remarkable performance on Sunway TaihuLight with ten million heterogeneous cores. In particular, we achieve an aggregate performance of 23.12 Pflops for the 3-D 5th order WENO stencil computation in gaseous wave detonation simulation, which is the highest performance result for high-order stencil computations as far as we know, and an aggregate performance of solving over one trillion unknowns per second in the HPGMG benchmark, which ranks the first place in the HPGMG List of Nov 2017. © 2018 IEEE.
URI	http://hdl.handle.net/20.500.11897/530837
ISSN	9781538658154
DOI	10.1109/CCGRID.2018.00086
Indexed	EI
Appears in Collections:	工学院

Files in This Work

There are no files associated with this item.

Web of Science®

0

Checked on Last Week

Scopus®

Checked on Current Time

百度学术™

0

Checked on Current Time

Google Scholar™

Check

Show full item record

License: See PKU IR operational policies.