An ultra-scalable fully implicit is developed for stiff time-dependent problems frequently found in atmospheric dynamics. In the solver, a hybrid multigrid domain decomposition preconditioner is proposed to greatly accelerate the convergence of the solver, and to exploit coarse-grained parallelism. A physics-based multi-block asynchronized incomplete LU factorization method is customized to solve the subproblems on each overlapped subdomain to further gain fine-grained concurrency. We perform systematic optimizations on different hardware levels for best utilization of the heterogeneous computing units and substantial reduction of the cost of data-movement. The solver enables fast and accurate atmospheric simulations on the emerging heterogeneous Sunway supercomputer in China, scaling to over 8.5 million heterogeneous cores and achieving a sustained performance over 1.5 PFlops.
The data partitioning and task scheduling of different kernels in our atmospheric model
The 500 m level profile of the atmosphere at day 10 for the baroclinic instability test in a β-plane 3D channel. Shown here are the horizontal velocities (left top/bottom) and the temperature/pressure (right top/ bottom).
Strong scaling results on the new Sunway supercomputer.