An Incompressible Navier-Stokes Equations Solver on the GPU Using CUDA

N. Karlsson, Master thesis, Chalmers University of Technology, supervisor A. Mark, examiner U. Assarsson, August 2013.

Abstract

Graphics Processing Units (GPUs) have emerged as highly capable computational accelerators for scienti c and engineering applications. Many reports claim orders of magnitude of speedup compared to traditional Central Processing Units (CPUs), and the interest for GPU computation is high in the computational world. In this thesis, the capability of using GPUs to accelerate the full computational chain of a 3D incompressible Navier-Stokes solver, including solvers and preconditioners for sparse linear systems as well as assembly routines for a nite volume discretization, has been evaluated. The CG, GMRES and BiCGStab iterative solvers have been implemented on the CUDA GPGPU platform and evaluated together with the Jacobi, and Least Square Polynomial preconditioners. A double precision Navier-Stokes solver has been implemented using CUDA, adopting a collocated cartesian grid, SIMPLEC pressure-velocity coupling scheme, and implicit time discretization.

The CUDA GPU implementations of the iterative solvers and preconditioners and the Navier-Stokes solver were validated and evaluated against serial and parallel CPU implementations. For the iterative solvers, speedups of between six and thirteen were achieved against the MKL CPU library, and the implemented methods beats existing open source GPU implementations of equivalent methods. For the full Navier-Stokes solver, speedups of up to a factor twelve were achieved compared to an equivalent commercial CPU code when equivalent iterative solvers were used. A speedup of a factor two was achieved when a commercial Algebraic MultiGrid method was used to solve the pressure Poisson equation in the commercial CPU implementation.

The bottleneck of the resulting implementation was found to be the solution of the pressure Poisson equation. It accounted for a signi cant part of the total execution time for large problems. The implemented assembly routines on the GPU were highly ecient. The combined execution time for these routines were negligible compared to the total execution time.

The GPU has been assessed as a highly capable accelerator for the implemented methods. About an order of magnitude of speedups have been achieved for algorithms which can eciently be implemented on the GPU.

Keywords: GPU, GPGPU, CUDA, Iterative Solver, Preconditioner, Navier-Stokes Equations, Fluid Solver, CFD, Finite Volume Methods




Photo credits: Nic McPhee