Skip to content

Ifpack2 - BlockTridiagContainer improvement and fix typo for intel compact blas

James Willenbring requested to merge kyungjoo-kim:ifpack2-develop into develop

Created by: kyungjoo-kim

Description

To improve the line solver performance of SPARC, I made some changes especially for the problem sizes of interest (approximately 4 GB usage of device memory).

  • In general, it uses a larger team size. As the target problem size is smaller, most line kernels cannot fill the whole gpu unit. Increasing team size, it can bring up more concurrency.
  • ExtractAndFactorize: now extract routine from tpetra block crs matrix has different loop order to reduce memory transactions.
  • ComputeResidual: as we use Tpetra BlockCrs format, we cannot really expect high degree of interleaved memory access. So, I moved the parallel loop one-level up and coasen the parallelism with atomic add.
  • BlockJacobi: block jacobi is used when a line has a unit length. Previously, a numeric phase factorize and a solve phase applies forward/backward solves. Now when a line has a unit length, the numeric phase invert diagonals and the solve phase just apply gemv.
  • KokkosBatched files: I also include batched header files that requires for this update.
  • @kliegeois This also include the fix for the typo of compact mkl.

Related Issues

#4388 #4584 (closed)

How Has This Been Tested?

Tested on Kokkos-dev-2 and bowman.

Merge request reports