Tpetra::CrsMatrix::getLocalDiagCopy: Kokkos-parallelize diagonal extraction
Created by: mhoemmen
@trilinos/tpetra Performance tests with Nalu discovered that Ifpack2::Relaxation Jacobi setup had a sequential section in CrsMatrix::getLocalDiagCopy (the two-argument version, though it's easy enough to optimize the one-argument version too). Nalu developers experimented with putting a parallel_for loop around diagonal extraction in getLocalDiagCopy, and it worked great.