KokkosKernels, Tpetra: Finish & optimize threaded GEMV & GEMM
Created by: mhoemmen
Tpetra::MultiVector::multiply currently invokes the BLAS for GEMM and GEMV operations. Belos in turn invokes Tpetra::MultiVector::multiply for the projection and basis vector update operations in classical Gram-Schmidt, an important kernel in the GMRES iterative linear solver.
Trilinos' configuration process finds the system BLAS by default. That BLAS implementation is usually not threaded. Thus, a big part of linear solves won't get threaded.
My work-around for #243 (closed) (which see) takes the first step to solving this. However, the kernels aren't complete and have not been optimized. I'll write more here about how to do that.