Tpetra BCRS: Improve vectorization of small dense linear algebra operations
Created by: mhoemmen
@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv
Tpetra::Experimental::BlockCrsMatrix uses the small dense linear algebra operations currently implemented in Tpetra_Experimental_BlockView.hpp. These operations take Kokkos::View or LittleVector / LittleBlock. (Their interfaces are enough alike from the perspective of these operations, that we need only consider Kokkos::View in what follows, without loss of generality.) For example, Tpetra::Experimental::GEMV (small dense matrix times small dense vector) takes a rank-2 View (the matrix) and two rank-1 Views (input and output vectors).
Discussions a couple weeks ago with @nmhamster suggested that we could get outer loop vectorization by doing the following:
- Change the storage layout so that the (i,j) entries of consecutive blocks (or the (i) entries of consecutive vectors) are stored contiguously
- Linear algebra operations on those small dense blocks would then need to take a whichBlock / whichVector index argument, to tell which block / vector to use
The routines wouldn't change, except that instead of writing A(i,j) or x(k) (for example), we would write A(i,j,whichBlock) or x(k,whichBlock). We have to rely on Kokkos::View::operator() to inline, but this is a much easier approach than explicit SIMD.
This depends on #177 (closed) and #179 (closed).