Tpetra BlockCrsMatrix (BCRS): Overlap communication & computation in apply()
Created by: mhoemmen
@trilinos/tpetra
Epic: #767.
Tpetra::Experimental::BlockCrsMatrix
's sparse matrix-vector multiply lives in its apply
method (which implements Tpetra::Operator::apply
). Its implementation does not currently overlap communication and computation. Doing so could improve performance, as well as robustness of that performance to random small performance variation between MPI processes.
The fix would take the same form as the proposed fix for #385 (which see).