Tpetra BlockCrsMatrix (BCRS): Overlap communication & computation in apply()
Created by: mhoemmen
Tpetra::Experimental::BlockCrsMatrix's sparse matrix-vector multiply lives in its
apply method (which implements
Tpetra::Operator::apply). Its implementation does not currently overlap communication and computation. Doing so could improve performance, as well as robustness of that performance to random small performance variation between MPI processes.
The fix would take the same form as the proposed fix for #385 (which see).