Tpetra::BlockCrsMatrix::apply: Don't copy entire source (multi)vector
Created by: mhoemmen
If the number of MPI process in a Tpetra::Experimental::BlockCrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication, then apply() copies the entire source (multi)vector, including the local entries.
The usual case is that the domain and column Maps have all their local entries first on every participating process, and that the remote entries follow in the column Map. This case does not require copying the local entries. Instead, the remote entries could be Imported into a separate data structure, and the remote part of the mat-vec done separately.