Tpetra::CrsMatrix::apply: Don't copy entire source (multi)vector
Created by: mhoemmen
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::CrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication, then apply() copies the entire source (multi)vector, including the local entries. This only affects performance for unpreconditioned or weakly preconditioned iterative solves, and even then, not very much.
The usual case is that the domain and column Maps have all their local entries first on every participating process, and that the remote entries follow in the column Map. This case does not require copying the local entries. Instead, the remote entries could be Imported into a separate data structure, and the remote part of the mat-vec done separately. See also #439 for discussion of a more general fix.
This depends on #437 (closed) and #439.
This is related to #385, in that the same tech that fixes #385 would fix this issue. See discussion there.