## Tpetra::DistObject: Add "locals only" and "remotes only" options to doImport and doExport

*Created by: mhoemmen*

@trilinos/tpetra

Epic: #767.

To the doImport and doExport methods of Tpetra::DistObject, add an option to {ex,im}port

- locals only,
- remotes only, or
- everything (what these methods currently do).

This would let us overlap communication and computation in CrsMatrix or BlockCrsMatrix matrix-vector multiply, without needing to change CrsGraph semantics. In particular, CrsGraph could still compute its Import from the domain Map to the (entire, with locals too) column Map, and code that relies on this Import could work unchanged. Matrix-vector multiply implementations could then do coarse-grained overlap as follows:

- Start a nonblocking Import of the remotes
- Import the locals (if necessary)
- Do the local part of the mat-vec
- Finish the nonblocking Import of the remotes
- Do the remote part of the mat-vec (in place, in the row Map vector -- hence coarse-grained overlap)
- Do an analogous procedure to overlap the Export, if an Export is needed (if row Map != range Map)

The "if necessary" remark on Step 2 relates to whether the domain Map is "fitted" to the column Map (see #437 (closed) for a definition of "fitted"). If so, the local entries of the input vector would not need to be copied (see #435 and #436). If not, they would need to be copied, but this copy could be per process -- processes that don't need permutations wouldn't need to make a copy.

This approach has the following benefits over one that uses a different Import than the CrsGraph's domain -> column Map Import:

- CrsGraph's Import retains its current meaning
- Neither the graph nor the matrix would need to compute a new Import object just for the remotes
- This approach would work for any domain and column Maps, and in fact for any range and row Maps. In particular, it would work regardless of whether the domain Map is fitted to the column Map. The graph or matrix would not need to do any extra all-reduces to figure out if the Maps are fitted on all processes.