Tpetra::CrsGraph::makeColMap: Thread-parallelize it & remove (unstated) UVM assumption
Created by: mhoemmen
@trilinos/tpetra Story: #829 (closed), #832
Thread-parallelize Tpetra::CrsGraph::makeColMap and remove the unstated UVM assumption.
We can only thread-parallelize this if the graph is StaticProfile. Current storage for DynamicProfile is host only and not thread safe (Teuchos::ArrayRCP<Teuchos::Array< LO | GO > >
).
The case where sortGhostsAssociatedWithEachProcessor_
is true is easier:
- Estimate number of entries in column Map on calling process
- Create Kokkos::UnorderedMap to store those entries
- Use local version of domain Map to figure out what's local & what's remote; stash remotes in the UnorderedMap
- Convert UnorderedMap into array, and sort (the current version uses
std::set
, which sorts on insert) - Continue with the rest of the method, more or less as it stands
If sortGhostsAssociatedWithEachProcessor_
is false, we can use the result of Step 4 above to allocate an array (corresponding to RemoteGIDUnorderedVector
in the current version of the code), then make a second pass in order to fill it.
Making everything thread parallel in this method depends on a fix for #659.
I marked this as "results impacting," because if sortGhostsAssociatedWithEachProcessor_
is true, the changes proposed above may change the order of remotes in the column Map. This may, in turn, change the order of entries in some rows of the sparse matrix, which may change results (even in exact arithmetic) for some preconditioners / smoothers (e.g., Gauss-Seidel or SOR, implemented in Ifpack2::Relaxation).