Tpetra::Distributor: Fix "slow path" so we can use MPI_Isend
Created by: mhoemmen
@trilinos/tpetra @jjellio @csiefer2
Fix the "slow path" of Distributor::doPosts
, so we can use nonblocking sends (MPI_Isend
). The "slow path" kicks in when the data to send are not neatly grouped in contiguous chunks per process. It permutes the data into contiguous-by-target-process-rank chunks for sending. Currently, the slow path uses the same send buffer for all the messages. This means that it cannot use nonblocking sends.
We must fix both the "three-argument" (all messages have the same size) and "four-argument" (different messages may have different sizes) overloads of doPosts
, and both the Teuchos::ArrayRCP
and Kokkos::View
versions of each.
Motivation and Context
This is part of the overall effort to improve MPI+CUDA performance and make Tpetra's boundary exchange and sparse matrix-vector multiply communication nonblocking.
Definition of Done
-
Fix 3-argument Teuchos::ArrayRCP
overload ofdoPosts
-
Fix 3-argument Kokkos::View
overload ofdoPosts
-
Fix 4-argument Teuchos::ArrayRCP
overload ofdoPosts
-
Fix 4-argument Kokkos::View
overload ofdoPosts
Related Issues
- Part of #383