Tpetra: Implement interface to nonblocking dot product
Created by: mhoemmen
@trilinos/tpetra
Story: #748 (closed) Blocked by: #752 (closed), #944 (closed)
Implement Tpetra interface to nonblocking dot product, for Tpetra::Vector and Tpetra::MultiVector.
"Nonblocking" could include both the local (Kokkos) part of the computation, and the global (MPI) part. The Kokkos kernels that Tpetra uses for norms and dot products do not block explicitly, so this already happens. Better practice for Kokkos kernels would be to take an execution space instance argument (corresponding to, e.g., a CUDA stream), for correct semantics of nonblocking operations. The global part of nonblocking depends on #752 (closed).
I would recommend putting this interface in the Tpetra::Details namespace for now. We don't need to make it available in Tpetra::MultiVector or Tpetra::Vector yet.