Tpetra::Details::FixedHashTable: copyOffsets does incorrect host access from CUDA
Created by: mhoemmen
Tpetra::Details::FixedHashTable: Fix CUDA crash
@trilinos/tpetra
FixedHashTable implements Tpetra::Map's conversion from global to local indices for the noncontiguous Map case, among other things. It has a "copy constructor" which converts between instances on different Kokkos devices. This constructor uses the copyOffsets internal function, which lives in Tpetra_Details_FixedHashTable_decl.hpp. copyOffsets copies FixedHashTable::ptr_ -- the offsets array (corresponds to the 'ptr' array in CSR) -- from one device to another. Different devices may use different offset types, so copyOffsets can't just use Kokkos::deep_copy. It also checks for overflow. Thus, it uses a custom parallel_reduce functor, CopyOffsets (note initial capital).
The CopyOffsets functor is templated on the input and output View types. It assumes that the output View's execution space can access the input View's memory space. This is a bug. For example, if the output View's execution space is Kokkos::Cuda, it cannot access host memory (Kokkos::HostSpace). (That's the wrong direction for CUDA UVM.) This is why the Dashboard was showing a failed test: the test was throwing in this case, because Kokkos was catching the incorrect access at run time.