I know how to get rid of CUDA_LAUNCH_BLOCKING=1 requirement
Created by: mhoemmen
@trilinos/tpetra @trilinos/ifpack2 @trilinos/muelu @trilinos/amesos2 @trilinos/zoltan2 @crtrott @nmhamster
The issue is that the mirror View of a CudaUVMSpace View is just the same View. That's not unreasonable, since it is accessible from host. However, it means that if you write to the View from (host, device), then want to access it from (device, host), you'll need to fence (otherwise you'll get "bus errors," invalid data, etc.).
I tried this and it worked. First, if you need to create mirror Views, insist on them being host Views:
Kokkos::View<double*> x ("x", 100);
auto x_h = Kokkos::create_mirror_view (Kokkos::HostSpace (), x);
Kokkos::deep_copy (x_h, x);
fill_on_host (x_h);
Kokkos::deep_copy (x, x_h);
No need for fences in the above code. Second, insert fences only if you plan to access the UVM Views on host without copying.
I've CC'd everyone because this is not just some Tpetra thing. Everything downstream of Tpetra needs to pay attention to how they access Kokkos::View data, or data that ultimately live in Kokkos Views.
The best way to fix this would be to turn off Kokkos' use of UVM by default, then fix the resulting build and test errors. However, I don't see that happening any time soon.