Tpetra: BCRS blockWiseMultiply runs in wrong execution space
Created by: mhoemmen
@trilinos/tpetra Tpetra::Experimental::BlockCrsMatrix::blockWiseMultiply currently runs in the default Kokkos execution space, rather than the one corresponding to Node. For example, if CUDA is enabled, Kokkos::Cuda is Kokkos' default execution space by default. If Node::execution_space is Kokkos::Serial, BlockCrsMatrix::blockWiseMultiply will attempt to run in CUDA. That causes run-time errors, because CUDA cannot (currently) access host memory. Even if it could, this would be inefficient.
This comes about because the code uses parallel_for with a number as the range. Use Kokkos::RangePolicy instead, and explicitly give it the right execution space as its first template parameter.
I never saw this error with CUDA before, because I only had the Cuda Node enabled in my CUDA tests. This error only manifested when both Cuda and Serial Nodes were enabled.