KokkosKernels: Don't need both CudaSpace & CudaUVMSpace instantiations

Created by: mhoemmen

@trilinos/tpetra

See #226 (closed). CUDA instantiations of sparse matrix-vector multiply kernel take a long time. I found that the kernels used by Tpetra for iterative solvers (sparse mat-vec and (multi)vector operations) are getting instantiated for both Device<Cuda, CudaSpace> and Device<Cuda, CudaUVMSpace>. Tpetra and downstream packages only need the latter. So, it's building twice as much code as needed.

KokkosKernels instantiates these kernels in a different way than Trilinos' ETI. With KokkosKernels, one can use template parameter combinations that haven't been instantiated. Thus, getting rid of instantiations that Tpetra and downstream packages don't use won't break any tests, examples, or downstream code.