Belos::MultiVecTraits: Tpetra specialization is slow on GPU, because it creates local MV & thus does extra CUDA allocations
Created by: mhoemmen
@trilinos/belos @trilinos/tpetra @cgcgcg
The Tpetra specialization of Belos::MultiVecTraits
has two methods that need to create temporary "local" MultiVector instances. This creation does a new DualView allocation each time. This makes running on the GPU slow, especially for classical Gram-Schmidt in GMRES, or in general whenever doing X^T * Y
or X^H * Y
for either X or Y having multiple columns.
I am working on a fix. The idea is to maintain a static DualView "pool" that the local MultiVector can view. One must be careful not to let any DualView instance persist past Kokkos::finalize
.