MPI_Finalize slow with CUDA + OpenMPI 2.x (known OpenMPI issue; fixed in 3.1)
Created by: mhoemmen
Tpetra::CrsMatrix UnitTests2 takes > 560s in a CUDA 8 release build on K80. Seriously, what's going on? Do I need different KOKKOS_ARCH
settings? It sure would have been nice to have had some performance tracking so we could have caught this. I don't think this is anything we did; we've only been fixing CUDA issues over time.
36/145 Test #36: TpetraCore_CrsMatrix_UnitTests2_MPI_4 ....................................................... Passed 561.02 sec
@trilinos/tpetra