Tpetra: Clarify operating procedures for using MPI + multiple GPUs on a node
Created by: mhoemmen
@trilinos/tpetra Blocked by: #1673 (closed) (else can't build on relevant testbeds)
Tpetra's documentation needs to clarify how to use MPI with multiple GPUs on a node. It looks like Kokkos knows how to deal with this, as long as MPI_Init
is called before Kokkos::initialize
, and as long as one uses the --kokkos-ndevices
argument correctly. I copied the documentation below out of Kokkos' --help
output:
--kokkos-ndevices=INT[,INT] : used when running MPI jobs. Specify number of
devices per node to be used. Process to device
mapping happens by obtaining the local MPI rank
and assigning devices round-robin. The optional
second argument allows for an existing device
to be ignored. This is most useful on workstations
with multiple GPUs of which one is used to drive
screen output.
Just to clarify, it looks like all the MPI processes can take the same --kokkos-ndevices
argument. Kokkos will use the local MPI process to do device mapping. See also https://github.com/kokkos/kokkos/issues/50 and https://github.com/kokkos/kokkos/issues/544 .