Thyra::get_Epetra_MultiVector hangs in debug build if some (not all) procs have 0 rows
Created by: mhoemmen
@trilinos/thyra @trilinos/epetra
We constructed a test that runs on 4 MPI processes. It builds an Epetra_Map
such that Processes 0 and 2 each have zero local rows, and Processes 1 and 3 each have 1 local row. The code invokes the following function on that Map:
Teuchos::RCP<const Epetra_MultiVector>
Thyra::get_Epetra_MultiVector(
const Epetra_Map &map,
const MultiVectorBase<double> &mv
);
(definition in thyra/adapters/epetra/src/Thyra_EpetraThyraWrappers.cpp). In that function, the processes with a nonzero number of rows return a new Epetra_MultiVector
. The other processes call an overload of Thyra::get_Epetra_MultiVector
in the same file, that takes a const Epetra_Map&
and an RCP<const MultiVectorBase<double> >
. That function has a TEUCHOS_DEBUG
bit of code that creates a Thyra vector space and checks it.
The problem is that Thyra::create_VectorSpace
creates a Teuchos::Comm
, which does a collective (at least in a debug build -- hopefully ONLY in a debug build, else Thyra is not being as efficient as it could be). (If I insert a barrier after the Comm creation, the code hangs.)
The right fix would be to change the first get_Epetra_MultiVector
overload so that it does not defer to the second overload. Each of these functions should only ever be called collectively on all processes in the input Map's communicator.
We already have a test that demonstrates this. I will add it to Thyra, along with the fix.