Thyra::get_Epetra_MultiVector hangs in debug build if some (not all) procs have 0 rows
Created by: mhoemmen
We constructed a test that runs on 4 MPI processes. It builds an
Epetra_Map such that Processes 0 and 2 each have zero local rows, and Processes 1 and 3 each have 1 local row. The code invokes the following function on that Map:
Teuchos::RCP<const Epetra_MultiVector> Thyra::get_Epetra_MultiVector( const Epetra_Map &map, const MultiVectorBase<double> &mv );
(definition in thyra/adapters/epetra/src/Thyra_EpetraThyraWrappers.cpp). In that function, the processes with a nonzero number of rows return a new
Epetra_MultiVector. The other processes call an overload of
Thyra::get_Epetra_MultiVector in the same file, that takes a
const Epetra_Map& and an
RCP<const MultiVectorBase<double> >. That function has a
TEUCHOS_DEBUG bit of code that creates a Thyra vector space and checks it.
The problem is that
Thyra::create_VectorSpace creates a
Teuchos::Comm, which does a collective (at least in a debug build -- hopefully ONLY in a debug build, else Thyra is not being as efficient as it could be). (If I insert a barrier after the Comm creation, the code hangs.)
The right fix would be to change the first
get_Epetra_MultiVector overload so that it does not defer to the second overload. Each of these functions should only ever be called collectively on all processes in the input Map's communicator.
We already have a test that demonstrates this. I will add it to Thyra, along with the fix.