Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #1941

Closed
Open
Created Nov 02, 2017 by James Willenbring@jmwilleMaintainer

Thyra::get_Epetra_MultiVector hangs in debug build if some (not all) procs have 0 rows

Created by: mhoemmen

@trilinos/thyra @trilinos/epetra

We constructed a test that runs on 4 MPI processes. It builds an Epetra_Map such that Processes 0 and 2 each have zero local rows, and Processes 1 and 3 each have 1 local row. The code invokes the following function on that Map:

Teuchos::RCP<const Epetra_MultiVector>
Thyra::get_Epetra_MultiVector(
  const Epetra_Map &map,
  const MultiVectorBase<double> &mv
  );

(definition in thyra/adapters/epetra/src/Thyra_EpetraThyraWrappers.cpp). In that function, the processes with a nonzero number of rows return a new Epetra_MultiVector. The other processes call an overload of Thyra::get_Epetra_MultiVector in the same file, that takes a const Epetra_Map& and an RCP<const MultiVectorBase<double> >. That function has a TEUCHOS_DEBUG bit of code that creates a Thyra vector space and checks it.

The problem is that Thyra::create_VectorSpace creates a Teuchos::Comm, which does a collective (at least in a debug build -- hopefully ONLY in a debug build, else Thyra is not being as efficient as it could be). (If I insert a barrier after the Comm creation, the code hangs.)

The right fix would be to change the first get_Epetra_MultiVector overload so that it does not defer to the second overload. Each of these functions should only ever be called collectively on all processes in the input Map's communicator.

We already have a test that demonstrates this. I will add it to Thyra, along with the fix.

Assignee
Assign to
Time tracking