Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #383
Closed
Open
Issue created May 23, 2016 by James Willenbring@jmwilleOwner

Tpetra::Distributor: Make doPosts nonblocking

Created by: mhoemmen

@trilinos/tpetra

Epic: #767.

Tpetra::Distributor implements the MPI communication that happens in an Export or Import. It uses MPI 2-sided point-to-point communication. Its doPosts method starts the receives and sends, and its doWaits method waits on them (MPI_Waitall). Receives are nonblocking (MPI_Irecv) and sends may be either blocking (various options, but only MPI_Send is used in practice) or nonblocking (MPI_Isend). However, sends default to blocking, and this is the only completely correct path. This is because of the so-called "slow path" of doPosts.

The "slow path" comes about when the indices in a send to a particular process aren't contiguous (i.e., are interrupted by data from [meant for]* other process(es)). The current implementation thus requires an intermediate pack buffer in that case. It allocates the extra buffer on the spot. In order to avoid holding on to that memory, the implementation forces blocking sends in that case (it throws std::logic_error otherwise).

Two fixes come to mind:

  1. Keep the extra buffer. Keep it in the returned CommRequest so it doesn't get deallocated.
  2. Pre-permute the data during packing (DistObject::packAndPrepare) so the slow path never gets invoked in practice.

The first fix is easier, but may be ultimately less performant.

The "slow path" occurs in both the 3-argument (fixed # packets per index, used by Vector and MultiVector) and 4-argument (variable # packets per index, used by CrsGraph, CrsMatrix, etc.) versions of doPosts. The 3-argument version matters most for solver performance, but it's easy to do both at the same time.

[*edit by jhux2 23-Jan-2019]

Assignee
Assign to
Time tracking