Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #3580

Closed
Open
Created Oct 09, 2018 by James Willenbring@jmwilleMaintainer4 of 4 tasks completed4/4 tasks

Tpetra::Distributor: Fix "slow path" so we can use MPI_Isend

Created by: mhoemmen

@trilinos/tpetra @jjellio @csiefer2

Fix the "slow path" of Distributor::doPosts, so we can use nonblocking sends (MPI_Isend). The "slow path" kicks in when the data to send are not neatly grouped in contiguous chunks per process. It permutes the data into contiguous-by-target-process-rank chunks for sending. Currently, the slow path uses the same send buffer for all the messages. This means that it cannot use nonblocking sends.

We must fix both the "three-argument" (all messages have the same size) and "four-argument" (different messages may have different sizes) overloads of doPosts, and both the Teuchos::ArrayRCP and Kokkos::View versions of each.

Motivation and Context

This is part of the overall effort to improve MPI+CUDA performance and make Tpetra's boundary exchange and sparse matrix-vector multiply communication nonblocking.

Definition of Done

  • Fix 3-argument Teuchos::ArrayRCP overload of doPosts
  • Fix 3-argument Kokkos::View overload of doPosts
  • Fix 4-argument Teuchos::ArrayRCP overload of doPosts
  • Fix 4-argument Kokkos::View overload of doPosts

Related Issues

  • Part of #383
Assignee
Assign to
Time tracking