Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #1658

Closed
Open
Created Aug 30, 2017 by James Willenbring@jmwilleOwner

Tpetra::MultiVector: Unpack kernels not using correct execution space

Created by: mhoemmen

@trilinos/tpetra @trilinos/stokhos Blocks: #1088 (closed)

Tpetra::MultiVector::unpackAndCombineNew uses Kokkos parallel kernels to unpack remote data into the target MultiVector. The kernels are supposed to run on either device or host, depending on whether the target MultiVector is sync'd to device or host. However, the kernels were only running on device! This caused trouble with #1088 (closed) (which see), and may hinder performance for small MultiVectors or in other cases where users prefer to work on host.

The issue is that the kernels were using the output Kokkos::View (of the target MultiVector's data) to determine the execution space on which to run. The problem with this, is that for a Kokkos::DualView of CudaUVMSpace, the host and device Views are the same. This blocks #1088 (closed), whose fix requires that the input View (the buffer to unpack) be either a CudaSpace or a HostSpace View. Furthermore, CudaUVMSpace::execution_space == Cuda, so the kernel will always run on device, even given a HostSpace buffer to unpack.

The fix is to change these kernels to take a Kokkos execution space argument. This argument lets the user specify the execution space on which to run. It's an execution space instance, so this could give us a future option to run in a separate CUDA stream (e.g., for overlap of communication and computation), or to run on a subset of threads.

This fix also requires fixing Stokhos' specializations of the unpack kernels.

Assignee
Assign to
Time tracking