Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #4639

Closed
Open
Created Mar 18, 2019 by James Willenbring@jmwilleMaintainer

Tpetra::MultiVector::reduce broken if getStride() > getLocalLength()

Created by: mhoemmen

@trilinos/tpetra @trilinos/belos @cgcgcg

Current Behavior

If a Tpetra::MultiVector has getStride() > getLocalLength(), then reduce() gives incorrect results.

Motivation and Context

I discovered this while working on a fix for #4626 (closed), a Belos performance issue on GPUs. My original attempted fix created MultiVectors from DualViews with stride(1) > extent(0). The issue manifested as some Belos tests failing. It turns out that no Tpetra tests must have been exercising reduce() with MultiVectors with this property.

Possible Solution

I have a fix ready.

Steps to Reproduce

  1. Create Kokkos::DualView dv_orig with M + S rows and N columns, where M, S, and N are positive integers.
  2. auto dv = Kokkos::subview (dv_orig, std::pair<size_t, size_t> (0, M), Kokkos::ALL ());
  3. Create Tpetra::MultiVector with a locally replicated Map (M rows per process, over MPI_COMM_WORLD) and dv.
  4. Call reduce() on the MultiVector. The results are wrong, even in a non-CUDA build.

Related Issues

  • Related to #4626 (closed), #4633 (closed)
Assignee
Assign to
Time tracking