Tpetra::MultiVector::reduce broken if getStride() > getLocalLength()
Created by: mhoemmen
@trilinos/tpetra @trilinos/belos @cgcgcg
getStride() > getLocalLength(), then
reduce() gives incorrect results.
Motivation and Context
I discovered this while working on a fix for #4626 (closed), a Belos performance issue on GPUs. My original attempted fix created MultiVectors from DualViews with
stride(1) > extent(0). The issue manifested as some Belos tests failing. It turns out that no Tpetra tests must have been exercising
reduce() with MultiVectors with this property.
I have a fix ready.
Steps to Reproduce
dv_origwith M + S rows and N columns, where M, S, and N are positive integers.
auto dv = Kokkos::subview (dv_orig, std::pair<size_t, size_t> (0, M), Kokkos::ALL ());
Tpetra::MultiVectorwith a locally replicated Map (M rows per process, over
reduce()on the MultiVector. The results are wrong, even in a non-CUDA build.