Tpetra::CrsMatrix: Optimize sumIntoLocalValues for common case where input is shorter than # entries in matrix row
Created by: mhoemmen
@trilinos/tpetra CC: @crtrott, @nmhamster
- Strip-mine input indices and values by some small compile-time constant K (e.g., 8)
- Copy those K indices and values into a temporary buffer of length K
- #887 (closed): Sort the indices and values jointly by indices (Tpetra::sort2 or equivalent algorithm)
- Use (sorted, sorted) search (different than findRelOffset) so that you pass over the entries in the matrix's row at most once