Tpetra: Graph / matrix insert doesn't merge, taking extra space & hindering thread parallelism
Created by: mhoemmen
@trilinos/tpetra
CrsGraph::insert{Local,Global}Indices and CrsMatrix::insert{Local,Global}Values currently do something nonintuitive: multiple inserts to the same row and column index are stored separately and not merged until fillComplete. For example, inserting (1,1) into a CrsGraph 10 times would require storing 10 entries, until fillComplete, at which point the entries get merged together into a single entry. This is especially bad for StaticProfile, which currently would counterintuitively fail on 9 of those 10 inserts if the user reasonably gave CrsGraph an upper bound of 1 entry per row. We don't want users to have to rely on DynamicProfile, which is both slow and (especially due to this issue) memory-intensive.
Commit 68e77d53 begins the process of fixing this. It does not yet change the behavior of CrsGraph or CrsMatrix. For now, Tpetra has new internal utility functions for merging indices (for CrsGraph) or indices and values together (for CrsMatrix). I also added some unit tests for the new functions. However, they still need to be integrated into CrsGraph and CrsMatrix. My initial attempts broke a lot of invariants and made a lot of tests fail. I realize I'll have to do this VERY cautiously.