TpetraExt::MatrixMatrix::add thread scaling needs improvement
Created by: pwxy
Trilinos was built with kokkos node type OpenMP (intel 17.0.2 compiler) Looked at thread scaling for a single KNL. Running with 1 MPI process and increasing number of threads from 1 to 64 (so have one thread per core).
t | TpetraExt::MatrixMatrix::add (seconds) |
---|---|
1 | 1154.0 |
2 | 799.8 |
4 | 620.5 |
8 | 527.3 |
16 | 483.9 |
32 | 467.4 |
64 | 461.7 |
Most of the TpetraExt::MatrixMatrix::add time is due to Tpetra::CrsMatrix::fillComplete The suboptimal scaling of TpetraExt::MatrixMatrix::add is due the suboptimal scaling of Tpetra::CrsMatrix::fillComplete fillComplete time is very roughly 1/3 time Tpetra::CrsGraph::makeColMap, 1/3 time Tpetra::CrsMatrix::sortAndMergeIndicesAndValues, and 1/3 time Tpetra::CrsGraph::makeIndicesLocal
Thread scalability of CrsMatrix::fillComplete has already been reported in: Tpetra: Improve thread scalability of CrsMatrix::fillComplete #829 (closed)