Tpetra: Improve thread scalability of Import/Export
Created by: mhoemmen
@trilinos/tpetra Epic: #796
[mfh edit 13 Jul 2017: promote the transferAndFillComplete task, #802 (closed), into its own story]
This involves several tasks, not all fully identified:
-
Change Tpetra::Details::PackTraits to support thread-parallel pack & unpack (#798 (closed)) -
Make CrsGraph use the new PackTraits interface to do thread-parallel pack & unpack (#799) -
Ditto for CrsMatrix (#800 (closed)) -
Ditto for BlockCrsMatrix (#801) -
Ditto for CrsMatrix::transferAndFillComplete (#802 (closed))
For #800 (closed), #801, and #802 (closed), we need to do performance tests to make sure that the changes thread-scale without sacrificing performance in the MPI-only case. It may make sense to have a non-threaded implementation if the number of threads is 1. (Some users may use Tpetra's OpenMP back-end without realizing it, but run with 1 thread per MPI process. That's why this should be a run-time decision rather than a decision based on the back-end type.)