Tpetra: Measure thread scaling of solver kernels, without MPI
Created by: mhoemmen
@trilinos/tpetra Epic: #820 (@jjellio, I can't assign you yet because you're not part of the Trilinos organization. See #819 (closed). After that's done, I'll assign you too :-) .)
Measure thread scaling of solver kernels (mainly OpenMP, Haswell and KNL, though other platforms are welcome), without MPI. Comparing performance both without and with MPI is important, because that removes the factor of MPI communication and associated Tpetra pack and unpack work. (The latter isn't yet all threaded. See #799, #800 (closed), and #801, though those don't concern the MultiVector communication that takes place in CrsMatrix::apply.)
It's OK to use an MPI-enabled build of Trilinos for this. Just run with 1 MPI process. Tpetra will skip MPI communication and pack / unpack in that case. For threads, it's OK to restrict the run to a single NUMA domain, though it's important to make sure that the threads actually stay within that NUMA domain (@jjellio knows all about this fun ;-) ).