Tpetra::Details::PackTraits: Change interface to support thread-parallel pack & unpack
Created by: mhoemmen
@trilinos/tpetra @trilinos/stokhos Story: #797 Blocks: #802 (closed) (and #800 (closed), if we implement #802 (closed) right)
PackTraits currently packs a row of a sparse matrix at a time, and is only callable from the host. We want to change it in order to do all the packing at once. This will likely involve a parallel scan to compute offsets, followed by (or fused with) a parallel_for to write each row's data into the output buffer. We can avoid aliasing issues by always casting to char*
before copying.
The best thing to do may be for PackTraits still to pack one row at a time, but to be a KOKKOS_FUNCTION (e.g., __device__ __host__
, thus callable in a CUDA kernel). We can then roll the parallel pack (see above) on top of that.
tpetra/core/src/Tpetra_Import_Util2.hpp and tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_def.hpp are currently the only Tpetra files that use the PackArray interface. CrsGraph and CrsMatrix should use it too, but they currently do not.
This affects Stokhos because Stokhos specializes PackTraits for some of the Scalar types that it defines.