MueLu: catch-all for KokkosRefactor branch tasks
Created by: aprokop
-
Introduce a performance monitoring framework to monitor performance results Could be similar to interface test. Could use performance file structure. -
Implement new smoothed prolongator construction [#1743] -
Kokkosify AmalgamationInfo It should be the only place to do node <-> dof transformations -
Optimize block CoalesceDrop -
Get rid of ArrayRCP
in UncoupledAggregation -
Replace LWGraph construction by a wrapper? -
Add a single value set function specialized on the device. This should allow us to skip initializing even rows in crs graph. -
Add RandomAccessTraits
when applicable -
Check Chebyshev smoother setup difference between Serial and OMP_NUM_THREADS=1 -
Use KOKKOS_FORCEINLINE_FUNCTION
? I'm not sure when to use it instead of regularKOKKOS_INLINE_FUNCTION
. -
Optimize block Tentative P Even after the rewrite, the Tentative P is dog slow compared to regular. Is it because of shared memory? -
Try optimizing GetOverlappedDiagonal
Would Tpetra'sgetLocalDiagOffsets
with the newStaticCrsGraph::rowConst()
be faster? It would not need to create a subview... -
Distance Laplacian functor -
Parallelize loops in Aggregates_kokkos::GetGraph()
-
Use Aggregates::GetGraph()
in CoordinatesTransferFactory -
Bypass amalgamation for blkSize=1
in CoordinatesTransferFactory -
Get rid of ArrayRCP
in Aggregates -
Get rid of subview creation in LWGraph_kokkos::getNeighborVertices
? -
Use ViewAllocateWithoutInitializing
? See kokkos/kokkos#1073 -
Add FIXME_KOKKOS
comments to code to indicate things to look at. -
Fix unit tests See #1686 (closed)