Issues with TentativePFactory for "use kokkos refactor"=true on KNL?
Created by: pwxy
I built drekar for develop Trilinos (cloned repo early Mon Aug 14 morning).
I built on ellis for KNL with intel 17 compiler with the following cmake options
-D Tpetra_ENABLE_MMM_Timings=ON \
-D MueLu_ENABLE_Experimental:BOOL=ON \
-D MueLu_ENABLE_Kokkos_Refactor:BOOL=ON \
-D Xpetra_ENABLE_Experimental:BOOL=ON \
-D Xpetra_ENABLE_Kokkos_Refactor:BOOL=ON \
the MueLu setup time for the drekar run was 14.6 sec for 1 MPI with 16 OMP threads.
I then added:
<Parameter name="use kokkos refactor" type="bool" value="true"/>
to the MueLu parameter list
and the MueLu setup time increased to 3557s, or 244x slower
Here are all the MueLu setup timers that are over 100s:
3557.00000 MueLu: N5MueLu9HierarchyIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Setup (total)
3557.00000 MueLu: N5MueLu9HierarchyIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Setup (total, level=1)
3557.00000 MueLu: N5MueLu10RAPFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Computing Ac (total)
3557.00000 MueLu: N5MueLu18RepartitionFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total)
3557.00000 MueLu: N5MueLu24RebalanceTransferFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total, level=1)
3557.00000 MueLu: N5MueLu18RepartitionFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total, level=1)
3557.00000 MueLu: N5MueLu24RebalanceTransferFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total)
3556.00000 MueLu: N5MueLu10RAPFactoryIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Computing Ac (total, level=1)
3554.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total)
3554.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total, level=1)
3391.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Stage 1 (LocalQR) (sub, total)
3391.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (level=1)
3391.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build
3391.00000 MueLu: N5MueLu24TentativePFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Stage 1 (LocalQR) (sub, total, level=1)
163.10000 MueLu: N5MueLu34UncoupledAggregationFactory_kokkosIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total)
163.00000 MueLu: N5MueLu34UncoupledAggregationFactory_kokkosIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total, level=0)
161.10000 MueLu: N5MueLu26CoalesceDropFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total)
161.10000 MueLu: N5MueLu26CoalesceDropFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (total, level=0)
160.90000 MueLu: N5MueLu26CoalesceDropFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build (level=0)
160.90000 MueLu: N5MueLu26CoalesceDropFactory_kokkosIdixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE: Build
Edit (@aprokop): added quoting