Improve MueLu setup scaling for repartitioning by reducing comm_split calls
Created by: pwxy
This was actually Bug 6346 from https://software.sandia.gov//bugzilla that was originally filed over 2.5 years ago (June 4, 2015) but conveniently failed to get transferred to github: "Bug 6346 - Improve MueLu scaling for repartitioning by reducing comm_split calls."
Unfortunately I completely forgot about this issue from over 2.5 years ago, and had to spend a lot of time to track down this issue again.
Currently when repartitioning on coarser levels occurs, there are 3 MPI_Comm_split calls: for Ac, for coordinates and for null space. Seems that two of these Comm_split calls could be removed which would improve scaling for large numbers of MPI processes (> 100,000). The Comm_split calls are the majority of the time to rebalance coordinates and rebalance null space.
For the drekar steady-state Poisson solve for a 4.1 billion row matrix on 524,288 MPI processes on BG/Q, time to rebalance coordinates and rebalance null space are the second and third most expensive items for MueLu setup (since Chebyshev smoother setup time is cheap; it is a different story for RILUK setup time for MHD problems).
Stack trace for Comm_split call to rebalance coordinates (line numbers are from trilinos source over 2.5 years ago):
Teuchos_DefaultMpiComm.hpp 1729 (current dev Trilinos line 1663 Teuchos::MpiComm::split) Tpetra_Map_def.hpp 1151 (current dev Trilinos line 1787 Tpetra::Map::removeEmptyProcesses) Xpetra_TpetraMap.hpp 226 (current dev Trilinos line 226 Xpetra::TpetraMap::removeEmptyProcesses) MueLu_RebalanceTransferFactory_def.hpp 256 (current dev Trilinos line 260 MueLu::RebalanceTransferFactory::Build) MueLu_TwoLevelFactoryBase.hpp 153 MueLu_Level.hpp 203 MueLu_HierarchyHelpers_def.hpp 89 MueLu_Hierarchy_def.hpp 305 MueLu_HierarchyManager.hpp 197 MueLu_ParameterListInterpreter_def.hpp 1203 MueLu_CreateTpetraPreconditioner.hpp 146 Thyra_MueLuTpetraPreconditionerFactory_def.hpp 194 NOX_Thyra_Group.C 776 NOX_Thyra_Group.C 647 NOX_Thyra_Group.C 544
Stack trace for Comm_split call to rebalance null space (line numbers are from trilinos source over 2.5 years ago):
Teuchos_DefaultMpiComm.hpp 1729 Tpetra_Map_def.hpp 1151 Xpetra_TpetraMap.hpp 226 MueLu_RebalanceTransferFactory_def.hpp 275 MueLu_TwoLevelFactoryBase.hpp 153 MueLu_Level.hpp 203 MueLu_HierarchyHelpers_def.hpp 89 MueLu_Hierarchy_def.hpp 305 MueLu_HierarchyManager.hpp 197 MueLu_ParameterListInterpreter_def.hpp 1203 MueLu_CreateTpetraPreconditioner.hpp 146 Thyra_MueLuTpetraPreconditionerFactory_def.hpp 194 NOX_Thyra_Group.C 776 NOX_Thyra_Group.C 647 NOX_Thyra_Group.C 544
Stack trace for Comm_split call to rebalance Ac (line numbers are from trilinos source over 2.5 years ago):
Teuchos_DefaultMpiComm.hpp 1729 Tpetra_Map_def.hpp 1151 Tpetra_KokkosRefactor_CrsMatrix_def.hpp 7232 Tpetra_KokkosRefactor_CrsMatrix_def.hpp 7632 Tpetra_CrsMatrix_decl.hpp 2665 Xpetra_CrsMatrixFactory.hpp 447 Xpetra_MatrixFactory.hpp 125 MueLu_RebalanceAcFactory_def.hpp 105 MueLu_TwoLevelFactoryBase.hpp 153 MueLu_Level.hpp 203 MueLu_HierarchyHelpers_def.hpp 108 MueLu_Hierarchy_def.hpp 305 MueLu_HierarchyManager.hpp 197 MueLu_ParameterListInterpreter_def.hpp 1203 MueLu_CreateTpetraPreconditioner.hpp 146