Skip to content

Improve cee-rhel6 intel-18.0.2 OpenMP test suite runtimes (#4251, #4260, #4262)

Created by: bartlettroscoe

This uses MPI and OpenMP settings discovered by Brad King (Kitware) that massively improve the runtime of the ATDM Trilinos test suite for the build Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt bringing down the time from currently over 2 hours as shown here to less than 9 minutes (see below).

This also seems to have fixed the timing out tests reported in #4251 (closed) #4260 (closed), and #4262 (closed).

NOTE: The following tests FAILED:

1477 - STKBalance_stk_balance_MPI_4 (Failed)
1478 - STKBalance_stk_balance_m2n_MPI_4 (Failed)

But these tests also fail in the promoted cee-rhel6-intel-17.0.1 build so there is no reason to block the promotion of this build.

How this was tested

I tested this on 'ceerws1113' where the nightly build Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt runs using:

$ cd /scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/

$  ./checkin-test-atdm-cee-rhel6.sh \
    cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt \
    --enable-all-packages=on --local-do-al

which gave the result:

  FAILED: Trilinos/cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt: passed=2207,notpassed=2
  
  Fri Mar 22 13:29:21 MDT 2019
  
  Enabled Packages: 
  Enabled all Packages
  Hostname: ceerws1113
  Source Dir: /scratch/rabartl/Trilinos.base/Trilinos/cmake/tribits/ci_support/../../..
  Build Dir: /scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt
  
  CMake Cache Varibles: -GNinja -DTrilinos_TRIBITS_DIR:PATH=/scratch/rabartl/Trilinos.base/Trilinos/cmake/tribits -DTrilinos_ENABLE_TESTS:BOOL=ON -DTrilinos_TEST_CATEGORIES:STRING=NIGHTLY -DTrilinos_ALLOW_NO_PACKAGES:BOOL=OFF -DDART_TESTING_TIMEOUT:STRING=600.0 -GNinja -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake -DTrilinos_TRACE_ADD_TEST=ON -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_PACKAGES:BOOL=ON -DTrilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES:BOOL=OFF
  Make Options: -j 16
  CTest Options: -j 8
  
  Pull: Not Performed
  Configure: Passed (2.93 min)
  Build: Passed (85.43 min)
  Test: FAILED (8.33 min)
  
  99% tests passed, 2 tests failed out of 2209
  
  Subproject Time Summary:
  Amesos2          =   3.42 sec*proc (8 tests)
  Anasazi          =  62.98 sec*proc (74 tests)
  Belos            = 127.14 sec*proc (100 tests)
  Ifpack2          =  27.99 sec*proc (47 tests)
  Intrepid2        = 117.01 sec*proc (260 tests)
  Kokkos           =  93.56 sec*proc (27 tests)
  KokkosKernels    =  94.87 sec*proc (8 tests)
  MueLu            = 602.93 sec*proc (110 tests)
  NOX              =  69.72 sec*proc (105 tests)
  Panzer           = 1623.80 sec*proc (168 tests)
  Phalanx          =   3.44 sec*proc (27 tests)
  Piro             =   9.74 sec*proc (13 tests)
  ROL              = 267.78 sec*proc (145 tests)
  Rythmos          =  19.24 sec*proc (83 tests)
  SEACAS           =  52.82 sec*proc (42 tests)
  STK              =   1.49 sec*proc (5 tests)
  Sacado           =  20.37 sec*proc (297 tests)
  ShyLU_Node       =   9.98 sec*proc (6 tests)
  Stratimikos      =  13.07 sec*proc (40 tests)
  Teko             =  27.99 sec*proc (18 tests)
  Tempus           = 202.00 sec*proc (80 tests)
  Teuchos          =  51.32 sec*proc (137 tests)
  Thyra            =  27.70 sec*proc (82 tests)
  Tpetra           = 187.14 sec*proc (198 tests)
  Xpetra           =  41.97 sec*proc (18 tests)
  Zoltan2          = 161.95 sec*proc (111 tests)
  
  Total Test time (real) = 499.92 sec
  
  The following tests FAILED:
  	1477 - STKBalance_stk_balance_MPI_4 (Failed)
  	1478 - STKBalance_stk_balance_m2n_MPI_4 (Failed)
  Errors while running CTest
  
  Total time for cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt = 96.70 min

NOTE: This was a rebuild so don't expect 83 min build time for the from-scratch nightly builds.

NOTE: The two failing tests shown above are also failing in the already promoted cee-rhel6-intel-17.0.1 build so this should not block the promotion of this build.

Merge request reports