Skip to content

Reduce ctest from 16 to 8 for serial GCC builds and fix path to cmake and ninja on hansen/shiller (#2976)

Created by: bartlettroscoe

CC: @fryeguy52

Description

This setup has a problem of having tests run on top of each other using the same cores with the GCC 4.9.3 serial executables. I tried passing different sets of arguments into 'mpiexec' but that just resulted in "There are not enough slots available in the system" errors for a bunch of tests. See #2976 (closed) for more details.

The only successful solution to ths problem will likely be an extension to ctest to control process and thread affinity and a close collaboration with MPI as described in #2422 and:

I also fixed the path for cmake and ninja in my home directory to point to /home/rabartl/. On the login this can be /ascldap/users/rabartl/ but on the compute nodes, that directory does not exist. This fixes being able to configure and build on the compute nodes on hansen/shiller with just using 'srun'.

I also removed the warning about using 'srun' over 'salloc'. I misunderstood how 'salloc' and 'srun' work.

Motivation and Context

This change was to resolve the test Tempus_DIRK_Combined_FSA_MPI_1 timeout (#2976 (closed)) but it will help other tests as well.

How Has This Been Tested?

I tested this by running the checkin-test-atdm.sh script for all of the gnu and intel builds on 'shiller' for the Tempus and Panzer test suites together. This avoided all timeouts and the longest running tests was as Panzer test under 500s (see details) below.

DETAILED TESTING: (click to expand)

To test this I ran:

$ srun ./checkin-test-atdm.sh gnu-debug-serial gnu-opt-serial \
  gnu-debug-openmp gnu-opt-openmp intel-debug-serial \
  intel-opt-serial intel-debug-openmp intel-opt-openmp \
  --enable-packages=Tempus,Panzer --local-do-all

which returned:

PASSED (NOT READY TO PUSH): Trilinos: shiller02

Wed Jun 20 06:58:34 MDT 2018

Enabled Packages: Tempus, Panzer

Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-serial => passed: passed=191,notpassed=0 (39.86 min)
2) gnu-opt-serial => passed: passed=191,notpassed=0 (28.14 min)
3) gnu-debug-openmp => passed: passed=191,notpassed=0 (27.26 min)
4) gnu-opt-openmp => passed: passed=191,notpassed=0 (18.07 min)
5) intel-debug-serial => passed: passed=190,notpassed=0 (46.27 min)
6) intel-opt-serial => passed: passed=191,notpassed=0 (39.63 min)
7) intel-debug-openmp => passed: passed=191,notpassed=0 (51.48 min)
8) intel-opt-openmp => passed: passed=191,notpassed=0 (42.71 min)

The most expensive tests for these builds were:

$ for build_dir in gnu-debug-serial gnu-opt-serial gnu-debug-openmp gnu-opt-openmp intel-debug-serial intel-opt-serial intel-debug-openmp intel-opt-openmp ; do ./print_expensive_tests.sh $build_dir/ctest.out ; done
 
***
*** gnu-debug-serial/ctest.out: 10 most expensive tests
***

191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed  487.86 sec
162/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed  365.03 sec
160/191 Test  #28: Tempus_IMEX_RK_Combined_FSA_MPI_1 ................................   Passed  283.76 sec
129/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed  266.60 sec
142/191 Test  #23: Tempus_DIRK_ASA_MPI_1 ............................................   Passed  252.24 sec
158/191 Test  #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 ....................   Passed  241.93 sec
175/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed  231.65 sec
176/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 .......   Passed  178.27 sec
 16/191 Test   #4: Tempus_BackwardEuler_Combined_FSA_MPI_1 ..........................   Passed  120.78 sec
 17/191 Test   #9: Tempus_BDF2_Combined_FSA_MPI_1 ...................................   Passed  114.67 sec

***
*** gnu-opt-serial/ctest.out: 10 most expensive tests
***

162/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed  313.27 sec
160/191 Test  #28: Tempus_IMEX_RK_Combined_FSA_MPI_1 ................................   Passed  293.16 sec
129/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed  266.48 sec
131/191 Test  #23: Tempus_DIRK_ASA_MPI_1 ............................................   Passed  266.43 sec
142/191 Test  #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 ....................   Passed  248.29 sec
191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed  218.58 sec
 28/191 Test   #4: Tempus_BackwardEuler_Combined_FSA_MPI_1 ..........................   Passed  140.25 sec
 27/191 Test   #9: Tempus_BDF2_Combined_FSA_MPI_1 ...................................   Passed  131.86 sec
171/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............   Passed   87.50 sec
172/191 Test #164: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 .....   Passed   86.35 sec

***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***

191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed  376.21 sec
190/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 .......   Passed   95.15 sec
151/191 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   79.01 sec
177/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   63.11 sec
 94/191 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   60.12 sec
 90/191 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   58.20 sec
107/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   53.13 sec
148/191 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   50.30 sec
 86/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   46.50 sec
 34/191 Test   #8: Tempus_BDF2_MPI_1 ................................................   Passed   45.16 sec

***
*** gnu-opt-openmp/ctest.out: 10 most expensive tests
***

191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   59.29 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed   33.59 sec
108/191 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   29.95 sec
 64/191 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   25.23 sec
 56/191 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   24.64 sec
 40/191 Test   #8: Tempus_BDF2_MPI_1 ................................................   Passed   22.73 sec
 57/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   15.85 sec
 63/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   15.52 sec
172/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............   Passed   15.22 sec
105/191 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   14.88 sec

***
*** intel-debug-serial/ctest.out: 10 most expensive tests
***

190/190 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 .......   Passed  162.56 sec
154/190 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   86.03 sec
187/190 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   86.02 sec
103/190 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   64.28 sec
 98/190 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   62.80 sec
150/190 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   60.35 sec
 95/190 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   58.11 sec
 90/190 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   55.28 sec
107/190 Test  #24: Tempus_HHTAlpha_MPI_1 ............................................   Passed   51.83 sec
173/190 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............   Passed   49.14 sec

***
*** intel-opt-serial/ctest.out: 10 most expensive tests
***

191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   77.14 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed   59.31 sec
125/191 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   26.64 sec
172/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............   Passed   21.24 sec
 70/191 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   19.30 sec
 54/191 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   17.86 sec
173/191 Test #164: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 .....   Passed   16.70 sec
 69/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   16.14 sec
109/191 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   16.01 sec
 55/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   14.68 sec


***
*** intel-debug-openmp/ctest.out: 10 most expensive tests
***

191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed  464.57 sec
190/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 .......   Passed  126.60 sec
155/191 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   87.61 sec
107/191 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   66.78 sec
176/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   66.29 sec
111/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   65.51 sec
152/191 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   64.48 sec
 97/191 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   63.29 sec
 85/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   53.22 sec
 72/191 Test   #8: Tempus_BDF2_MPI_1 ................................................   Passed   53.19 sec

***
*** intel-opt-openmp/ctest.out: 10 most expensive tests
***

191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 .....   Passed   58.28 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 .......   Passed   35.93 sec
114/191 Test  #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................   Passed   24.27 sec
 63/191 Test  #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 .............................   Passed   18.35 sec
 62/191 Test   #3: Tempus_BackwardEuler_MPI_1 .......................................   Passed   18.29 sec
 73/191 Test  #20: Tempus_DIRK_Combined_FSA_MPI_1 ...................................   Passed   16.48 sec
 72/191 Test  #21: Tempus_DIRK_Staggered_FSA_MPI_1 ..................................   Passed   16.07 sec
113/191 Test  #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ...................   Passed   15.96 sec
 38/191 Test   #8: Tempus_BDF2_MPI_1 ................................................   Passed   15.84 sec
 92/191 Test  #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 ....................   Passed   13.38 sec

All tests were under 500 sec so hopefully that will take care of these timeouts on 'hansen' once and for all.

Checklist

  • My commit messages mention the appropriate GitHub issue numbers.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • All new and existing tests passed.
  • No new compiler warnings were introduced.

Merge request reports