Reduce ctest from 16 to 8 for serial GCC builds and fix path to cmake and ninja on hansen/shiller (#2976)
Created by: bartlettroscoe
CC: @fryeguy52
Description
This setup has a problem of having tests run on top of each other using the same cores with the GCC 4.9.3 serial executables. I tried passing different sets of arguments into 'mpiexec' but that just resulted in "There are not enough slots available in the system" errors for a bunch of tests. See #2976 (closed) for more details.
The only successful solution to ths problem will likely be an extension to ctest to control process and thread affinity and a close collaboration with MPI as described in #2422 and:
I also fixed the path for cmake and ninja in my home directory to point to /home/rabartl/. On the login this can be /ascldap/users/rabartl/ but on the compute nodes, that directory does not exist. This fixes being able to configure and build on the compute nodes on hansen/shiller with just using 'srun'.
I also removed the warning about using 'srun' over 'salloc'. I misunderstood how 'salloc' and 'srun' work.
Motivation and Context
This change was to resolve the test Tempus_DIRK_Combined_FSA_MPI_1
timeout (#2976 (closed)) but it will help other tests as well.
How Has This Been Tested?
I tested this by running the checkin-test-atdm.sh
script for all of the gnu
and intel
builds on 'shiller' for the Tempus and Panzer test suites together. This avoided all timeouts and the longest running tests was as Panzer test under 500s (see details) below.
DETAILED TESTING: (click to expand)
To test this I ran:
$ srun ./checkin-test-atdm.sh gnu-debug-serial gnu-opt-serial \
gnu-debug-openmp gnu-opt-openmp intel-debug-serial \
intel-opt-serial intel-debug-openmp intel-opt-openmp \
--enable-packages=Tempus,Panzer --local-do-all
which returned:
PASSED (NOT READY TO PUSH): Trilinos: shiller02
Wed Jun 20 06:58:34 MDT 2018
Enabled Packages: Tempus, Panzer
Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-serial => passed: passed=191,notpassed=0 (39.86 min)
2) gnu-opt-serial => passed: passed=191,notpassed=0 (28.14 min)
3) gnu-debug-openmp => passed: passed=191,notpassed=0 (27.26 min)
4) gnu-opt-openmp => passed: passed=191,notpassed=0 (18.07 min)
5) intel-debug-serial => passed: passed=190,notpassed=0 (46.27 min)
6) intel-opt-serial => passed: passed=191,notpassed=0 (39.63 min)
7) intel-debug-openmp => passed: passed=191,notpassed=0 (51.48 min)
8) intel-opt-openmp => passed: passed=191,notpassed=0 (42.71 min)
The most expensive tests for these builds were:
$ for build_dir in gnu-debug-serial gnu-opt-serial gnu-debug-openmp gnu-opt-openmp intel-debug-serial intel-opt-serial intel-debug-openmp intel-opt-openmp ; do ./print_expensive_tests.sh $build_dir/ctest.out ; done
***
*** gnu-debug-serial/ctest.out: 10 most expensive tests
***
191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 487.86 sec
162/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 365.03 sec
160/191 Test #28: Tempus_IMEX_RK_Combined_FSA_MPI_1 ................................ Passed 283.76 sec
129/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 266.60 sec
142/191 Test #23: Tempus_DIRK_ASA_MPI_1 ............................................ Passed 252.24 sec
158/191 Test #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 .................... Passed 241.93 sec
175/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 231.65 sec
176/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 ....... Passed 178.27 sec
16/191 Test #4: Tempus_BackwardEuler_Combined_FSA_MPI_1 .......................... Passed 120.78 sec
17/191 Test #9: Tempus_BDF2_Combined_FSA_MPI_1 ................................... Passed 114.67 sec
***
*** gnu-opt-serial/ctest.out: 10 most expensive tests
***
162/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 313.27 sec
160/191 Test #28: Tempus_IMEX_RK_Combined_FSA_MPI_1 ................................ Passed 293.16 sec
129/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 266.48 sec
131/191 Test #23: Tempus_DIRK_ASA_MPI_1 ............................................ Passed 266.43 sec
142/191 Test #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 .................... Passed 248.29 sec
191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 218.58 sec
28/191 Test #4: Tempus_BackwardEuler_Combined_FSA_MPI_1 .......................... Passed 140.25 sec
27/191 Test #9: Tempus_BDF2_Combined_FSA_MPI_1 ................................... Passed 131.86 sec
171/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............ Passed 87.50 sec
172/191 Test #164: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 ..... Passed 86.35 sec
***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***
191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 376.21 sec
190/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 ....... Passed 95.15 sec
151/191 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 79.01 sec
177/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 63.11 sec
94/191 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 60.12 sec
90/191 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 58.20 sec
107/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 53.13 sec
148/191 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 50.30 sec
86/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 46.50 sec
34/191 Test #8: Tempus_BDF2_MPI_1 ................................................ Passed 45.16 sec
***
*** gnu-opt-openmp/ctest.out: 10 most expensive tests
***
191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 59.29 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 33.59 sec
108/191 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 29.95 sec
64/191 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 25.23 sec
56/191 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 24.64 sec
40/191 Test #8: Tempus_BDF2_MPI_1 ................................................ Passed 22.73 sec
57/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 15.85 sec
63/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 15.52 sec
172/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............ Passed 15.22 sec
105/191 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 14.88 sec
***
*** intel-debug-serial/ctest.out: 10 most expensive tests
***
190/190 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 ....... Passed 162.56 sec
154/190 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 86.03 sec
187/190 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 86.02 sec
103/190 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 64.28 sec
98/190 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 62.80 sec
150/190 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 60.35 sec
95/190 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 58.11 sec
90/190 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 55.28 sec
107/190 Test #24: Tempus_HHTAlpha_MPI_1 ............................................ Passed 51.83 sec
173/190 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............ Passed 49.14 sec
***
*** intel-opt-serial/ctest.out: 10 most expensive tests
***
191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 77.14 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 59.31 sec
125/191 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 26.64 sec
172/191 Test #160: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ............ Passed 21.24 sec
70/191 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 19.30 sec
54/191 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 17.86 sec
173/191 Test #164: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 ..... Passed 16.70 sec
69/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 16.14 sec
109/191 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 16.01 sec
55/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 14.68 sec
***
*** intel-debug-openmp/ctest.out: 10 most expensive tests
***
191/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 464.57 sec
190/191 Test #168: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 ....... Passed 126.60 sec
155/191 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 87.61 sec
107/191 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 66.78 sec
176/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 66.29 sec
111/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 65.51 sec
152/191 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 64.48 sec
97/191 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 63.29 sec
85/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 53.22 sec
72/191 Test #8: Tempus_BDF2_MPI_1 ................................................ Passed 53.19 sec
***
*** intel-opt-openmp/ctest.out: 10 most expensive tests
***
191/191 Test #165: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ..... Passed 58.28 sec
190/191 Test #169: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 ....... Passed 35.93 sec
114/191 Test #15: Tempus_ExplicitRK_Staggered_FSA_MPI_1 ............................ Passed 24.27 sec
63/191 Test #14: Tempus_ExplicitRK_Combined_FSA_MPI_1 ............................. Passed 18.35 sec
62/191 Test #3: Tempus_BackwardEuler_MPI_1 ....................................... Passed 18.29 sec
73/191 Test #20: Tempus_DIRK_Combined_FSA_MPI_1 ................................... Passed 16.48 sec
72/191 Test #21: Tempus_DIRK_Staggered_FSA_MPI_1 .................................. Passed 16.07 sec
113/191 Test #32: Tempus_IMEX_RK_Partitioned_Staggered_FSA_MPI_1 ................... Passed 15.96 sec
38/191 Test #8: Tempus_BDF2_MPI_1 ................................................ Passed 15.84 sec
92/191 Test #31: Tempus_IMEX_RK_Partitioned_Combined_FSA_MPI_1 .................... Passed 13.38 sec
All tests were under 500 sec so hopefully that will take care of these timeouts on 'hansen' once and for all.
Checklist
-
My commit messages mention the appropriate GitHub issue numbers. -
My change requires a change to the documentation. -
I have updated the documentation accordingly. -
All new and existing tests passed. -
No new compiler warnings were introduced.