Disable some individual Kokkos and KokkosKernels tests on a few more full debug builds and disable the Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests on a few more platforms
Created by: bartlettroscoe
CC: @fryeguy52
Description
This PR branch contains commits to disable a few of the individual Kokkos and KokkosKernels unit tests for some full debug builds (see the careful analysis and identification of these individual unit tests in https://github.com/trilinos/Trilinos/issues/2827#issuecomment-397688803) (see #2827 (closed)).
This PR branch also contains commits to disable the tests Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4
in a few other builds on white/ride where then have been seen to be randomly failing with hitting the 100 iteration max (see https://github.com/trilinos/Trilinos/issues/2920#issuecomment-397834016 and https://github.com/trilinos/Trilinos/issues/2920#issuecomment-398058965).
I also updated some documentation in the cmake/std/atdm/README.md
file about using checkin-test-atdm.sh on hansen/shiller.
Motivation and Context
These tests fairly regularly fail in automated ATDM testing. Going forward we can't tolerate known randomly failing tests. Such tests will destroy automated processes to update Trilinos for that ATDM APP codes.
How Has This Been Tested?
I ran the builds and tests manually on 'shiller' and 'ride' and verified that with these changes all of the Kokkos and KokkosKernels tests in the debug builds far under the 600 sec timeout (but one test was as high as 447 sec).
See details of the testing below.
DETAILED TEST RESULTS: (click to expand)
A) Testing on 'shiller'
$ ./checkin-test-atdm.sh \
gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openmp \
cuda-8.0-debug cuda-9.0-debug \
--enable-packages=Kokkos,KokkosKernels --configure
$ /usr/bin/srun ./checkin-test-atdm.sh \
gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openmp \
cuda-8.0-debug cuda-9.0-debug \
--enable-packages=Kokkos,KokkosKernels --build --test
returned
PASSED (NOT READY TO PUSH): Trilinos: shiller02
Sat Jun 16 14:00:58 MDT 2018
Enabled Packages: Kokkos, KokkosKernels
Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-serial => passed: passed=28,notpassed=0 (3.41 min)
2) gnu-debug-openmp => passed: passed=35,notpassed=0 (5.22 min)
3) intel-debug-serial => passed: passed=28,notpassed=0 (4.18 min)
4) intel-debug-openmp => passed: passed=35,notpassed=0 (4.75 min)
5) cuda-8.0-debug => passed: passed=35,notpassed=0 (5.39 min)
6) cuda-9.0-debug => passed: passed=35,notpassed=0 (4.58 min)
Using the script:
$ cat print_expensive_tests.sh
#!/bin/bash
ctest_out_file=$1
n_most_expensive_tests=10
echo
echo "***"
echo "*** $ctest_out_file: $n_most_expensive_tests most expensive tests"
echo "***"
echo
cat $ctest_out_file | grep "Test" | grep "Passed" | sort -rn -k 7 | head -n $n_most_expensive_tests
Looking for expensive tests with:
$for build_name in gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openm cuda-8.0-debug cuda-9.0-debug ; do ./print_expensive_; done | lessld_name/ctest.out ;
***
*** gnu-debug-serial/ctest.out: 10 most expensive tests
***
28/28 Test #26: KokkosKernels_sparse_serial_MPI_1 ................ Passed 204.28 sec
27/28 Test #23: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 203.75 sec
26/28 Test #27: KokkosKernels_graph_serial_MPI_1 ................. Passed 177.73 sec
25/28 Test #25: KokkosKernels_blas_serial_MPI_1 .................. Passed 105.65 sec
24/28 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 77.93 sec
23/28 Test #24: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 51.48 sec
16/28 Test #4: KokkosCore_UnitTest_PushFinalizeHook_terminate ... Passed 11.30 sec
7/28 Test #11: KokkosCore_UnitTest_DefaultInit_7_MPI_1 .......... Passed 10.11 sec
8/28 Test #10: KokkosCore_UnitTest_DefaultInit_6_MPI_1 .......... Passed 10.10 sec
9/28 Test #12: KokkosCore_UnitTest_DefaultInit_8_MPI_1 .......... Passed 10.09 sec
***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***
35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................ Passed 313.07 sec
34/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 236.03 sec
33/35 Test #29: KokkosKernels_sparse_openmp_MPI_1 ................ Passed 225.57 sec
32/35 Test #34: KokkosKernels_graph_serial_MPI_1 ................. Passed 199.32 sec
31/35 Test #26: KokkosContainers_UnitTest_OpenMP_MPI_1 ........... Passed 139.10 sec
30/35 Test #30: KokkosKernels_graph_openmp_MPI_1 ................. Passed 121.48 sec
29/35 Test #32: KokkosKernels_blas_serial_MPI_1 .................. Passed 114.06 sec
28/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 98.52 sec
27/35 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 83.38 sec
26/35 Test #2: KokkosCore_UnitTest_OpenMP_MPI_1 ................. Passed 73.56 sec
***
*** intel-debug-serial/ctest.out: 10 most expensive tests
***
28/28 Test #26: KokkosKernels_sparse_serial_MPI_1 ................ Passed 250.78 sec
27/28 Test #23: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 226.06 sec
26/28 Test #27: KokkosKernels_graph_serial_MPI_1 ................. Passed 212.57 sec
25/28 Test #25: KokkosKernels_blas_serial_MPI_1 .................. Passed 114.90 sec
24/28 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 90.55 sec
23/28 Test #24: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 44.57 sec
20/28 Test #4: KokkosCore_UnitTest_PushFinalizeHook_terminate ... Passed 4.39 sec
9/28 Test #3: KokkosCore_UnitTest_PushFinalizeHook_MPI_1 ....... Passed 2.98 sec
4/28 Test #15: KokkosCore_UnitTest_DefaultInit_11_MPI_1 ......... Passed 2.43 sec
3/28 Test #11: KokkosCore_UnitTest_DefaultInit_7_MPI_1 .......... Passed 2.43 sec
***
*** intel-debug-openm/ctest.out: 10 most expensive tests
***
cat: intel-debug-openm/ctest.out: No such file or directory
***
*** cuda-8.0-debug/ctest.out: 10 most expensive tests
***
35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................ Passed 323.10 sec
34/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 264.12 sec
33/35 Test #34: KokkosKernels_graph_serial_MPI_1 ................. Passed 214.00 sec
32/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 .................. Passed 192.78 sec
5/35 Test #2: KokkosCore_UnitTest_Cuda_MPI_1 ................... Passed 118.32 sec
4/35 Test #32: KokkosKernels_blas_serial_MPI_1 .................. Passed 116.83 sec
2/35 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 84.91 sec
9/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 61.74 sec
1/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ................... Passed 61.74 sec
14/35 Test #28: KokkosKernels_blas_cuda_MPI_1 .................... Passed 31.59 sec
***
*** cuda-9.0-debug/ctest.out: 10 most expensive tests
***
35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................ Passed 274.38 sec
34/35 Test #34: KokkosKernels_graph_serial_MPI_1 ................. Passed 268.87 sec
33/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 238.93 sec
32/35 Test #2: KokkosCore_UnitTest_Cuda_MPI_1 ................... Passed 148.07 sec
31/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 .................. Passed 141.35 sec
7/35 Test #32: KokkosKernels_blas_serial_MPI_1 .................. Passed 115.60 sec
2/35 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 84.65 sec
1/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 58.47 sec
3/35 Test #28: KokkosKernels_blas_cuda_MPI_1 .................... Passed 44.03 sec
4/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ................... Passed 24.78 sec
All of those look well under 600 sec.
**B) Testing on 'ride':
$ bsub -x -Is -q rhel7F -n 16 ctest -j16 \
./checkin-test-atdm.sh gnu-debug-openmp cuda-debug \
--enable-packages=Kokkos,KokkosKernels --local-do-all
returned:
PASSED (NOT READY TO PUSH): Trilinos: ride11
Sat Jun 16 13:43:17 MDT 2018
Enabled Packages: Kokkos, KokkosKernels
Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-openmp => passed: passed=35,notpassed=0 (7.66 min)
2) cuda-debug => passed: passed=35,notpassed=0 (5.94 min)
Looking for expensive tests with:
$ for build_name in gnu-debug-openmp cuda-debug ; do ./print_expensive_tests.sh $build_name/ctest.out ; done
***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***
35/35 Test #29: KokkosKernels_sparse_openmp_MPI_1 ................ Passed 447.45 sec
34/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................ Passed 366.10 sec
33/35 Test #26: KokkosContainers_UnitTest_OpenMP_MPI_1 ........... Passed 354.53 sec
32/35 Test #30: KokkosKernels_graph_openmp_MPI_1 ................. Passed 347.01 sec
31/35 Test #34: KokkosKernels_graph_serial_MPI_1 ................. Passed 323.47 sec
30/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 286.63 sec
29/35 Test #32: KokkosKernels_blas_serial_MPI_1 .................. Passed 274.02 sec
28/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 236.46 sec
27/35 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 200.46 sec
26/35 Test #2: KokkosCore_UnitTest_OpenMP_MPI_1 ................. Passed 196.63 sec
***
*** cuda-debug/ctest.out: 10 most expensive tests
***
35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................ Passed 322.17 sec
34/35 Test #34: KokkosKernels_graph_serial_MPI_1 ................. Passed 275.03 sec
33/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ........... Passed 267.83 sec
32/35 Test #2: KokkosCore_UnitTest_Cuda_MPI_1 ................... Passed 259.72 sec
31/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 .................. Passed 240.50 sec
3/35 Test #1: KokkosCore_UnitTest_Serial_MPI_1 ................. Passed 156.26 sec
2/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 .................. Passed 136.87 sec
1/35 Test #32: KokkosKernels_blas_serial_MPI_1 .................. Passed 136.85 sec
30/35 Test #28: KokkosKernels_blas_cuda_MPI_1 .................... Passed 49.27 sec
14/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ................... Passed 39.16 sec
The most expensive test was 447 sec but that is pretty far below 600 sec so hopefully we will be okay.
Checklist
-
My commit messages mention the appropriate GitHub issue numbers. -
All new and existing tests passed.