Skip to content

Disable some individual Kokkos and KokkosKernels tests on a few more full debug builds and disable the Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests on a few more platforms

Created by: bartlettroscoe

CC: @fryeguy52

Description

This PR branch contains commits to disable a few of the individual Kokkos and KokkosKernels unit tests for some full debug builds (see the careful analysis and identification of these individual unit tests in https://github.com/trilinos/Trilinos/issues/2827#issuecomment-397688803) (see #2827 (closed)).

This PR branch also contains commits to disable the tests Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 in a few other builds on white/ride where then have been seen to be randomly failing with hitting the 100 iteration max (see https://github.com/trilinos/Trilinos/issues/2920#issuecomment-397834016 and https://github.com/trilinos/Trilinos/issues/2920#issuecomment-398058965).

I also updated some documentation in the cmake/std/atdm/README.md file about using checkin-test-atdm.sh on hansen/shiller.

Motivation and Context

These tests fairly regularly fail in automated ATDM testing. Going forward we can't tolerate known randomly failing tests. Such tests will destroy automated processes to update Trilinos for that ATDM APP codes.

How Has This Been Tested?

I ran the builds and tests manually on 'shiller' and 'ride' and verified that with these changes all of the Kokkos and KokkosKernels tests in the debug builds far under the 600 sec timeout (but one test was as high as 447 sec).

See details of the testing below.

DETAILED TEST RESULTS: (click to expand)

A) Testing on 'shiller'

$ ./checkin-test-atdm.sh \
  gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openmp \
  cuda-8.0-debug cuda-9.0-debug \
  --enable-packages=Kokkos,KokkosKernels --configure
$ /usr/bin/srun ./checkin-test-atdm.sh \
  gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openmp \
  cuda-8.0-debug cuda-9.0-debug \
  --enable-packages=Kokkos,KokkosKernels --build --test

returned

PASSED (NOT READY TO PUSH): Trilinos: shiller02

Sat Jun 16 14:00:58 MDT 2018

Enabled Packages: Kokkos, KokkosKernels

Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-serial => passed: passed=28,notpassed=0 (3.41 min)
2) gnu-debug-openmp => passed: passed=35,notpassed=0 (5.22 min)
3) intel-debug-serial => passed: passed=28,notpassed=0 (4.18 min)
4) intel-debug-openmp => passed: passed=35,notpassed=0 (4.75 min)
5) cuda-8.0-debug => passed: passed=35,notpassed=0 (5.39 min)
6) cuda-9.0-debug => passed: passed=35,notpassed=0 (4.58 min)

Using the script:

$ cat print_expensive_tests.sh
#!/bin/bash

ctest_out_file=$1
n_most_expensive_tests=10

echo
echo "***"
echo "*** $ctest_out_file: $n_most_expensive_tests most expensive tests"
echo "***"
echo

cat $ctest_out_file | grep "Test" | grep "Passed" | sort -rn -k 7 | head -n $n_most_expensive_tests

Looking for expensive tests with:

$for build_name in gnu-debug-serial gnu-debug-openmp intel-debug-serial intel-debug-openm cuda-8.0-debug cuda-9.0-debug ; do ./print_expensive_; done | lessld_name/ctest.out ;

***
*** gnu-debug-serial/ctest.out: 10 most expensive tests
***

28/28 Test #26: KokkosKernels_sparse_serial_MPI_1 ................   Passed  204.28 sec
27/28 Test #23: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  203.75 sec
26/28 Test #27: KokkosKernels_graph_serial_MPI_1 .................   Passed  177.73 sec
25/28 Test #25: KokkosKernels_blas_serial_MPI_1 ..................   Passed  105.65 sec
24/28 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed   77.93 sec
23/28 Test #24: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed   51.48 sec
16/28 Test  #4: KokkosCore_UnitTest_PushFinalizeHook_terminate ...   Passed   11.30 sec
 7/28 Test #11: KokkosCore_UnitTest_DefaultInit_7_MPI_1 ..........   Passed   10.11 sec
 8/28 Test #10: KokkosCore_UnitTest_DefaultInit_6_MPI_1 ..........   Passed   10.10 sec
 9/28 Test #12: KokkosCore_UnitTest_DefaultInit_8_MPI_1 ..........   Passed   10.09 sec

***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***

35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................   Passed  313.07 sec
34/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  236.03 sec
33/35 Test #29: KokkosKernels_sparse_openmp_MPI_1 ................   Passed  225.57 sec
32/35 Test #34: KokkosKernels_graph_serial_MPI_1 .................   Passed  199.32 sec
31/35 Test #26: KokkosContainers_UnitTest_OpenMP_MPI_1 ...........   Passed  139.10 sec
30/35 Test #30: KokkosKernels_graph_openmp_MPI_1 .................   Passed  121.48 sec
29/35 Test #32: KokkosKernels_blas_serial_MPI_1 ..................   Passed  114.06 sec
28/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed   98.52 sec
27/35 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed   83.38 sec
26/35 Test  #2: KokkosCore_UnitTest_OpenMP_MPI_1 .................   Passed   73.56 sec

***
*** intel-debug-serial/ctest.out: 10 most expensive tests
***

28/28 Test #26: KokkosKernels_sparse_serial_MPI_1 ................   Passed  250.78 sec
27/28 Test #23: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  226.06 sec
26/28 Test #27: KokkosKernels_graph_serial_MPI_1 .................   Passed  212.57 sec
25/28 Test #25: KokkosKernels_blas_serial_MPI_1 ..................   Passed  114.90 sec
24/28 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed   90.55 sec
23/28 Test #24: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed   44.57 sec
20/28 Test  #4: KokkosCore_UnitTest_PushFinalizeHook_terminate ...   Passed    4.39 sec
 9/28 Test  #3: KokkosCore_UnitTest_PushFinalizeHook_MPI_1 .......   Passed    2.98 sec
 4/28 Test #15: KokkosCore_UnitTest_DefaultInit_11_MPI_1 .........   Passed    2.43 sec
 3/28 Test #11: KokkosCore_UnitTest_DefaultInit_7_MPI_1 ..........   Passed    2.43 sec

***
*** intel-debug-openm/ctest.out: 10 most expensive tests
***

cat: intel-debug-openm/ctest.out: No such file or directory

***
*** cuda-8.0-debug/ctest.out: 10 most expensive tests
***

35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................   Passed  323.10 sec
34/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  264.12 sec
33/35 Test #34: KokkosKernels_graph_serial_MPI_1 .................   Passed  214.00 sec
32/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 ..................   Passed  192.78 sec
 5/35 Test  #2: KokkosCore_UnitTest_Cuda_MPI_1 ...................   Passed  118.32 sec
 4/35 Test #32: KokkosKernels_blas_serial_MPI_1 ..................   Passed  116.83 sec
 2/35 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed   84.91 sec
 9/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed   61.74 sec
 1/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ...................   Passed   61.74 sec
14/35 Test #28: KokkosKernels_blas_cuda_MPI_1 ....................   Passed   31.59 sec

***
*** cuda-9.0-debug/ctest.out: 10 most expensive tests
***

35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................   Passed  274.38 sec
34/35 Test #34: KokkosKernels_graph_serial_MPI_1 .................   Passed  268.87 sec
33/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  238.93 sec
32/35 Test  #2: KokkosCore_UnitTest_Cuda_MPI_1 ...................   Passed  148.07 sec
31/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 ..................   Passed  141.35 sec
 7/35 Test #32: KokkosKernels_blas_serial_MPI_1 ..................   Passed  115.60 sec
 2/35 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed   84.65 sec
 1/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed   58.47 sec
 3/35 Test #28: KokkosKernels_blas_cuda_MPI_1 ....................   Passed   44.03 sec
 4/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ...................   Passed   24.78 sec

All of those look well under 600 sec.

**B) Testing on 'ride':

$ bsub -x -Is -q rhel7F -n 16 ctest -j16 \
  ./checkin-test-atdm.sh gnu-debug-openmp cuda-debug \
  --enable-packages=Kokkos,KokkosKernels --local-do-all

returned:

PASSED (NOT READY TO PUSH): Trilinos: ride11

Sat Jun 16 13:43:17 MDT 2018

Enabled Packages: Kokkos, KokkosKernels

Build test results:
-------------------
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => Test case MPI_RELEASE_DEBUG_SHARED_PT_OPENMP was not run! => Does not affect push readiness! (-1.00 min)
1) gnu-debug-openmp => passed: passed=35,notpassed=0 (7.66 min)
2) cuda-debug => passed: passed=35,notpassed=0 (5.94 min)

Looking for expensive tests with:

$ for build_name in gnu-debug-openmp cuda-debug ; do ./print_expensive_tests.sh $build_name/ctest.out ; done

***
*** gnu-debug-openmp/ctest.out: 10 most expensive tests
***

35/35 Test #29: KokkosKernels_sparse_openmp_MPI_1 ................   Passed  447.45 sec
34/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................   Passed  366.10 sec
33/35 Test #26: KokkosContainers_UnitTest_OpenMP_MPI_1 ...........   Passed  354.53 sec
32/35 Test #30: KokkosKernels_graph_openmp_MPI_1 .................   Passed  347.01 sec
31/35 Test #34: KokkosKernels_graph_serial_MPI_1 .................   Passed  323.47 sec
30/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  286.63 sec
29/35 Test #32: KokkosKernels_blas_serial_MPI_1 ..................   Passed  274.02 sec
28/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed  236.46 sec
27/35 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed  200.46 sec
26/35 Test  #2: KokkosCore_UnitTest_OpenMP_MPI_1 .................   Passed  196.63 sec

***
*** cuda-debug/ctest.out: 10 most expensive tests
***

35/35 Test #33: KokkosKernels_sparse_serial_MPI_1 ................   Passed  322.17 sec
34/35 Test #34: KokkosKernels_graph_serial_MPI_1 .................   Passed  275.03 sec
33/35 Test #25: KokkosContainers_UnitTest_Serial_MPI_1 ...........   Passed  267.83 sec
32/35 Test  #2: KokkosCore_UnitTest_Cuda_MPI_1 ...................   Passed  259.72 sec
31/35 Test #29: KokkosKernels_sparse_cuda_MPI_1 ..................   Passed  240.50 sec
 3/35 Test  #1: KokkosCore_UnitTest_Serial_MPI_1 .................   Passed  156.26 sec
 2/35 Test #27: KokkosAlgorithms_UnitTest_MPI_1 ..................   Passed  136.87 sec
 1/35 Test #32: KokkosKernels_blas_serial_MPI_1 ..................   Passed  136.85 sec
30/35 Test #28: KokkosKernels_blas_cuda_MPI_1 ....................   Passed   49.27 sec
14/35 Test #30: KokkosKernels_graph_cuda_MPI_1 ...................   Passed   39.16 sec

The most expensive test was 447 sec but that is pretty far below 600 sec so hopefully we will be okay.

Checklist

  • My commit messages mention the appropriate GitHub issue numbers.
  • All new and existing tests passed.

Merge request reports