Random test failure for KokkosAlgorithms_UnitTest_MPI_1 in ATDM build?
Created by: bartlettroscoe
CC: @trilinos/kokkos, @fryeguy52, @kddevin (Trilinos Data Services Product Lead)
Next Action Status
The test KokkosAlgorithms_UnitTest_MPI_1
is expected to randomly fail occasionally as a comprise between test runtime and not randomly failing more often. Next: Match for this failing test more and decide how to handle it longer-term (such as treating it as "expected may fail" as part of #2933) ....
Description
The test KokkosAlgorithms_UnitTest_MPI_1
looks to have had a random failure in the ATDM Trilinos build Trilinos-atdm-hansen-shiller-intel-opt-serial
shown here which shows the output:
[ RUN ] serial.Random_XorShift1024
Test Seed:1533901314858176575
Test Scalar=int
-- Testing randomness properties
Pass: 1 1 -2.75867e-05 -4.42521e-05 0.000158617 || 0.000502704
-- Testing 1-D histogram
Density 1D: 7.26597e-05 0.0178458 0.00132233 || 0.051031 2035 2407 || 2159.68 2198.22 || 18.2798 -0.159026
-- Testing 3-D histogram
Density 3D: 7.26597e-05 -0.00698725 -3.92889e-05 || 0.051031 1e+64 -1e+64
Test Scalar=unsigned int
-- Testing randomness properties
Pass: 1 1 1.68802e-05 9.3101e-05 -8.45415e-05 || 0.000502704
-- Testing 1-D histogram
Density 1D: 7.26597e-05 -0.00149644 -0.000557499 || 0.051031 2025 2373 || 2201.52 2198.22 || -7.70686 -0.159026
-- Testing 3-D histogram
Density 3D: 7.26597e-05 -0.00456695 -0.00055943 || 0.051031 1e+64 -1e+64
Test Scalar=int64_t
-- Testing randomness properties
Pass: 1 0 1.89669e-05 0.000837141 -0.000265228 || 0.000502704
-- Testing 1-D histogram
Density 1D: 7.26597e-05 -0.0166374 0.00138102 || 0.051031 2005 2386 || 2235.41 2198.22 || 19.0912 -0.159026
-- Testing 3-D histogram
Density 3D: 7.26597e-05 0.000533505 -0.000342069 || 0.051031 1e+64 -1e+64
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-intel-opt-serial/SRC_AND_BUILD/Trilinos/packages/kokkos/algorithms/unit_tests/TestRandom.hpp:426: Failure
Value of: 1
Expected: test_int64.pass_var
Which is: 0
[ FAILED ] serial.Random_XorShift1024 (1840 ms)
[ RUN ] serial.SortUnsigned
[ OK ] serial.SortUnsigned (2699 ms)
[----------] 3 tests from serial (8593 ms total)
[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (8593 ms total)
[ PASSED ] 2 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] serial.Random_XorShift1024
1 FAILED TEST
There was no updates to Kokkos from previous day for this build as shown here. Therefore, one would assume this is a random failure of some type.
Steps to reproduce
One should be able to produce this build and run this test on either 'hansen' or 'shiller' as described at:
More specifically, one can follow the instructions at:
and use the build name intel-opt-serial
enable the package Kokkos
as:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh intel-opt-serial
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Kokkos=ON \
$TRILINOS_DIR
$ make NP=16
$ srun ctest -j16
But given that this test looks to have randomly failed, it might be hard to reproduce.