Ifpack2_unit_tests_MPI_4 randomly failing on ATDM waterman build
Created by: fryeguy52
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
??: Add label "ATDM"> ??: Add label "bug"?> ??: Add label for affected packages (e.g. "MueLu", "Tpetra", "Kokkos", etc.)> ??: Add milestone "Initial cleanup of new ATDM builds of Trilinos" or "Keep promoted ATDM builds of Trilinos clean"> ??: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files> ??: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>Next Action Status
Description
As shown in this query the test:
- Ifpack2_unit_tests_MPI_4
is randomly failing in the buils:
- Trilinos-atdm-waterman-cuda-9.2-opt
It has failed roughly 6 times in the last month. Here are some examples of the output when it fails:
Error, relErr(Y.get1dView ()[9932],Z.get1dView ()[9932]) = relErr(29832,0) = 1 <= tol = 2.22045e-12: failed!
p=0 | The following tests FAILED:
p=0 | 48. Ifpack2OverlappingRowMatrix_default_scalar_type_default_local_ordinal_type_default_global_ordinal_type_Test0_UnitTest ...
p=0 |
p=0 | Total Time: 6.49 sec
p=0 |
p=1 | Summary: total = 82, run = 82, passed = 81, failed = 1
p=1 |
p=1 | End Result: TEST FAILED
Current Status on CDash
Steps to Reproduce
One should be able to reproduce the build on waterman where this test is randomly failing as described in:
More specifically, the commands given for waterman are provided at:
The exact commands to reproduce this issue should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-waterman-cuda-9.2-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -n 20 ctest -j20