Anasazi_Tpetra_MVOPTester_MPI_4 failing in ATDM cuda 9 builds on waterman
Created by: fryeguy52
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
Next Action Status
Downgrade from OpenMPI 3.1.0 to OpenMPI 2.1.2 fixed the problem (as it fixed failing tests in other packages as well).
Description
As shown in this query the test:
- Anasazi_Tpetra_MVOPTester_MPI_4
is failing in the builds:
- Trilinos-atdm-waterman-cuda-9.2-opt
- Trilinos-atdm-waterman-cuda-9.2-debug
test output
The following tests FAILED:
7. MultiVector_int_longlong_double_OPTestLocal_UnitTest ...
Total Time: 7.62 sec
Summary: total = 8, run = 8, passed = 7, failed = 1
End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[50974,1],2]
Exit code: 1
Steps to Reproduce
One should be able to reproduce this failure on the machine waterman as described in:
More specifically, the commands given for the system waterman are provided at:
The exact commands to reproduce this issue should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Anasazi=ON \
$TRILINOS_DIR
$ make NP=20
$ bsub -x -Is -n 20 ctest -j20