KokkosCore test failing on new cuda 9.2 ATDM build on white/ride
Created by: fryeguy52
CC: @trilinos/kokkos , @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe
Next Action Status
Caused by upgraded SLURM/MPI correctly implementing --mca orte_abort_on_non_zero_status 0
for srun
on white/ride. PR #3292 merged on 8/13/2018 which removes --mca orte_abort_on_non_zero_status 0
and test passed on 8/14/2018 .
Description
As shown in this query the tests:
- KokkosCore_UnitTest_PushFinalizeHook_terminate
are failing in the builds:
- Trilinos-atdm-white-ride-cuda-9.2-opt
- Trilinos-atdm-white-ride-cuda-9.2-debug
It is failing due to timeout
Steps to Reproduce
One should be able to reproduce this failure on the machine white
as described in:
More specifically, the commands given for the system white
are provided at:
The exact commands to reproduce this issue should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Kokkos=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16