Zoltan2 tests failing on ATDM waterman builds
Created by: fryeguy52
CC: @trilinos/zoltan2 , @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe
Next Action Status
PR #3363 merged to 'develop' on 8/37/2018 fixed these Zoltan2 tests on 'waterman' on 8/28/2018.
Description
As shown in this query the tests:
- Zoltan2_teuchosSubcommTest_MPI_4
- Zoltan2_TaskMappingProblemTest_MPI_4
are failing in the builds:
- Trilinos-atdm-waterman-gnu-opt-openmp
- Trilinos-atdm-waterman-gnu-debug-openmp
- Trilinos-atdm-waterman-cuda-9.2-opt
- Trilinos-atdm-waterman-cuda-9.2-debug
Zoltan2_teuchosSubcommTest_MPI_4 is timing out and here is some of the output:
[1534830586.894337] [waterman1:47105:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240)
Zoltan2_TaskMappingProblemTest_MPI_4 output:
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 0 on node waterman8 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Steps to Reproduce
One should be able to reproduce this failure on the machine as described in:
More specifically, the commands given for the system are provided at:
The exact commands to reproduce this issue should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Zoltan2=ON \
$TRILINOS_DIR
$ make NP=20
$ bsub -x -Is -n 20 ctest -j20