KokkosCore_UnitTest_[OpenMP|Serial]_MPI_1 tests fail in most of the GNU and Intel builds on hansen/shiller
Created by: bartlettroscoe
CC: @trilinos/kokkos, @fryeguy52
Next Action Status
Changed from KOKKOS_ARCH=BDW
to HSW
which fixed all of the test on all ATDM builds of Trilinos on shiller
.
Description
The tests KokkosCore_UnitTest_OpenMP_MPI_1
and KokkosCore_UnitTest_Serial_MPI_1
failed in most of the GNU and Intel ATDM Trilinos builds on hansen today as shown at:
which shows:
Site | Build Name | Test Name | Status | Time | Details | Build Time |
---|---|---|---|---|---|---|
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-debug-openmp | KokkosCore_UnitTest_OpenMP_MPI_1 | Failed | 3.72 | Completed (Failed) | 2018-03-01T10:29:09 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-opt-openmp | KokkosCore_UnitTest_OpenMP_MPI_1 | Failed | 4.24 | Completed (Failed) | 2018-03-01T10:41:40 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-debug-openmp | KokkosCore_UnitTest_OpenMP_MPI_1 | Failed | 10.42 | Completed (Failed) | 2018-03-01T08:49:48 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-opt-openmp | KokkosCore_UnitTest_OpenMP_MPI_1 | Failed | 9.96 | Completed (Failed) | 2018-03-01T11:49:31 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-debug-openmp | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 3.72 | Completed (Failed) | 2018-03-01T10:29:09 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-opt-openmp | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 3.74 | Completed (Failed) | 2018-03-01T10:41:40 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-debug-openmp | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 9.71 | Completed (Failed) | 2018-03-01T08:49:48 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-debug-serial | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 3.82 | Completed (Failed) | 2018-03-01T07:36:18 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-opt-serial | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 9.77 | Completed (Failed) | 2018-03-01T07:15:18 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-intel-opt-serial | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 3.19 | Completed (Failed) | 2018-03-01T08:50:33 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-debug-serial | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 9.9 | Completed (Failed) | 2018-03-01T07:10:58 UTC |
hansen/shiller | Trilinos-atdm-hansen-shiller-gnu-opt-openmp | KokkosCore_UnitTest_Serial_MPI_1 | Failed | 9.16 | Completed (Failed) | 2018-03-01T11:49:31 UTC |
The ony builds that the
All of the test failures show:
okkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
For unit testing set OMP_PROC_BIND=false
[==========] Running 84 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 84 tests from openmp
[ RUN ] openmp.atomic_operations
[ OK ] openmp.atomic_operations (3 ms)
[ RUN ] openmp.atomic_views_integral
[ OK ] openmp.atomic_views_integral (295 ms)
[ RUN ] openmp.atomic_views_nonintegral
[ OK ] openmp.atomic_views_nonintegral (176 ms)
[ RUN ] openmp.atomic_view_api
[ OK ] openmp.atomic_view_api (0 ms)
[ RUN ] openmp.atomics
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 58263 on node hansen02 exited on signal 4 (Illegal instruction).
--------------------------------------------------------------------------
This is for the new configuration of ATDM Trilinos that uses the KOKKOS_ARCH=BDW
. Not sure why this is an issue but I think taht is all that really changed from the previous ATDM configuration that manually set compiler options.
We seeing a lot of other test failures with various types of segmentation faults or other types of non-clean crashes at:
But it seems produent to address these failures first since they could be realted to many of the other failures that fail in bad ways like this.
Steps to Reproduce:
The instructions to reproduce these build failures can be found starting at:
and clicking "Reproducing ATDM builds locally" which takes you to:
Basically, on hansen
or shiller
, you just clone the Trilinos repo (with location depicted as $TRILINOS_DIR
below), get on the develop
branch. Then create a build directory and do the configure and build as:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh gnu-debug-openmp
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Kokkos=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16