Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #2320

Closed
Open
Created Mar 02, 2018 by James Willenbring@jmwilleOwner

KokkosCore_UnitTest_[OpenMP|Serial]_MPI_1 tests fail in most of the GNU and Intel builds on hansen/shiller

Created by: bartlettroscoe

CC: @trilinos/kokkos, @fryeguy52

Next Action Status

Changed from KOKKOS_ARCH=BDW to HSW which fixed all of the test on all ATDM builds of Trilinos on shiller.

Description

The tests KokkosCore_UnitTest_OpenMP_MPI_1 and KokkosCore_UnitTest_Serial_MPI_1 failed in most of the GNU and Intel ATDM Trilinos builds on hansen today as shown at:

  • https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-03-01&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=status&compare2=62&value2=passed&field3=status&compare3=62&value3=notrun&field4=testname&compare4=65&value4=KokkosCore_UnitTest

which shows:

Site Build Name Test Name Status Time Details Build Time
hansen/shiller Trilinos-atdm-hansen-shiller-intel-debug-openmp KokkosCore_UnitTest_OpenMP_MPI_1 Failed 3.72 Completed (Failed) 2018-03-01T10:29:09 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-intel-opt-openmp KokkosCore_UnitTest_OpenMP_MPI_1 Failed 4.24 Completed (Failed) 2018-03-01T10:41:40 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-debug-openmp KokkosCore_UnitTest_OpenMP_MPI_1 Failed 10.42 Completed (Failed) 2018-03-01T08:49:48 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-opt-openmp KokkosCore_UnitTest_OpenMP_MPI_1 Failed 9.96 Completed (Failed) 2018-03-01T11:49:31 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-intel-debug-openmp KokkosCore_UnitTest_Serial_MPI_1 Failed 3.72 Completed (Failed) 2018-03-01T10:29:09 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-intel-opt-openmp KokkosCore_UnitTest_Serial_MPI_1 Failed 3.74 Completed (Failed) 2018-03-01T10:41:40 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-debug-openmp KokkosCore_UnitTest_Serial_MPI_1 Failed 9.71 Completed (Failed) 2018-03-01T08:49:48 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-intel-debug-serial KokkosCore_UnitTest_Serial_MPI_1 Failed 3.82 Completed (Failed) 2018-03-01T07:36:18 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-opt-serial KokkosCore_UnitTest_Serial_MPI_1 Failed 9.77 Completed (Failed) 2018-03-01T07:15:18 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-intel-opt-serial KokkosCore_UnitTest_Serial_MPI_1 Failed 3.19 Completed (Failed) 2018-03-01T08:50:33 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-debug-serial KokkosCore_UnitTest_Serial_MPI_1 Failed 9.9 Completed (Failed) 2018-03-01T07:10:58 UTC
hansen/shiller Trilinos-atdm-hansen-shiller-gnu-opt-openmp KokkosCore_UnitTest_Serial_MPI_1 Failed 9.16 Completed (Failed) 2018-03-01T11:49:31 UTC

The ony builds that the

All of the test failures show:

okkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false
[==========] Running 84 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 84 tests from openmp
[ RUN      ] openmp.atomic_operations
[       OK ] openmp.atomic_operations (3 ms)
[ RUN      ] openmp.atomic_views_integral
[       OK ] openmp.atomic_views_integral (295 ms)
[ RUN      ] openmp.atomic_views_nonintegral
[       OK ] openmp.atomic_views_nonintegral (176 ms)
[ RUN      ] openmp.atomic_view_api
[       OK ] openmp.atomic_view_api (0 ms)
[ RUN      ] openmp.atomics
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 58263 on node hansen02 exited on signal 4 (Illegal instruction).
--------------------------------------------------------------------------

This is for the new configuration of ATDM Trilinos that uses the KOKKOS_ARCH=BDW. Not sure why this is an issue but I think taht is all that really changed from the previous ATDM configuration that manually set compiler options.

We seeing a lot of other test failures with various types of segmentation faults or other types of non-clean crashes at:

  • https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-03-01&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=status&compare2=62&value2=passed&field3=status&compare3=62&value3=notrun

But it seems produent to address these failures first since they could be realted to many of the other failures that fail in bad ways like this.

Steps to Reproduce:

The instructions to reproduce these build failures can be found starting at:

  • https://snl-wiki.sandia.gov/display/CoodinatedDevOpsATDM/ATDM+Builds+of+Trilinos

and clicking "Reproducing ATDM builds locally" which takes you to:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md

Basically, on hansen or shiller, you just clone the Trilinos repo (with location depicted as $TRILINOS_DIR below), get on the develop branch. Then create a build directory and do the configure and build as:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh gnu-debug-openmp

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Kokkos=ON \
  $TRILINOS_DIR

$ make NP=16

$ ctest -j16
Assignee
Assign to
Time tracking