Build and test failures in ATDM RDC builds on white and waterman
Created by: fryeguy52
CC: Trilinos Product areas leads: @jwillenbring, @rppawlo, @kddevin, @mperego, @srajama1
Other CC: @bartlettroscoe @fryeguy52
Next Action Status
Next: Waiting for PR #4761 to get tested, approved, and merged ...
Description
As shown here, there are several failing tests and build errors in the builds:
- Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug
- Trilinos-atdm-waterman-cuda-9.2-rdc-shared-release-debug
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug-pt
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-shared-release-debug
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-shared-release-debug-pt
These are builds that enable cuda relocatable device code. Most of the errors look something like:
nvlink error : Undefined reference to '_ZN6Sacado4Impl40global_sacado_cuda_memory_pool_on_deviceE' in 'packages/intrepid2/unit-test/Discretization/Basis/HDIV_HEX_In_FEM/Serial/CMakeFiles/Intrepid2_unit-test_Discretization_Basis_HDIV_HEX_In_FEM_Serial_Test_01_SLFadDouble.dir/test_01_SLFadDouble.cpp.o'
or
nvlink warning : Stack size for entry function '_ZN6Kokkos4Impl75_GLOBAL__N__51_tmpxft_000160f8_00000000_6_Kokkos_Cuda_Task_cpp1_ii_b2872e7123cuda_task_queue_executeEPNS0_9TaskQueueINS_4CudaEEEi' cannot be statically determined
collect2: fatal error: ld terminated with signal 9 [Killed]
compilation terminated.
Current Status on CDash
- Current status of the rdc builds (NOTE: Click "Previous" to see the full set of builds for the previous day)
Steps to Reproduce
One should be able to reproduce this failure on ride or white as described in:
More specifically, the commands given for ride or white are provided at:
For the "-pt" builds <build-name>
:
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug-pt
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-shared-release-debug-pt
and for <Package>
= Kokkos
, KokkosKernels
, Belos
, etc., the commands to reproduce the build and test failures should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.s <build-name>
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_<Package>=ON \
$TRILINOS_DIR
$ ninja -j16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
For the other builds with <build-name>
:
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug
- Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-shared-release-debug
and for <Package>
= Kokkos
, KokkosKernels
, Belos
, etc., the commands to reproduce the build and test failures on 'white' or 'ride' should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.s <build-name>
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_<Package>=ON \
$TRILINOS_DIR
$ ninja -j16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
The build and test failures for the 'waterman' builds with <build-name>
:
Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug
Trilinos-atdm-waterman-cuda-9.2-rdc-shared-release-debug
one uses:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.s <build-name>
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_<Package>=ON \
$TRILINOS_DIR
$ ninja -j16
$ bsub -x -Is -n 20 ctest -j16