New Kokkos, KokkosKernels, and Panzer test failures on CUDA 8.0 and CUDA 9.0 builds after Kokkos and KokkosKernels update
Created by: bartlettroscoe
CC: @trilinos/kokkos, @trilinos/kokkos-kernels, @trilinos/panzer, @ndellingwood
Next Action Status
Kokkos, KokkosKernels, and Panzer failing and timing-out tests have been fixed by PRs #2863, #2874, #2927, and #2964 . No Panzer, Kokkos or KokkosKernels failures observed 6/19 or 6/20/2018.
Description
The Kokkos and KokkosKernels updates in the recent commits 51cb7c5a and 816e703b:
51cb7c5: Merge branch 'develop' into kokkos-promotion
Author: ndellingwood <ndellin@sandia.gov>
Date: Thu May 24 23:55:26 2018 -0600
816e703: Snapshot of kokkos-kernels.git from commit 1a7b524ba38fdfab6c1058065af06cbcb4a2ce6f
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date: Thu May 24 23:30:27 2018 -0600
seem to have triggered several new test failures and timeouts in the packages in Kokkos, KokkosKernels, and Panzer as shown in:
The new failing and timing-out tests are:
Test | Status | Details |
---|---|---|
KokkosContainers_UnitTest_Serial_MPI_1 | Failed | Completed (Timeout) |
KokkosCore_UnitTest_Cuda_MPI_1 | Failed | Completed (Failed) |
KokkosKernels_sparse_serial_MPI_1 | Failed | Completed (Timeout) |
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-2 | Failed | Completed (Failed) |
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 | Failed | Completed (Failed) |
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 | Failed | Completed (Failed) |
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 | Failed | Completed (Failed) |
PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-3 | Failed | Completed (Failed) |
PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-4 | Failed | Completed (Failed) |
PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-3 | Failed | Completed (Failed) |
PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 | Failed | Completed (Failed) |
which failed in one or more of the unique builds:
- Trilinos-atdm-hansen-shiller-cuda-8.0-debug
- Trilinos-atdm-hansen-shiller-cuda-8.0-opt
- Trilinos-atdm-white-ride-cuda-debug
- Trilinos-atdm-white-ride-cuda-opt
These are all basically CUDA 8.0 builds.
These commits were shown pulled in this testing day at:
Steps to Reproduce
The most failures are produced on the Trilinos-atdm-white-ride-cuda-debug
build on 'white' and 'ride' so that is likely the bet bet to use to reproduce these failures. Therefore, as described in:
after logging into 'white' or 'ride' and cloning the Trilinos Git repo (pointed to by TRILINOS_DIR
) and getting on the 'develop' branch, one would do:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Kokkos=ON \
-DTrilinos_ENABLE_KokkosKernels=ON \
-DTrilinos_ENABLE_Panzer=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16