Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #2827

Closed
Open
Created May 26, 2018 by James Willenbring@jmwilleMaintainer

New Kokkos, KokkosKernels, and Panzer test failures on CUDA 8.0 and CUDA 9.0 builds after Kokkos and KokkosKernels update

Created by: bartlettroscoe

CC: @trilinos/kokkos, @trilinos/kokkos-kernels, @trilinos/panzer, @ndellingwood

Next Action Status

Kokkos, KokkosKernels, and Panzer failing and timing-out tests have been fixed by PRs #2863, #2874, #2927, and #2964 . No Panzer, Kokkos or KokkosKernels failures observed 6/19 or 6/20/2018.

Description

The Kokkos and KokkosKernels updates in the recent commits 51cb7c5a and 816e703b:

51cb7c5:  Merge branch 'develop' into kokkos-promotion
Author: ndellingwood <ndellin@sandia.gov>
Date:   Thu May 24 23:55:26 2018 -0600

816e703:  Snapshot of kokkos-kernels.git from commit 1a7b524ba38fdfab6c1058065af06cbcb4a2ce6f
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Thu May 24 23:30:27 2018 -0600

seem to have triggered several new test failures and timeouts in the packages in Kokkos, KokkosKernels, and Panzer as shown in:

  • https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&date=2018-05-26&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=63&value1=Trilinos-atdm-&field2=status&compare2=62&value2=passed&field3=status&compare3=62&value3=notrun&field4=buildname&compare4=63&value4=cuda&field5=buildname&compare5=62&value5=Trilinos-atdm-white-ride-cuda-debug-pt-all-at-once

The new failing and timing-out tests are:

Test Status Details
KokkosContainers_UnitTest_Serial_MPI_1 Failed Completed (Timeout)
KokkosCore_UnitTest_Cuda_MPI_1 Failed Completed (Failed)
KokkosKernels_sparse_serial_MPI_1 Failed Completed (Timeout)
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-2 Failed Completed (Failed)
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 Failed Completed (Failed)
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 Failed Completed (Failed)
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 Failed Completed (Failed)
PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-3 Failed Completed (Failed)
PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-4 Failed Completed (Failed)
PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-3 Failed Completed (Failed)
PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 Failed Completed (Failed)

which failed in one or more of the unique builds:

  • Trilinos-atdm-hansen-shiller-cuda-8.0-debug
  • Trilinos-atdm-hansen-shiller-cuda-8.0-opt
  • Trilinos-atdm-white-ride-cuda-debug
  • Trilinos-atdm-white-ride-cuda-opt

These are all basically CUDA 8.0 builds.

These commits were shown pulled in this testing day at:

  • https://testing-vm.sandia.gov/cdash/viewNotes.php?buildid=3547898#!#note0

Steps to Reproduce

The most failures are produced on the Trilinos-atdm-white-ride-cuda-debug build on 'white' and 'ride' so that is likely the bet bet to use to reproduce these failures. Therefore, as described in:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite

after logging into 'white' or 'ride' and cloning the Trilinos Git repo (pointed to by TRILINOS_DIR) and getting on the 'develop' branch, one would do:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON \
  -DTrilinos_ENABLE_Kokkos=ON \
  -DTrilinos_ENABLE_KokkosKernels=ON \
  -DTrilinos_ENABLE_Panzer=ON \
  $TRILINOS_DIR

$ make NP=16

$ bsub -x -Is -q rhel7F -n 16 ctest -j16
Assignee
Assign to
Time tracking