Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #4257

Closed
Open
Created Jan 24, 2019 by James Willenbring@jmwilleMaintainer

TpetraCore_BlockCrsMatrix_MPI_4 failing in ATDM cuda builds

Created by: fryeguy52

CC: @trilinos/tpetra, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52

Next Action Status

With the merge of PR #4307 on to 'develop' on 2/4/2018, the test TpetraCore_BlockCrsMatrix_MPI_4 seems to be passing in all of the ATDM Trilinos builds on 2/5/2018. Next: Get PR #4326 merged which re-enables this test in the Trilinos CUDA PR build ...

Description

As shown in this query the test:

  • TpetraCore_BlockCrsMatrix_MPI_4

is failing in the builds:

  • Trilinos-atdm-waterman-cuda-9.2-debug
  • Trilinos-atdm-waterman-cuda-9.2-opt
  • Trilinos-atdm-waterman-cuda-9.2-release-debug
  • Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug
  • Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release
  • Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug

It is failing with the following output:

p=0: *** Caught standard std::exception of type 'std::logic_error' :
 
  /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug/SRC_AND_BUILD/Trilinos/packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_def.hpp:2825:
  
  Throw number = 1
  
  Throw test that evaluated to true: numBytesOut != numBytes
  
  unpackRow: numBytesOut = 4 != numBytes = 156.
 [FAILED]  (0.0877 sec) BlockCrsMatrix_double_int_int_Kokkos_Compat_KokkosCudaWrapperNode_write_UnitTest
 Location: /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug/SRC_AND_BUILD/Trilinos/packages/tpetra/core/test/Block/BlockCrsMatrix.cpp:859
 
[white23:102556] *** An error occurred in MPI_Allreduce
[white23:102556] *** reported by process [231079937,0]
[white23:102556] *** on communicator MPI_COMM_WORLD
[white23:102556] *** MPI_ERR_OTHER: known error not in list
[white23:102556] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[white23:102556] ***    and potentially your MPI job)

@kyungjoo-kim can you see if one of these commits may have caused this?

47f9cbe:  Tpetra - fix failing test
Author: Kyungjoo Kim (-EXP) <kyukim@bread.sandia.gov>
Date:   Tue Jan 22 11:24:43 2019 -0700

M	packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_def.hpp

3e26a55:  Tpetra - fix warning error from mismatched virtual functions
Author: Kyungjoo Kim (-EXP) <kyukim@bread.sandia.gov>
Date:   Mon Jan 21 11:48:32 2019 -0700

M	packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_decl.hpp
M	packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_def.hpp

Current Status on CDash

The current status of these tests/builds for the current testing day can be found here

Steps to Reproduce

One should be able to reproduce this failure on ride or white as described in:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md

More specifically, the commands given for ride or white are provided at:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite

The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release
$ cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
 $TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
Assignee
Assign to
Time tracking