Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 test failing on new cuda 9.2 ATDM build on white/ride
Created by: fryeguy52
CC: @trilinos/ifpack2 , @srajama1 (Trilinos Linear Solvers Product Lead) @bartlettroscoe
Next Action Status
PR #3549 merged on 10/2/2018 changed from hybrid OpenMPI 2.1.2/3.1.0 env to consistent GCC 7.2.0 + OpenMPI 2.1.2 + CUDA 9.2 env and TPLs and appears to fix this failing Ifpack2 test on 'white' and 'ride' and did not seem to break other tests. As of 10/9/2018 this test has not failed on 'white' or 'ride' since 10/1/2018 and there are now additional test failures.
Description
As shown in this query the tests:
- Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4
are failing in the builds:
- Trilinos-atdm-white-ride-cuda-9.2-debug
- Trilinos-atdm-white-ride-cuda-9.2-opt
from the test output:
[white25:104781] mca_base_component_repository_open: unable to open mca_coll_hcoll: libsharp_coll.so.2: cannot open shared object file: No such file or directory (ignored)
[white25:104783] mca_base_component_repository_open: unable to open mca_coll_hcoll: libsharp_coll.so.2: cannot open shared object file: No such file or directory (ignored)
[white25:104785] mca_base_component_repository_open: unable to open mca_coll_hcoll: libsharp_coll.so.2: cannot open shared object file: No such file or directory (ignored)
[white25:104784] mca_base_component_repository_open: unable to open mca_coll_hcoll: libsharp_coll.so.2: cannot open shared object file: No such file or directory (ignored)
<I> nranks 4 ni 10 nj 10 nk 10 bs 5 nrhs 1 isplit 4 jsplit 1
Steps to Reproduce
One should be able to reproduce this failure on the machine white
as described in:
More specifically, the commands given for the system white
are provided at:
The exact commands to reproduce this issue should be:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16