Zoltan2 test Fails on POWER8 with GCC 4.9.2 and CUDA 7.5.7 with OpenMPI 1.10.2
Created by: jjellio
On power8 with GCC 4.9.2 and CUDA 7.5.7 I see the following failure. This may be related to bug #705.
617: Test command: /home/projects/pwr8-rhel72/openmpi/1.10.2/gcc/4.9.2/cuda/7.5.7/bin/mpirun "--map-by" "ppr:2:NUMA:PE=4" "-np" "4" "/tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe"
617: Test timeout computed to be: 600
617: --------------------------------------------------------------------------
617: WARNING: There is at least non-excluded one OpenFabrics device found,
617: but there are no active ports detected (or Open MPI was unable to use
617: them). This is most certainly not what you wanted. Check your
617: cables, subnet manager configuration, etc. The openib BTL will be
617: ignored for this job.
617:
617: Local host: host7
617: --------------------------------------------------------------------------
617: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name host7.sandia.gov and rank 1!
617: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name host7.sandia.gov and rank 2!
617: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name host7.sandia.gov and rank 3!
617: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name host7.sandia.gov and rank 0!
617: --------
617: Starting test1: int
617: [0] Zoltan ERROR in Zoltan_DD_Create (line 114 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Create.c): Invalid input argument
617: [-1] Zoltan ERROR in Zoltan_DD_Update (line 93 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Update.c): Invalid input argument
617: [host7:13213] *** Process received signal ***
617: [host7:13213] Signal: Segmentation fault (11)
617: [host7:13213] Signal code: Address not mapped (1)
617: [host7:13213] Failing at address: 0x70
617: [host7:13213] [ 0] [0x3fffa6a30478]
617: [host7:13213] [ 1] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan220findUniqueGidsCommonIiEEmmiPyPcP19ompi_communicator_t+0x98)[0x10020ea8]
617: [host7:13213] [ 2] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan214findUniqueGidsISt5arrayIiLm1EEiEEmRSt6vectorIT_SaIS4_EERS3_IT0_SaIS8_EERKN7Teuchos4CommIiEE+0xc8)[0x100210d8]
617: [host7:13213] [ 3] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_Z5test1IiEvRN7Teuchos3RCPIKNS0_4CommIiEEEE+0x250)[0x10024a20]
617: [host7:13213] [ 4] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(main+0x50)[0x10011180]
617: [host7:13213] [ 5] /lib64/power8/libc.so.6(+0x24580)[0x3fff9bf64580]
617: [host7:13213] [ 6] /lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x3fff9bf64774]
617: [host7:13213] *** End of error message ***
617: [1] Zoltan ERROR in Zoltan_DD_Create (line 114 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Create.c): Invalid input argument
617: [-1] Zoltan ERROR in Zoltan_DD_Update (line 93 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Update.c): Invalid input argument
617: [host7:13214] *** Process received signal ***
617: [host7:13214] Signal: Segmentation fault (11)
617: [host7:13214] Signal code: Address not mapped (1)
617: [host7:13214] Failing at address: 0x70
617: [host7:13214] [ 0] [0x3fff7dbc0478]
617: [host7:13214] [ 1] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan220findUniqueGidsCommonIiEEmmiPyPcP19ompi_communicator_t+0x98)[0x10020ea8]
617: [host7:13214] [ 2] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan214findUniqueGidsISt5arrayIiLm1EEiEEmRSt6vectorIT_SaIS4_EERS3_IT0_SaIS8_EERKN7Teuchos4CommIiEE+0xc8)[0x100210d8]
617: [host7:13214] [ 3] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_Z5test1IiEvRN7Teuchos3RCPIKNS0_4CommIiEEEE+0x250)[0x10024a20]
617: [host7:13214] [ 4] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(main+0x50)[0x10011180]
617: [host7:13214] [ 5] /lib64/power8/libc.so.6(+0x24580)[0x3fff730f4580]
617: [host7:13214] [ 6] /lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x3fff730f4774]
617: [host7:13214] *** End of error message ***
617: [2] Zoltan ERROR in Zoltan_DD_Create (line 114 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Create.c): Invalid input argument
617: [-1] Zoltan ERROR in Zoltan_DD_Update (line 93 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Update.c): Invalid input argument
617: [host7:13215] *** Process received signal ***
617: [host7:13215] Signal: Segmentation fault (11)
617: [host7:13215] Signal code: Address not mapped (1)
617: [host7:13215] Failing at address: 0x70
617: [host7:13215] [ 0] [0x3fffb5730478]
617: [host7:13215] [ 1] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan220findUniqueGidsCommonIiEEmmiPyPcP19ompi_communicator_t+0x98)[0x10020ea8]
617: [host7:13215] [ 2] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan214findUniqueGidsISt5arrayIiLm1EEiEEmRSt6vectorIT_SaIS4_EERS3_IT0_SaIS8_EERKN7Teuchos4CommIiEE+0xc8)[0x100210d8]
617: [host7:13215] [ 3] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_Z5test1IiEvRN7Teuchos3RCPIKNS0_4CommIiEEEE+0x250)[0x10024a20]
617: [host7:13215] [ 4] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(main+0x50)[0x10011180]
617: [host7:13215] [ 5] /lib64/power8/libc.so.6(+0x24580)[0x3fffaac64580]
617: [host7:13215] [ 6] /lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x3fffaac64774]
617: [host7:13215] *** End of error message ***
617: [3] Zoltan ERROR in Zoltan_DD_Create (line 114 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Create.c): Invalid input argument
617: [-1] Zoltan ERROR in Zoltan_DD_Update (line 93 of /home/jjellio/src/Trilinos/packages/zoltan/src/Utilities/DDirectory/DD_Update.c): Invalid input argument
617: [host7:13216] *** Process received signal ***
617: [host7:13216] Signal: Segmentation fault (11)
617: [host7:13216] Signal code: Address not mapped (1)
617: [host7:13216] Failing at address: 0x70
617: [host7:13216] [ 0] [0x3fff7bc70478]
617: [host7:13216] [ 1] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan220findUniqueGidsCommonIiEEmmiPyPcP19ompi_communicator_t+0x98)[0x10020ea8]
617: [host7:13216] [ 2] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_ZN7Zoltan214findUniqueGidsISt5arrayIiLm1EEiEEmRSt6vectorIT_SaIS4_EERS3_IT0_SaIS8_EERKN7Teuchos4CommIiEE+0xc8)[0x100210d8]
617: [host7:13216] [ 3] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(_Z5test1IiEvRN7Teuchos3RCPIKNS0_4CommIiEEEE+0x250)[0x10024a20]
617: [host7:13216] [ 4] /tmp/trilinos-cuda-gcc/packages/zoltan2/test/unit/Zoltan2_findUniqueGids.exe(main+0x50)[0x10011180]
617: [host7:13216] [ 5] /lib64/power8/libc.so.6(+0x24580)[0x3fff711a4580]
617: [host7:13216] [ 6] /lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x3fff711a4774]
617: [host7:13216] *** End of error message ***
617: --------------------------------------------------------------------------
617: mpirun noticed that process rank 3 with PID 13216 on node host7 exited on signal 11 (Segmentation fault).
617: --------------------------------------------------------------------------
617: [host7.sandia.gov:13210] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
617: [host7.sandia.gov:13210] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
7/9 Test #617: Zoltan2_findUniqueGids_MPI_4 .....................***Failed 10.77 sec
test 783
Start 783: Belos_Tpetra_PseudoBlockCG_hb_test_MPI_4
Configure:
#!/bin/bash
EXTRA_ARGS=$@
COMPILER=gcc-4.9.2_cuda-7.5.7
MPI=openmpi-1.10.2
COMPILER_DIR=
MPI_DIR=${MPI_ROOT}
BLAS_DIR=${BLAS_ROOT}
LAPACK_DIR=${LAPACK_ROOT}
HDF5_DIR=${HDF5_ROOT}
NETCDF_DIR=${NETCDF_ROOT}
ZLIB_DIR=${ZLIB_ROOT}
BOOST_DIR=${BOOST_ROOT}
METIS_DIR=${METIS_ROOT}
PARMETIS_DIR=${PARMETIS_ROOT}
SUPERLUDIST_DIR=${SUPERLUDIST_ROOT}
EXTRA_C_FLAGS=""
EXTRA_CXX_FLAGS="-arch=sm_37 -lineinfo \
-Xcudafe --diag_suppress=conversion_function_not_usable \
-Xcudafe --diag_suppress=cc_clobber_ignored \
-Xcudafe --diag_suppress=code_is_unreachable"
EXTRA_F_FLAGS=""
#LINK_FLAGS="-fuse-ld=gold"
LINK_FLAGS=""
# Shouldn't need to change anything below this line
BUILD=${COMPILER}_${MPI}
if [[ ${1} == 'static' || ${2} == 'static' ]]
then
LINK_DYNAMIC=OFF
LINK_SUFFIX=static
elif [[ ${1} == 'dynamic' || ${2} == 'dynamic' ]]
then
LINK_DYNAMIC=ON
LINK_SUFFIX=dynamic
else
echo " *** Warning: 'static' or 'dynamic' LINK_TYPE is an optional argument to this script. Defaulting to 'dynamic'."
LINK_DYNAMIC=ON
LINK_SUFFIX=dynamic
fi
if [[ ${1} == 'opt' || ${2} == 'opt' ]]
then
BUILD_TYPE=RELEASE
BUILD_SUFFIX=opt
elif [[ ${1} == 'dbg' || ${2} == 'dbg' ]]
then
BUILD_TYPE=DEBUG
BUILD_SUFFIX=dbg
else
echo " *** Warning: 'opt' or 'dbg' BUILD_TYPE is an optional argument to this script. Defaulting to 'opt'."
BUILD_TYPE=RELEASE
BUILD_SUFFIX=opt
fi
TRILINOS_INSTALL=${HOME}/install/Trilinos/ride-gpu_${BUILD}_${LINK_SUFFIX}_${BUILD_SUFFIX}-timers
echo BUILD=${BUILD}
echo TRILINOS_INSTALL=${TRILINOS_INSTALL}
echo TRILINOS_HOME=${TRILINOS_HOME}
echo COMPILER_DIR=
echo MPI_DIR=${MPI_DIR}
echo BLAS_DIR=${BLAS_DIR}
echo LAPACK_DIR=${LAPACK_DIR}
echo HDF5_DIR=${HDF5_DIR}
echo NETCDF_DIR=${NETCDF_DIR}
echo ZLIB_DIR=${ZLIB_DIR}
echo BOOST_DIR=${BOOST_DIR}
echo METIS_DIR=${METIS_DIR}
echo PARMETIS_DIR=${PARMETIS_DIR}
echo SUPERLUDIST_DIR=${SUPERLUDIST_DIR}
rm -f CMakeCache.txt; rm -rf CMakeFiles
cmake \
-D CMAKE_VERBOSE_MAKEFILE=FALSE \
-D CMAKE_INSTALL_PREFIX:PATH=${TRILINOS_INSTALL} \
-D CMAKE_BUILD_TYPE:STRING=${BUILD_TYPE} \
-D BUILD_SHARED_LIBS=${LINK_DYNAMIC} \
\
-D CMAKE_C_COMPILER="mpicc" \
-D CMAKE_CXX_COMPILER="mpicxx" \
-D CMAKE_Fortran_COMPILER="mpif90" \
-D CMAKE_C_FLAGS="$EXTRA_C_FLAGS" \
-D CMAKE_CXX_FLAGS="$EXTRA_CXX_FLAGS" \
-D CMAKE_Fortran_FLAGS="$EXTRA_F_FLAGS" \
-D CMAKE_EXE_LINKER_FLAGS="$LINK_FLAGS" \
\
-D CMAKE_NO_BUILTIN_CHRPATH=TRUE \
\
-D Trilinos_VERBOSE_CONFIGURE=OFF \
-D Trilinos_ENABLE_ALL_PACKAGES=OFF \
-D Trilinos_ENABLE_SECONDARY_TESTED_CODE=OFF \
\
-D Trilinos_ENABLE_TESTS=ON \
-D Trilinos_ENABLE_EXAMPLES=OFF \
-D Kokkos_ENABLE_TESTS=OFF \
-D KokkosCore_ENABLE_TESTS=OFF \
-D KokkosAlgorithms_ENABLE_TESTS=OFF \
-D KokkosContainers_ENABLE_TESTS=OFF \
-D DART_TESTING_TIMEOUT:STRING="600" \
\
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION=ON \
-D Tpetra_INST_FLOAT=OFF \
-D Tpetra_INST_DOUBLE=ON \
-D Tpetra_INST_COMPLEX_FLOAT=OFF \
-D Tpetra_INST_COMPLEX_DOUBLE=OFF \
-D Tpetra_INST_INT_INT=ON \
-D Tpetra_INST_INT_LONG=OFF \
-D Tpetra_INST_INT_UNSIGNED=OFF \
-D Tpetra_INST_INT_LONG_LONG=ON \
-D Teuchos_ENABLE_LONG_LONG_INT=ON \
-D Teuchos_ENABLE_COMPLEX=OFF \
-D Zoltan_ENABLE_ULLONG_IDS=ON \
\
-D Trilinos_ENABLE_OpenMP=OFF \
-D TPL_ENABLE_Pthread=OFF \
\
-D Trilinos_ENABLE_Teuchos=ON \
-D Trilinos_ENABLE_Epetra=ON \
-D Trilinos_ENABLE_EpetraExt=ON \
-D Trilinos_ENABLE_AztecOO=ON \
-D Trilinos_ENABLE_Amesos=ON \
-D Trilinos_ENABLE_Stratimikos=OFF \
-D Trilinos_ENABLE_Anasazi=ON \
-D Anasazi_ENABLE_RBGen=ON \
-D Anasazi_ENABLE_TESTS=OFF \
-D Trilinos_ENABLE_Ifpack=ON \
-D Trilinos_ENABLE_ML=ON \
-D Trilinos_ENABLE_Teko=OFF \
-D Trilinos_ENABLE_NOX=OFF \
-D Trilinos_ENABLE_Thyra=OFF \
-D Trilinos_ENABLE_Rythmos=OFF \
-D Trilinos_ENABLE_Sacado=ON \
-D Trilinos_ENABLE_Stokhos=OFF \
-D Trilinos_ENABLE_Panzer=OFF \
-D Trilinos_ENABLE_Tpetra=ON \
-D Tpetra_INST_SERIAL=ON \
-D Tpetra_INST_OPENMP=OFF \
-D Tpetra_BCRS_Point_Import=ON \
\
-D Trilinos_ENABLE_Belos=ON \
-D Belos_ENABLE_TEUCHOS_TIME_MONITOR:BOOL=ON \
-D Belos_Tpetra_Timers:BOOL=ON \
-D Belos_ENABLE_TSQR=ON \
-D Belos_ENABLE_TriUtils=ON \
\
-D Trilinos_ENABLE_Amesos2=ON \
-D Amesos2_ENABLE_KLU2=ON \
-D Trilinos_ENABLE_Ifpack2=ON \
-D Trilinos_ENABLE_MueLu=OFF \
-D Trilinos_ENABLE_Zoltan2=ON \
-D Trilinos_ENABLE_STKMesh=OFF \
-D Trilinos_ENABLE_STKIO=OFF \
-D Trilinos_ENABLE_STKTransfer=OFF \
-D Trilinos_ENABLE_STKSearch=OFF \
-D Trilinos_ENABLE_STKUtil=OFF \
-D Trilinos_ENABLE_STKTopology=OFF \
\
-D Trilinos_ENABLE_Kokkos=ON \
-D Trilinos_ENABLE_KokkosCore=ON \
-D Kokkos_ENABLE_Serial=ON \
-D Kokkos_ENABLE_OpenMP=OFF \
-D Kokkos_ENABLE_Pthread=OFF \
-D Kokkos_ENABLE_Cuda=ON \
-D Kokkos_ENABLE_Cuda_UVM=ON \
\
-D Trilinos_ENABLE_SEACAS=ON \
-D TPL_ENABLE_X11=OFF \
-D TPL_ENABLE_Matio=OFF \
\
-D Trilinos_ENABLE_Gtest=ON \
\
-D TPL_ENABLE_MPI=ON \
-D MPI_USE_COMPILER_WRAPPERS=ON \
-D MPI_BASE_DIR:PATH=${MPI_DIR} \
-D MPI_EXEC:PATH="mpirun" \
-D MPI_EXEC_MAX_NUMPROCS:STRING="4" \
-D MPI_EXEC_NUMPROCS_FLAG:STRING="--map-by;ppr:2:NUMA:PE=4;-x;OMP_NUM_THREADS=32;-x;OMP_THREAD_PLACES=core;-x;OMP_DISPLAY_ENV=true;-np" \
\
-D TPL_ENABLE_BLAS=ON \
-D BLAS_LIBRARY_DIRS:PATH="${BLAS_DIR}/lib" \
-D BLAS_LIBRARY_NAMES:STRING="blas" \
\
-D TPL_ENABLE_LAPACK=ON \
-D LAPACK_LIBRARY_DIRS:PATH="${LAPACK_DIR}/lib" \
-D LAPACK_LIBRARY_NAMES:STRING="lapack" \
\
-D TPL_ENABLE_Boost=ON \
-D Boost_INCLUDE_DIRS:PATH=${BOOST_DIR}/include \
\
-D TPL_ENABLE_BoostLib=ON \
-D BoostLib_INCLUDE_DIRS:PATH=${BOOST_DIR}/include \
-D BoostLib_LIBRARY_DIRS:PATH=${BOOST_DIR}/lib \
\
-D TPL_ENABLE_Netcdf=ON \
-D Netcdf_INCLUDE_DIRS:PATH="${NETCDF_DIR}/include;${HDF5_DIR}/include" \
-D Netcdf_LIBRARY_DIRS:PATH="${NETCDF_DIR}/lib;${PNETCDF_ROOT}/lib;${HDF5_DIR}/lib;${ZLIB_DIR}/lib" \
-D Netcdf_LIBRARY_NAMES:STRING="netcdf;pnetcdf;hdf5_hl;hdf5;z" \
\
-D TPL_ENABLE_METIS=ON \
-D METIS_INCLUDE_DIRS:PATH=${METIS_DIR}/include \
-D METIS_LIBRARY_DIRS:PATH=${METIS_DIR}/lib \
\
-D TPL_ENABLE_ParMETIS=ON \
-D ParMETIS_INCLUDE_DIRS:PATH=${PARMETIS_DIR}/include \
-D ParMETIS_LIBRARY_DIRS:PATH=${PARMETIS_DIR}/lib \
\
-D TPL_ENABLE_SuperLUDist=ON \
-D SuperLUDist_INCLUDE_DIRS:PATH=${SUPERLUDIST_DIR}/include \
-D SuperLUDist_LIBRARY_DIRS:PATH=${SUPERLUDIST_DIR}/lib \
-D SuperLUDist_LIBRARY_NAMES:STRING="superlu_dist_4.3" \
\
-D Trilinos_EXTRA_LINK_FLAGS:STRING="-lmpi -ldl -lutil -lm -ldl -lpthread" \
\
${EXTRA_ARGS} \
${TRILINOS_HOME}
/cc @trilinos-zoltan @trilinos-belos @trilinos-tpetra