Zoltan Test Failures on Knights Landing with OpenMPI 1.10.4 and Intel 17.0.098
Created by: nmhamster
Zoltan team I am seeing some issues with the latest builds on Knights Landing and Intel 17.0.098 compilers. I am seeing an insufficient memory error on the failing cases. The node has 16GB + 96GB of memory so I think this should be sufficient?
$ ctest -V -R Zoltan_ch_drake_zoltan_parallel
UpdateCTestConfiguration from :/home/sdhammo/git/trilinos-github-repo/build-knl-170098/DartConfiguration.tcl
Parse Config file:/home/sdhammo/git/trilinos-github-repo/build-knl-170098/DartConfiguration.tcl
Add coverage exclude regular expressions.
SetCTestConfiguration:CMakeCommand:/home/projects/x86-64-knl/cmake/3.5.2/bin/cmake
UpdateCTestConfiguration from :/home/sdhammo/git/trilinos-github-repo/build-knl-170098/DartConfiguration.tcl
Parse Config file:/home/sdhammo/git/trilinos-github-repo/build-knl-170098/DartConfiguration.tcl
Test project /home/sdhammo/git/trilinos-github-repo/build-knl-170098
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 199
Start 199: Zoltan_ch_drake_zoltan_parallel
199: Test command: /home/projects/x86-64-knl/cmake/3.5.2/bin/cmake "-DTEST_CONFIG=" "-P" "/home/sdhammo/git/trilinos-github-repo/build-knl-170098/packages/zoltan/test/ch_drake/Zoltan_ch_drake_zoltan_parallel.cmake"
199: Test timeout computed to be: 1500
199:
199: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
199:
199: Advanced Test: Zoltan_ch_drake_zoltan_parallel
199:
199: Selected Test/CTest Propeties:
199: CATEGORIES = NIGHTLYPERFORMANCE
199: PROCESSORS = 3
199: TIMEOUT = DEFAULT
199:
199: Running test commands: TEST_0
199:
199: ================================================================================
199:
199: TEST_0
199:
199: Running: "/usr/bin/perl" "../ctest_zoltan.pl" "--np" "3" "--debug" "--mpiexec" "/home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec" "--mpiexecarg" "-np" "--pkg" "Zoltan"
199:
199: --------------------------------------------------------------------------------
199:
199: CTEST_FULL_OUTPUT
199: --np3--debug--mpiexec/home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec--mpiexecarg-np--pkgZoltan
199: DEBUG HOSTNAME node02.bowman.sandia.gov node0
199: DEBUG: package Zoltan
199: 08:21:00 up 58 days, 22:32, 0 users, load average: 3.69, 2.34, 1.08
199: DEBUG: mpiexec /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np
199: DEBUG Dir /home/sdhammo/git/trilinos-github-repo/build-knl-170098/packages/zoltan/test/ch_drake dirname drake
199: DEBUG Outfilebase: ; Dropbase:
199: DEBUG Running test 0 on zdrive.inp.rcb
199: DEBUG Test name: rcb
199: DEBUG Archfilebase: drake.rcb.3.; Dropbase: drake.rcb.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.rcb 2>&1 | tee drake.rcb.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.rcb
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using rcb.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: keep_cuts 1
199: debug_level 3
199: timer user
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 3 (RCB)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.rcb.3.0
199: Test drake:rcb FAILED (Missing output files)
199: DEBUG Running test 1 on zdrive.inp.rcb-ts
199: DEBUG Test name: rcb-ts
199: DEBUG Archfilebase: drake.rcb-ts.3.; Dropbase: drake.rcb-ts.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.rcb-ts 2>&1 | tee drake.rcb-ts.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.rcb-ts
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using rcb.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: tflops_special 1
199: debug_level 3
199: timer user
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 3 (RCB)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.rcb-ts.3.0
199: Test drake:rcb-ts FAILED (Missing output files)
199: DEBUG Running test 2 on zdrive.inp.rib
199: DEBUG Test name: rib
199: DEBUG Archfilebase: drake.rib.3.; Dropbase: drake.rib.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.rib 2>&1 | tee drake.rib.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.rib
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using rib.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: keep_cuts 1
199: debug_level 3
199: timer user
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 7 (RIB)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.rib.3.0
199: Test drake:rib FAILED (Missing output files)
199: DEBUG Running test 3 on zdrive.inp.rib-ts
199: DEBUG Test name: rib-ts
199: DEBUG Archfilebase: drake.rib-ts.3.; Dropbase: drake.rib-ts.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.rib-ts 2>&1 | tee drake.rib-ts.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.rib-ts
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using rib.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: tflops_special 1
199: debug_level 3
199: timer user
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 7 (RIB)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.rib-ts.3.0
199: Test drake:rib-ts FAILED (Missing output files)
199: DEBUG Running test 4 on zdrive.inp.hsfc
199: DEBUG Test name: hsfc
199: DEBUG Archfilebase: drake.hsfc.3.; Dropbase: drake.hsfc.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.hsfc 2>&1 | tee drake.hsfc.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.hsfc
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using hsfc.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: keep_cuts 1
199: debug_level 3
199: timer user
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 8 (HSFC)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.hsfc.3.0
199: Test drake:hsfc FAILED (Missing output files)
199: DEBUG Running test 5 on zdrive.inp.phg
199: DEBUG Test name: phg
199: DEBUG Archfilebase: drake.phg.3.; Dropbase: drake.phg.drops.3.
199: DEBUG Executing now: /home/projects/x86-64-knl/openmpi/1.10.4/intel/17.0.098/bin/mpiexec --mca mpi_yield_when_idle 1 -np 3 ../zdrive.exe zdrive.inp.phg 2>&1 | tee drake.phg.3.outerr
199:
199:
199:
199: Reading the command file, zdrive.inp.phg
199: Input values:
199: Zoltan version 3.83
199: zdrive version 1.0
199: Total number of Processors = 3
199:
199: Performing load balance using hypergraph.
199: Parameters:
199: remap 0
199: obj_weight_dim 1
199: phg_edge_size_threshold 1.0
199:
199: Initially distribute input objects according to assignments in file.
199: ##########################################################
199: ZOLTAN Load balancing method = 10 (HYPERGRAPH)
199: Starting iteration 1
199: =========================messages from Proc 0=========================
199: Proc 0: fatal: insufficient memory
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/ch/ch_dist_graph.c
199: Proc 0: at line 407
199: Proc 0: fatal: Error returned from chaco_dist_graph
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_chaco_io.c
199: Proc 0: at line 248
199: Proc 0: fatal: Error returned from read_chaco_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 571
199: Proc 0: fatal: Error returned from read_mesh
199:
199: Proc 0: in file /home/sdhammo/git/trilinos-github-repo/packages/zoltan/src/driver/dr_main.c
199: Proc 0: at line 334
199: --------------------------------------------------------------------------
199: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
199: with errorcode -1.
199:
199: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
199: You may or may not see output from other processes, depending on
199: exactly when Open MPI kills them.
199: --------------------------------------------------------------------------
199: DEBUG system results 0
199: Using default indextype
199: DEBUG moving files: drake.out.3.0 output/drake.phg.3.0
199: Test drake:phg FAILED (Missing output files)
199: Test drake: 0 out of 6 tests PASSED.
199: Test drake: 6 out of 6 tests FAILED.
199:
199: --------------------------------------------------------------------------------
199:
199: TEST_0: Return code = 0
199: TEST_0: Pass criteria = Return code
199: TEST_0: Result = PASSED
199:
199: ================================================================================
199:
199: OVERALL FINAL RESULT: TEST PASSED (Zoltan_ch_drake_zoltan_parallel)
199:
199: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
199:
1/1 Test #199: Zoltan_ch_drake_zoltan_parallel ...***Failed Error regular expression found in output. Regex=[FAILED] 9.20 sec
0% tests passed, 1 tests failed out of 1
Label Time Summary:
Zoltan = 9.20 sec (1 test)
Total Test time (real) = 10.39 sec
The following tests FAILED:
199 - Zoltan_ch_drake_zoltan_parallel (Failed)
Errors while running CTest