Test PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3 randomly failing and timing out in some GCC and Intel serial builds
Created by: bartlettroscoe
CC: @fryeguy52
Next Action Status
Test has not failed or timed-out in any promoted "ATDM" CDash Group build since 5/11/2018 (and then only on the one 'sems-rhel6' machine). See below. Therefore, we can assume this is not really a problem now.
Description
The test PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3
looks to be randomly failing in some ATDM bulids and timing out in other ATDM builds of Trilinos. You can see all of the builds on hansen where this test either failed or timed out in the last 2 weeks at:
That shows the test failing wtih the builds Trilinos-atdm-hansen-shiller-gnu-debug-serial
and Trilinos-atdm-hansen-shiller-gnu-opt-serial
and timing out with the build Trilinos-atdm-hansen-shiller-intel-debug-serial
.
One error for the build Trilinos-atdm-hansen-shiller-gnu-debug-serial
for example at:
shows:
libgomp: Invalid value for environment variable OMP_NUM_THREADS
libgomp: Invalid value for environment variable OMP_NUM_THREADS
libgomp: Invalid value for environment variable OMP_NUM_THREADS
libgomp: Invalid value for environment variable OMP_NUM_THREADS
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name hansen02 and rank 0!
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name hansen02 and rank 1!
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name hansen02 and rank 2!
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name hansen02 and rank 3!
p=0 | CubeHexMesh: Building sub cells
libgomp: Thread creation failed: Resource temporarily unavailable
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[54941,1],1]
Exit code: 1
--------------------------------------------------------------------------
It looks like we are sestting the value of OMP_NUM_THREADS incorrectly.