Weird MPI run-time error in Dashboard build using ancient OpenMPI version
Created by: mhoemmen
@trilinos/tpetra @trilinos/framework
I noticed that Tpetra's "ReadTriples" test showed a failure on one Dashboard build:
https://testing-vm.sandia.gov/cdash/testDetails.php?test=50492359&build=3725219
The failure looks spurious, possibly due to a full /tmp
filesystem:
[ascic114:11890] opal_os_dirpath_create: Error: Unable to create the sub-directory (/tmp/openmpi-sessions-trilinos@ascic114_0/49948) of (/tmp/openmpi-sessions-trilinos@ascic114_0/49948/0/0), mkdir failed [1]
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 107
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 402
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 638
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_session_dir failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
I also noticed that the build uses OpenMPI 1.8.7. @prwolfe and/or @rrdrake reported to me that Sierra skipped over that version of OpenMPI in favor of 1.10.x, because 1.8.y caused tests to fail. 1.8 is also a retired version of OpenMPI. (So is 1.10, but at least it's a bit newer.) We should get rid of that build or update the OpenMPI version.
Expectations
Nobody tests with OpenMPI versions that old. OpenMPI doesn't support them, and neither Sierra nor ATDM apps use them.
Possible Solution
- Eliminate that build, or
- update its OpenMPI version.