Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #3120

Closed
Open
Created Jul 15, 2018 by James Willenbring@jmwilleMaintainer

Weird MPI run-time error in Dashboard build using ancient OpenMPI version

Created by: mhoemmen

@trilinos/tpetra @trilinos/framework

I noticed that Tpetra's "ReadTriples" test showed a failure on one Dashboard build:

https://testing-vm.sandia.gov/cdash/testDetails.php?test=50492359&build=3725219

The failure looks spurious, possibly due to a full /tmp filesystem:

[ascic114:11890] opal_os_dirpath_create: Error: Unable to create the sub-directory (/tmp/openmpi-sessions-trilinos@ascic114_0/49948) of (/tmp/openmpi-sessions-trilinos@ascic114_0/49948/0/0), mkdir failed [1]
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 107
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 402
[ascic114:11890] [[49948,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 638
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

I also noticed that the build uses OpenMPI 1.8.7. @prwolfe and/or @rrdrake reported to me that Sierra skipped over that version of OpenMPI in favor of 1.10.x, because 1.8.y caused tests to fail. 1.8 is also a retired version of OpenMPI. (So is 1.10, but at least it's a bit newer.) We should get rid of that build or update the OpenMPI version.

Expectations

Nobody tests with OpenMPI versions that old. OpenMPI doesn't support them, and neither Sierra nor ATDM apps use them.

Possible Solution

  1. Eliminate that build, or
  2. update its OpenMPI version.
Assignee
Assign to
Time tracking