Address initial ATDM Trilinos failing tests on 'serrano' due to 'found no available cpus' and other system issues
Created by: bartlettroscoe
CC: @fryeguy52, @jmgate
Next Action Status
PR #2703 was merged on 5/9/2018 which should greatly reduce test failures due to this issue. None of these failures were observed in any 'serrano' test from 6/15/2018 to 6/20/2018.
Description
With the ATDM Trilinos tests now submitting from 'serrano' to CDash (see TRIL-204), we are seeing a lot of test failures as seen, for example, today at:
with the number of failing and passing tests:
Build Name | Fail | Pass |
---|---|---|
Trilinos-atdm-toss3-intel-debug-openmp | 402 | 1377 |
Trilinos-atdm-toss3-intel-debug-openmp-panzer | 29 | 127 |
Trilinos-atdm-toss3-intel-opt-openmp | 402 | 1379 |
Trilinos-atdm-toss3-intel-opt-openmp-panzer | 29 | 127 |
Many of these failing tests show failures like:
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:
Node: ser58
Please check your allocation.
--------------------------------------------------------------------------
The test executables with MPI are being run with:
mpiexec -np <NP> -map-by socket:PE=16 --oversubscribe <exec-name>
and the level of ctest parallelism being used is only CTEST_PARALLEL_LEVEL=8
.