New SEACAS tests failing in ATDM Trilinos builds starting on 7/19/2018 and 7/23/2018
Created by: bartlettroscoe
New SEACAS tests failing in ATDM Trilinos builds starting on 7/19/2018 and 7/23/2018
CC: @trilinos/seacas, @gsjaardema (pushed breaking commits?), @kddevin (Trilinos Data Services Product Lead)
Next Action Status
PR #3213 merged on 8/1/2018 then later fixed in PR #3251 merged 8/8/2018 that disabled most of these tests in the 'mutrino' builds on 8/2/2018. No test failures since 8/8/2018 as of 8/29/2018.
Description
As shown in this query for the builds today, the tests:
- SEACASAprepro_aprepro_array_test
- SEACASAprepro_aprepro_command_line_include_test
- SEACASAprepro_aprepro_command_line_vars_test
- SEACASAprepro_aprepro_unit_test
- SEACASAprepro_lib_aprepro_lib_array_test
- SEACASAprepro_lib_aprepro_lib_unit_test
- SEACASExodus_exodus_unit_tests_nc5_env
are failing in the builds:
- Trilinos-atdm-mutrino-intel-debug-openmp
- Trilinos-atdm-mutrino-intel-opt-openmp
and the tests:
- SEACASIoss_exodus32_to_exodus32
- SEACASIoss_exodus32_to_exodus32_pnetcdf
- SEACASIoss_exodus32_to_exodus64
are failing in the builds:
- Trilinos-atdm-hansen-shiller-cuda-8.0-debug
- Trilinos-atdm-hansen-shiller-cuda-8.0-opt
As shown in this query showing failing SEACAS tests going back to 7/10/2018, the test SEACASExodus_exodus_unit_tests_nc5_env
started failing on 7/19/2018 and the other tests started failing on 7/23/2018. There were several PRs merged the days before these dates by @gsjaardema so it is not clear which changes caused these new failures but it seems likely that one or more of the commits in these merged PRs triggered these new failures.
Also, the test SEACASAprepro_aprepro_test_dump_reread
added in one of these PRs appeared on 7/23/2018 and then started randomly failing as shown in this query. When the test passes like shown here, it shows:
================================================================================
TEST_3
Running: "diff" "-w" "test-filter.dump" "test-reread.dump"
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
TEST_3: Return code = 0
TEST_3: Pass criteria = Zero return code [PASSED]
TEST_3: Result = PASSED
================================================================================
when it fails like shown here, it shows:
================================================================================
TEST_3
Running: "diff" "-w" "test-filter.dump" "test-reread.dump"
--------------------------------------------------------------------------------
1,2c1,2
< Thu Aug 2 11:20:16 2018: [unset]:_pmi_alps_init:alps_get_placement_info returned with error -1
< Thu Aug 2 11:20:16 2018: [unset]:_pmi_init:_pmi_alps_init returned -1
---
> Thu Aug 2 11:20:17 2018: [unset]:_pmi_alps_init:alps_get_placement_info returned with error -1
> Thu Aug 2 11:20:17 2018: [unset]:_pmi_init:_pmi_alps_init returned -1
--------------------------------------------------------------------------------
TEST_3: Return code = 1
TEST_3: Pass criteria = Zero return code [FAILED]
TEST_3: Result = FAILED
================================================================================
Steps to reproduce
These failures should be reproducable on the machines 'hansen' or 'shiller' and 'mutrino' using the instructions in:
For example, for the failures on 'hansen'/'shiler', the specific instructions are given at:
For example, after cloning Trilinos, the following commands should reproduce the test failures on 'hansen' or 'shiller' with:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-8.0-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnvSettings.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_SEACAS=ON \
$TRILINOS_DIR
$ make NP=16
$ srun ctest -j16