Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2020-07-22T01:04:27Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4929Link problems with libmuelu breaking most ATDM Trilinos builds starting 4/17/...2020-07-22T01:04:27ZJames WillenbringLink problems with libmuelu breaking most ATDM Trilinos builds starting 4/17/2019*Created by: bartlettroscoe*
CC: @trilinos/muelu , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "client: ATDM">
<???: Add label "ATDM Sev: Blocker" (by default but could ...*Created by: bartlettroscoe*
CC: @trilinos/muelu , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "client: ATDM">
<???: Add label "ATDM Sev: Blocker" (by default but could be other "ATDM Sev: XXX")>
<???: Add label "type: bug"?>
<???: Add label for affected packages (e.g. "pkg: MueLu", "pkg: Tpetra", "pkg: Kokkos", etc.)>
<???: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
<???: Add milestone "Initial cleanup of new ATDM ..." or "Keep promoted ATDM ...">
<???: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&date=2019-04-17&filtercount=1&showfilters=1&field1=buildname&compare1=65&value1=Trilinos-atdm-) there are link errors related to the muelu library. For example, as shown [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?buildid=4904861) it shows link errors like:
```
packages/muelu/src/libmuelu.a(MueLu_CoalesceDropFactory.cpp.o):(.rodata._ZTVN5MueLu7LWGraphIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE[_ZTVN5MueLu7LWGraphIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE]+0xa8): undefined reference to `MueLu::LWGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::print(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, int) const'
packages/muelu/src/libmuelu.a(MueLu_CoalesceDropFactory.cpp.o):(.rodata._ZTVN5MueLu5GraphIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE[_ZTVN5MueLu5GraphIixN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE]+0xa8): undefined reference to `MueLu::Graph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::print(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, int) const'
packages/muelu/src/libmuelu.a(MueLu_CoalesceDropFactory.cpp.o):(.rodata._ZTVN5MueLu7LWGraphIiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE[_ZTVN5MueLu7LWGraphIiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE]+0xa8): undefined reference to `MueLu::LWGraph<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::print(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, int) const'
packages/muelu/src/libmuelu.a(MueLu_CoalesceDropFactory.cpp.o):(.rodata._ZTVN5MueLu5GraphIiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE[_ZTVN5MueLu5GraphIiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEE]+0xa8): undefined reference to `MueLu::Graph<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::print(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, int) const'
collect2: error: ld returned 1 exit status
```
## Steps to Reproduce
One should be able to reproduce this failure on many of the systems as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4982MueLu: MueLu_Helmholtz2DParallel_MPI_4 failing on ATDM complex builds2019-06-08T15:27:25ZJames WillenbringMueLu: MueLu_Helmholtz2DParallel_MPI_4 failing on ATDM complex builds*Created by: fryeguy52*
## Bug Report
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia...*Created by: fryeguy52*
## Bug Report
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=63&value2=-complex-&field3=testname&compare3=65&value3=MueLu_Helmholtz2DParallel_MPI_4&field4=buildstarttime&compare4=84&value4=2019-04-22T00%3A00%3A00&field5=buildstarttime&compare5=83&value5=2019-03-23T00%3A00%3A00) the test:
* MueLu_Helmholtz2DParallel_MPI_4
has been failing since 2019-01-11 in the builds:
* Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-debug
* Trilinos-atdm-sems-rhel7-gnu-7.2.0-openmp-complex-shared-release-debug
* Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-release-debug
* Trilinos-atdm-sems-rhel7-clang-3.9.0-openmp-complex-shared-release-debug
new commits on 2019-04-11 can be found [here](https://testing.sandia.gov/cdash/viewNotes.php?buildid=4867040#!#note2)
## Current Status on CDash
[results for the current testing day](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=63&value2=complex&field3=testname&compare3=65&value3=MueLu_Helmholtz2DParallel_MPI_4&field4=buildstarttime&compare4=84&value4=today&field5=buildstarttime&compare5=83&value5=yesterday)
## Steps to Reproduce
One should be able to reproduce this failure on with a sems rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for with a sems rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#sems-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-complex-shared-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j8
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4989Muelu: MueLu_Maxwell3D-Tpetra_MPI_4 failing on atdm complex build2019-06-08T15:27:25ZJames WillenbringMuelu: MueLu_Maxwell3D-Tpetra_MPI_4 failing on atdm complex build*Created by: fryeguy52*
## Bug Report
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add label "bug"?>
<???: Add label for affected pa...*Created by: fryeguy52*
## Bug Report
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add label "bug"?>
<???: Add label for affected packages (e.g. "MueLu", "Tpetra", "Kokkos", etc.)>
<???: Add milestone "Initial cleanup of new ATDM builds of Trilinos" or "Keep promoted ATDM builds of Trilinos clean">
<???: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
<???: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-release-debug&field2=testname&compare2=61&value2=MueLu_Maxwell3D-Tpetra_MPI_4&field3=site&compare3=61&value3=sems-rhel7&field4=buildstarttime&compare4=84&value4=2019-04-22T00%3A00%3A00&field5=buildstarttime&compare5=83&value5=2019-03-23T00%3A00%3A00) the test:
* MueLu_Maxwell3D-Tpetra_MPI_4
is failing in the build:
* Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-release-debug
## Current Status on CDash
[Test results last 5 days](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-release-debug&field2=testname&compare2=61&value2=MueLu_Maxwell3D-Tpetra_MPI_4&field3=site&compare3=61&value3=sems-rhel7&field4=buildstarttime&compare4=83&value4=5%20days%20ago)
## Steps to Reproduce
One should be able to reproduce this failure on with a sems rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for with a sems rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#sems-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel7-intel-17.0.1-openmp-complex-shared-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j8
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4646Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 timing out on waterman cuda...2019-04-21T01:44:26ZJames WillenbringIfpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 timing out on waterman cuda builds*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https:/...*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-17&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=testname&compare2=61&value2=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field3=site&compare3=61&value3=waterman&field4=buildname&compare4=63&value4=cuda&field5=buildstarttime&compare5=83&value5=2019-03-15&field6=buildstarttime&compare6=84&value6=today) the tests:
* Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4
are timing out in the builds:
* Trilinos-atdm-waterman-cuda-9.2-opt
* Trilinos-atdm-waterman-cuda-9.2-debug
* Trilinos-atdm-waterman-cuda-9.2-release-debug
the same test in the white cuda builds finished in about 30 seconds shown [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-17&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=testname&compare2=61&value2=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field3=site&compare3=61&value3=white&field4=buildname&compare4=63&value4=cuda&field5=buildstarttime&compare5=83&value5=2019-03-15&field6=buildstarttime&compare6=84&value6=today)
## Current Status on CDash
[Current status](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-17&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=testname&compare2=61&value2=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field3=site&compare3=61&value3=waterman&field4=buildname&compare4=63&value4=cuda&field5=buildstarttime&compare5=83&value5=yesterday&field6=buildstarttime&compare6=84&value6=today)
## Steps to Reproduce
One should be able to reproduce this failure on waterman as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for waterman are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-waterman-cuda-9.2-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -n 20 ctest -j20
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4353Ifpack2_unit_tests_MPI_4 randomly failing on ATDM waterman build2019-04-21T01:32:25ZJames WillenbringIfpack2_unit_tests_MPI_4 randomly failing on ATDM waterman build*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add label "bug"?>
<???: Add label for affected packages (e.g. ...*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add label "bug"?>
<???: Add label for affected packages (e.g. "MueLu", "Tpetra", "Kokkos", etc.)>
<???: Add milestone "Initial cleanup of new ATDM builds of Trilinos" or "Keep promoted ATDM builds of Trilinos clean">
<???: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
<???: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-waterman-cuda-9.2-opt&field2=testname&compare2=61&value2=Ifpack2_unit_tests_MPI_4&field3=site&compare3=61&value3=waterman&field4=buildstarttime&compare4=84&value4=2019-02-08T00%3A00%3A00&field5=buildstarttime&compare5=83&value5=2018-12-27T00%3A00%3A00) the test:
* Ifpack2_unit_tests_MPI_4
is randomly failing in the buils:
* Trilinos-atdm-waterman-cuda-9.2-opt
It has failed roughly 6 times in the last month. Here are some examples of the output when it fails:
```
Error, relErr(Y.get1dView ()[9932],Z.get1dView ()[9932]) = relErr(29832,0) = 1 <= tol = 2.22045e-12: failed!
```
```
p=0 | The following tests FAILED:
p=0 | 48. Ifpack2OverlappingRowMatrix_default_scalar_type_default_local_ordinal_type_default_global_ordinal_type_Test0_UnitTest ...
p=0 |
p=0 | Total Time: 6.49 sec
p=0 |
p=1 | Summary: total = 82, run = 82, passed = 81, failed = 1
p=1 |
p=1 | End Result: TEST FAILED
```
## Current Status on CDash
[2 Week history of this test](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-waterman-cuda-9.2-opt&field2=testname&compare2=61&value2=Ifpack2_unit_tests_MPI_4&field3=site&compare3=61&value3=waterman&field4=buildstarttime&compare4=84&value4=tomorrow&field5=buildstarttime&compare5=83&value5=2%20weeks%20ago)
## Steps to Reproduce
One should be able to reproduce the build on waterman where this test is randomly failing as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for waterman are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-waterman-cuda-9.2-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -n 20 ctest -j20
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4260Belos tests timing out on ATDM intel-18 mpich build2019-03-27T20:41:42ZJames WillenbringBelos tests timing out on ATDM intel-18 mpich build*Created by: fryeguy52*
CC: @trilinos/belos, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in the links below the t...*Created by: fryeguy52*
CC: @trilinos/belos, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in the links below the tests:
* [Belos_rcg_hb_MPI_4](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt&field2=buildstarttime&compare2=84&value2=2019-01-25&field3=buildstarttime&compare3=83&value3=2019-01-01&field4=testname&compare4=61&value4=Belos_rcg_hb_MPI_4)
* [Belos_gcrodr_hb_MPI_4](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt&field2=buildstarttime&compare2=84&value2=2019-01-25&field3=buildstarttime&compare3=83&value3=2019-01-01&field4=testname&compare4=61&value4=Belos_gcrodr_hb_MPI_4)
are randomly timing out in the build:
* Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt
## Current Status on CDash
The current status of the Belos tests on this build for the current testing day can be found [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt&field2=buildstarttime&compare2=84&value2=today&field3=buildstarttime&compare3=83&value3=yesterday&field4=testname&compare4=65&value4=Belos_)
## Steps to Reproduce
One should be able to reproduce a build where this is randomly failing on a machine with a cee rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for a machine with a cee rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce the build where this issue is randomly occurring should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Belos=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```
Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4678Stratimikos and Rythmos tests failing on many ATDM builds2019-03-26T15:04:05ZJames WillenbringStratimikos and Rythmos tests failing on many ATDM builds*Created by: fryeguy52*
CC: @trilinos/stratimikos, @srajama1 (Trilinos Linear Solvers Product Lead), @rppawlo (Trilinos Nonlinear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add lab...*Created by: fryeguy52*
CC: @trilinos/stratimikos, @srajama1 (Trilinos Linear Solvers Product Lead), @rppawlo (Trilinos Nonlinear Solvers Product Lead), @bartlettroscoe, @fryeguy52
<Checklist>
<???: Add label "ATDM">
<???: Add label "bug"?>
<???: Add label for affected packages (e.g. "MueLu", "Tpetra", "Kokkos", etc.)>
<???: Add milestone "Initial cleanup of new ATDM builds of Trilinos" or "Keep promoted ATDM builds of Trilinos clean">
<???: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
<???: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-20&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=status&compare2=61&value2=Failed&field3=testname&compare3=62&value3=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field4=buildstarttime&compare4=83&value4=2019-03-20&field5=buildstarttime&compare5=84&value5=2019-03-21) the tests:
* Stratimikos_test_single_stratimikos_solver_driver_belos_np_MPI_1
* Stratimikos_test_single_stratimikos_solver_driver_belos_ml_MPI_1
* Stratimikos_test_single_stratimikos_solver_driver_belos_ifpack_MPI_1
* Rythmos_timeDiscretizedBackwardEuler_amesos_MPI_1
are failing in many ATDM builds.
[new commits when these started failing](https://testing.sandia.gov/cdash/viewNotes.php?buildid=4754139#!#note4)
## Current Status on CDash
currently failing tests in ATDM builds can be seen [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-20&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=status&compare2=61&value2=Failed&field3=buildstarttime&compare3=83&value3=today&field4=buildstarttime&compare4=84&value4=tomorrow)
## Steps to Reproduce
One should be able to reproduce this failure on with a sems rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for with a sems rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#sems-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Stratimikos=ON \
-DTrilinos_ENABLE_Rythmos=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j8
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/994Belos: Change GMRES default orthogonalizer from DGKS to ICGS 2-pass2019-03-22T09:43:04ZJames WillenbringBelos: Change GMRES default orthogonalizer from DGKS to ICGS 2-pass*Created by: mhoemmen*
@trilinos/belos @jjellio @hkthorn *Created by: mhoemmen*
@trilinos/belos @jjellio @hkthorn Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4622Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 failing in ATDM cuda builds2019-03-18T16:12:43ZJames WillenbringIfpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 failing in ATDM cuda builds*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash/que...*Created by: fryeguy52*
CC: @trilinos/ifpack2, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-14&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=64&value2=-rdc-&field3=testname&compare3=61&value3=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field4=status&compare4=62&value4=Passed&field5=buildstarttime&compare5=83&value5=2019-03-14&field6=buildstarttime&compare6=84&value6=2019-03-15) the tests:
* Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4
are failing in the builds:
* Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug
* Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-shared-release-debug
* Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug
* Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug
* Trilinos-atdm-waterman-cuda-9.2-release-debug
* Trilinos-atdm-waterman-cuda-9.2-rdc-shared-release-debug
* Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug
* Trilinos-atdm-waterman-cuda-9.2-opt
* Trilinos-atdm-waterman-cuda-9.2-debug
* Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug
* Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug
## Current Status on CDash
[Currently Status on CDash for all ATDM builds](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4&field3=buildstarttime&compare3=84&value3=today&field4=buildstarttime&compare4=83&value4=yesterday)
## Steps to Reproduce on waterman
One should be able to reproduce this failure on waterman as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for waterman are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-waterman-cuda-9.2-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -n 20 ctest -j20
```
## Steps to Reproduce on white/ride
One should be able to reproduce this failure on ride or white as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for ride or white are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Ifpack2=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3897MueLu_UnitTests[Blocked][Epetra|Tpetra]_MPI_4 failing randomly on several ATD...2019-02-11T17:29:49ZJames WillenbringMueLu_UnitTests[Blocked][Epetra|Tpetra]_MPI_4 failing randomly on several ATDM builds*Created by: fryeguy52*
CC: @trilinos/MueLu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #4046 merged to 'develop' on 12/18/2018 may fix these random failures. Next: Watch for more...*Created by: fryeguy52*
CC: @trilinos/MueLu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #4046 merged to 'develop' on 12/18/2018 may fix these random failures. Next: Watch for more failures over the coming days and weeks to see if there are any more failures ...
## Description
As shown in the links below the tests:
* [MueLu_UnitTestsBlockedEpetra_MPI_4](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=62&value2=Trilinos-atdm-cee-rhel6-gnu-7.2.0-opt-serial&field3=testname&compare3=61&value3=MueLu_UnitTestsBlockedEpetra_MPI_4&field4=status&compare4=61&value4=failed&field5=buildstarttime&compare5=83&value5=2018-10-17)
* [MueLu_UnitTestsEpetra_MPI_4](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=MueLu_UnitTestsEpetra_MPI_4&field3=status&compare3=61&value3=failed&field4=details&compare4=64&value4=Timeout&field5=buildstarttime&compare5=83&value5=2018-10-16)
* [MueLu_UnitTestsEpetra_MPI_1](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=MueLu_UnitTestsEpetra_MPI_1&field3=status&compare3=61&value3=failed&field4=details&compare4=64&value4=Timeout&field5=buildstarttime&compare5=83&value5=2018-10-16)
* [MueLu_UnitTestsTpetra_MPI_1](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=MueLu_UnitTestsTpetra_MPI_1&field3=status&compare3=61&value3=failed&field4=details&compare4=64&value4=Timeout&field5=buildstarttime&compare5=83&value5=2018-10-16)
* [MueLu_UnitTestsTpetra_MPI_1](https://testing.sandia.gov/cdash/testDetails.php?test=61339618&build=4287170)
are randomly failing across several builds. They has failed several times in the last month on different builds. The builds where we have seen failures are:
* Trilinos-atdm-cee-rhel6-gnu-4.9.3-opt-serial
* Trilinos-atdm-cee-rhel6-gnu-opt-serial
* Trilinos-atdm-cee-rhel6-intel-opt-serial
* Trilinos-atdm-hansen-shiller-gnu-opt-openmp
* Trilinos-atdm-hansen-shiller-gnu-opt-serial
* Trilinos-atdm-hansen-shiller-gnu-opt-serial
* Trilinos-atdm-hansen-shiller-intel-debug-openmp
* Trilinos-atdm-hansen-shiller-intel-debug-serial
* Trilinos-atdm-mutrino-intel-opt-openmp-HSW
* Trilinos-atdm-mutrino-intel-opt-openmp-KNL
* Trilinos-atdm-sems-rhel6-gnu-debug-openmp
* Trilinos-atdm-sems-rhel6-intel-opt-openmp
* Trilinos-atdm-serrano-intel-opt-openmp
* Trilinos-atdm-waterman-gnu-opt-openmp
* Trilinos-atdm-waterman-gnu-release-debug-openmp
* Trilinos-atdm-white-ride-cuda-9.2-opt
* Trilinos-atdm-white-ride-gnu-opt-openmp
It looks like that in each case something similar to the following appears in the 'openmp' builds:
```
...
p=0: *** Caught standard std::exception of type 'Xpetra::Exceptions::RuntimeError' :
EpetraExt::MatrixMarketFileToCrsMatrix return value of -1
[FAILED] (0.0902 sec) Hierarchy_double_int_int_Kokkos_Compat_KokkosOpenMPWrapperNode_Write_UnitTest
Location: /home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-intel-debug-openmp/SRC_AND_BUILD/Trilinos/packages/muelu/test/unit_tests/Hierarchy.cpp:889
...
The following tests FAILED:
116. Hierarchy_double_int_int_Kokkos_Compat_KokkosOpenMPWrapperNode_Write_UnitTest ...
...
```
and the 'serial' builds show:
```
...
p=0: *** Caught standard std::exception of type 'Xpetra::Exceptions::RuntimeError' :
EpetraExt::MatrixMarketFileToCrsMatrix return value of -1
[FAILED] (0.00618 sec) Hierarchy_double_int_int_Kokkos_Compat_KokkosSerialWrapperNode_Write_UnitTest
Location: /jenkins/slave/workspace/Trilinos-atdm-sems-rhel6-gnu-debug-serial/SRC_AND_BUILD/Trilinos/packages/muelu/test/unit_tests/Hierarchy.cpp:889
...
The following tests FAILED:
116. Hierarchy_double_int_int_Kokkos_Compat_KokkosSerialWrapperNode_Write_UnitTest ...
...
```
It is just that one failing unit test 116 called `Hierarchy_double_int_int_Kokkos_Compat_KokkosSerialWrapperNode_Write_UnitTest` in the 'serial' builds and called `Hierarchy_double_int_int_Kokkos_Compat_KokkosOpenMPWrapperNode_Write_UnitTest` in the 'openmp' builds.
The first failure showed up on 2018-10-21
## Current Status on CDash
To see failures for these tests in the last month click [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=65&value2=MueLu_UnitTests&field3=status&compare3=61&value3=failed&field4=details&compare4=64&value4=Timeout&field5=buildstarttime&compare5=83&value5=30%20days%20ago).
## Steps to Reproduce
This may be very difficult to reproduce because it is failing infrequently on a single build but nearly every other day across all the builds. Instructions for reproducing ATDM builds can be found at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for ride or white are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite
The exact commands to reproduce one build where this has failed on white or ride are:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-white-ride-gnu-opt-openmp
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3541ShyLU_DD tests build failure in targeted CUDA PR bulid Trilinos-atdm-white-ri...2019-01-28T18:04:16ZJames WillenbringShyLU_DD tests build failure in targeted CUDA PR bulid Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt starting 10/2/2018*Created by: bartlettroscoe*
CC: @trilinos/shylu, @srajama1 (Trilinos Linear Solvers Product Area Lead), @fryeguy52, @roeverf, @searhein
## Next Action Status
After merge of PR #4248 to 'develop' on 1/23/2019, ShyLU_DD build and ...*Created by: bartlettroscoe*
CC: @trilinos/shylu, @srajama1 (Trilinos Linear Solvers Product Area Lead), @fryeguy52, @roeverf, @searhein
## Next Action Status
After merge of PR #4248 to 'develop' on 1/23/2019, ShyLU_DD build and tests in build `Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt` on 'ride' was 100% clean on 1/24/2019.
## Description
Starting today, there are two build errors for the ShyLU_DD package in the build `Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt ` on 'white' and 'ride' as shown [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?buildid=3998259) which shows build errors starting with:
```
/home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt/SRC_AND_BUILD/Trilinos/packages/shylu/shylu_dd/frosch/test/Thyra_Tpetra/main.cpp(104): error: "EpetraNode" is ambiguous
```
ShyLU_DD was building just fine yesterday in this build as shown [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?type=0&buildid=3995410).
Looking at the commits pulled today shown [here](https://testing.sandia.gov/cdash-dev-view/viewNotes.php?buildid=3998221#!#note6), it seems likely this might have been caused by one of the commits from @roeverf to the ShyLU_DD package merged in the PR #3472 merged to 'develop' by @searhein on 10/1/2018 as shown [here](https://github.com/trilinos/Trilinos/pull/3472#event-1876847991).
This is an important build because we are targeting this build on 'white' and 'ride' as a Trilinos PR testing build (see #2464 ).
## Current Status on CDash
The current status of `ShyLU_DD` in this build and tests over the last few days can be seen in [this CDash query](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&date=2019-01-22&filtercount=4&showfilters=1&filtercombine=and&field1=subprojects&compare1=93&value1=ShyLU_DD&field2=buildname&compare2=61&value2=Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt&field3=site&compare3=61&value3=ride&field4=buildstarttime&compare4=83&value4=1%20week%20ago).
## Steps to reproduce
One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnvAllPtPackages.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_ShyLU_DD=ON \
$TRILINOS_DIR
$ make NP=16
```
Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3994MueLu_Maxwell3D- tests not run due to build failure in ATDM build2018-12-21T02:48:28ZJames WillenbringMueLu_Maxwell3D- tests not run due to build failure in ATDM build*Created by: fryeguy52*
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
Merge of PR #3993 on 12/4/2018 resulted in passing build on [12/5/2018](https://test...*Created by: fryeguy52*
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
Merge of PR #3993 on 12/4/2018 resulted in passing build on [12/5/2018](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&parentid=4253654).
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt&field2=testname&compare2=65&value2=MueLu_Maxwell3D-&field3=testname&compare3=66&value3=_MPI_4&field4=site&compare4=61&value4=cee-rhel6&field5=buildstarttime&compare5=84&value5=2018-12-04T00%3A00%3A00&field6=buildstarttime&compare6=83&value6=2018-11-04T00%3A00%3A00) the following tests are not being run due to a [build failure](https://testing.sandia.gov/cdash/viewBuildError.php?buildid=4245202) that started on 12/01/2018:
* MueLu_Maxwell3D-Epetra_MPI_4
* MueLu_Maxwell3D-Tpetra-Stratimikos_MPI_4
* MueLu_Maxwell3D-Tpetra_MPI_4
in the build:
* Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt
The error occurs when building `packages/muelu/test/maxwell/CMakeFiles/MueLu_Maxwell3D.dir/Maxwell3D.cpp.o`
Standard error:
```
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/muelu/test/maxwell/Maxwell3D.cpp:262:11: error: no viable overloaded '='
tm2 = Teuchos::null;
~~~ ^ ~~~~~~~~~~~~~
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/teuchos/comm/src/Teuchos_TimeMonitor.hpp:178:34: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'Teuchos::ENull' to 'const Teuchos::TimeMonitor' for 1st argument
class TEUCHOSCOMM_LIB_DLL_EXPORT TimeMonitor :
^
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/muelu/test/maxwell/Maxwell3D.cpp:274:11: error: no viable overloaded '='
tm3 = Teuchos::null;
~~~ ^ ~~~~~~~~~~~~~
/scratch/rabartl/Trilinos.base/NightlyBuilds/Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt/SRC_AND_BUILD/Trilinos/packages/teuchos/comm/src/Teuchos_TimeMonitor.hpp:178:34: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'Teuchos::ENull' to 'const Teuchos::TimeMonitor' for 1st argument
class TEUCHOSCOMM_LIB_DLL_EXPORT TimeMonitor :
^
2 errors generated.
```
## Current Status on CDash
The current status of these tests/builds for the current testing day can be found [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt&field2=testname&compare2=65&value2=MueLu_Maxwell3D-&field3=testname&compare3=66&value3=_MPI_4&field4=site&compare4=61&value4=cee-rhel6)
## Steps to Reproduce
One should be able to reproduce this failure on a machine with a cee rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for a machine with a cee rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3405Tpetra_ASSUME_CUDA_AWARE_MPI affecting test performance on white2018-12-20T21:49:06ZJames WillenbringTpetra_ASSUME_CUDA_AWARE_MPI affecting test performance on white*Created by: kddevin*
@trilinos/tpetra @trilinos/zoltan2 @ndellingwood @kyungjoo-kim @nmhamster @crtrott
## Next Action Status
PR #3500 merged to 'develop' on 12/6/2018 that enables `Tpetra_ASSUME_CUDA_AWARE_MPI =ON` by default f...*Created by: kddevin*
@trilinos/tpetra @trilinos/zoltan2 @ndellingwood @kyungjoo-kim @nmhamster @crtrott
## Next Action Status
PR #3500 merged to 'develop' on 12/6/2018 that enables `Tpetra_ASSUME_CUDA_AWARE_MPI =ON` by default for ATDM (and all) CUDA Trilinos builds. Next, watch to see ATDM Trilinos builds and the EMPIRE builds over next few days to see what happens ...
## Description
Using devpack/20180521/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88 on white, the behavior of tests differs depending on the setting of Tpetra_ASSUME_CUDA_AWARE_MPI.
With Tpetra_ASSUME_CUDA_AWARE_MPI=ON, there are segfaults in MPI_Send called from Tpetra::Distributor::doPosts.
With Tpetra_ASSUME_CUDA_AWARE_MPI=OFF, the tests run fine. (Note that ATDM testing uses this setting.)
I don't know whether this problem is caused by OpenMPI 3 (see #3356), or a problem with Tpetra, or something that I'm doing wrong with my configure/build. @crtrott wrote the Tpetra test script that I am using (see info below), and it worked without errors in March with now obsolete devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44.
I've had difficulty getting Trilinos to configure and build with devpacks using OpenMPI 2.
I will continue to try to get an OpenMPI 2 version to work. I welcome an OpenMPI 2 build script that works on white; please share if you have one. Thanks.
## Motivation and Context
<!---
How has this expectation failure affected you? What are you trying to
accomplish? Why do we need to address this? What does it have to do with
anything? Providing context helps us come up with a solution that is most
useful in the real world.
-->
We need Tpetra to work for Zoltan2 testing.
The segfaults from the Tpetra::Distributor's MPI_Send occur in Zoltan2 testing as well; setting Tpetra_ASSUME_CUDA_AWARE_MPI=OFF allows Zoltan2 tests to run.
## Steps to Reproduce
<!---
Provide a link to a live example, or an unambiguous set of steps to reproduce
this issue. Include code to reproduce, if relevant.
1. Do this.
1. Do that.
1. Shake fist angrily at computer.
-->
I am running the Tpetra test script described at https://github.com/trilinos/Trilinos/wiki/Tpetra-test-script. I changed the devpack in the script to devpack/20180521/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88, because the original devpack in the script (devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44) is no longer available on white. I compared tests with -DTpetra_ASSUME_CUDA_AWARE_MPI=ON and OFF.
On white, in my home directory,
Trilinos/Obj_white/Test_white_2018_09_06_22.34.48 has tests with Tpetra_ASSUME_CUDA_AWARE_MPI=OFF, and
Trilinos/Obj_white/Test_white_2018_09_06_15.35.24 has tests with Tpetra_ASSUME_CUDA_AWARE_MPI=ON.
The usual Testing/Temporary/LastTest.log shows the segfaults for the case with Tpetra_ASSUME_CUDA_AWARE_MPI=ON, and the passing tests with it OFF.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/3640MueLu_UnitTestsBlockedEpetra_MPI_1 failing on ATDM cee-rhel6-clang-opt-serial...2018-12-20T18:23:36ZJames WillenbringMueLu_UnitTestsBlockedEpetra_MPI_1 failing on ATDM cee-rhel6-clang-opt-serial build*Created by: fryeguy52*
CC: @trilinos/muelu , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #4072 merged to 'develop' on 12/19/2018 for test `MueLu_UnitTestsBlockedEpetra_MPI_1` tha...*Created by: fryeguy52*
CC: @trilinos/muelu , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #4072 merged to 'develop' on 12/19/2018 for test `MueLu_UnitTestsBlockedEpetra_MPI_1` that was failing every day. Test passed passed on 12/20/2018.
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/queryTests.php?project=Trilinos&date=2018-10-15&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-cee-rhel6-&field2=status&compare2=62&value2=passed) the tests:
* MueLu_UnitTestsBlockedEpetra_MPI_1
are failing in the builds:
* Trilinos-atdm-cee-rhel6-clang-opt-serial
failing due to seg fault
```
[ceerws1113:37972] *** Process received signal ***
[ceerws1113:37972] Signal: Segmentation fault (11)
[ceerws1113:37972] Signal code: Address not mapped (1)
[ceerws1113:37972] Failing at address: (nil)
```
## Steps to Reproduce
One should be able to reproduce this failure any CEE LAN RHEL6 SRN machine as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the CEE LAN RHEL6 SRN machines are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cee-rhel6-clang-opt-serial
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3992Anasazi_Epetra_BKS_norestart_test_MPI_4 failing in seveal ATDM builds.2018-12-20T18:04:13ZJames WillenbringAnasazi_Epetra_BKS_norestart_test_MPI_4 failing in seveal ATDM builds.*Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
Triggered by the PR #3951 merged to 'develop' on 10/28/2018 that worked around Int...*Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
Triggered by the PR #3951 merged to 'develop' on 10/28/2018 that worked around Intel 18.0.2 MKL GEEV defect. Next: Try updated Intel MKL 18.0.5 on 'mutrino' (with local revert of #3951) and see all of these failures go away (@fryeguy52) ...
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Anasazi_Epetra_BKS_norestart_test_MPI_4&field3=buildstarttime&compare3=83&value3=2018-11-04T00%3A00%3A00&field4=status&compare4=61&value4=Failed) the test:
* Anasazi_Epetra_BKS_norestart_test_MPI_4
is failing in the builds:
* Trilinos-atdm-mutrino-intel-opt-openmp-HSW (since ???)
* Trilinos-atdm-mutrino-intel-opt-openmp-KNL (since ???)
* Trilinos-atdm-cee-rhel6-intel-17.0.1-intelmpi-5.1.2-serial-static-opt (since 11/30/2018)
* Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt (11/29/2018 & 12/1/2018)
* Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt (on 12/2/2018)
* Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt (on 12/10/2018)
<more-details>
Looks like some of these failures are random like shown for the build [Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt&field2=testname&compare2=61&value2=Anasazi_Epetra_BKS_norestart_test_MPI_4&field3=site&compare3=61&value3=cee-rhel6&field4=buildstarttime&compare4=83&value4=2018-11-11T00%3A00%3A00) and the build [Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt&field2=testname&compare2=61&value2=Anasazi_Epetra_BKS_norestart_test_MPI_4&field3=site&compare3=61&value3=cee-rhel6&field4=buildstarttime&compare4=84&value4=2018-12-11T00%3A00%3A00&field5=buildstarttime&compare5=83&value5=2018-11-11T00%3A00%3A00).
The errors look like [here](https://testing.sandia.gov/cdash/testDetails.php?test=61150478&build=4276066) for example:
```
Number of iterations performed in BlockKrylovSchur_test.exe: 30
Direct residual norms computed in BlockKrylovSchur_test.exe
Eigenvalue Residual
----------------------------------------
1.199112e+05 1.296543e-07
1.196455e+05 1.185550e-07
1.192047e+05 4.530562e-04
1.185918e+05 1.497329e-04
1.178109e+05 4.552932e-04
End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[25128,1],1]
Exit code: 255
--------------------------------------------------------------------------
...
```
## Current Status on CDash
The current status of these tests/builds for the current testing day can be found [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=62&value2=Trilinos-atdm-cee-rhel6-intel-18.0.2-mpich2-3.2-serial-static-opt&field3=testname&compare3=61&value3=Anasazi_Epetra_BKS_norestart_test_MPI_4&field4=buildstarttime&compare4=83&value4=1%20day%20ago&field5=status&compare5=61&value5=Failed&field6=site&compare6=62&value6=mutrino)
## Steps to Reproduce
One should be able to reproduce this failure on a machine with a cee rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for a machine with a cee rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-cee-rhel6-intel-17.0.1-intelmpi-5.1.2-serial-static-opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Anasazi=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3989Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4 in many ATDM builds2018-12-20T17:28:41ZJames WillenbringAnasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4 in many ATDM builds*Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
The merge of PR #4031 to 'develop' on 12/13/2018 seems to have resulted in the ...*Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
The merge of PR #4031 to 'develop' on 12/13/2018 seems to have resulted in the test `Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4` passing in all ATDM Trilinos builds. It passed in all 41 ATDM Trilinos builds on 2018-12-19 as shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&date=2018-12-19&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4) (and there were no missing builds for testing day 2018-12-19 so this should be complete test results).
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4&field3=buildstarttime&compare3=84&value3=2018-12-04T00%3A00%3A00&field4=buildstarttime&compare4=83&value4=2018-11-04T00%3A00%3A00&field5=status&compare5=61&value5=Failed) the test: `Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4` is has failed in many ATDM builds since 11/24/2018 all the builds where this has failed in that time are are:
* Trilinos-atdm-sems-rhel6-intel-opt-openmp
* Trilinos-atdm-mutrino-intel-opt-openmp-KNL
* Trilinos-atdm-mutrino-intel-opt-openmp-HSW
* Trilinos-atdm-chama-intel-opt-openmp
* Trilinos-atdm-chama-intel-debug-openmp
* Trilinos-atdm-cee-rhel6-intel-17.0.1-intelmpi-5.1.2-serial-static-opt
* Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt
* Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt
* Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt
The test has been failing everyday since 11/29/2018 in the builds:
* Trilinos-atdm-cee-rhel6-clang-5.0.1-openmpi-1.10.2-serial-static-opt
* Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-opt
* Trilinos-atdm-cee-rhel6-gnu-7.2.0-openmpi-1.10.2-serial-static-opt
the test output looks like this in these cases:
```
Building Map
Setting up info for filling matrix
Creating matrix
Filling matrix
Calling FillComplete on matrix
Setting Anasazi parameters
Creating initial vector for solver
Creating eigenproblem
Creating eigensolver (GeneralizedDavidsonSolMgr)
Solving eigenproblem
[ceerws1113:51638] *** An error occurred in MPI_Allreduce
[ceerws1113:51638] *** reported by process [999489537,2]
[ceerws1113:51638] *** on communicator MPI_COMM_WORLD
[ceerws1113:51638] *** MPI_ERR_IN_STATUS: error code in status
[ceerws1113:51638] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ceerws1113:51638] *** and potentially your MPI job)
[ceerws1113:51629] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[ceerws1113:51629] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
```
## Current Status on CDash
The current status of this test on all ATDM builds can be found [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4)
History for the last week on ATDM builds can be seen [here](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Anasazi_Epetra_GeneralizedDavidson_nh_test_MPI_4&field3=buildstarttime&compare3=83&value3=7%20days%20ago)
## Steps to Reproduce on CEE RHEL6
One should be able to reproduce this failure on a machine with a cee rhel6 environment because it has been failing there everyday. The process is described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for a machine with a cee rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-cee-rhel6-gnu-4.9.3-openmpi-1.10.2-serial-static-op
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Anasazi=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3497Belos_gcrodr_hb_MPI_4 failing in ATDM builds on mutrino2018-12-12T21:22:57ZJames WillenbringBelos_gcrodr_hb_MPI_4 failing in ATDM builds on mutrino*Created by: fryeguy52*
CC: @trilinos/belos , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #3951 merged to 'develop' on 11/28/2018 resulted in this test passing in the Intel 18.0.2 ...*Created by: fryeguy52*
CC: @trilinos/belos , @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
PR #3951 merged to 'develop' on 11/28/2018 resulted in this test passing in the Intel 18.0.2 builds on 'mutrino' and the 'cee-rhel6' builds on 12/1/2018 and in all builds for several days as of 12/3/2018.
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/queryTests.php?project=Trilinos&date=2018-09-24&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=site&compare2=61&value2=mutrino&field3=status&compare3=62&value3=passed&field4=buildstarttime&compare4=83&value4=2018-09-01&field5=testname&compare5=63&value5=Belos) the test:
* Belos_gcrodr_hb_MPI_4
is failing in the builds:
* Trilinos-atdm-mutrino-intel-opt-openmp-HSW
* Trilinos-atdm-mutrino-intel-opt-openmp-KNL
some test output:
```
*** Error in `/lscratch1/jenkins/mutrino-slave/workspace/Trilinos-atdm-mutrino-intel-opt-openmp-HSW/SRC_AND_BUILD/BUILD/packages/belos/epetra/test/GCRODR/Belos_gcrodr_hb.exe': free(): invalid pointer: 0x000001000011bba0 ***
*** Error in `/lscratch1/jenkins/mutrino-slave/workspace/Trilinos-atdm-mutrino-intel-opt-openmp-HSW/SRC_AND_BUILD/BUILD/packages/belos/epetra/test/GCRODR/Belos_gcrodr_hb.exe': free(): invalid pointer: 0x00000100004b4980 ***
*** Error in `/lscratch1/jenkins/mutrino-slave/workspace/Trilinos-atdm-mutrino-intel-opt-openmp-HSW/SRC_AND_BUILD/BUILD/packages/belos/epetra/test/GCRODR/Belos_gcrodr_hb.exe': free(): invalid pointer: 0x00000100004b4980 ***
*** Error in `/lscratch1/jenkins/mutrino-slave/workspace/Trilinos-atdm-mutrino-intel-opt-openmp-HSW/SRC_AND_BUILD/BUILD/packages/belos/epetra/test/GCRODR/Belos_gcrodr_hb.exe': free(): invalid pointer: 0x00000100004b4980 ***
```
## Steps to Reproduce
One should be able to reproduce this failure on the machine mutrino as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the system mutrino are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#mutrino
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh intel-opt-openmp-HSW
$ cmake \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ make -j16
$ salloc -N 1 -p standard -J $JOB_NAME ctest -j16
```Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2633Test Anasazi_Epetra_LOBPCG_solvertest_MPI_4 randomly failing in Trilinos-atdm...2018-12-07T15:13:03ZJames WillenbringTest Anasazi_Epetra_LOBPCG_solvertest_MPI_4 randomly failing in Trilinos-atdm-white-ride-gnu-debug-openmp build*Created by: bartlettroscoe*
Test Anasazi_Epetra_LOBPCG_solvertest_MPI_4 randomly failing in some ATDM builds
**CC:** @trilinos/anasazi, @fryeguy52
## Next Action Status:
No errors observed in any promoted ATDM Trilinos builds...*Created by: bartlettroscoe*
Test Anasazi_Epetra_LOBPCG_solvertest_MPI_4 randomly failing in some ATDM builds
**CC:** @trilinos/anasazi, @fryeguy52
## Next Action Status:
No errors observed in any promoted ATDM Trilinos builds since 4/26/2018.
## Description
As shown in the query:
* https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=7&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=status&compare2=62&value2=Passed&field3=testname&compare3=61&value3=Anasazi_Epetra_LOBPCG_solvertest_MPI_4&field4=buildstarttime&compare4=84&value4=2018-04-25&field5=buildname&compare5=64&value5=Trilinos-atdm-white-ride-cuda-opt&field6=buildname&compare6=64&value6=Trilinos-atdm-white-ride-gnu-opt-openmp&field7=buildstarttime&compare7=84&value7=2018-04-01
the test `Anasazi_Epetra_LOBPCG_solvertest_MPI_4` looks to be randomly failing in the following builds:
* `Trilinos-atdm-hansen-shiller-cuda-opt`
* `Trilinos-atdm-hansen-shiller-gnu-debug-serial`
* `Trilinos-atdm-hansen-shiller-gnu-opt-serial`
* `Trilinos-atdm-white-ride-gnu-debug-openmp`
The most recent failure was on 2018-04-023. The only failures since `2018-03-17` where with the build `Trilinos-atdm-white-ride-gnu-debug-openmp` run on 'white' and 'ride'. The failures on `2018-03-17` and before that all look like:
```
Anasazi in Trilinos 12.13 (Dev)
Testing solver(default,default) with standard eigenproblem...
libgomp: Thread creation failed: Resource temporarily unavailable
libgomp: Thread creation failed: Resource temporarily unavailable
libgomp: Thread creation failed: Resource temporarily unavailable
libgomp: Thread creation failed: Resource temporarily unavailable
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[40799,1],2]
Exit code: 1
--------------------------------------------------------------------------
```
Since we have not seen any failures like that since 2018-03-17, I think those issues got solved by adjusting the way tests are run on that system (likely the commit 114ca53c76c9e090f9a4dbcf10d8cb81bf6f0ca6 and/or the commit d852fa33fefbf74a15bdd04370e7b0b6ce55fd6c).
The more recent three failures for the build `Trilinos-atdm-white-ride-gnu-debug-openmp` which occurred on 2018-04-03, 2018-04-17, and 2018-04-23 show the same output:
```
Anasazi in Trilinos 12.13 (Dev)
Testing solver(default,default) with standard eigenproblem...
Testing solver(default,default) with generalized eigenproblem...
Testing solver(nev,false) with standard eigenproblem...
Testing solver(nev,true) with standard eigenproblem...
Testing solver(nev,false) with generalized eigenproblem...
Testing solver(nev,true) with generalized eigenproblem...
Testing solver(2*nev,false) with standard eigenproblem...
Testing solver(2*nev,true) with standard eigenproblem...
[ride13:114533] *** Process received signal ***
[ride13:114533] Signal: Segmentation fault (11)
[ride13:114533] Signal code: Address not mapped (1)
[ride13:114533] Failing at address: 0x10036020010
[ride13:114533] [ 0] [0x100000050478]
[ride13:114533] [ 1] [0x3ff0000000000000]
[ride13:114533] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 114533 on node ride13 exited on signal 11 (Segmentation fault).
```
Therefore, I think that the problem with this test is the segfaults on this build.
## Steps to reproduce
One may be able to reproduce this failure on 'white' (SON) or 'ride' (SRN) as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
The exact comamnds should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh gnu-debug-openmp
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Anasazi=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -R Anasazi_Epetra_LOBPCG_solvertest_MPI_4
```
NOTE: Since this is not a CUDA build, you should not need to run on a compute node. But to reproduce the failure, you may need to. Also since this test seems to be randomly failing, one may not be able to reproduce the failure.
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3344Anasazi_Tpetra_MVOPTester_MPI_4 failing in ATDM cuda 9 builds on waterman 2018-12-07T15:10:59ZJames WillenbringAnasazi_Tpetra_MVOPTester_MPI_4 failing in ATDM cuda 9 builds on waterman *Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
Downgrade from OpenMPI 3.1.0 to OpenMPI 2.1.2 fixed the problem (as it fixed failing tests ...*Created by: fryeguy52*
CC: @trilinos/anasazi, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe
## Next Action Status
Downgrade from OpenMPI 3.1.0 to OpenMPI 2.1.2 fixed the problem (as it fixed failing tests in other packages as well).
## Description
As shown in [this query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-08-21&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-waterman-cuda&field2=testname&compare2=61&value2=Anasazi_Tpetra_MVOPTester_MPI_4&field3=buildstarttime&compare3=84&value3=now) the test:
* Anasazi_Tpetra_MVOPTester_MPI_4
is failing in the builds:
* Trilinos-atdm-waterman-cuda-9.2-opt
* Trilinos-atdm-waterman-cuda-9.2-debug
test output
```
The following tests FAILED:
7. MultiVector_int_longlong_double_OPTestLocal_UnitTest ...
Total Time: 7.62 sec
Summary: total = 8, run = 8, passed = 7, failed = 1
End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[50974,1],2]
Exit code: 1
```
## Steps to Reproduce
One should be able to reproduce this failure on the machine waterman as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the system waterman are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Anasazi=ON \
$TRILINOS_DIR
$ make NP=20
$ bsub -x -Is -n 20 ctest -j20
```Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2920Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests failing due to max iteration...2018-12-03T20:39:41ZJames WillenbringBelos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests failing due to max iterations limit seemingly randomly in the `Trilinos-atdm-white-ride-cuda-debug` build on 'white'*Created by: bartlettroscoe*
CC: @trilinos/belos, @fryeguy52, @srajama1 (Linear Solvers Product Lead)
## Next Action Status
Disabled in build `Trilinos-atdm-white-ride-cuda-debug` in commit cc7fff2 pushed on 6/12/2018 and showed d...*Created by: bartlettroscoe*
CC: @trilinos/belos, @fryeguy52, @srajama1 (Linear Solvers Product Lead)
## Next Action Status
Disabled in build `Trilinos-atdm-white-ride-cuda-debug` in commit cc7fff2 pushed on 6/12/2018 and showed disabled and missing on CDash on 6/13/2018. PR #3546 merged on 10/2/2018 which re-enables tests that should be fixed from PR #3050 merged before. No new failures as of 12/3/2018!
## Description
As shown in [this rather complex query showing all failing Belos tests other than Belos_rcg_hb_MPI_4 in all promoted ATDM builds since 5/10/2018](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=17&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=62&value2=Trilinos-atdm-mutrino-intel-debug-openmp&field3=buildname&compare3=62&value3=Trilinos-atdm-mutrino-intel-opt-openmp&field4=buildname&compare4=62&value4=Trilinos-atdm-white-ride-cuda-debug-pt-all-at-once&field5=site&compare5=62&value5=ride&field6=testname&compare6=62&value6=Belos_rcg_hb_MPI_4&field7=buildstarttime&compare7=84&value7=2018-06-08&field8=buildstarttime&compare8=83&value8=2018-05-10&field9=buildname&compare9=62&value9=Trilinos-atdm-white-ride-cuda-opt&field10=buildname&compare10=62&value10=Trilinos-atdm-white-ride-gnu-opt-openmp&field11=site&compare11=62&value11=serrano&field12=site&compare12=62&value12=shiller&field13=buildname&compare13=62&value13=Trilinos-atdm-white-ride-cuda-debug-all-at-once&field14=site&compare14=62&value14=chama&field15=testname&compare15=65&value15=Belos&field16=status&compare16=62&value16=passed&field17=status&compare17=62&value17=notrun) the tests:
* Belos_pseudo_stochastic_pcg_hb_0_MPI_4
* Belos_pseudo_stochastic_pcg_hb_1_MPI_4
failed 5 times in total and appear to be randomly failing in the `Trilinos-atdm-white-ride-cuda-debug` build. (The other failing test shown was `Belos_pseudo_pcg_hb_1_MPI_4` also for the `Trilinos-atdm-white-ride-cuda-debug` build but that only failed once yesterday so we will ignore that for now.) (The test `Belos_rcg_hb_MPI_4` was excluded from the above query because it is addressed in #2919.)
Looking at the testing history for these tests `Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4` from 5/10/2018 through today 6/8/2018 in [this less complex query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=6&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=site&compare2=62&value2=ride&field3=testname&compare3=65&value3=Belos_pseudo_stochastic_pcg_hb_&field4=buildstarttime&compare4=84&value4=2018-06-08&field5=buildstarttime&compare5=83&value5=2018-05-10&field6=buildname&compare6=61&value6=Trilinos-atdm-white-ride-cuda-debug) one can see that these tests complete in about the same time in under 2 seconds when they pass or fail.
The output when these tests fail (such as shown for the test `Belos_pseudo_stochastic_pcg_hb_1_MPI_4` yesterday on 6/7/2018 [here](https://testing-vm.sandia.gov/cdash/testDetails.php?test=48082702&build=3589607)) looks like:
```
Belos::StatusTestGeneralOutput: Passed
(Num calls,Mod test,State test): (104, 1, Passed)
Passed.......OR Combination ->
Failed.......Number of Iterations = 100 == 100
Unconverged..(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 8.95881e-09 < 1e-08
residual [ 1 ] = 1.21989e-08 > 1e-08
residual [ 2 ] = 6.84374e-09 < 1e-08
residual [ 3 ] = 9.15804e-09 < 1e-08
residual [ 4 ] = 7.2567e-09 < 1e-08
Passed.......OR Combination ->
Failed.......Number of Iterations = 100 == 100
Unconverged..(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 8.95881e-09 < 1e-08
residual [ 1 ] = 1.21989e-08 > 1e-08
residual [ 2 ] = 6.84374e-09 < 1e-08
residual [ 3 ] = 9.15804e-09 < 1e-08
residual [ 4 ] = 7.2567e-09 < 1e-08
==================================================================================================================================
TimeMonitor results over 4 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
----------------------------------------------------------------------------------------------------------------------------------
Belos: Operation Op*x 0.06571 (101) 0.07122 (101) 0.07694 (101) 0.0007051 (101)
Belos: Operation Prec*x 0.1014 (104) 0.108 (104) 0.1151 (104) 0.001039 (104)
Belos: PseudoBlockStochasticCGSolMgr total solve time 0.2159 (1) 0.216 (1) 0.2162 (1) 0.216 (1)
Epetra_CrsMatrix::Multiply(TransA,X,Y) 0.0665 (102) 0.07206 (102) 0.07777 (102) 0.0007065 (102)
Epetra_CrsMatrix::Solve(Upper,Trans,UnitDiag,X,Y) 0.101 (210) 0.1076 (210) 0.1147 (210) 0.0005122 (210)
==================================================================================================================================
---------- Actual Residuals (normalized) ----------
Problem 0 : 8.95881e-09
Problem 1 : 1.21989e-08
Problem 2 : 6.84374e-09
Problem 3 : 9.15804e-09
Problem 4 : 7.2567e-09
End Result: TEST FAILED
```
So this shows that the test fails due to the max iteration limit of 100 being reached before reaching the desired residual tolerance . The other failures for the tests `Belos_pseudo_stochastic_pcg_hb_0_MPI_4` and `Belos_pseudo_stochastic_pcg_hb_1_MPI_4` look to all be maxing out the number of iterations at 100.
When the test `Belos_pseudo_stochastic_pcg_hb_1_MPI_4` passed the day before on 6/6/2018 as shown [here](https://testing-vm.sandia.gov/cdash/testDetails.php?test=48012272&build=3584608) showed output like:
```
Belos::StatusTestGeneralOutput: Passed
(Num calls,Mod test,State test): (89, 1, Passed)
Passed.......OR Combination ->
OK...........Number of Iterations = 87 < 100
Converged....(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 5.02551e-09 < 1e-08
residual [ 1 ] = 5.92159e-09 < 1e-08
residual [ 2 ] = 6.61897e-09 < 1e-08
residual [ 3 ] = 8.2598e-09 < 1e-08
residual [ 4 ] = 3.67011e-09 < 1e-08
Passed.......OR Combination ->
OK...........Number of Iterations = 87 < 100
Converged....(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 5.02551e-09 < 1e-08
residual [ 1 ] = 5.92159e-09 < 1e-08
residual [ 2 ] = 6.61897e-09 < 1e-08
residual [ 3 ] = 8.2598e-09 < 1e-08
residual [ 4 ] = 3.67011e-09 < 1e-08
=================================================================================================================================
TimeMonitor results over 4 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------------------------------------------
Belos: Operation Op*x 0.0652 (88) 0.06892 (88) 0.07251 (88) 0.0007831 (88)
Belos: Operation Prec*x 0.09675 (89) 0.1009 (89) 0.1101 (89) 0.001134 (89)
Belos: PseudoBlockStochasticCGSolMgr total solve time 0.195 (1) 0.195 (1) 0.195 (1) 0.195 (1)
Epetra_CrsMatrix::Multiply(TransA,X,Y) 0.06596 (89) 0.06969 (89) 0.07333 (89) 0.0007831 (89)
Epetra_CrsMatrix::Solve(Upper,Trans,UnitDiag,X,Y) 0.09635 (180) 0.1006 (180) 0.1098 (180) 0.0005587 (180)
=================================================================================================================================
---------- Actual Residuals (normalized) ----------
Problem 0 : 5.02551e-09
Problem 1 : 5.92159e-09
Problem 2 : 6.61897e-09
Problem 3 : 8.2598e-09
Problem 4 : 3.67011e-09
End Result: TEST PASSED
```
which shows it converged in 87 iterations. I looked at several other instances when these tests passed and they all look to be converging in 87 iterations.
Is this non-deterministic behavior due to fact that this is "stochastic" code and therefore the behavior is truly random or is it due to the fact that the random seed is not set consistently, or is it due to non-deterministic behavior in the accumulations with the CUDA 8.0 threaded Kokkos implementation on this machine? The fact that the test seems to converge in 87 iterations when it passes suggests that this is not purposeful random behavior but is a result of some other undesired and unintended random behavior.
## Steps to reproduce
Following the instructions at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
one might be able to reproduce this behavior on 'white' or 'ride' by cloning the Trilinos github repo, getting on the 'develop' branch and then doing:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Belos=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```
But given that this test looks to be randomly failing, it may be hard to reproduce this behavior locally.
Keep promoted "ATDM" builds of Trilinos clean