Tests Anasazi_Epetra_ModalSolversTester_MPI_4 and Anasazi_Epetra_OrthoManagerGenTester_[0,1]_MPI_4 failing in 'debug' builds on white/ride
Created by: bartlettroscoe
CC: @trilinos/anasazi, @mhoemmen
Next Action Status
PR #2621 merged on 4/24/2018 that re-enables the tests Anasazi_Epetra_ModalSolversTester_MPI_4
and Anasazi_Epetra_OrthoManagerGenTester_[0,1]_MPI_4
. Tests ran and passed in all promoted ATDM Trilinos builds between 5/20/2018 and 6/7/2018.
Description
The tests:
Anasazi_Epetra_ModalSolversTester_MPI_4
Anasazi_Epetra_OrthoManagerGenTester_0_MPI_4
Anasazi_Epetra_OrthoManagerGenTester_1_MPI_4
failed in Trilinos-atdm-hansen-shiller-cuda-debug
build on 'ride' as shown at:
- https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&parentid=3398198
- https://testing-vm.sandia.gov/cdash/viewTest.php?onlyfailed&buildid=3398699
This build is targeted to be an auto PR build for Trilinos (see #2464 (closed)) so we desire to clean up this build more quickly.
Intrestingly, these tests did not fail in what should be the idential Trilinos-atdm-hansen-shiller-cuda-debug
build on the identical machine 'white' as shown at:
- https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&parentid=3398098
- https://testing-vm.sandia.gov/cdash/viewTest.php?buildid=3398630
Strangely, those tests did fail on Trilinos-atdm-hansen-shiller-cuda-debug
build on 'white' yestrday shown at:
A) Anasazi_Epetra_ModalSolversTester_MPI_4:
Test failing test Anasazi_Epetra_ModalSolversTester_MPI_4
today with details shown at:
showed the failure:
************* Householder Apply Test *************
orthonorm error of V: 7.08978e-16
orthonorm error of VQ: 0.375867
ERROR: V*Q failed.
orthonorm error of applyHouse: 0.375867
ERROR: applyHouse failed.
error(VQ - house(V,H,tau): 2.64481e-16
************* DirectSolver Test *************
Looking at all of the builds today that ran that test shown at:
this test fails in the same way (i.e. a numerical problem) on the builds Linux-gcc-4.8.4-MPI_RELEASE_12.12.1
and Linux-gcc-4.8.4-MPI_RELEASE_12.12.1_SHARED
on the machine hansel.sandia.gov
so this problem is not isolated to ATDM builds of Trilinos.
Also note that this test failed for the ATDM builds Trilinos-atdm-white-ride-gnu-opt-openmp
and Trilinos-atdm-white-ride-gnu-opt-openmp
with segfaults, but that is already being addressed by #2454 (closed) and is likely unrelated.
B) Anasazi_Epetra_OrthoManagerGenTester_0_MPI_4:
The failing test Anasazi_Epetra_OrthoManagerGenTester_0_MPI_4
today with details shown at:
showed:
Anasazi in Trilinos 12.13 (Dev)
Generating Y1,Y2 for project() : testing...
|| <Y1,Y1> - I || : 6.47718e-16
|| <Y2,Y2> - I || : 7.20309e-16
|| <X1,Y2> || : 1.64775e-16
|| <X1b,Y2> || : 6.9984e-15
p=3: *** Caught standard std::exception of type 'std::runtime_error' :
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-debug/SRC_AND_BUILD/Trilinos/packages/anasazi/epetra/test/OrthoManager/cxx_gentest.cpp:274:
Throw number = 1
Throw test that evaluated to true: err > TOL
New X1 did not meet tolerance: orthog(X1,Y2) == 0.547032
Looking at all of the builds today that ran that test shown at:
you can see that this test also failed in a similar (numerical) way in the builds Linux-gcc-4.9.3-Sierra_MPI_release_DEV_ETI_SERIAL-ON_OPENMP-ON_PTHREAD-OFF_CUDA-OFF_COMPLEX-ON
and Linux-GCC-4.9.3-openmpi-1.8.7_Debug_DEV_Werror
so it looks like this problem is not isolated to ATDM builds of Trilinos. Note that one of those is a "Sierra' build of Trilinos.