Test Anasazi_Epetra_OrthoManagerGenTester_0_MPI_4 appears to be randomly failing in many builds including CI, PR, and ATDM builds
Created by: bartlettroscoe
CC: @trilinos/framework, @trilinos/anasazi, @srajama1 (Trilinos Linear Solver Product Area Lead)
Next Action Status
PR #4052 merged to 'develop' on 12/18/2018 but still failing after that. Next: Try to fix again?
It would seem that the test
Anasazi_Epetra_OrthoManagerGenTester_0_MPI_4 is very occasionally randomly failing in various builds. As shown in this query, this test failed 10 times since 7/1/2018 in the builds:
Linux-GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_OPENMP_CI(post-push CI build): 1 time (today)
PR-XXXX-test-Trilinos_pullrequest_gcc_4.9.3-YYYY(standard PR build): 4 times
PR-XXXX-test-Trilinos_pullrequest_gcc_4.8.4-YYYY(standard PR build): 1 time
Trilinos-atdm-chama-intel-debug-openmp(standard ATDM build): 1 time
Trilinos-atdm-rhel6-gnu-opt-openmp(standard ATDM build): 2 times
Trilinos-atdm-waterman-cuda-9.2-debug(standard ATDM build): 1 time
In each of these 10 failures in the last 3 months, such as the CI failure today shown here, it shows failures like:
projectAndNormalizeGen() returned rank 5 || <S,S> - I || after : 2.65912e-11 1|| S_in - X1*C1 - X2*C2 - S_out*B || : 1.70776e-09 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv tolerance exceeded! test failed!
The location of these failures seems to change in this test but all of the failures appear to be "tolerance exceeded! test failed!"
Is there some type of non-deterministic behavior in this test or in the underlying Anasazi code that allows for these types of random failures?
Steps to Reproduce
Given that this test seems to be failing randomly only very occasionally, this might be hard to reproduce locally. But given that this has failed in the post-push GCC 4.8.4 CI build and the GCC 4.9.3 PR build one might be able to use one of those.