Address expensive Panzer tests that timeout at 10 minutes in ATDM builds
Created by: bartlettroscoe
CC: @trilinos/panzer, @bathmatt, @fryeguy52
Next Action Status
Pushed the commits 245e01d9 and d852fa33 to 'develop' to address timeouts and it removed the timing out tests on 3/25/2108. Addressing memory issues and re-enabling these tests will be done in other follow-on issues.
Description
This story is to analyze and then to address some expensive Panzer tests that are timing out routinely in the ATDM Trilinos builds as shown, for example, in the following query that lists all of the timing out tests over the last week as shown in the query:
This query shows the following 6 timing out tests:
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4
PanzerAdaptersSTK_main_driver_energy-ss-loca-eigenvalue
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3
PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1
PanzerAdaptersSTK_PoissonInterfaceExample_2d_MPI_4
which include the builds:
Trilinos-atdm-hansen-shiller-cuda-debug
Trilinos-atdm-hansen-shiller-cuda-opt
Trilinos-atdm-hansen-shiller-intel-debug-serial
Trilinos-atdm-white-ride-cuda-debug
Trilinos-atdm-white-ride-cuda-opt
Trilinos-atdm-white-ride-gnu-debug-openmp
As was discovered in https://github.com/trilinos/Trilinos/issues/2318#issuecomment-375494367, many of these tests will actually complete if you increase the timeouts . In particular, for the CUDA builds on hansen/shiller the following set of 5 tests all passed once the timeouts were increased to over 40 minutes for those CUDA builds:
PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2
PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-3
PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1
PanzerAdaptersSTK_PoissonInterfaceExample_2d_MPI_4
The only test missing from the above list for CUDA builds on hansen/shiller was PanzerAdaptersSTK_main_driver_energy-ss-loca-eigenvalue
and that test only timed out on the Trilinos-atdm-white-ride-cuda-opt
build.
This Issue will be to investigate these tests some more and then decide how to address them.
Tasks:
- Inspect the timing out tests in the last week on all builds of Trilinos ... All can be addressed with increasing timesouts and one disable (see below) [DONE]
- Increase timeouts on all of the timing out Panzer tests in the last week to 45 minutes and set
CATEGORIES NIGHTLY
... - See if these tests pass with longer timeouts in automated builds and see what their runtimes are when they are displayed on CDash ...
- Decrease the timeouts for some of the tests that are not taking 45 minutes to complete ...
- ???
Related Issues
- Related to #2318 (closed)