Skip to content

Allow enable of TpetraTSQR in ATDM Trilinos builds (#4526)

Created by: bartlettroscoe

CC: @trilinos/tpetra, @mhoemmen, @fryeguy52

The ATDM customer SPARC needs TSQR enabled (see #4526 (closed)). Here, I just allowed the enable of TpetraTSQR so that the ATDM Trilinos builds can test it and will install it.

I tested in all of the builds on 'ride' shown below and everything passed. This tested on 'gnu-7.2.0' and 'cuda-9.2-gnu-7.2.0'. This did not test on 'clang' or 'intel' but I think that is a pretty small risk. I think at this point, it is okay to go ahead and turn on TpetraTSQR and see what happens across all of the ATDM Trilinos builds. If there are any new failures triggered by this, we can create new ATDM Trilinos GitHub issues for these.

NOTE: I also cleaned up the set of "supported" builds to only be the "Promoted" ATDM Trilinos builds we are currently running and submitting to CDash.

How this was tested

I tested this on 'ride' with:

$ cd ~/Trilinos.base/BUILDS/RIDE/CHECKIN/

$ bsub -x -Is -q rhel7F -n 16 \
  ./checkin-test-atdm.sh all --enable-packages=TpetraTSQR --enable-fwd-packages \
  --local-do-all

That returned:

FAILED (NOT READY TO PUSH): Trilinos: ride15

Mon Mar  4 19:12:06 MST 2019

Enabled Packages: TpetraTSQR

Build test results:
-------------------
1) cuda-9.2-gnu-7.2.0-debug => passed: passed=1220,notpassed=0 (38.68 min)
2) cuda-9.2-gnu-7.2.0-release => passed: passed=1248,notpassed=0 (33.61 min)
3) cuda-9.2-gnu-7.2.0-release-debug => FAILED: passed=1248,notpassed=1 => Not ready to push! (48.66 min)
4) gnu-7.2.0-openmp-debug => passed: passed=1252,notpassed=0 (31.90 min)
5) gnu-7.2.0-openmp-release => passed: passed=1254,notpassed=0 (28.76 min)
6) gnu-7.2.0-openmp-release-debug => passed: passed=1253,notpassed=0 (38.55 min)

The one test failure in the cuda-9.2-gnu-7.2.0-release-debug build was:

The following tests FAILED:
        1229 - PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1 (Timeout)

To make sure this is not a real error, I ran just this one test with:

$ cd cuda-9.2-gnu-7.2.0-release-debug/

$ . load-env.sh 

Hostname 'ride6' matches known ATDM host 'ride' and system 'ride'
Setting compiler and build options for buld name 'cuda-9.2-gnu-7.2.0-release-debug'
Using white/ride compiler stack CUDA-9.2_GNU-7.2.0 to build RELEASE-DEBUG code with Kokkos node type CUDA and KOKKOS_ARCH=Power8,Kepler37

$ cd packages/panzer/

$ bsub -x -Is -q rhel7F -n 16 c \
  ctest -R PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1

...

***Forced exclusive execution
Job <854317> is submitted to queue <rhel7F>.
<<Waiting for dispatch ...>>
<<Starting on ride8>>
Test project /ascldap/users/rabartl/Trilinos.base/BUILDS/RIDE/CHECKIN/cuda-9.2-gnu-7.2.0-release-debug/packages/panzer
    Start 145: PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1
1/1 Test #145: PanzerAdaptersSTK_PoissonInterfaceExample_2d_diffsideids_MPI_1 ...   Passed  132.65 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
Panzer    = 132.65 sec*proc (1 test)

Total Test time (real) = 132.81 sec

We still struggle with random timouts due to test processes and thread running on the shame hardware (likely the GPU in this case) (see #2422).

Therefore, I think this is okay.

Merge request reports