Fix ROL CUDA build failure (#3072)
Created by: bartlettroscoe
CC: @trilinos/rol, @dridzal (ROL package lead)
Description
Fixes the ROL CUDA build failure described in #3072 (closed). The fix was trivial (not sure why other compilers did not catch this or at least prove a warning).
I also included a commit to add debug print info for nvcc_wrapper
(see kokkos/nvcc_wrapper#19 and kokkos/nvcc_wrapper#20).
Motivation and Context
ROL was not building for a CUDA build (see #3072 (closed)). We wold like an auto PR CUDA build that includes all Primary Tested packages and ROL is a PT package (see #2464 (closed)). Also, SPARC uses ROL and adding support for SPARC means testing ROL on all of the platforms where SPARC uses ROL and CUDA is an important build on many of those platforms.
How Has This Been Tested?
I tested this on 'white' with:
$ cd ~/Trilinos.base/BUILD/WHITE/CUDA/CUDA-DEBUG/
$ source ~/Trilinos.base/Trilinos/cmake/std/atdm/load-env.sh cuda-debug
Hostname 'white11' matches known ATDM host 'white' and system 'ride'
ATDM_CONFIG_TRILNOS_DIR = /home/rabartl/Trilinos.base/Trilinos
Setting default compiler and build options for JOB_NAME='cuda-debug'
Using white/ride compiler stack CUDA to build DEBUG code with Kokkos node type CUDA
$ time cmake \
-GNinja
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnvAllPtPackages.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_ROL=ON \
~/Trilinos.base/Trilinos \
&> configure.out
real 1m43.759s
user 0m58.268s
sys 0m17.081s
$ time make NP=16 &> make.out
real 54m28.573s
user 696m12.668s
sys 80m53.877s
$ time bsub -x -Is -q rhel7F -n 16 ctest -j16 --timeout 600 &> ctest.out
real 14m51.969s
user 0m0.032s
sys 0m0.035s
and the build passed and the test results were:
90% tests passed, 16 tests failed out of 156
Subproject Time Summary:
ROL = 11219.28 sec*proc (156 tests)
Total Test time (real) = 890.82 sec
The following tests FAILED:
32 - ROL_test_elementwise_TpetraMultiVector_MPI_4 (Failed)
130 - ROL_example_PDE-OPT_0ld_poisson_example_01_MPI_4 (Failed)
131 - ROL_example_PDE-OPT_0ld_stefan-boltzmann_example_03_MPI_4 (Failed)
134 - ROL_example_PDE-OPT_0ld_adv-diff-react_example_01_MPI_4 (Failed)
135 - ROL_example_PDE-OPT_0ld_adv-diff-react_example_02_MPI_4 (Timeout)
136 - ROL_example_PDE-OPT_0ld_stoch-adv-diff_example_01_MPI_4 (Timeout)
137 - ROL_example_PDE-OPT_poisson_example_01_MPI_4 (Failed)
139 - ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4 (Failed)
141 - ROL_example_PDE-OPT_stefan-boltzmann_example_03_MPI_4 (Failed)
142 - ROL_example_PDE-OPT_adv-diff-react_example_02_MPI_4 (Failed)
143 - ROL_example_PDE-OPT_navier-stokes_example_01_MPI_4 (Timeout)
144 - ROL_example_PDE-OPT_navier-stokes_example_02_MPI_4 (Failed)
145 - ROL_example_PDE-OPT_obstacle_example_01_MPI_4 (Failed)
150 - ROL_example_PDE-OPT_nonlinear-elliptic_example_01_MPI_4 (Failed)
151 - ROL_example_PDE-OPT_nonlinear-elliptic_example_02_MPI_4 (Failed)
152 - ROL_example_PDE-OPT_topo-opt_poisson_example_01_MPI_4 (Failed)
Errors while running CTest
Those are the same 16 tests already shown failing in the build Trilinos-atdm-white-ride-cuda-debug-pt-all-at-once
for example shown here. (I will create a new GitHub issue for those failing tests once this PR is merge.)
Checklist
-
My commit messages mention the appropriate GitHub issue numbers.