Skip to content

Finish robust install of Trilinos when there are individual package build or install failures

James Willenbring requested to merge tribits_github_snapshot into develop

Created by: bartlettroscoe

This should finally make the installs of Trilinos robust if there are build failures (as part of #2689). For example, a build failure in Panzer would not break the installs of packages used by SPARC. I tested a real-live use case with Trilinos and SPARC and verified this really works (see details below).

Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'

At commit:

commit 3b02ce896ad948e7505804caa86e18835b677d3e
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Apr 18 17:36:32 2019 -0600
Summary: Make usage of <Package>Config.cmake robust when there are broken packages (trilinos/Trilinos#2689)

How this was tested?

There are strong automated tests in TriBITS for this but I also did a real-live use case where I broke Phalanx and therefore also Panzer and verified that SPARC was able to correctly build and passed tests for the remaining packages (because SPARC does not use Phalanx or Panzer).

Detailed manual test details (click to expand)

.

(4/19/2019)

Testing SPARC against an install of Trilios where Intrepid2 is broken. To do that, I basically need to build and install Trilinos manually and then build and test SPARC against that Trilinos install. I will use the build: cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt.

First, get Trilinos and TriBITS repos into the right state:

$ cd /scratch/rabartl/Trilinos.base/Trilinos/

$ git checkout atdm-nightly

$ git pull

$ cd TriBITS/

$ git fetch github

$ git checkout --track github/trilinos-2689-robust-proj-config-install

Now break the Phalanx build so that Phalanx will not produc libphalanx.a:

$ cd /scratch/rabartl/Trilinos.base/Trilinos/

$ echo "This file is broken" >> packages/phalanx/cmake/Phalanx_config.hpp.in 

$ git diff
diff --git a/packages/phalanx/cmake/Phalanx_config.hpp.in b/packages/phalanx/cmake/Phalanx_config.hpp.in
index 31af6f4..8f690c0 100644
--- a/packages/phalanx/cmake/Phalanx_config.hpp.in
+++ b/packages/phalanx/cmake/Phalanx_config.hpp.in
@@ -27,3 +27,4 @@
 @PHALANX_DEPRECATED_DECLARATIONS@
 
 #endif
+This file is broken

Now to build and install Trilinos using this version of TriBITS:

$ cd /scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/

$ ./checkin-test-atdm-cee-rhel6.sh \
  cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt \
  --enable-all-packages=on \
  --configure

$ cd cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt/

$ . load-env.sh 
Hostname 'ceerws1113' matches known ATDM host 'cee-rhel6' and system 'cee-rhel6'
Setting compiler and build options for buld name 'cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt'
Using CEE RHEL6 compiler stack CLANG-5.0.1_OPENMPI-1.10.2 to build RELEASE code with Kokkos node type SERIAL

$ rm -r CMake*

$ time ./do-configure \
  -DCMAKE_INSTALL_PREFIX=install \
  -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \
  -DTrilinos_TRIBITS_DIR:STRING=TriBITS/tribits \
  -DTrilinos_ENABLE_ALL_PACKAGES=ON \
  -DTrilinos_ENABLE_TESTS=OFF \
  &> configure.out

real    0m22.984s
user    0m16.525s
sys     0m17.262s

$ time ninja -j16 -k 999999 &> make.out

real    14m52.077s
user    223m8.497s
sys     8m12.935s

$ time ninja install_package_by_package &> make.install.out

real    0m12.278s
user    0m9.684s
sys     0m1.507s

This created the installation with a lot of libraries:

$ ls install/lib/ | wc -l
132

This did not create Phalanx lib but it did create a few Panzer libs:

$ ls install/lib/ | grep phalanx
[empty]

$ ls install/lib/ | grep panzer
libpanzer-core.a
libpanzer-dof-mgr.a

We see a lot of build errors in Phalanx and Panzer:

$ grep FAILED make.out | grep /phalanx/  | wc -l
4

$ grep FAILED make.out | grep /panzer/  | wc -l
155

Set up for the standard install format:

$ cd /scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt/

$ ln -s install cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt

Now to test SPARC 'master' against this:

$ env \
    ATDM_TRIL_SPARC_BUILDS_LIST=cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt \
    ATDM_TRIL_SPARC_SKIP_NATIVE_BUILD=1 \
    ATDM_TRIL_SPARC_ATDM_USE_INSTALL_DIR=/scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt \
  ./sparc-tril-dev-scripts/run_builds_and_tests.sh 

Shoot, this gave the same build error:

/scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt/cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt/include/Ifpack2_Relaxation_def.hpp:147:6: error: variable templates are a C++14 extension [-Werror,-Wc++14-extensions]
void Relaxation<MatrixType>::updateCachedMultiVector(const Teuchos::RCP<const Tpetra::Map<local_ordinal_type,global_ordinal_type,node_type> > & map, size_t numVecs) const{
     ^

reported in:

I will turn off -Werror and see what happens. I added the

$ env \
    ATDM_TRIL_SPARC_BUILDS_LIST=cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt \
    SPARC_CONFIG_NO_WERROR=1 \
    ATDM_TRIL_SPARC_SKIP_NATIVE_BUILD=1 \
    ATDM_TRIL_SPARC_ATDM_USE_INSTALL_DIR=/scratch/rabartl/Trilinos.base/BUILDS/ATDM/CEE-RHEL6/CHECKIN/cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt \
  ./sparc-tril-dev-scripts/run_builds_and_tests.sh 

I had to fix other problems with SPARC using Trilnos as well (which Ifpack2 breaking backward compatiblity). That returned the test result:

100% tests passed, 0 tests failed out of 280

...

Total Test time (real) = 665.91 sec

So that actually passed!

That means that I have verified that the TriBITS 'install_package_by_pacakge' target is robust to package build errors for clients that don't use the broken packages!

Merge request reports