Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2019-04-11T18:23:38Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4840Tpetra: Deprecate MultiVectorFiller2019-04-11T18:23:38ZJames WillenbringTpetra: Deprecate MultiVectorFiller*Created by: csiefer2*
Because Tpetra::FEMultiVector is so much better...*Created by: csiefer2*
Because Tpetra::FEMultiVector is so much better...https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4858ifpack2: build error with scalar=FLOAT and COMPLEX_DOUBLE enabled2019-04-11T16:10:32ZJames Willenbringifpack2: build error with scalar=FLOAT and COMPLEX_DOUBLE enabled*Created by: ajpowel*
@trilinos/ifpack2
## Current Behavior
```
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Vector_SIMD_Arith.hpp:619:5: note: template argument deducti...*Created by: ajpowel*
@trilinos/ifpack2
## Current Behavior
```
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Vector_SIMD_Arith.hpp:619:5: note: template argument deduction/substitution failed:
In file included from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Gemm_Serial_Internal.hpp:12:0,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Gemm_Serial_Impl.hpp:8,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_BlockTriDiContainer_def.hpp:57,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_BlockTriDiContainer.hpp:2,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_ContainerFactory_def.hpp:51,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_ContainerFactory.hpp:2,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_BlockRelaxation_decl.hpp:52,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_BlockRelaxation.hpp:1,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Details_OneLevelFactory_def.hpp:54,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_Details_OneLevelFactory.hpp:2,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Details_Factory_def.hpp:46,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_Details_Factory.hpp:2,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Factory_decl.hpp:48,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_Factory.hpp:1,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Details_LinearSolverFactory_def.hpp:54,
from /scratch/ajpowel/code_032119/packages/ifpack2/src/Ifpack2_Details_LinearSolverFactory.hpp:2,
from /scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Details_registerLinearSolverFactory.cpp:45:
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp:1101:67: note: mismatched types 'Kokkos::complex<RealType1>' and 'double'
C[0*_cs0+0*_cs1] += alpha * c_00; C[0*_cs0+1*_cs1] += alpha * c_01;
~~~~~~^~~~~~
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp: In instantiation of 'int KokkosBatched::Experimental::InnerGemmFixC<mb, nb>::serial_invoke(ScalarType, const ValueType*, const ValueType*, int, ValueType*) [with ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>; int mb = 1; int nb = 1]':
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp:1285:80: required from 'int KokkosBatched::Experimental::InnerGemmFixC<mb, nb>::serial_invoke(ScalarType, const ValueType*, const ValueType*, int, int, int, ValueType*) [with ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>; int mb = 2; int nb = 2]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp:1259:71: required from 'int KokkosBatched::Experimental::InnerGemmFixC<mb, nb>::serial_invoke(ScalarType, const ValueType*, const ValueType*, int, int, int, ValueType*) [with ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>; int mb = 3; int nb = 3]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp:1230:71: required from 'int KokkosBatched::Experimental::InnerGemmFixC<mb, nb>::serial_invoke(ScalarType, const ValueType*, const ValueType*, int, int, int, ValueType*) [with ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>; int mb = 4; int nb = 4]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:139:13: required from 'KokkosBatched::Experimental::TeamGemmInternal<ArgAlgo>::invoke(const MemberType&, int, int, int, ScalarType, const ValueType*, int, int, const ValueType*, int, int, ScalarType, ValueType*, int, int) [with MemberType = MemberType; ScalarType = ScalarType; ValueType = ValueType; ArgAlgo = KokkosBatched::Experimental::Algo::Level3::Blocked]::<lambda(int, int, int, const ValueType*, const ValueType*, ValueType*)>::<lambda(const int&)> [with MemberType = Kokkos::Impl::HostThreadTeamMember<Kokkos::Serial>; ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:144:44: required from 'struct KokkosBatched::Experimental::TeamGemmInternal<ArgAlgo>::invoke(const MemberType&, int, int, int, ScalarType, const ValueType*, int, int, const ValueType*, int, int, ScalarType, ValueType*, int, int) [with MemberType = MemberType; ScalarType = ScalarType; ValueType = ValueType; ArgAlgo = KokkosBatched::Experimental::Algo::Level3::Blocked]::<lambda(int, int, int, const ValueType*, const ValueType*, ValueType*)> [with MemberType = Kokkos::Impl::HostThreadTeamMember<Kokkos::Serial>; ScalarType = double; ValueType = KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>]::<lambda(const int&)>'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:130:11: [ skipping 12 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos/core/src/Kokkos_Parallel.hpp:191:4: required from 'void Kokkos::parallel_for(const ExecPolicy&, const FunctorType&, const string&, typename Kokkos::Impl::enable_if<Kokkos::is_execution_policy<ExecPolicy>::value>::type*) [with ExecPolicy = Kokkos::TeamPolicy<Kokkos::Serial, Ifpack2::BlockTriDiContainerDetails::ExtractAndFactorizeTridiags<Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > >::ExtractAndFactorizeTag>; FunctorType = Ifpack2::BlockTriDiContainerDetails::ExtractAndFactorizeTridiags<Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > >; std::__cxx11::string = std::__cxx11::basic_string<char>; typename Kokkos::Impl::enable_if<Kokkos::is_execution_policy<ExecPolicy>::value>::type = void]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos/core/src/Kokkos_Parallel.hpp:244:25: required from 'void Kokkos::parallel_for(const string&, const ExecPolicy&, const FunctorType&) [with ExecPolicy = Kokkos::TeamPolicy<Kokkos::Serial, Ifpack2::BlockTriDiContainerDetails::ExtractAndFactorizeTridiags<Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > >::ExtractAndFactorizeTag>; FunctorType = Ifpack2::BlockTriDiContainerDetails::ExtractAndFactorizeTridiags<Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > >; std::__cxx11::string = std::__cxx11::basic_string<char>]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_BlockTriDiContainer_impl.hpp:1826:29: required from 'void Ifpack2::BlockTriDiContainerDetails::ExtractAndFactorizeTridiags<MatrixType>::run() [with MatrixType = Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> >]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_BlockTriDiContainer_impl.hpp:1849:7: required from 'void Ifpack2::BlockTriDiContainerDetails::performNumericPhase(const Teuchos::RCP<const typename Ifpack2::BlockTriDiContainerDetails::ImplType<MatrixType>::tpetra_block_crs_matrix_type>&, const Ifpack2::BlockTriDiContainerDetails::PartInterface<MatrixType>&, Ifpack2::BlockTriDiContainerDetails::BlockTridiags<MatrixType>&, typename Ifpack2::BlockTriDiContainerDetails::ImplType<MatrixType>::magnitude_type) [with MatrixType = Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> >; typename Ifpack2::BlockTriDiContainerDetails::ImplType<MatrixType>::tpetra_block_crs_matrix_type = Tpetra::Experimental::BlockCrsMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> >; typename Ifpack2::BlockTriDiContainerDetails::ImplType<MatrixType>::magnitude_type = float]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_BlockTriDiContainer_def.hpp:235:9: required from 'void Ifpack2::BlockTriDiContainer<MatrixType, Ifpack2::BlockTriDiContainerDetails::ImplSimdTag>::compute() [with MatrixType = Tpetra::RowMatrix<float, int, long long int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> >]'
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/ifpack2/src/Ifpack2_Details_registerLinearSolverFactory.cpp:76:1: required from here
/scratch/ajpowel/code_032119/TPLs_src/Trilinos/packages/kokkos-kernels/src/batched/KokkosBatched_InnerGemmFixC_Serial_Impl.hpp:1139:33: error: no match for 'operator*' (operand types are 'const double' and 'KokkosBatched::Experimental::Vector<KokkosBatched::Experimental::SIMD<float>, 16>')
C[0*_cs0+0*_cs1] += alpha * c_00;
~~~~~~^~~~~~
```
## Steps to Reproduce
0) Comment out line 986 of $PROJECT/packages/tpetra/CMakeLists.txt (suppressing fail message to prevent possible Thyra build failure)
1) Configure Trilinos packages:
```
cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_Fortran_COMPILER=mpifort -DTrilinos_ENABLE_Tpetra=ON -DTrilinos_ENABLE_COMPLEX_DOUBLE=ON -DTrilinos_ENABLE_FLOAT=ON -DTrilinos_ENABLE_Teuchos=ON -DTrilinos_ENABLE_Teko=ON /scratch/ajpowel/code_032119/TPLs_src/Trilinos
```
2) Attempt to build ifpack2:
```
cd $PROJECT/packages/ifpack2
make -j 64
```
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4850MueLu: update LTG Matrix kernels in TpetraExt2019-04-10T20:53:18ZJames WillenbringMueLu: update LTG Matrix kernels in TpetraExt*Created by: jjellio*
Update the TpetraExt (LTG) kernels to use improved copies and memory management.
These changes were not propagated from the work done last fall. This issue mirrors the PR being submitted.
@trilinos/muelu
@c...*Created by: jjellio*
Update the TpetraExt (LTG) kernels to use improved copies and memory management.
These changes were not propagated from the work done last fall. This issue mirrors the PR being submitted.
@trilinos/muelu
@csiefer2
## Expectations
- [x] The kernels will use bulk threaded copies for copy-out
- [x] The kernels will compute the rowptr in-place (reduced memory overhead)
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4728MueLu: deprecate code in preparation for next release2019-04-10T18:07:58ZJames WillenbringMueLu: deprecate code in preparation for next release*Created by: jhux2*
@trilinos/muelu
If there are features that we want to deprecate in MueLu, we should do so before the next minor release, which is around April 15.*Created by: jhux2*
@trilinos/muelu
If there are features that we want to deprecate in MueLu, we should do so before the next minor release, which is around April 15.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4599MueLu build failures in new ATDM Trilinos sems-rhel7+cuda+complex builds2019-04-10T17:41:55ZJames WillenbringMueLu build failures in new ATDM Trilinos sems-rhel7+cuda+complex builds*Created by: bartlettroscoe*
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https...*Created by: bartlettroscoe*
CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&date=2019-03-11&filtercount=2&showfilters=1&filtercombine=and&field1=subprojects&compare1=93&value1=MueLu&field2=buildname&compare2=65&value2=Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-), MueLu has build errors in library code in the new cuda+complex builds:
* `Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug`
* `Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug`
using the 'sems-rhel7' env.
The build errors shown [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?buildid=4695056) and [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?buildid=4695082) show errors building the source files **`ExplicitInstantiation/MueLu_TentativePFactory_kokkos.cpp`** showing errors like:
* `Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Kokkos_View.hpp(816): error: calling a constexpr __host__ function("std::real<double> ") from a __device__ function("Kokkos::Impl::ParallelFor< ::, ::Kokkos::RangePolicy<int, ::Kokkos::Cuda > , ::Kokkos::Cuda> ::operator () const") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.`
and **`ExplicitInstantiation/MueLu_TentativePFactory_kokkos.cpp`** showing errors like:
* `Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Kokkos_View.hpp(971): error: calling a constexpr __host__ function("std::complex<double> ::complex") from a __device__ function("Kokkos::Impl::ParallelFor< ::, ::Kokkos::RangePolicy<int, ::Kokkos::Cuda > , ::Kokkos::Cuda> ::operator () const") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.`
## Current Status on CDash
The current status of these builds over the last 7 days can be see in [this query](https://testing.sandia.gov/cdash/index.php?project=Trilinos&date=2019-03-11&filtercount=3&showfilters=1&filtercombine=and&field1=subprojects&compare1=93&value1=MueLu&field2=buildname&compare2=65&value2=Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-&field3=buildstarttime&compare3=83&value3=7%20days%20ago).
## Steps to Reproduce
These builds are from the CEE LAN machine 'ascicgpu14' and someone with access to the CEE LAN should be able to log onto 'ascicgpu15' and reproduce these failures in as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the system `sems-rhel7' are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#sems-rhel7-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \
sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ ninja -j16
```
Since some developers do not have access to the SRN CEE LAN, it is likely that these build errors can also be produce on other machines that have a CUDA build. For example, one can likely reproduce these build errors on the SON machine 'white' as described at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite
using the commands:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-complex-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
$TRILINOS_DIR
$ ninja -j16
```
Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4829KokkosKernels broke Albany build on Ride2019-04-10T14:34:42ZJames WillenbringKokkosKernels broke Albany build on Ride*Created by: ikalash*
It appears some changes to KokkosKernels broke Albany's Ride build over the weekend. Here is the error:
```
58%] Building C object packages/aztecoo/src/CMakeFiles/aztecoo.dir/az_gsumd_puma.c.o ...*Created by: ikalash*
It appears some changes to KokkosKernels broke Albany's Ride build over the weekend. Here is the error:
```
58%] Building C object packages/aztecoo/src/CMakeFiles/aztecoo.dir/az_gsumd_puma.c.o /.../repos/Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_impl_color.hpp(516): error: namespace "KokkosGraph::Experimental" has no member "d2_graph_color"
--
```
http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=83384
Could someone please have a look?
@trilinos/kokkos-kernels https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4251ROL test timing out on ATDM intel-18 mpich build2019-04-10T14:22:11ZJames WillenbringROL test timing out on ATDM intel-18 mpich build*Created by: fryeguy52*
CC: @trilinos/rol, @rppawlo (Trilinos Nonlinear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-v...*Created by: fryeguy52*
CC: @trilinos/rol, @rppawlo (Trilinos Nonlinear Solvers Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/queryTests.php?project=Trilinos&date=2018-10-09&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt&field2=testname&compare2=61&value2=ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4&field3=buildstarttime&compare3=84&value3=2019-01-23&field4=buildstarttime&compare4=83&value4=2019-01-01) the test:
* ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4
is timing out in the build:
* Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt
## Current Status on CDash
The current status of this test in this build for the current testing day can be found [here](https://testing.sandia.gov/cdash-dev-view/queryTests.php?project=Trilinos&date=2018-10-09&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt&field2=testname&compare2=61&value2=ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4&field3=buildstarttime&compare3=84&value3=today&field4=buildstarttime&compare4=83&value4=yesterday)
## Steps to Reproduce
One should be able to reproduce this failure on a machine with a cee rhel6 environment as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for a machine with a cee rhel6 environment are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#cee-rhel6-environment
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-cee-rhel6_intel-18.0.2_mpich2-3.2_openmp_static_opt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Rol=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -j16
```
Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3844ATDM_CONFIG_MPI_EXEC Causes Problems on Chama2019-04-10T14:10:28ZJames WillenbringATDM_CONFIG_MPI_EXEC Causes Problems on Chama*Created by: jmgate*
Hey @bartlettroscoe, I'm having trouble getting a build of EMPIRE up and running on `chama`. It'll build, but I'm seeing all of EMPIRE-PIC's tests fail (see [EMPIRE_EM-Plasma-Trilinos-chama-test-all-of-EMPIRE-intel...*Created by: jmgate*
Hey @bartlettroscoe, I'm having trouble getting a build of EMPIRE up and running on `chama`. It'll build, but I'm seeing all of EMPIRE-PIC's tests fail (see [EMPIRE_EM-Plasma-Trilinos-chama-test-all-of-EMPIRE-intel-opt-openmp](https://jenkins-srn.sandia.gov/user/jmgate/my-views/view/EMPIRE/job/EMPIRE_EM-Plasma-Trilinos-chama-test-all-of-EMPIRE-intel-opt-openmp/9/consoleFull)). I think what's happening is on `chama` the `ATDM_CONFIG_MPI_EXEC` variable gets set to `srun` in [cmake/std/atdm/common/toss3/environment.sh](https://github.com/trilinos/Trilinos/blob/7e0342c8d2ab8855fe5112b483284a8837911d20/cmake/std/atdm/common/toss3/environment.sh#L74). When we test EMPIRE, we wind up submitting the `ctest` command via `srun`. Unfortunately each EMPIRE-PIC regression test winds up running
```
$Trilinos_MPI_EXEC -n $NUM_PROCS $EXE --i=$INPUT
```
where `$TRILINOS_MPI_EXEC` is `$ATDM_CONFIG_MPI_EXEC`, which is `srun`, so we're effectively invoking `srun` within something that's been submitted via `srun`. I don't know slurm inside and out, but that sounds like a problem.
The question, then, is what should we do about it? In the CMake file that's used for all of EMPIRE-PIC's regression tests, do we need to replace `$Trilinos_MPI_EXEC` with `$(which mpiexec)` or something to that effect? Or do we need to do something more clever with the `$ATDM_CONFIG_MPI_EXEC` variable in `cmake/std/atdm/common/toss3/environment.sh` such that it works for whatever you need it to do, but then it doesn't also mess up EMPIRE's testing on `toss3` systems?
@bathmatt, did you set up our CMake stuff to use `$Trilinos_MPI_EXEC`? Do you know the rationale behind it?https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2933Change from individual CDash error emails to daily summary emails for the ATD...2019-04-10T03:32:57ZJames WillenbringChange from individual CDash error emails to daily summary emails for the ATDM Trilinos builds (and perhaps other efforts)*Created by: bartlettroscoe*
CC: @fryeguy52, @trilinos/framework, @dridzal
## Description
After having to triage the promoted ATDM Trilinos builds for a couple of months now, and from extensive experience on other projects like C...*Created by: bartlettroscoe*
CC: @fryeguy52, @trilinos/framework, @dridzal
## Description
After having to triage the promoted ATDM Trilinos builds for a couple of months now, and from extensive experience on other projects like CASL VERA, I have come to the realization that relying on CDash error emails is not a very effective notification and monitoring scheme in many of these situations. The reasons that CDash error emails are not effective for keeping on top of a lot of builds is that:
1. It is hard to tell if a failing test is new that day or has been failing for multiple days or if that same test is failing across several builds. (All you get is a single email telling you that there is a failure for that one build.)
2. When a failure does occur that results in a CDash error email, there is an urgency to address the problem ASAP (by either fixing, disabling, or reverting commits) in order to make the CDash error email go away. Otherwise, repeated CDash error emails day after day makes people accustomed to seeing CDash error emails and therefore new failures are ignored (and many people will create email filters and just ignore them from that point on).
3. Catastrophic failures due to system issues can occur that result in a huge number of CDash error emails that can spam people (sometimes a Trilinos developer can get a dozen or more emails since the are on several different package regression lists). This can occur for many reasons like the disk filling up, or when the Intel license server goes down, or when a module does not load correctly. The huge glut of CDash error emails that can occur in these cases can obscure new real failures and can cause some people to add email filters (which then makes the CDash error emails worthless).
Instead of relying on individual CDash error emails, we could move to a notification scheme that created a single email each day that summarized the builds and tests and gave some information about the history of failing tests. Such a system could solve all of the problems listed above and make top-level triaging and monitoring of a bunch of related builds much easier.
(NOTE: Really CDash error notification emails are the best solution for a small number of post-push CI builds that you expect to fail only very rarely and you need a notification ASAP. For nightly builds, they are not effective for the reasons described above.)
## Possible Solution
It seems that a straightforward solution would be to write a Python script that extracted data off of the CDash site using multiple queries using the API interface that provides data as JSON data-structures. The Python script would analyze the data and create an HTML-formatted email with useful summary information and CDash URL links.
The full specification is given at:
* https://docs.google.com/document/d/13A6tIXCS5EnL0a3ramu-4TvCMwFeiIKEKjOlP-z1Qvo
The input that would would provide to the Python script would be:
* Name of the set of builds being analyzed (e.g. "ATDM Trilinos Builds")
* Base CDash site (e.g. "https://testing-vm.sandia.gov/cdash/")
* CDash project name (e.g. "project=Trilinos")
* Current testing day (e.g. "YYYY-MM-DD")
* CDash query URL fields (minus `data`, `project`, etc.) for queryTests.php to determine tests to be examined
* CDash query URL fields (minus `data`, `project`, etc.) for index.php for list of builds to be examined
* List of expected builds in the triplet ('site', 'build-name', 'group')
Given this data, the Python script would run queries and extract data off of the queryTests.php page for the current day and the previous two testing days (using the `data=YYYY-MM-DD` URL field) and then display that data as described below (sorted into various lists).
The Python script would then run the query on the index.php page and would note the builds that had any configure, build or test failures (including "not run" tests) and it would compare the list of builds extracted to the input list of expected builds and then note the expected builds that did not show up.
Then the Python script would construct an HTML-formatted email with the body having the following data:
* (limited) List of tests that failed today but not the previous day (`t1=???` in summary line)
* (limited) List of tests that failed today and the previous day but not the day before that (`t2=???` in summary line)
* (limited) List of tests that failed today and the previous two consecutive days (`t3+=???` in summary line)
* Total number of "not-run" (non-disabled) tests for current testing day and CDash URL to that list (`tnr=???` in summary line)
* List of current-day builds that had any configure, build, or test failures (including "not run" tests) (`b=???` is the sum of the build failures in those builds shown in summary line)
* List of missing expected builds or builds that exist and pass the configure but don't have test results (`meb=???` in summary) (NOTE: The current CDash implementation will only alert about missing expected builds but it will not alert about builds with missing tests.)
* Total number of builds run and URL to the list of builds.
* Total number of failing tests for the current testing day and the CDash URL
* URL(s) to the list of all failing tests for the current day (but excluding "not run" tests)
The summary line for the email could be something like:
```
FAILED (t1=2, t2=1, t3+=5, tnr=18, b=3, meb=1): ATDM Trilinos Builds
```
That email summary message would look similar to the ones that CDash sends out and one could see just in the summary line how many tests newly failed in the current testing day (i.e. `t1=2`), how many tests failed in the last two consecutive days (i.e. `t2=2`) and how many tests failed in the last three or more consecutive days (i.e. `t3+=5`). It would also show if there were any build failures (i.e. `b=3`) and how may tests were not run (`tnr=18`). Lastly, it would show if there were any missing expected builds (`meb=1`).
For the ATDM Trilinos builds, we could run this script on a cron job or a Jenkins job after 12 midnight MT or so (or wait until 5 am to allow all of the jobs to finish).
Other data we might consider reporting on and showing are:
* Number of, URL to, and (limited) list of newly passing tests for the current testing that failed the previous day (or the last day that the matching builds had any test results) (`tnp=???` in summary line)
* Number of, URL to, and (limited) list of newly missing tests compared to yesterday (but only if the build ran the current day and the tests ran for that build and likewise for the previous day) (`tnm=???` in summary line)
The above two bits of data would really help in determining that failing tests got resolved (either by fixing them or temporarily disabling them).
And since you would only get one email, then I think it would be good to send out the email with the summary line:
```
PASSED (tnp=2, tnm=1): ATDM Trilinos Builds
```
and that email would contain links to the set of 100% passing builds!
That is an email that even a manager might want to get :-)
This script could also allow you to specify a set of "expected may fail" tests which would be provided in an array with the four fields `[<test-name>, <build-name>, <site-name>, <github-issue-link>]` and any failing tests that matched this criteria would be listed in their own sublist in the email could could be given `tef=???` in the summary line. These failing tests would not be counted against global pass/fail when the fail but if they go from failing to passing, that would be listed along with the other "newly passing tests" (e.g. `tnp`). However, a better way to handle this would be to have CTest/CDash mark such tests as EXPECTED_MAY_FAIL as described in [this CTest/CDash backlog item](https://docs.google.com/document/d/1TLHRp8eTNKw7udOhwIxrOYShXQUbxAzsXeOq5cwWnKM/edit#bookmark=id.4w8ld6727hpw) and then this script would automatically handle these tests differently without having to provide a separate list to this script. However, allowing someone to label a certain test as "expected may fail" specifically in this script would allow different customers to handle the same test differently. For example, one customer might consider a failing MueLu test as a show stopper and affect global pass/fail while another may not and therefore want to handle it as an "Expected may fail" and not affect global pass/fail. You can't do that with a single CTest/CDash property for each test. But without direct CTest/CDash support, the email body would list out the failing test along with the `<github-issue-link>` so one could immediately go to that issue to see how that failing test is being addressed.
Even for tests that we did **not** want to mark as "Expected may fail" (and therefore taken out of global pass/fail logic), it would also be useful to mark known failing tests that we did want to impact global pass/fail, it would also be nice to mark them with the GitHub issue ID if the failure is known and is being tracked. This could be done by passing in an array of "Known failing" tests with entries `[<test-name>, <build-name>, <site-name>, <github-issue-link>]`. This would be useful to see when looking at the summary email to know if we needed to triage those tests or not. (That is, if one sees failing tests that have failed for more than one consecutive day that don't have a GitHub Issue associated with them, then that would be a trigger to triage the failure and create a new Trilinos GitHub issue and then add to the list of "Expected may fail" tests or "Known failing" tests lists).
The script could also allow you to specify some "flaky" or "unstable" builds as an array of `[<build-name>, <site-name>]` entries where we expect random test failures. If a test failed in one of these "flaky" or "unstable" builds, then it would be reported in a separate section of the email and would not count toward the global pass/fail. Currently (as of 7/14/2018) we would categorize all of the ATDM Trilinos builds on 'ride' (see #2511) and the builds on 'mutrino' (see [TRIL-214](https://software-sandbox.sandia.gov/jira/browse/TRIL-214)) in this category. That way, we could keep track of these builds in case something big went wrong but the they would not count toward global pass/fail (and therefore would not disrupt automated processes that update Trilinos between branches and application customers). But if more than a small number of test failures occurred (e.g. 4 tests per build) then this could impact global pass/fail. This would avoid a new catastrophic failure on one of these platforms from allowing an update of Trilinos to an ATDM APP, for example.
## Tasks:
1. Get initial script working that keeps track of failing tests with existing GitHub issue trackers can detect new failing tests that need to be triaged and get basic unit tests in place (see "TODO.txt" file in 'atdm-email' branch of 'TrilinosATDMStatus' repo and 'atdm-email' in TriBITS branch) ... PROGRESS ...
1. Set up mailman list and Jenkins job to run script and post emails to the mailman list (and we can sign up for the mail list). (The mail list will also provide an archive of past results). (There should be a different mail list for different types of results; .e.g. one for the main "Promoted ATDM Trilinos Builds", a different one for "Specialized ATDM Trilinos Builds", etc.)
1. Create documentation about the script somewhere and put in links to this documentation in the generated HTML emails somehow.
1. Flesh out the script to cover all of the types of failures we need to keep track of.
1. ???
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4809Framework: Non-framework machines no longer able to report to Cdash2019-04-09T23:16:48ZJames WillenbringFramework: Non-framework machines no longer able to report to Cdash*Created by: csiefer2*
And for that matter, the entire Nightly track is gone. SAD!
(Originally noticed by @lucbv)
@trilinos/framework
@trilinos/tpetra This will impact deprecation work after the release if it is not resolved...*Created by: csiefer2*
And for that matter, the entire Nightly track is gone. SAD!
(Originally noticed by @lucbv)
@trilinos/framework
@trilinos/tpetra This will impact deprecation work after the release if it is not resolved by then.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4842MueLu: update CUDA drivers on geminga2019-04-09T17:37:38ZJames WillenbringMueLu: update CUDA drivers on geminga*Created by: jhux2*
@trilinos/muelu
So we don't forget about this.*Created by: jhux2*
@trilinos/muelu
So we don't forget about this.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4551New STKUnit test failing in ATDM CUDA PT build on white2019-04-09T12:50:24ZJames WillenbringNew STKUnit test failing in ATDM CUDA PT build on white*Created by: fryeguy52*
CC: @trilinos/stk, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://test...*Created by: fryeguy52*
CC: @trilinos/stk, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt&field2=testname&compare2=61&value2=STKUnit_tests_stk_ngp_test_utest_MPI_4&field3=site&compare3=61&value3=white&field4=buildstarttime&compare4=84&value4=2019-03-05T00%3A00%3A00&field5=buildstarttime&compare5=83&value5=2019-02-03T00%3A00%3A00) the test:
* STKUnit_tests_stk_ngp_test_utest_MPI_4
is failing in the builds:
* Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt
[Test Output on CDash](https://testing.sandia.gov/cdash/testDetails.php?test=69483834&build=4654832)
This looks like a new test that was added on 2019-02-22 and has been failing since
## Current Status on CDash
[The current status and recent history of this test](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2019-03-06&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt&field2=testname&compare2=61&value2=STKUnit_tests_stk_ngp_test_utest_MPI_4&field3=site&compare3=61&value3=white&field4=buildstarttime&compare4=84&value4=today&field5=buildstarttime&compare5=83&value5=14%20days%20ago)
## Steps to Reproduce
One should be able to reproduce this failure on ride or white as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for ride or white are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#ridewhite
The exact commands to reproduce this issue should be:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_STK=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4813Zoltan2: example passes extra arguments to Kokkos::View constructor2019-04-06T09:36:26ZJames WillenbringZoltan2: example passes extra arguments to Kokkos::View constructor*Created by: mhoemmen*
@trilinos/zoltan2 @kddevin
`Trilinos/packages/zoltan2/example/block/kokkosBlock.cpp`, line 147, creates a 1-D `Kokkos::View`, but passes it two run-time dimensions. The second dimension is zero, which is why ...*Created by: mhoemmen*
@trilinos/zoltan2 @kddevin
`Trilinos/packages/zoltan2/example/block/kokkosBlock.cpp`, line 147, creates a 1-D `Kokkos::View`, but passes it two run-time dimensions. The second dimension is zero, which is why it still builds, but it throws if you disable deprecated Kokkos code.
I am working on a fix.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4814Tpetra: Hide all currently deprecated code with TPETRA_ENABLE_DEPRECATED_CODE...2019-04-06T09:36:26ZJames WillenbringTpetra: Hide all currently deprecated code with TPETRA_ENABLE_DEPRECATED_CODE macro*Created by: mhoemmen*
@trilinos/tpetra
- [x] For anything in Tpetra marked with `TPETRA_DEPRECATED`, wrap it in `#ifdef TPETRA_ENABLE_DEPRECATED_CODE ... #endif`.
- [x] Make sure that all downstream code builds with `TPETRA_EN...*Created by: mhoemmen*
@trilinos/tpetra
- [x] For anything in Tpetra marked with `TPETRA_DEPRECATED`, wrap it in `#ifdef TPETRA_ENABLE_DEPRECATED_CODE ... #endif`.
- [x] Make sure that all downstream code builds with `TPETRA_ENABLE_DEPRECATED_CODE` not defined (it is currently defined by default; set the CMake option `Tpetra_ENABLE_DEPRECATED_CODE` (case significant) to `OFF`, in order to build with the macro not defined).
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/4785Zoltan2: MJ giving unbalanced parts for large structured data2019-04-06T09:36:26ZJames WillenbringZoltan2: MJ giving unbalanced parts for large structured data*Created by: kddevin*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
<!---
Note that an...*Created by: kddevin*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
<!---
Note that anything between these delimiters is a comment that will not appear
in the issue description once created. Click on the Preview tab to see what
everything will look like when you submit.
-->
<!---
Feel free to delete anything from this template that is not applicable to the
issue you are submitting.
-->
<!---
Replace <teamName> below with the appropriate Trilinos package/team name.
-->
@trilinos/zoltan2
<!---
Assignees: If you know anyone who should likely tackle this issue, select them
from the Assignees drop-down on the right.
-->
<!---
Lables: Choose any applicable package names from the Labels drop-down on the
right. Additionally, choose a label to indicate the type of issue, for
instance, bug, build, documentation, enhancement, etc.
-->
## Expectations
For uniformly weighted input, MJ should return balanced parts when not using rectilinear blocks.
## Current Behavior
For large structured data, MJ can return unbalanced parts.
For example, running with 9M points selected as integers within a 1000x1000x1000 cube and dividing into 64 parts, MJ returns imbalance of 3, with half of the parts empty.
```
myGlobalId_t = i 4; localCount = 9000000; globalCount = 9000000
Test: no weights, scalar = double
Imbalance Metrics: (64 existing parts) (32 of which are non-empty)
min max avg imbalance
object count 0 4.219e+05 1.406e+05 3
```
This result is consistent regardless of whether the coordinates are given to MJ as integers or doubles.
## Motivation and Context
<!---
How has this expectation failure affected you? What are you trying to
accomplish? Why do we need to address this? What does it have to do with
anything? Providing context helps us come up with a solution that is most
useful in the real world.
-->
I'd like to partition the nonzeros of a tensor with MJ, but for large-enough tensors, the resulting partition is not balanced.
## Definition of Done
<!---
Tell us what needs to happen. If necessary, give us a task list along the
lines of:
- [ ] First do this.
- [ ] Then do that.
- [ ] Also this other thing.
-->
MJ returns balanced parts for this use case.
## Possible Solution
<!---
Not obligatory, but suggest a fix for the bug or documentation, or suggest
ideas on how to implement the addition or change.
-->
## Steps to Reproduce
<!---
Provide a link to a live example, or an unambiguous set of steps to reproduce
this issue. Include code to reproduce, if relevant.
1. Do this.
1. Do that.
1. Shake fist angrily at computer.
-->
New test problem zoltan2/test/partition/mj_imbalanced.cpp demonstrates the problem.
Running on one processor:
Zoltan2_mj_imbalanced.exe
## Your Environment
<!---
Include relevant details about your environment such that we can replicate this
issue.
-->
- **Relevant repo SHA1s:**
- **Relevant configure flags or configure script:**
- **Operating system and version:**
- **Compiler and TPL versions:**
77f266c4c5f8c2135b922c299ea531d221a2df5c
All platforms
## Related Issues
<!---
If applicable, let us know how this bug is related to any other open issues:
-->
* Blocks
* Is blocked by
* Follows
* Precedes
* Related to
* Part of
* Composed of
## Additional Information
<!---
Anything else that might be helpful for us to know in addressing this issue:
* Configure log file:
* Build log file:
* Test log file:
* When was the last time everything worked (date/time; SHA1s; etc.)?
* What did you do that made the bug rear its ugly head?
* Have you tried turning it off and on again?
-->
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/3543ROL tests failing in targeted CUDA PR build Trilinos-atdm-white-ride-cuda-9.2...2019-04-06T00:16:37ZJames WillenbringROL tests failing in targeted CUDA PR build Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt*Created by: bartlettroscoe*
CC: @trilinos/rol , @rppawlo (Trilinos Nonlinear Solvers Product Area Lead)
## Next Action Status
## Description
The ROL package has 66 failing tests in the build `Trilinos-atdm-white-ride-cuda-9.2-...*Created by: bartlettroscoe*
CC: @trilinos/rol , @rppawlo (Trilinos Nonlinear Solvers Product Area Lead)
## Next Action Status
## Description
The ROL package has 66 failing tests in the build `Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt ` on 'white' and 'ride' as shown [here](https://testing.sandia.gov/cdash-dev-view/viewTest.php?onlyfailed&buildid=3998251) which shows the failing tests:
* ROL_example_PDE-OPT_0ld_adv-diff-react_example_01_MPI_4
* ROL_example_PDE-OPT_0ld_adv-diff-react_example_02_MPI_4
* ROL_example_PDE-OPT_0ld_poisson_example_01_MPI_4
* ROL_example_PDE-OPT_0ld_stefan-boltzmann_example_03_MPI_4
* ROL_example_PDE-OPT_navier-stokes_example_01_MPI_4
* ROL_example_PDE-OPT_navier-stokes_example_02_MPI_4
* ROL_example_PDE-OPT_nonlinear-elliptic_example_01_MPI_4
* ROL_example_PDE-OPT_nonlinear-elliptic_example_02_MPI_4
* ROL_example_PDE-OPT_obstacle_example_01_MPI_4
* ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4
* ROL_example_PDE-OPT_stefan-boltzmann_example_03_MPI_4
* ROL_example_PDE-OPT_topo-opt_poisson_example_01_MPI_4
* ROL_example_tempus_example_parabolic_modeleval_MPI_1
* ROL_example_tempus_example_parabolic_thyravec_MPI_1
* ROL_test_elementwise_TpetraMultiVector_MPI_4
The first failing test `ROL_example_PDE-OPT_0ld_adv-diff-react_example_01_MPI_4` with detailed output shown [here](https://testing.sandia.gov/cdash-dev-view/testDetails.php?test=55725358&build=3998251) shows:
```
Total number of processors: 4
Number of nodes = 1089
Number of cells = 1024
Number of edges = 2112
Cell offsets across processors: {0, 256, 512, 768}
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaGetLastError() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp:401
Traceback functionality not available
what(): cudaGetLastError() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp:401
Traceback functionality not available
[white27:11203] *** Process received signal ***
[white27:11204] *** Process received signal ***
[white27:11203] Signal: Aborted (6)
[white27:11203] Signal code: (-6)
[white27:11204] Signal: Aborted (6)
[white27:11204] Signal code: (-6)
[white27:11203] [ 0] [white27:11204] [ 0] [0x3fff90070478]
[white27:11203] [ 1] [0x3fffa3f00478]
...
```
Randomly looking at the output of several of the other tests I looked at all show errors like shown above.
This is an important build because we are targeting this build on 'white' and 'ride' as a Trilinos PR testing build (see #2464 ). Also, SPARC uses ROL and as part of https://software-sandbox.sandia.gov/jira/browse/TRIL-212 we are about to update the ATDM Trilinos configuration to test ROL on many platforms (including CUDA builds) so it is critical to get these tests cleaned up for ATDM.
## Steps to reproduce
One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_ROL=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/3749TrilinosCouplings build and test failures in the build Trilinos-atdm-white-ri...2019-04-06T00:15:10ZJames WillenbringTrilinosCouplings build and test failures in the build Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt starting 2018-10-24*Created by: bartlettroscoe*
@trilinos/trilinoscouplings, @jwillenbring (Trilinos Framework Product Area Lead), @trilinos/muelu, @lucbv
## Next Action Status
Build and test errors were caused by merge of PR #3723 on 10/23/2018 th...*Created by: bartlettroscoe*
@trilinos/trilinoscouplings, @jwillenbring (Trilinos Framework Product Area Lead), @trilinos/muelu, @lucbv
## Next Action Status
Build and test errors were caused by merge of PR #3723 on 10/23/2018 that enabled this code by allowing the enable of `MueLu_ENABLE_Epetra=ON`. Next: Fix or disable these tests?
## Description
As shown [here](https://testing.sandia.gov/cdash-dev-view/viewBuildError.php?buildid=4103503), the TrilinosCouplings example files `IntrepidPoisson_Pamgen_EpetraAztecOO_main.cpp` and `IntrepidPoisson_Pamgen_Epetra_main.cpp` starting failing to compile in the build `Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt` on 'ride' and 'white' starting on 2018-10-27 showing the build errors:
```
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt/SRC_AND_BUILD/Trilinos/teuchos/core/src/Teuchos_RCP.hpp(288): error: a value of type "MueLu::Hierarchy<TrilinosCouplings::EpetraIntrepidPoissonExample::ST, int, int, KokkosClassic::DefaultNode::DefaultNodeType> *" cannot be used to initialize an entity of type "MueLu::EpetraOperator::Hierarchy *"
detected during instantiation of "Teuchos::RCP<T>::RCP(const Teuchos::RCP<T2> &) [with T=MueLu::EpetraOperator::Hierarchy, T2=MueLu::Hierarchy<TrilinosCouplings::EpetraIntrepidPoissonExample::ST, int, int, KokkosClassic::DefaultNode::DefaultNodeType>]"
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt/SRC_AND_BUILD/Trilinos/trilinoscouplings/examples/scaling/IntrepidPoisson_Pamgen_EpetraAztecOO_main.cpp(259): here
1 error detected in the compilation of "/tmp/tmpxft_00005fee_00000000-6_IntrepidPoisson_Pamgen_EpetraAztecOO_main.cpp1.ii".
```
and
```
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/teuchos/core/src/Teuchos_RCP.hpp(288): error: a value of type "MueLu::Hierarchy<TrilinosCouplings::EpetraIntrepidPoissonExample::ST, int, int, KokkosClassic::DefaultNode::DefaultNodeType> *" cannot be used to initialize an entity of type "MueLu::EpetraOperator::Hierarchy *"
detected during instantiation of "Teuchos::RCP<T>::RCP(const Teuchos::RCP<T2> &) [with T=MueLu::EpetraOperator::Hierarchy, T2=MueLu::Hierarchy<TrilinosCouplings::EpetraIntrepidPoissonExample::ST, int, int, KokkosClassic::DefaultNode::DefaultNodeType>]"
/home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/trilinoscouplings/examples/scaling/IntrepidPoisson_Pamgen_Epetra_main.cpp(286): here
1 error detected in the compilation of "/tmp/tmpxft_00006693_00000000-6_IntrepidPoisson_Pamgen_Epetra_main.cpp1.ii".
```
This results in the test failures shown, for example, [here](https://testing.sandia.gov/cdash-dev-view/viewTest.php?onlyfailed&buildid=4103503):
* TrilinosCouplings_Example_Maxwell_MueLu_MPI_1
* TrilinosCouplings_Example_Maxwell_MueLu_MPI_4
Looking at the history of the TrilinosCouplings build on 'ride' [here](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&date=2018-10-27&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-white-ride-cuda-9.2-debug-pt&field2=site&compare2=61&value2=ride&field3=subprojects&compare3=93&value3=TrilinosCouplings&field4=buildstarttime&compare4=83&value4=2018-09-26) we can see this these build failures started on 2018-10-24.
Looking at the git commits pulled that day shown [here](https://testing.sandia.gov/cdash-dev-view/viewNotes.php?buildid=4088183#!#note6), we see that the commits that likely caused by the PR #3722 with commits from @lucbv.
## Current Status on CDash
To see the current status of the build and tests for the current testing day and previous few days, click the below link:
* [TrilinosCouplings build and test results for 'Trilinos-atdm-white-ride-cuda-9.2-debug-pt' on 'ride'](https://testing.sandia.gov/cdash-dev-view/index.php?project=Trilinos&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-white-ride-cuda-9.2-debug-pt&field2=site&compare2=61&value2=ride&field3=subprojects&compare3=93&value3=TrilinosCouplings&field4=buildstarttime&compare4=83&value4=10%20days%20ago)
NOTE: On above page, click on the "Start Time" column header to see the build results sorted by date.
## Steps to Reproduce
One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_TrilinosCouplings=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4042Zoltan test diff failures in targeted CUDA PR build Trilinos-atdm-white-ride-...2019-04-06T00:14:47ZJames WillenbringZoltan test diff failures in targeted CUDA PR build Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt*Created by: bartlettroscoe*
CC: @trilinos/zoltan, @kddevin (Trilinos <product-area-name> Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](...*Created by: bartlettroscoe*
CC: @trilinos/zoltan, @kddevin (Trilinos <product-area-name> Product Lead), @bartlettroscoe, @fryeguy52
## Next Action Status
<status-and-or-first-action>
## Description
As shown in [this query](https://testing.sandia.gov/cdash-dev-view/viewTest.php?onlyfailed&buildid=4287837) the tests:
* `Zoltan_ch_simple_zoltan_parallel`
* `Zoltan_ch_grid20x19_zoltan_parallel`
* `Zoltan_ch_ewgt_zoltan_parallel`
* `Zoltan_ch_nograph_zoltan_parallel`
fail in the build `Trilinos-atdm-white-ride-cuda-9.2-release-debug-pt` which is the current candidate CUDA PR build described in #2464. They have failed since we switched from a `debug` build to a `release-debug` build for the reasons described in https://github.com/trilinos/Trilinos/issues/2464#issuecomment-444637454. These are the only new tests that are failing since we switched from a `debug` to a `release-debug` build.
These all look to be "diff" failures like [here](https://testing.sandia.gov/cdash-dev-view/testDetails.php?test=61372846&build=4287837) showing:
```
DEBUG moving files: simple.out.4.3 output/simple.rib-partlocal4.4.3
DEBUG comparing files: answers/simple.rib-partlocal4.4.3 output/simple.rib-partlocal4.4.3
DEBUG comparing files: answers/simple.rib-partlocal4.drops.4.3 output/simple.rib-partlocal4.drops.4.3
DEBUG COMPARISON 1 1
Test simple:rib-partlocal4 FAILED (Diff failed on 1 files)
```
## Current Status on CDash
The current status of these tests/builds for the current testing day can be found at:
* [Zoltan tests in Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt build over last two days](https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt&field2=testname&compare2=65&value2=Zoltan_&field3=buildstarttime&compare3=83&value3=2%20days%20ago)
NOTE: Click "previous" to see the previous day's test results in case this build did not run today or add the filter ["Build Start Time", "is after", "2 weeks ago"] to see history of tests in previous days. (Or create any filters you want from there.)
## Steps to Reproduce
One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Zoltan=ON \
$TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
```Initial cleanup of new ATDM builds of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/4797Tpetra::MultiVector: Deprecate getDualView & remove its use from Trilinos2019-04-05T20:34:29ZJames WillenbringTpetra::MultiVector: Deprecate getDualView & remove its use from Trilinos*Created by: mhoemmen*
@trilinos/tpetra
- [x] Deprecate `Tpetra::MultiVector::getDualView`.
- [x] Remove its use from Trilinos.
- [x] Explain (in the method's Doxygen documentation) how to get the same functionality.
## Motivat...*Created by: mhoemmen*
@trilinos/tpetra
- [x] Deprecate `Tpetra::MultiVector::getDualView`.
- [x] Remove its use from Trilinos.
- [x] Explain (in the method's Doxygen documentation) how to get the same functionality.
## Motivation and Context
The `getDualView` method exposes an implementation detail of `Tpetra::MultiVector`. This hinders fixing #364 and #333.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/3715Tempus: Add Stepper Initial Conditions2019-04-05T20:00:01ZJames WillenbringTempus: Add Stepper Initial Conditions*Created by: ccober6*
Adding a Stepper initial conditions has several benefits:
* Ensure that the initial SolutionState is consistent, i.e., x, xDot, and xDotDot satisfy the governing equation at the initial time.
* Currently explic...*Created by: ccober6*
Adding a Stepper initial conditions has several benefits:
* Ensure that the initial SolutionState is consistent, i.e., x, xDot, and xDotDot satisfy the governing equation at the initial time.
* Currently explicit Steppers are not consistent until the next time step fills in the currentState. This change will make the SolutionState consistent within the time step.
* For explicit Steppers with a consistent SolutionState, the residual can be determined and compared to residuals from implicit Steppers. This is SPARC's main driver for this change.
* For DAEs, making the initial condition consistent is critical to obtain the correct solution.
@trilinos/tempus
## Expectations
* Changes for this issue should not effect solutions.
## Definition of Done
- [x] First convert Forward Euler.
- [x] Convert all other explicit Steppers, ERK, Leapfrog, ...
- [x] Convert all implicit Steppers
- [x] Passes all tests.