Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2018-11-30T11:16:53Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2919Belos_rcg_hb_MPI_4 timing out in several ATDM Trilinos builds on 'hansen' sin...2018-11-30T11:16:53ZJames WillenbringBelos_rcg_hb_MPI_4 timing out in several ATDM Trilinos builds on 'hansen' since 5/29/2018*Created by: bartlettroscoe*
CC: @trilinos/belos, @fryeguy52, @srajama1 (Linear Solves Project Lead)
## Next Action Status
Test was disabled in these builds on 'hansen' in the commit 8850c64 pushed on 6/12/2018 and was shown to be...*Created by: bartlettroscoe*
CC: @trilinos/belos, @fryeguy52, @srajama1 (Linear Solves Project Lead)
## Next Action Status
Test was disabled in these builds on 'hansen' in the commit 8850c64 pushed on 6/12/2018 and was shown to be disabled in the builds on CDash 6/13/2018
## Description
As shown in [this large query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=19&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=62&value2=Trilinos-atdm-mutrino-intel-debug-openmp&field3=buildname&compare3=62&value3=Trilinos-atdm-mutrino-intel-opt-openmp&field4=buildname&compare4=62&value4=Trilinos-atdm-white-ride-cuda-debug-pt-all-at-once&field5=buildname&compare5=62&value5=Trilinos-atdm-serrano-intel-debug-openmp&field6=buildname&compare6=62&value6=Trilinos-atdm-serrano-intel-opt-openmp&field7=buildname&compare7=62&value7=Trilinos-atdm-chama-intel-opt-openmp&field8=buildname&compare8=62&value8=Trilinos-atdm-chama-intel-debug-openmp-panzer&field9=buildname&compare9=62&value9=Trilinos-atdm-chama-intel-debug-openmp&field10=buildname&compare10=62&value10=Trilinos-atdm-chama-intel-opt-openmp-panzer&field11=site&compare11=62&value11=ride&field12=testname&compare12=61&value12=Belos_rcg_hb_MPI_4&field13=buildstarttime&compare13=84&value13=2018-06-08&field14=buildstarttime&compare14=83&value14=2018-05-10&field15=buildname&compare15=62&value15=Trilinos-atdm-white-ride-cuda-opt&field16=buildname&compare16=62&value16=Trilinos-atdm-white-ride-gnu-opt-openmp&field17=site&compare17=62&value17=serrano&field18=site&compare18=62&value18=shiller&field19=buildname&compare19=62&value19=Trilinos-atdm-white-ride-cuda-debug-all-at-once) the test `Belos_rcg_hb_MPI_4` looks to be consistently timing out in the builds:
* Trilinos-atdm-hansen-shiller-cuda-8.0-debug
* Trilinos-atdm-hansen-shiller-cuda-8.0-opt
* Trilinos-atdm-hansen-shiller-cuda-9.0-debug
* Trilinos-atdm-hansen-shiller-cuda-9.0-opt
* Trilinos-atdm-hansen-shiller-gnu-debug-serial
* Trilinos-atdm-hansen-shiller-gnu-opt-serial
all on 'hansen' starting on 5/29/201 or 5/30/2018. (Since the these builds are pulling directly from the 'develop' branch, they may be testing different versions on the same day and this is UTC time so they may be on the same testing day in Mountain time.)
That same query shows that that test has been consistently passing in every other promoted build on every other ATDM Trilinos testing machine.
What that query also shows is that in those same builds that are now timing out, the test was taking upwards of 6+ minutes to complete before it started timing out at 10 minutes on 5/29/201 or 5/30/2018 as shown in the last non-timing-out builds:
* Trilinos-atdm-hansen-shiller-cuda-8.0-debug: 6m 26s 280ms
* Trilinos-atdm-hansen-shiller-cuda-8.0-opt: 6m 25s 680ms
* Trilinos-atdm-hansen-shiller-cuda-9.0-debug: 6m 22s 810ms
* Trilinos-atdm-hansen-shiller-cuda-9.0-opt: 6m 22s 440ms
* Trilinos-atdm-hansen-shiller-gnu-debug-serial: 6m 13s 150ms
* Trilinos-atdm-hansen-shiller-gnu-opt-serial: 5m 58s 960ms
But the other builds that are not showing any timeouts, that test completes very fast (in under 30 seconds in about every case). Some of the recent test times shown in that query for the various builds that don't have timeouts now are:
* Trilinos-atdm-hansen-shiller-gnu-debug-openmp: 23s 850ms
* Trilinos-atdm-hansen-shiller-gnu-opt-openmp: 8s 650ms
* Trilinos-atdm-hansen-shiller-intel-debug-openmp: 7s 720ms
* Trilinos-atdm-hansen-shiller-intel-debug-serial: 7s 950ms
* Trilinos-atdm-hansen-shiller-intel-opt-openmp: 6s 150ms
* Trilinos-atdm-hansen-shiller-intel-opt-serial: 5s 910ms
* Trilinos-atdm-rhel6-gnu-debug-openmp: 6s 840ms
* Trilinos-atdm-rhel6-gnu-debug-serial: 5s 340ms
* Trilinos-atdm-rhel6-gnu-opt-openmp: 5s 180ms
* Trilinos-atdm-rhel6-gnu-opt-serial: 4s 250ms
* Trilinos-atdm-rhel6-intel-opt-openmp: 3s 740ms
* Trilinos-atdm-sems-gcc-7-2-0: 5s 290ms
* Trilinos-atdm-white-ride-cuda-debug: 9s 430ms
* Trilinos-atdm-white-ride-gnu-debug-openmp: 9s 90ms
So this seems pretty crazy. How can the same test take over 6 minutes to complete for a CUDA 8.0 and 9.0 optimized build on 'hansen' and only take 9 seconds for a CUDA debug on 'white'? And this test takes a very long time (and are now timing out) for the `gnu-debug-serial` and `gnu-opt-serial` builds as well on 'hansen' but is fast for the `intel-debug-serial` and `intel-opt-serial` builds on the same machine. How can that be the case?
To try to get more insight about this test we can look at the test output for a case where it takes a long time to run (and is timing out currently) and compare that to the test output for a case that completes very quickly.
First, lets look at the last time this test passed for the `Trilinos-atdm-hansen-shiller-gnu-debug-serial` build on 'hansen' which took 6m 13s 150ms to complate and pass on 2018-05-29T06:41:19 UTC with output shown at:
* https://testing-vm.sandia.gov/cdash/testDetails.php?test=47454651&build=3555977
which shows:
```
Passed.......OR Combination ->
OK...........Number of Iterations = 2206 < 4000
Converged....(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 9.56537e-07 < 1e-06
residual [ 1 ] = 9.4486e-07 < 1e-06
residual [ 2 ] = 9.24543e-07 < 1e-06
residual [ 3 ] = 9.44363e-07 < 1e-06
residual [ 4 ] = 9.64382e-07 < 1e-06
residual [ 5 ] = 9.14533e-07 < 1e-06
residual [ 6 ] = 9.50517e-07 < 1e-06
residual [ 7 ] = 8.31671e-07 < 1e-06
residual [ 8 ] = 9.59686e-07 < 1e-06
residual [ 9 ] = 9.74218e-07 < 1e-06
==================================================================================================================================
TimeMonitor results over 4 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
----------------------------------------------------------------------------------------------------------------------------------
Belos: Operation Op*x 1.489 (2.114e+04) 1.582 (2.114e+04) 1.668 (2.114e+04) 7.483e-05 (2.114e+04)
Belos: Operation Prec*x 0 (0) 0 (0) 0 (0) 0 (0)
Belos: RCGSolMgr total solve time 365.4 (1) 365.4 (1) 365.4 (1) 365.4 (1)
Epetra_CrsMatrix::Multiply(TransA,X,Y) 1.45 (2.114e+04) 1.542 (2.114e+04) 1.629 (2.114e+04) 7.295e-05 (2.114e+04)
==================================================================================================================================
```
And let's compare this to the test output for the build `Trilinos-atdm-hansen-shiller-intel-debug-serial` on 'hansen' which took 6s 740ms to complete and pass on 2018-05-29T14:52:35 UTC shown at:
* https://testing-vm.sandia.gov/cdash/testDetails.php?test=47482010&build=3557186
which shows:
```
Passed.......OR Combination ->
OK...........Number of Iterations = 2131 < 4000
Converged....(2-Norm Imp Res Vec) / (2-Norm Res0)
residual [ 0 ] = 9.5909e-07 < 1e-06
residual [ 1 ] = 9.65321e-07 < 1e-06
residual [ 2 ] = 8.59334e-07 < 1e-06
residual [ 3 ] = 9.55053e-07 < 1e-06
residual [ 4 ] = 9.97094e-07 < 1e-06
residual [ 5 ] = 7.53902e-07 < 1e-06
residual [ 6 ] = 8.46489e-07 < 1e-06
residual [ 7 ] = 9.64082e-07 < 1e-06
residual [ 8 ] = 9.92318e-07 < 1e-06
residual [ 9 ] = 9.92263e-07 < 1e-06
==================================================================================================================================
TimeMonitor results over 4 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
----------------------------------------------------------------------------------------------------------------------------------
Belos: Operation Op*x 2.026 (2.109e+04) 2.179 (2.109e+04) 2.403 (2.109e+04) 0.0001033 (2.109e+04)
Belos: Operation Prec*x 0 (0) 0 (0) 0 (0) 0 (0)
Belos: RCGSolMgr total solve time 5.945 (1) 5.946 (1) 5.946 (1) 5.946 (1)
Epetra_CrsMatrix::Multiply(TransA,X,Y) 1.975 (2.109e+04) 2.116 (2.109e+04) 2.316 (2.109e+04) 0.0001003 (2.109e+04)
==================================================================================================================================
```
The times for the individual operations is not that different but "Belos: RCGSolMgr total solve time" at 365.4 vs. 5.946 is the real problem. The final results show that the test is doing different computations in these two builds but the total number of operations is not radically different (e,g, 2.114e+04 vs. 2.109e+04 mat-vecs). So what is going on here to cause the huge increase in wall clock time for a serial Kokkos threading test?
Looking at the new commits pulled in when this started to fail for the build `Trilinos-atdm-hansen-shiller-gnu-opt-serial` on 2018-05-29 14:05:09 shown at:
* https://testing-vm.sandia.gov/cdash/viewNotes.php?buildid=3560199#!#note0
it is hard to tell what might have caused these tests to start timing out. I would guess that the most likely trigger was:
```
c840658: Switch to CMake 3.11.2, Ninja 1.8.2 and all-at-once mode on hansen/shiller (TRIL-209)
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Tue May 29 08:12:42 2018 -0600
M cmake/ctest/drivers/atdm/shiller/local-driver.sh
M cmake/std/atdm/shiller/environment.sh
```
That will increase the number of tests running on the machine and could result in single tests taking longer to run.
But the fact that the same test takes 6 minutes GCC but only takes 7 seconds with Intel is a major problem, in my opinion and that has to be investigated.
Someone is going to need to add some more timers to account for where the time is going.
## Steps to reproduce
One should be able to follow the instructions at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
to reproduce this behavior on 'hansen' or 'shiller'. To avoid needing to run on a compute node, one could use the `gnu-debug-serial` build and do:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh gnu-debug-serial
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Belos=ON \
$TRILINOS_DIR
$ make NP=16
$ ctest -VV -R Belos_rcg_hb_MPI_4
```
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2925Test Stratimikos_test_aztecoo_thyra_driver_MPI_1 timing out in Trilinos-atdm...2018-11-30T11:16:53ZJames WillenbringTest Stratimikos_test_aztecoo_thyra_driver_MPI_1 timing out in Trilinos-atdm-hansen-shiller-gnu-debug-serial build since 5/30/2018*Created by: bartlettroscoe*
CC: @trilinos/stratimikos, @fryeguy52
## Next Action Stauts
Test was disabled for these two builds on 'hansen' in commit 73ae19c pushed on 6/12/2018 and this test disappeared in these builds on 6/13/2...*Created by: bartlettroscoe*
CC: @trilinos/stratimikos, @fryeguy52
## Next Action Stauts
Test was disabled for these two builds on 'hansen' in commit 73ae19c pushed on 6/12/2018 and this test disappeared in these builds on 6/13/2018.
## Description
As shown in [this query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-06-11&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=11&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=status&compare2=62&value2=passed&field3=status&compare3=62&value3=notrun&field4=buildname&compare4=62&value4=Trilinos-atdm-white-ride-cuda-debug-pt-all-at-once&field5=site&compare5=62&value5=mutrino&field6=site&compare6=62&value6=serrano&field7=site&compare7=62&value7=chama&field8=site&compare8=62&value8=ride&field9=buildstarttime&compare9=84&value9=2018-06-11&field10=buildstarttime&compare10=83&value10=2018-05-20&field11=testname&compare11=65&value11=Stratimikos), the test `Stratimikos_test_aztecoo_thyra_driver_MPI_1` has been timing out in the builds `Trilinos-atdm-hansen-shiller-gnu-debug-serial` and `Trilinos-atdm-hansen-shiller-gnu-opt-serial` since 5/30/2018. (That query also shows this is the only Stratimikos test that has failed in any of the promoted "ATDM" builds since 5/20/2018.)
[This query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-06-11&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=5&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=buildname&compare2=61&value2=Trilinos-atdm-hansen-shiller-gnu-debug-serial&field3=buildstarttime&compare3=84&value3=2018-06-11&field4=buildstarttime&compare4=83&value4=2018-05-20&field5=testname&compare5=61&value5=Stratimikos_test_aztecoo_thyra_driver_MPI_1) shows that the test `Stratimikos_test_aztecoo_thyra_driver_MPI_1` went from passing at under 21s every day to timing out at 10 minutes every day since 5/29/2018 (but it did pass once taking 9m 56s 930ms on 6/8/2018, the only time it did not time-out since 5/29/2018).
What changed from 5/29/2018 to 5/30/2018? Looking at the updates pulled in for the build `Trilinos-atdm-hansen-shiller-gnu-debug-serial` with build stamp `20180530-0400-ATDM` shown at:
* https://testing-vm.sandia.gov/cdash/viewNotes.php?buildid=3558860#!#note0
it seems like only commits that could have impacted this were:
```
c9ccf7d: Switch from srun to salloc on hansen/shiller (TRIL-209)
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Tue May 29 08:35:16 2018 -0600
M cmake/ctest/drivers/atdm/shiller/local-driver.sh
M cmake/std/atdm/README.md
c840658: Switch to CMake 3.11.2, Ninja 1.8.2 and all-at-once mode on hansen/shiller (TRIL-209)
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Tue May 29 08:12:42 2018 -0600
M cmake/ctest/drivers/atdm/shiller/local-driver.sh
M cmake/std/atdm/shiller/environment.sh
```
There are no other commits that I could see that could impact this AztecOO test. So it looks like moving to CMake/CTest 3.11.2 and to the all-at-once approach triggered this large increase in runtime for the test `Stratimikos_test_aztecoo_thyra_driver_MPI_1` for the build `Trilinos-atdm-hansen-shiller-gnu-debug-serial`. This may have been a result of having more tests running while this Stratimikos test is running.
Looking in [this query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-06-10&filtercombine=and&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testname&compare2=61&value2=Stratimikos_test_aztecoo_thyra_driver_MPI_1), we can see that the test `Stratimikos_test_aztecoo_thyra_driver_MPI_1` timed-out in the build `Trilinos-atdm-hansen-shiller-gnu-debug-serial` yesterday 6/10/2018 but it took upwards of 2.5 to 3.5 minutes to run in the CUDA builds. Ohterwise, this test did not take any longer than 22s to run in all of the other ATDM builds of Trilinos. And what is also interesting is that query showed that this test passed in 4s 460ms for the build `Trilinos-atdm-hansen-shiller-intel-debug-serial` also run on 'hansen'. How can the same test pass on an `intel-debug-serial` build in under 5 seconds but then time out at 10 minutes for a `gnu-debug-serial` build on the same hardware with the same MPI implementation and settings?
For that matter, [this query](https://testing-vm.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2018-06-10&filtercombine=and&filtercount=1&showfilters=1&field1=testname&compare1=61&value1=Stratimikos_test_aztecoo_thyra_driver_MPI_1) shows that other than the CUDA builds of Trilinos and the yet-to-be-cleaned-up 'mutrinos' build `Trilinos-atdm-mutrino-intel-debug-openmp`, this test did not take any longer than 22s to run in any of the 46 Trilinos builds where this test ran yesterday. On some platforms, this test completed in less than 2s!
This is very strange behavior for a test. There must be some type of machine or system usage issue going on here. But why would it impact a `gnu-debug-serial` build but not an `intel-debug-serial` build on the same machine?
## Steps to reproduce
Following the instructions at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#shillerhansen
one can log on to 'hansen' or 'shiller', clone Trilinos and get on to the 'develop' branch, and then do:
```
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh intel-opt-openmp
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Stratimikos=ON \
$TRILINOS_DIR
$ make NP=16
$ salloc ctest -j16
```
I did this on 'shiller' but unfortunately all of the Stratimikos tests passed:
```
100% tests passed, 0 tests failed out of 40
Subproject Time Summary:
Stratimikos = 256.50 sec*proc (40 tests)
Total Test time (real) = 20.84 sec
```
Therefore, I was not able to reproduce this behavior on 'shiller'. Therefore, this must be some type of system issue.
Keep promoted "ATDM" builds of Trilinos cleanhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2894MueLu: types mismatch in Driver.cpp equilibration2018-06-06T17:13:43ZJames WillenbringMueLu: types mismatch in Driver.cpp equilibration*Created by: lucbv*
@trilinos/muelu
## Expectations
The vector and matrix Scalar types forming a linear system should be consistent.
## Current Behavior
It seems that a call to KokkosBlas::abs is done on a `Scalar` type vector ...*Created by: lucbv*
@trilinos/muelu
## Expectations
The vector and matrix Scalar types forming a linear system should be consistent.
## Current Behavior
It seems that a call to KokkosBlas::abs is done on a `Scalar` type vector and a `magnitude` type vector.
## Motivation and Context
The code is not compiling properly when `std::complex<>` is used
## Definition of Done
- [ ] MueLu Driver compiles
## Possible Solution
My guess is that in MueLu_Driver.cpp on line 167 where the call to KokkosBlas::abs() is made, the two vectors should use the same `Scalar` type. Most likely the magnitude type needs to be replaced by a Scalar type even if this means that in the case of complex numbers only the real part is non zero.
## Steps to Reproduce
Build MueLu with tests and examples on and with `Trilinos_ENABLE_Complex=ON`.
## Your Environment
See builds on cdashhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2893Teuchos::Time::wallTime() might not return a wall time but a CPU time2018-06-06T15:00:38ZJames WillenbringTeuchos::Time::wallTime() might not return a wall time but a CPU time*Created by: finkandreas*
Teuchos:e:Time::wallTime() could not return actually a wall time, but the cpu time. Looking into the implementation, I've seen this piece of code:
```
#ifdef HAVE_MPI
int mpiInitialized;
MPI...*Created by: finkandreas*
Teuchos:e:Time::wallTime() could not return actually a wall time, but the cpu time. Looking into the implementation, I've seen this piece of code:
```
#ifdef HAVE_MPI
int mpiInitialized;
MPI_Initialized(&mpiInitialized);
if( mpiInitialized ) {
return(MPI_Wtime());
}
else {
clock_t start;
start = clock();
return( (double)( start ) / CLOCKS_PER_SEC );
}
```
I realized it, when I did not initialize MPI, so I'll end up with the `clock()` call.
On Linux the value returned is the CPU time, NOT a wallclock time, i.e. in an OpenMP programm this will give wrong results.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2880Multiple definition of HAVE_MPI in configure files.2018-06-04T21:52:36ZJames WillenbringMultiple definition of HAVE_MPI in configure files.*Created by: kyungjoo-kim*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
I am wondering...*Created by: kyungjoo-kim*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
I am wondering why we have multiple definition of HAVE_MPI ? For example with TPL_ENABLE_MPI=ON, I can find out following multiple definitions from my build directory.
```
[kyukim @bread] packages > grep -r "HAVE_MPI" *
belos/src/Belos_config.h:#define HAVE_MPI
sacado/src/Sacado_config.h:#define HAVE_MPI
stokhos/src/Stokhos_config.h:#define HAVE_MPI
teuchos/core/src/Teuchos_config.h:#define HAVE_MPI
```
Do we do this because it does not cause a problem so far ? Any potential issue with this ? Also this macro is not guarded by package name.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2877Tpetra: Norm2 (dot) is substantially slower than direct calls to TPL BLAS2018-06-04T21:26:32ZJames WillenbringTpetra: Norm2 (dot) is substantially slower than direct calls to TPL BLAS*Created by: jjellio*
When strong scaling an app, we noticed the norm2 (among other things) is 'blowing' up. In this case, the customer has a challenging problem that they want to strong scale, while using 2 processes per node and 16...*Created by: jjellio*
When strong scaling an app, we noticed the norm2 (among other things) is 'blowing' up. In this case, the customer has a challenging problem that they want to strong scale, while using 2 processes per node and 16 threads per process (On Haswell nodes). The problem, is that this results in relatively small work per process for the dense linear algebra, and KokkosKernels::BLAS does not impose a reasonable minimum chunk size for its operations.
Considering that vendor BLAS is Intel's MKL, it isn't surprising that is performs very well. This leaves a reasonable question: Why doesn't Tpetra prefer vendor BLAS?
To support my argument, I modified Belos' MultiVectorTrait::norm2 to call TPL BLAS ddot rather than Tpetra::norm2 (which if you follow the rabbit hole, eventually calls KokkosKernels::BLAS:dot).
I wrote a simple test code that calls MVT::Norm2 a few thousand times in a loop. I profiled this code linked against my modified MVT and the vanilla Trilinos one. (I.e., TPL ddot vs Kokkos dot). For the experiment, I fixed the data per MPI process to 1000 elements (i.e., a very small work size). I then weak scale this perfectly, incrementally filling nodes with 2 processes.
I also profiled the cost of Teuchos::reduceAll, with a single scalar. I ran this with OMP_NUM_THREADS=1 and 16.
## Regular MVT::norm2 and All Reduce (1 thread)
![baseline1-1](https://user-images.githubusercontent.com/21248657/40936039-351eb936-67f7-11e8-9d0d-18496b9b4b3d.png)
## Regular MVT::norm2 and All Reduce (16 thread)
![baseline16-1](https://user-images.githubusercontent.com/21248657/40936049-3cffa28c-67f7-11e8-8103-dd71b81a0545.png)
## All Reduce unaffected by threads
![baselinethreadcomp2-1](https://user-images.githubusercontent.com/21248657/40936072-54554c66-67f7-11e8-98fc-873a66d82fd0.png)
## Using TPL dot with 1 or 16 threads
![modbelos-1](https://user-images.githubusercontent.com/21248657/40936110-7755e310-67f7-11e8-8596-5e21dc86dcd6.png)
## TPL ddot vs Kokkos
![modbelos-2](https://user-images.githubusercontent.com/21248657/40936122-858f7c0c-67f7-11e8-948a-3585142f5505.png)
![modbelos-3](https://user-images.githubusercontent.com/21248657/40936126-899cff22-67f7-11e8-89ed-dd5f65d9d921.png)
While I profiled norm2, the issue is really the underlying call to ddot. In this case, MKL is doing a much better job of throttling back it's thread count. Still, calling the threaded BLAS is an overall loss, and in this case calling a purely serial MKL would have been better.
One option to mitigate the lack of thread scaling we see, is to enforce compute ensure that BLAS operations are called with a meaningful chunk size. A simpler solution, that would reduce the Trilinos code base, would be to call TPL BLAS for Serial/Thread/and OpenMP execution spaces.
This contradicts the recommendation in #2850
@trilinos/tpetra
@trilinos/kokkos-kernels https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2867Intrepid2 - Curved element test and examples2018-06-01T16:52:41ZJames WillenbringIntrepid2 - Curved element test and examples*Created by: kyungjoo-kim*
DTK @Rombur requests an example code for curved elements.
## Expectations
A complete set of examples for curved elements of HGRAD elements.
## Current Behavior
There is no example for curved eleme...*Created by: kyungjoo-kim*
DTK @Rombur requests an example code for curved elements.
## Expectations
A complete set of examples for curved elements of HGRAD elements.
## Current Behavior
There is no example for curved elements.
## Definition of Done
- [ ] Create a wiki page.
- [ ] Provide examples of HGRAD elements.
- [ ] Pass unit tests.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2829NOX_Playa_Group.cpp Make Error2018-05-28T19:46:54ZJames WillenbringNOX_Playa_Group.cpp Make Error*Created by: xiaozhaolook*
<!---
the trilinos code has run cmake successfully, but when make about 87%, I got follow message and make process aborted! I can't fix these problem, anyone can help me?
/home/zhaoliang/trilinos/packages/Su...*Created by: xiaozhaolook*
<!---
the trilinos code has run cmake successfully, but when make about 87%, I got follow message and make process aborted! I can't fix these problem, anyone can help me?
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In constructor ‘NOX::NOXPlaya::Group::Group(const Playa::Vector<double>&, const Playa::NonlinearOperator<double>&, const Playa::LinearSolver<double>&)’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:71:70: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
xVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, DeepCopy))),
^
In file included from /home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.hpp:57:0,
from /home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:51:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Vector.hpp:75:7: note: because the following virtual functions are pure within ‘NOX::NOXPlaya::Vector’:
class Vector : public NOX::Abstract::Vector
^~~~~~
In file included from /home/zhaoliang/trilinos/packages/nox/src/NOX_Abstract_Group.H:54:0,
from /home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.hpp:53,
from /home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:51:
/home/zhaoliang/trilinos/packages/nox/src/NOX_Abstract_Vector.H:137:34: note: virtual NOX::Abstract::Vector& NOX::Abstract::Vector::random(bool, int)
virtual NOX::Abstract::Vector& random(bool useSeed = false, int seed = 1) = 0;
^~~~~~
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:72:71: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
fVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:73:76: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
newtonVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:74:78: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
gradientVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In constructor ‘NOX::NOXPlaya::Group::Group(const Playa::NonlinearOperator<double>&, const Playa::LinearSolver<double>&)’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:88:88: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
xVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, DeepCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:89:89: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
fVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:90:94: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
newtonVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:91:96: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
gradientVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In constructor ‘NOX::NOXPlaya::Group::Group(const Playa::Vector<double>&, const Playa::NonlinearOperator<double>&, const Playa::LinearSolver<double>&, int)’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:107:70: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
xVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, DeepCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:108:71: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
fVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:109:76: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
newtonVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:110:78: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
gradientVector(rcp(new NOX::NOXPlaya::Vector(initcond, precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In constructor ‘NOX::NOXPlaya::Group::Group(const Playa::NonlinearOperator<double>&, const Playa::LinearSolver<double>&, int)’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:125:88: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
xVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, DeepCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:126:89: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
fVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:127:94: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
newtonVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:128:96: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
gradientVector(rcp(new NOX::NOXPlaya::Vector(nonlinOp.getInitialGuess(), precision, ShapeCopy))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In copy constructor ‘NOX::NOXPlaya::Group::Group(const NOX::NOXPlaya::Group&, NOX::CopyType)’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:141:75: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
xVector(rcp(new NOX::NOXPlaya::Vector(*(source.xVector), precision, type))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:142:75: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
fVector(rcp(new NOX::NOXPlaya::Vector(*(source.fVector), precision, type))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:143:85: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
newtonVector(rcp(new NOX::NOXPlaya::Vector(*(source.newtonVector), precision, type))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:144:89: error: invalid new-expression of abstract class type ‘NOX::NOXPlaya::Vector’
gradientVector(rcp(new NOX::NOXPlaya::Vector(*(source.gradientVector), precision, type))),
^
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp: In member function ‘virtual NOX::Abstract::Group::ReturnType NOX::NOXPlaya::Group::computeF()’:
/home/zhaoliang/trilinos/packages/Sundance/Playa/src/NOX_Playa_Group.cpp:290:45: error: cannot allocate an object of abstract type ‘NOX::NOXPlaya::Vector’
*fVector = nonlinearOp.getFunctionValue();
^
packages/Sundance/Playa/src/CMakeFiles/sundancePlaya.dir/build.make:75: recipe for target 'packages/Sundance/Playa/src/CMakeFiles/sundancePlaya.dir/NOX_Playa_Group.cpp.o' failed
make[2]: *** [packages/Sundance/Playa/src/CMakeFiles/sundancePlaya.dir/NOX_Playa_Group.cpp.o] Error 1
CMakeFiles/Makefile2:20780: recipe for target 'packages/Sundance/Playa/src/CMakeFiles/sundancePlaya.dir/all' failed
make[1]: *** [packages/Sundance/Playa/src/CMakeFiles/sundancePlaya.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make: *** [all] Error 2
-->
<!---
Note that anything between these delimiters is a comment that will not appear
in the issue description once created. Click on the Preview tab to see what
everything will look like when you submit.
-->
<!---
Feel free to delete anything from this template that is not applicable to the
issue you are submitting.
-->
<!---
Replace <teamName> below with the appropriate Trilinos package/team name.
-->
@trilinos/<teamName>
<!---
Assignees: If you know anyone who should likely tackle this issue, select them
from the Assignees drop-down on the right.
-->
<!---
Lables: Choose any applicable package names from the Labels drop-down on the
right. Additionally, choose a label to indicate the type of issue, for
instance, bug, build, documentation, enhancement, etc.
-->
## Expectations
<!---
Tell us what you think should happen, how you think things should work, what
you would like to see in the documentation, etc.
-->
## Current Behavior
<!---
Tell us how the current behavior fails to meet your expectations in some way.
-->
## Motivation and Context
<!---
How has this expectation failure affected you? What are you trying to
accomplish? Why do we need to address this? What does it have to do with
anything? Providing context helps us come up with a solution that is most
useful in the real world.
-->
## Definition of Done
<!---
Tell us what needs to happen. If necessary, give us a task list along the
lines of:
- [ ] First do this.
- [ ] Then do that.
- [ ] Also this other thing.
-->
## Possible Solution
<!---
Not obligatory, but suggest a fix for the bug or documentation, or suggest
ideas on how to implement the addition or change.
-->
## Steps to Reproduce
<!---
Provide a link to a live example, or an unambiguous set of steps to reproduce
this issue. Include code to reproduce, if relevant.
1. Do this.
1. Do that.
1. Shake fist angrily at computer.
-->
## Your Environment
<!---
Include relevant details about your environment such that we can replicate this
issue.
-->
- **Relevant repo SHA1s:**
- **Relevant configure flags or configure script:**
- **Operating system and version:**
- **Compiler and TPL versions:**
## Related Issues
<!---
If applicable, let us know how this bug is related to any other open issues:
-->
* Blocks
* Is blocked by
* Follows
* Precedes
* Related to
* Part of
* Composed of
## Additional Information
<!---
Anything else that might be helpful for us to know in addressing this issue:
* Configure log file:
* Build log file:
* Test log file:
* When was the last time everything worked (date/time; SHA1s; etc.)?
* What did you do that made the bug rear its ugly head?
* Have you tried turning it off and on again?
-->
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2864Auto PR build failures due to Intel License server problems2018-06-01T18:11:28ZJames WillenbringAuto PR build failures due to Intel License server problems*Created by: bartlettroscoe*
CC: @trilinos/framework
## Expectations
Auto PR builds should be robust and not fail unless there is a failure in code itself.
## Current Behavior
The new Intel auto PR build fails randomly due ...*Created by: bartlettroscoe*
CC: @trilinos/framework
## Expectations
Auto PR builds should be robust and not fail unless there is a failure in code itself.
## Current Behavior
The new Intel auto PR build fails randomly due to Intel license server problems such as shown at:
* https://github.com/trilinos/Trilinos/pull/2860#issuecomment-393742327
which showed the build failure:
* https://testing-vm.sandia.gov/cdash/viewBuildError.php?buildid=3564874
which showed:
```
Error: A license for Comp-CL is not available now (-15,570,115).
A connection to the license server could not be made. You should
make sure that your license daemon process is running: both an
lmgrd process and an INTEL process should be running
if your license limits you to a specified number of licenses in use
at a time. Also, check to see if the wrong port@host or the wrong
license file is being used, or if the port or hostname in the license
file has changed.
License file(s) used were (in this order):
1. Trusted Storage
** 2. /projects/sems/install/rhel6-x86_64/sems/compiler/intel/17.0.1/base/Licenses/intel-Linux-SRN.lic
** 3. /projects/sems/install/rhel6-x86_64/sems/compiler/intel/17.0.1/base/compilers_and_libraries_2017.1.132/linux/bin/intel64/../../Licenses
** 4. /ascldap/users/trilinos/Licenses
```
The reason that I noticed this was because it happened in my PR #2860.
Looking at the current PR build history at:
* https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=63&value1=-test-Trilinos_pullrequest_intel_17.0.1&field2=buildstarttime&compare2=84&value2=2018-06-02&field3=buildstarttime&compare3=83&value3=2018-05-14
out of 10 builds, it failed twice with these Intel license server problems. That is an only 80% success rate so far. That is not robust enough for an auto PR build.
## Motivation and Context
Auto PR builds block what goes into the 'develop' branch and long delays make things harder.
## Definition of Done
Auto PR builds should only fail due to non-code issues very infrequently.
## Possible Solution
Don't know.
## Steps to Reproduce
Don't know.
## Your Environment
N.A. This is the auto PR builds that define their own env.
Improve productivity, stability, and quality of Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/2846Re-Enable Kokkos test in Intel PR Build2018-06-20T15:34:39ZJames WillenbringRe-Enable Kokkos test in Intel PR Build*Created by: prwolfe*
Kokkos has a failing unit test on the Intel 17.0.1 MPICH 3.2 build. This issue is to assure that when the issue if fixed we turn off the disable line for it.
@trilinos/framework
<!---
Assignees: If you kno...*Created by: prwolfe*
Kokkos has a failing unit test on the Intel 17.0.1 MPICH 3.2 build. This issue is to assure that when the issue if fixed we turn off the disable line for it.
@trilinos/framework
<!---
Assignees: If you know anyone who should likely tackle this issue, select them
from the Assignees drop-down on the right.
-->
<!---
Lables: Choose any applicable package names from the Labels drop-down on the
right. Additionally, choose a label to indicate the type of issue, for
instance, bug, build, documentation, enhancement, etc.
-->
## Current Behavior
<!---
Tell us how the current behavior fails to meet your expectations in some way.
-->
The test is turned off in PullRequestLinuxIntelTestingSettings.cmake via the line
`set (KokkosCore_UnitTest_Serial_MPI_1_DISABLE ON CACHE BOOL "Temporarily disabled in PR testing")
`
## Definition of Done
<!---
Tell us what needs to happen. If necessary, give us a task list along the
lines of:
- [ ] First do this.
- [ ] Then do that.
- [ ] Also this other thing.
-->
After the current tracking issues (#2793, kokkos/kokkos#1632) have been resolved remove the line and test that the PR is still clean.
* Is blocked by
#2793
kokkos/kokkos#1632
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2852Ifpack: inefficient and inconsistent implementation of Ifpack_Hypre2018-06-13T23:28:06ZJames WillenbringIfpack: inefficient and inconsistent implementation of Ifpack_Hypre*Created by: ecoon*
Ifpack_Hypre is inconsistent with documentation and design of Ifpack in general, and has several bugs that make it unusable with Hypre's `Hypre_BoomerAMGSetDofFunc()` option.
<!---
Replace <teamName> below with...*Created by: ecoon*
Ifpack_Hypre is inconsistent with documentation and design of Ifpack in general, and has several bugs that make it unusable with Hypre's `Hypre_BoomerAMGSetDofFunc()` option.
<!---
Replace <teamName> below with the appropriate Trilinos package/team name.
-->
@trilinos/Ifpack
## Current Behavior
<!---
Tell us how the current behavior fails to meet your expectations in some way.
-->
Ifpack_Hypre is inconsistent with documentation and design of Ifpack in general. Specifically, Ifpack's user guide states that (in regards to the general interface design and usage)`Initialize()` is work done on a symbolically assembled matrix, and uses only matrix structure (not values), while `Compute()` is work done an a fully assemble matrix, and relies on values. Then, `ApplyInverse()` does exactly that. Logically (and I assume this was the intended design), it should be possible to both:
1. call `ApplyInverse()` multiple times for the same values without re-calling `Compute()` (this is currently true)
2. call `Compute()` multiple times with different values but the same sparsity structure (this is not currently true)
Currently `Compute()` completely destroys the Hypre solver context, which is unnecessary/inefficient. Furthermore only `Initialize()` copies values from the Epetra Matrix to the Hypre Matrix, meaning that calling `Compute()` is not sufficient after values change -- instead `Initialize()` must also be called again (I would call this a correctness issue, at least within the constraints of the manual).
Additionally, there are bugs related to the use of `Hypre_SetDofFunc()`, which is necessary for solution block systems that are not strided (e.g. hybrid / mixed formulation discretizations). There are also several gotchas in Hypre's memory management in this use case which make it impossible to use through Ifpack currently (see https://github.com/LLNL/COGENT/blob/master/hypre-2.9.0b/src/parcsr_ls/HYPRE_parcsr_amg.c#L1140 which is still a "surprise" in the current code, even if this comment was deleted!)
## Expectations
I'm nearly done re-implementing this, and will provide a pull request for the code. I would like to get some help with running whatever existing tests are available to ensure correctness of my implementation if that's possible (hopefully your CI explores the Ifpack_Hypre path?). I'm happy to provide tests which show the current code's shortcomings, but these tests are currently embedded within a larger codebase, and I would need some help adapting them for Trilinos's testing harness.
Note this is also an issue in xSDK's Ifpack2_Hypre, which is effectively a copy/paste of the current Ifpack_Hypre implementation with matrix/vector interfaces updated to the Tpetra stack. It would be fairly easy to redo both of these at the same time if anyone from IDEAS-xSDK reads this.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2847Re-Enable MueLu tests in Intel PR Build2018-06-20T15:34:40ZJames WillenbringRe-Enable MueLu tests in Intel PR Build*Created by: prwolfe*
@trilinos/framework
## Expectations
All related tests should be functional for all PR builds
## Current Behavior
<!---
Tell us how the current behavior fails to meet your expectations in some way.
-->
...*Created by: prwolfe*
@trilinos/framework
## Expectations
All related tests should be functional for all PR builds
## Current Behavior
<!---
Tell us how the current behavior fails to meet your expectations in some way.
-->
4 tests are disabled in PullRequestLinuxIntelTestingSettings using
> set (MueLu_UnitTestsEpetra_MPI_1_DISABLE ON CACHE BOOL "Temporarily disabled in PR testing")
> set (MueLu_UnitTestsEpetra_MPI_4_DISABLE ON CACHE BOOL "Temporarily disabled in PR testing")
> set (MueLu_UnitTestsTpetra_MPI_1_DISABLE ON CACHE BOOL "Temporarily disabled in PR testing")
> set (MueLu_UnitTestsTpetra_MPI_4_DISABLE ON CACHE BOOL "Temporarily disabled in PR testing")
>
## Definition of Done
Verify that issue #2799 is closed, then remove the disables and confirm that the PR build functions properly.
* Is blocked by
#2799 https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2836Tpetra: Eliminate CUDA as direct TPL2018-06-12T23:36:56ZJames WillenbringTpetra: Eliminate CUDA as direct TPL*Created by: mhoemmen*
It should only come in through Kokkos.
@trilinos/tpetra *Created by: mhoemmen*
It should only come in through Kokkos.
@trilinos/tpetra https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2835Using Teuchos to send a data stored in Kokkos::View2018-05-29T20:52:52ZJames WillenbringUsing Teuchos to send a data stored in Kokkos::View*Created by: keitat*
I am looking for an example of message passing using Teuchos and Kokkos::View. Is there any example available in the Trillions source or document?*Created by: keitat*
I am looking for an example of message passing using Teuchos and Kokkos::View. Is there any example available in the Trillions source or document?https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2807Amesos-Pardiso: How do I change the number of threads?2018-05-24T16:43:07ZJames WillenbringAmesos-Pardiso: How do I change the number of threads?*Created by: freaklovesmango*
I am working with Pardiso and the examples provided by Amesos: https://github.com/trilinos/Trilinos/tree/master/packages/amesos/example
It works properly, but if I execute the compare-file, the time meas...*Created by: freaklovesmango*
I am working with Pardiso and the examples provided by Amesos: https://github.com/trilinos/Trilinos/tree/master/packages/amesos/example
It works properly, but if I execute the compare-file, the time measured for Pardiso is worse than for Klu. Since it should work parallel, I expect better values. I think, the problem is that in Amesos_Pardiso.cpp in https://github.com/trilinos/Trilinos/tree/master/packages/amesos/src the number of processes is only 1. I think I cannot change via ```omp_set_num_threads()```, the variable ``` num_procs ``` or alike.
Does anybody now how to change the number of process or how I really apply OpenMP to the program?
Thanks in advance :)https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2804MueLu/Ifpack2/Tpetra: Fix issues with RCPs associated with Teuchos::TimeMonit...2018-10-12T22:26:50ZJames WillenbringMueLu/Ifpack2/Tpetra: Fix issues with RCPs associated with Teuchos::TimeMonitor and Kokkos profiling tool integration*Created by: csiefer2*
Fix issues with RCPs associated with Teuchos::TimeMonitor and Kokkos profiling tool integration.
As per @mhoemmen :
FYI, @ibaned pointed out a difficulty over the phone today: Kokkos Profiling expects that r...*Created by: csiefer2*
Fix issues with RCPs associated with Teuchos::TimeMonitor and Kokkos profiling tool integration.
As per @mhoemmen :
FYI, @ibaned pointed out a difficulty over the phone today: Kokkos Profiling expects that regions always form a tree -- any two regions are either disjoint, or one fully encloses the other. However, MueLu's use of RCP to handle TimeMonitor breaks this assumption. For example:
RCP<TimeMonitor> mon = rcp (new TimeMonitor (*TimeMonitor::getNewCounter ("A")));
code_A_to_time ();
mon = rcp (new TimeMonitor (*TimeMonitor::getNewCounter ("B")));
code_B_to_time ();
// ... etc. ...
The problem is that the right-hand side of mon = rcp (new TimeMonitor (...)); gets evaluated first, and creates the second TimeMonitor, before the first TimeMonitor's destructor gets called. This means that the second TimeMonitor instance overlaps in time with the first one. @ibaned noticed that this can result in timing differences in practice -- just a few nanoseconds, but it breaks the tree invariant and thus causes some regions to have artificially small timings.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2812Trilinos Framework: Enable testing against installation builds2018-05-23T20:17:40ZJames WillenbringTrilinos Framework: Enable testing against installation builds*Created by: dridzal*
@trilinos/framework
## Expectations
There should be a capability in Trilinos to run tests and examples against an **installation build** (generated using `make install`). Additionally, nightly builds should i...*Created by: dridzal*
@trilinos/framework
## Expectations
There should be a capability in Trilinos to run tests and examples against an **installation build** (generated using `make install`). Additionally, nightly builds should include at least one installation build.
## Current Behavior
There is no active installation build testing. It appears that some infrastructure exists in Tribits, but it is neither used nor tested.
## Motivation and Context
We find out from _customers_ that our builds are broken. This is unacceptable.
## Definition of Done
(1) Installation build testing enabled and (2) performed nightly.
## Proposed Solution
(1) Develop infrastructure to enable marking of tests and examples with “INSTALLATION_TEST” or a similar label.
(2) Such tests should be able to use additional test-specific headers that are not part of the main installation.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2801Amesos: reference guide is out-of-date2018-05-22T17:49:56ZJames WillenbringAmesos: reference guide is out-of-date*Created by: jhux2*
@trilinos/amesos
The Amesos 2.0 Reference Guide in `amesos/doc/AmesosReferenceGuide` has out-of-date configure and build instructions. This may confuse users - see #2770. Other parts may be out of date as well. ...*Created by: jhux2*
@trilinos/amesos
The Amesos 2.0 Reference Guide in `amesos/doc/AmesosReferenceGuide` has out-of-date configure and build instructions. This may confuse users - see #2770. Other parts may be out of date as well.
Possible resolutions:
1. Delete the guide.
1. Put a warning at the beginning that parts of the guide are invalid.
1. Update the incorrect parts of the guide.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2796Ifpack2: Add a "please tell me if I messed up the parameters" flag2018-05-22T22:02:43ZJames WillenbringIfpack2: Add a "please tell me if I messed up the parameters" flag*Created by: mhoemmen*
@vbrunini requests the following Ifpack2 feature: Add an option to Ifpack2, which, if enabled, catches unknown parameters. That is, if the option is enabled, and if a user gives a parameter to a solver that the s...*Created by: mhoemmen*
@vbrunini requests the following Ifpack2 feature: Add an option to Ifpack2, which, if enabled, catches unknown parameters. That is, if the option is enabled, and if a user gives a parameter to a solver that the solver does not understand (e.g., because the user misspelled it), then the solver should report an error.
@trilinos/ifpack2 @trilinos/muelu
Ifpack2 solvers tend to ignore parameters they don't understand. This is a legacy Ifpack compatibility thing. The history is that ParameterList originally didn't have sublists, so people just had to shove everything into a single list. Hence Ifpack2's "namespaced" parameter names: "schwarz: ...", "ilut: ...", etc.
This behavior creates work for users. See e.g., Sierra Ticket 19152. Currently, the only way to diagnose failure to set some parameters, is to notice that linear solves aren't converging, compared with AztecOO solves. This particularly hinders use of more complicated but effective Ifpack2 solvers. For example, ticket 19152 concerns domain decomposition with ILUT preconditioning, with nondefault ILUT settings that are required for effective convergence.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/2799MueLu unit test fail on exit under MPICH3.22018-06-20T15:34:39ZJames WillenbringMueLu unit test fail on exit under MPICH3.2*Created by: prwolfe*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
The unit tests MueLu_...*Created by: prwolfe*
<!---
Provide a general summary of the issue in the Title above. If this issue
pertains to a particular package in Trilinos, it's worthwhile to start the
title with "PackageName: ".
-->
The unit tests MueLu_UnitTestsEpetra_MPI_1, MueLu_UnitTestsEpetra_MPI_4, MueLu_UnitTestsTpetra_MPI_1, and MueLu_UnitTestsTpetra_MPI_4 all run to completion and at exit fail with
> WARNING: Tpetra::Map destructor (~Map()) is being called after Kokkos::finalize() has been called. > This is user error! There are two likely causes:
> 1. You have a static Tpetra::Map (or RCP or shared_ptr of a Map)
> 2. You declare and construct a Tpetra::Map (or RCP or shared_ptr of a Tpetra::Map) at the same scope in main() as Kokkos::finalize() or Tpetra::finalize().
> Don't do either of these! Please refer to GitHib Issue #2372.
I do not see these under openmpi builds.
@trilinos/muelu
## Steps to Reproduce
<!---
Provide a link to a live example, or an unambiguous set of steps to reproduce
this issue. Include code to reproduce, if relevant.
1. Do this.
1. Do that.
1. Shake fist angrily at computer.
-->
This can be see from my test builds of the new PR template at https://testing-vm.sandia.gov/cdash/testDetails.php?test=45860155&build=3464065 and can be re-created by doing
```
source ${TrilinosSource}/cmake/std/sems/PullRequestIntel17.0.1TestingEnv.sh
cmake -C ${TrilinosSource}/cmake/std/PullRequestLinuxIntelTestingSettings.cmake -DTrilinos_ENABLE_MueLu=ON ${TrilinosSource}
cd pacakges/muelu/test/unit_tests
make
```
Note that I am also seeing this in zoltan2. Not sure if this is a shared issue or not. I am building in debug now to try and find the issue.