Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2016-12-14T20:33:53Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/908Small dense matrix inversion2016-12-14T20:33:53ZJames WillenbringSmall dense matrix inversion*Created by: bathmatt*
@mhoemmen I can see this being somewhere for your block crs matrix inversion.
Is there a class where I can do a small dense matrix inversion? Example, I can grab a sub-view out of a kokkos 2D view and call it ...*Created by: bathmatt*
@mhoemmen I can see this being somewhere for your block crs matrix inversion.
Is there a class where I can do a small dense matrix inversion? Example, I can grab a sub-view out of a kokkos 2D view and call it a matrix and compute its inverse?
Thanks
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/786Ifpack2: Linking error, duplicate symbols, with TpetraKernels_ENABLE_Experime...2016-12-09T05:00:29ZJames WillenbringIfpack2: Linking error, duplicate symbols, with TpetraKernels_ENABLE_Experimental=ON*Created by: bathmatt*
Is anyone else getting this??
4062/5246] Linking CXX executable packages/stratimikos/test/Stratimikos_issue_535.exe
FAILED: packages/stratimikos/test/Stratimikos_issue_535.exe
: && /projects/sems/install...*Created by: bathmatt*
Is anyone else getting this??
4062/5246] Linking CXX executable packages/stratimikos/test/Stratimikos_issue_535.exe
FAILED: packages/stratimikos/test/Stratimikos_issue_535.exe
: && /projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/openmpi/1.10.1/bin/mpicxx -std=c++11 -fopenmp -g -O0 packages/stratimikos/test/CMakeFiles/Stratimikos_issue_535.dir/test_issue_535.cpp.o -o packages/stratimikos/test/Stratimikos_issue_535.exe -rdynamic packages/stratimikos/src/libstratimikos.a packages/stratimikos/adapters/belos/src/libstratimikosbelos.a packages/stratimikos/adapters/aztecoo/src/libstratimikosaztecoo.a packages/stratimikos/adapters/amesos/src/libstratimikosamesos.a packages/stratimikos/adapters/ml/src/libstratimikosml.a packages/stratimikos/adapters/ifpack/src/libstratimikosifpack.a packages/ifpack2/adapters/libifpack2-adapters.a packages/ifpack2/src/libifpack2.a packages/thyra/adapters/tpetra/src/libthyratpetra.a packages/triutils/src/libtriutils.a packages/ml/src/libml.a packages/zoltan/src/libzoltan.a -lm packages/ifpack/src/libifpack.a packages/amesos/src/libamesos.a packages/belos/tpetra/src/libbelostpetra.a packages/belos/epetra/src/libbelosepetra.a packages/belos/src/libbelos.a packages/thyra/adapters/tpetra/src/libthyratpetra.a packages/aztecoo/src/libaztecoo.a packages/thyra/adapters/epetraext/src/libthyraepetraext.a packages/thyra/adapters/epetra/src/libthyraepetra.a packages/epetraext/src/libepetraext.a packages/triutils/src/libtriutils.a packages/thyra/core/src/libthyracore.a packages/rtop/src/librtop.a packages/tpetra/core/ext/libtpetraext.a packages/tpetra/core/inout/libtpetrainout.a packages/tpetra/core/src/libtpetra.a packages/epetra/src/libepetra.a packages/tpetra/kernels/src/libtpetrakernels.a packages/kokkos/algorithms/src/libkokkosalgorithms.a packages/kokkos/containers/src/libkokkoscontainers.a packages/tpetra/classic/LinAlg/libtpetraclassiclinalg.a packages/tpetra/classic/NodeAPI/libtpetraclassicnodeapi.a packages/tpetra/classic/src/libtpetraclassic.a packages/teuchos/kokkoscomm/src/libteuchoskokkoscomm.a packages/teuchos/kokkoscompat/src/libteuchoskokkoscompat.a packages/teuchos/remainder/src/libteuchosremainder.a packages/teuchos/numerics/src/libteuchosnumerics.a /usr/lib64/liblapack.so.3 /usr/lib64/libblas.so.3 packages/teuchos/comm/src/libteuchoscomm.a packages/teuchos/parameterlist/src/libteuchosparameterlist.a packages/teuchos/core/src/libteuchoscore.a /projects/sems/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/5.3.0/base/lib/libboost_program_options.so /projects/sems/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/5.3.0/base/lib/libboost_system.so packages/kokkos/core/src/libkokkoscore.a /usr/lib64/libdl.so -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lquadmath -Wl,-rpath,/projects/sems/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/5.3.0/base/lib && :
packages/ifpack2/src/libifpack2.a(Ifpack2_Relaxation_OpenMP.cpp.o): In function `KokkosKernels::Experimental::Util::endswith(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
/net/fs02eppic/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/base/include/c++/5.3.0/bits/char_traits.h:258: multiple definition of `KokkosKernels::Experimental::Util::endswith(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
packages/ifpack2/src/libifpack2.a(Ifpack2_Details_OneLevelFactory_OpenMP.cpp.o):/net/fs02eppic/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/base/include/c++/5.3.0/bits/char_traits.h:258: first defined here
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/759KokkosKernels unit tests' dependency to kokkos gtest build2017-04-13T18:45:04ZJames WillenbringKokkosKernels unit tests' dependency to kokkos gtest build*Created by: mndevec*
Kokkos is using gtest for the unit tests. I think we talked briefly that we will use the same unit test harness for KokkosKernels, rather than introducing a dependency on Teuchos.
I am trying to add unit tests ...*Created by: mndevec*
Kokkos is using gtest for the unit tests. I think we talked briefly that we will use the same unit test harness for KokkosKernels, rather than introducing a dependency on Teuchos.
I am trying to add unit tests that use gtest, assuming that kokkos builds gtest and I can access the prebuild kokkos_gtest library. However, it is only compiled when Kokkos test is enabled.
I don't know an elegant way to handle this, I was wondering how this should be handled.
@srajama1 @mhoemmen @crtrott @ambrad @kyungjoo-kim
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/701Why does TriBITS' ETI add Nodes to list of manglings / typedefs, even when I ...2016-10-12T17:32:39ZJames WillenbringWhy does TriBITS' ETI add Nodes to list of manglings / typedefs, even when I don't want them there?*Created by: mhoemmen*
I'm using the ETI system for KokkosKernels. I don't want the manglings and typedefs to include Node types, because those don't exist in KokkosKernels. I just want to strip those out. How do I do that? I even c...*Created by: mhoemmen*
I'm using the ETI system for KokkosKernels. I don't want the manglings and typedefs to include Node types, because those don't exist in KokkosKernels. I just want to strip those out. How do I do that? I even cleared out list_of_manglings and eti_typedefs at the top of tpetra/kernels/cmake/ExplicitInstantiationSupport.cmake, and the Node types came back!!!
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/700KokkosKernels: Add simplified kernels for integer Scalar types2016-11-04T05:30:13ZJames WillenbringKokkosKernels: Add simplified kernels for integer Scalar types*Created by: mhoemmen*
@trilinos/zoltan2 @trilinos/tpetra
Key words: build size, build time
Many Trilinos users have complained about long build times and large library and executable sizes. One of the biggest sources of this in...*Created by: mhoemmen*
@trilinos/zoltan2 @trilinos/tpetra
Key words: build size, build time
Many Trilinos users have complained about long build times and large library and executable sizes. One of the biggest sources of this in the Tpetra solver stack, is the large number of template parameter combinations for which Tpetra classes get instantiated. For example, we build all of Tpetra for Scalar = int and Scalar = GlobalOrdinal for EVERY enabled GlobalOrdinal type, as well as for the usual Scalar types like double and std::complex<double>.
Use of integer Scalar types seems a little weird. In most cases where Tpetra or downstream Trilinos packages use integer Scalar types, they use them for communication (as the source or target of an Export or Import), not for computation. This could justify refactoring Tpetra's class hierarchy into integer and non-integer "branches." However, I had a conversation with Michael Wolf about Zoltan2's needs. He explained that for some computations of metrics, Zoltan2 does sparse matrix-vector multiplies with integer Scalar types. This means that we really do need to compute with integer Scalar types. However, we don't need highly optimized kernels for integer Scalar types, as far as I know.
This suggests that we could address the problem at the KokkosKernels level, by falling back to simple kernels for integer Scalar types. This issue proposes to do just that. The kernels still need to be thread parallel, and must use CUDA appropriately. However, they don't need such heavy optimization. We can write simple one-level parallelism, for example.
Here are some build directory size statistics, for the Trilinos/packages/tpetra build directory after `make clean` and `make`, with no examples or tests enabled. I used GCC 4.7.2 on Linux, and enabled Scalar = std::complex<double>. Otherwise, I only use default settings for enabled types. (Default enabled LocalOrdinal type is int. Default enabled GlobalOrdinal types are int and long long.) __STATIC builds indicate static libraries; otherwise, I use dynamic libraries. *_DEBUG_ builds have Kokkos and Teuchos debugging features (e.g., bounds checking) turned on; __RELEASE_ builds have these debugging features turned off. I enabled only the Kokkos::OpenMP version of Tpetra (this should generate more code than the Kokkos::Serial version).
- MPI_DEBUG: 2.1 G
- MPI_DEBUG_STATIC: 11 G
- MPI_RELEASE: 187 M
- MPI_RELEASE_STATIC: 2.3 G
Do you see why we recommend dynamic libraries? ;-)
Correction: My MPI_RELEASE build is Kokkos::Serial only.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/691KokkosKernels: One kernel instantiation for both CudaSpace and CudaUVMSpace2016-10-05T19:11:59ZJames WillenbringKokkosKernels: One kernel instantiation for both CudaSpace and CudaUVMSpace*Created by: mhoemmen*
This depends on https://github.com/kokkos/kokkos/issues/290. (See also https://github.com/kokkos/kokkos/issues/290 , marked as redundant.)
Once the above Kokkos issue is fixed, we'll be able to assign from a Cud...*Created by: mhoemmen*
This depends on https://github.com/kokkos/kokkos/issues/290. (See also https://github.com/kokkos/kokkos/issues/290 , marked as redundant.)
Once the above Kokkos issue is fixed, we'll be able to assign from a CudaUVMSpace View to a CudaSpace View. This will let us just instantiate kernels for CudaSpace.
@trilinos/tpetra
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/667KokkosKernels: Use existing macros for TPLs like MKL2017-09-01T19:25:43ZJames WillenbringKokkosKernels: Use existing macros for TPLs like MKL*Created by: mhoemmen*
@trilinos/tpetra
I'm looking in tpetra/kernels/src/stage/graph/impl/KokkosKernels_SPGEMM_mkl_impl.hpp. I notice a macro KERNELS_HAVE_MKL. This is some macro that gets defined somewhere by hand in the code. It...*Created by: mhoemmen*
@trilinos/tpetra
I'm looking in tpetra/kernels/src/stage/graph/impl/KokkosKernels_SPGEMM_mkl_impl.hpp. I notice a macro KERNELS_HAVE_MKL. This is some macro that gets defined somewhere by hand in the code. It is not plugged into the CMake build system at all. KokkosKernels already has a macro, HAVE_TPETRAKERNELS_MKL. Please use that instead. "KERNELS_HAVE_MKL" is too general of a name; it is likely to collide with other software library's macro.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/666KokkosKernels: Fix sparse matrix-matrix multiply error handling2016-11-02T17:34:54ZJames WillenbringKokkosKernels: Fix sparse matrix-matrix multiply error handling*Created by: mhoemmen*
@trilinos/tpetra
For example, if the user requests a TPL (such as CUSP), but that TPL is not installed, the implementation should throw an exception or return an error code, not just print some error message to ...*Created by: mhoemmen*
@trilinos/tpetra
For example, if the user requests a TPL (such as CUSP), but that TPL is not installed, the implementation should throw an exception or return an error code, not just print some error message to stderr. I see a similar approach to error handling when the memory space is wrong (e.g., lines 67-80 of KokkosKernels_SPGEMM_mkl_impl.hpp).
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/662KokkosKernels: Add segmented sort / sort-and-merge2016-10-26T04:10:08ZJames WillenbringKokkosKernels: Add segmented sort / sort-and-merge*Created by: mhoemmen*
See #660 for a use case. Tpetra::Crs{Graph,Matrix}::fillComplete currently needs segmented sort-and-merge, though a fix for #119 would remove the "-and-merge" requirement.
Thrust doesn't have anything like this....*Created by: mhoemmen*
See #660 for a use case. Tpetra::Crs{Graph,Matrix}::fillComplete currently needs segmented sort-and-merge, though a fix for #119 would remove the "-and-merge" requirement.
Thrust doesn't have anything like this. stable_sort_by_key() just does what Tpetra::sort2 currently does, namely apply the implicit permutation resulting from sorting keys, to a corresponding array of values.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/568Tpetra: proposed changes to handle processes with zero rows2016-09-14T17:19:30ZJames WillenbringTpetra: proposed changes to handle processes with zero rows*Created by: allaffa*
@allaffa @jhux2 @trilinos/tpetra @rstumin
I am currently working on a project at Sandia about multigrid preconditioners where some tasks have no matrix rows. This has revealed two places where KokkosKernels and T...*Created by: allaffa*
@allaffa @jhux2 @trilinos/tpetra @rstumin
I am currently working on a project at Sandia about multigrid preconditioners where some tasks have no matrix rows. This has revealed two places where KokkosKernels and Tpetra throw an exception.
I would like to propose a change of the conditions to throw the exception, checking also that the number of rows be different from zero.
See attached .txt files.
[0005-Tpetra-added-check-on-rows-in-exception.txt](https://github.com/trilinos/Trilinos/files/425981/0005-Tpetra-added-check-on-rows-in-exception.txt)
[0007-Kokkos-added-check-on-rows-in-exception.txt](https://github.com/trilinos/Trilinos/files/425980/0007-Kokkos-added-check-on-rows-in-exception.txt)
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/583Ifpack2: Plug in HTS for thread-parallel sparse triangular solve 2017-09-28T01:43:23ZJames WillenbringIfpack2: Plug in HTS for thread-parallel sparse triangular solve *Created by: mhoemmen*
@trilinos/ifpack2 @trilinos/tpetra
*Created by: mhoemmen*
@trilinos/ifpack2 @trilinos/tpetra
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/447KokkosKernels: CrsMatrix sumIntoValuesSorted minor questions2016-06-19T20:49:20ZJames WillenbringKokkosKernels: CrsMatrix sumIntoValuesSorted minor questions*Created by: mhoemmen*
@bathmatt 's commit https://github.com/trilinos/Trilinos/commit/1e65cffac05ae95bbc4e6f0d6d4d428886704834 added a sumIntoValuesSorted method to KokkosSparse::CrsMatrix. I have a few comments and questions:
1. We s...*Created by: mhoemmen*
@bathmatt 's commit https://github.com/trilinos/Trilinos/commit/1e65cffac05ae95bbc4e6f0d6d4d428886704834 added a sumIntoValuesSorted method to KokkosSparse::CrsMatrix. I have a few comments and questions:
1. We should introduce the "hint" that Epetra and Tpetra use for optimizing search for multiple column indices. It introduces an extra branch per input index, but avoids search for common cases. @etphipp first implemented it in Tpetra and found it to be useful, and `findRelOffset` (in tpetra/core/src/Tpetra_Util.hpp) does it too.
2. It's legit to use `ordinal_type` (32-bit) instead of `size_type` (64-bit on everything but CUDA) for the difference between two consecutive row offsets, as long as the row doesn't have too many duplicate entries. SparseRowView(Const) already uses `ordinal_type` for the row length, for this reason.
3. Was there a particular reason for the `hi - low > 10` cut-off, or is that just a good guess?
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/226cuda build of Kokkos_Sparse_MV_impl_spmv takes 25 minutes for 1 file.2016-05-18T19:44:14ZJames Willenbringcuda build of Kokkos_Sparse_MV_impl_spmv takes 25 minutes for 1 file.*Created by: bathmatt*
Can we split up the build of this file? This is all on hansen with
-bash-4.1$ module load devpack/openmpi/1.10.0/gcc/4.8.4/cuda/7.5.18
The build of
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeF...*Created by: bathmatt*
Can we split up the build of this file? This is all on hansen with
-bash-4.1$ module load devpack/openmpi/1.10.0/gcc/4.8.4/cuda/7.5.18
The build of
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV_impl_spmv_Cuda.cpp.o
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o
is very slow under cuda/debug (not -G though). By very slow, 25 minutes.
-bash-4.1$ time make -j
[ 0%] Built target kokkoscore
[ 0%] Built target kokkosalgorithms
[ 0%] Built target kokkoscontainers
[ 33%] Built target teuchoscore
[ 66%] Built target teuchosparameterlist
[ 66%] Built target teuchoscomm
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV_impl_spmv_Cuda.cpp.o
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o
[ 66%] Linking CXX static library libtpetrakernels.a
[100%] Built target tpetrakernels
real 24m16.402s
user 29m35.614s
sys 0m58.005s
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/208build of Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o fails if you use nvcc and ha...2016-03-19T08:19:49ZJames Willenbringbuild of Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o fails if you use nvcc and have cuda disabled*Created by: bathmatt*
If I don't configure with cuda but still have OMPI_CXX=nvcc_wrapper I get the following. This is on hansen
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV...*Created by: bathmatt*
If I don't configure with cuda but still have OMPI_CXX=nvcc_wrapper I get the following. This is on hansen
[ 66%] Building CXX object packages/tpetra/kernels/src/CMakeFiles/tpetrakernels.dir/impl/Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o
/home/mbetten/Trilinos/Trilinos/packages/tpetra/kernels/src/impl/Kokkos_Sparse_impl_spmv.hpp(885): error: namespace "Kokkos" has no member "shfl_down"
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/178Tpetra BCRS: Thread-parallelize sparse matrix-vector multiply2016-06-02T15:57:50ZJames WillenbringTpetra BCRS: Thread-parallelize sparse matrix-vector multiply*Created by: mhoemmen*
@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv
Thread-parallelize the sparse matrix-vector multiply in the apply() method of Tpetra::Experimental::BlockCrsMatrix. Please interact with Ryan E...*Created by: mhoemmen*
@trilinos/tpetra @trilinos/ifpack2 @crtrott @kyungjoo-kim @amklinv
Thread-parallelize the sparse matrix-vector multiply in the apply() method of Tpetra::Experimental::BlockCrsMatrix. Please interact with Ryan Eberhardt, who has an excellent CUDA implementation for column-major blocks.
It would be wise to do this in two passes. First, add a simple host execution space parallelization using a lambda. Then, implement an optimized kernel, using Ryan's as a start.
This affects Ifpack2 as well as Tpetra, because for Jacobi with > 1 sweep, Ifpack2 uses sparse mat-vec.