Commit 56a7f9b7 authored by Mark Hoemmen's avatar Mark Hoemmen
Browse files

Moved KokkosLinAlg subpackage from Kokkos into Tpetra as TpetraKernels.

I moved the KokkosLinAlg subpackage of Kokkos into Tpetra, and renamed
it "TpetraKernels".  This move will facilitate moving Kokkos from the
Trilinos repository into its own Github repository.  I renamed it from
"LinAlg" because all of Tpetra implements "linear algebra."  The name
"Kernels" better suggests the contents: single-MPI-process,
thread-parallel computational kernels.

I also fixed downstream packages that depended on KokkosLinAlg.  This
includes both the package name (KokkosLinAlg -> TpetraKernels) and
various library names.

This commit mainly affects Kokkos and Tpetra.  I also had to make
minor changes to Isorropia, ShyLu, Stokhos, TrilinosCouplings, and
Xpetra.  In particular, Xpetra's "FakeKokkos" (KokkosClassic
replacement headers for when building with Tpetra disabled) caused
some bizarre build errors until I finally figured out that Xpetra was
using "Xpetra_ENABLE_Kokkos" to key on whether to use FakeKokkos'
headers.  This manifested as Tpetra files not getting the contents of
TpetraClassic files (like Kokkos_ConfigDefs.hpp), but only when
building in Xpetra.  I fixed this by making Xpetra key on
Xpetra_ENABLE_Tpetra, since the former KokkosClassic now lives in
Tpetra.

NOTE: The Kokkos refactor version of MueLu does not currently build
with ETI enabled, because of some missing explicit instantiations.
Here is the error message I get:

Linking CXX executable MueLu_Challenge_XML.exe
../../../src/libmuelu.so.11.13: undefined reference to `Ifpack2::AdditiveSchwarz<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> >, Ifpack2::Preconditioner<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > >::AdditiveSchwarz(Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&, int)'
../../../src/libmuelu.so.11.13: undefined reference to `Ifpack2::Details::OneLevelFactory<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > >::create(std::string const&, Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&) const'
../../../src/libmuelu.so.11.13: undefined reference to `Ifpack2::Chebyshev<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > >::getLambdaMaxForApply() const'
../../../src/libmuelu.so.11.13: undefined reference to `Ifpack2::Hiptmair<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > >::Hiptmair(Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&, Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&, Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&)'
../../../src/libmuelu.so.11.13: undefined reference to `Ifpack2::Krylov<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > >::Krylov(Teuchos::RCP<Tpetra::RowMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP> > const> const&)'
collect2: error: ld returned 1 exit status

Note that KokkosDeviceWrapperNode<Kokkos::OpenMP> is NOT the default
Node type in this build; the default Node type is
KokkosDeviceWrapperNode<Kokkos::Serial>.  As far as I can tell,
Ifpack2 only does ETI for the default Node type.  It could be that
MueLu is explicitly requesting ETI for a NON-default Node type, or it
could be that I messed up somewhere with the macros.  I'll have to
work a little bit harder at this, but I think it's OK to finish the
subpackage migration process first.
parent e17acde0
......@@ -47,6 +47,7 @@
#include "Tpetra_Import.hpp"
#include "Tpetra_Export.hpp"
#include "Tpetra_SrcDistObject.hpp"
#include <Kokkos_NodeAPIConfigDefs.hpp> // enum KokkosClassic::ReadWriteOption
#if TPETRA_USE_KOKKOS_DISTOBJECT
......
......@@ -47,6 +47,7 @@
#include "Tpetra_Import.hpp"
#include "Tpetra_Export.hpp"
#include "Tpetra_SrcDistObject.hpp"
#include <Kokkos_NodeAPIConfigDefs.hpp> // enum KokkosClassic::ReadWriteOption
// #ifndef HAVE_TPETRA_TRANSFER_TIMERS
// # define HAVE_TPETRA_TRANSFER_TIMERS 1
......
......@@ -42,12 +42,14 @@
#ifndef TPETRA_MAP_DECL_HPP
#define TPETRA_MAP_DECL_HPP
/// \file Tpetra_Map_decl.hpp
/// \brief Declarations for the Tpetra::Map class and related
/// nonmember constructors.
#include <Tpetra_ConfigDefs.hpp>
#include <Kokkos_DefaultNode.hpp>
#include <Teuchos_Describable.hpp>
// enums and defines
#include "Tpetra_ConfigDefs.hpp"
// mfh 27 Apr 2013: If HAVE_TPETRA_FIXED_HASH_TABLE is defined (which
// it is by default), then Map will used the fixed-structure hash
// table variant for global-to-local index lookups. Otherwise, it
......@@ -62,9 +64,6 @@
# define HAVE_TPETRA_FIXED_HASH_TABLE 1
#endif // HAVE_TPETRA_FIXED_HASH_TABLE
/// \file Tpetra_Map_decl.hpp
/// \brief Declarations for the Tpetra::Map class and related nonmember constructors.
///
namespace Tpetra {
#ifndef DOXYGEN_SHOULD_SKIP_THIS
......@@ -95,15 +94,15 @@ namespace Tpetra {
};
} // namespace Details
template<class Node>
Teuchos::RCP<Node> defaultArgNode() {
// Workaround function for a deferred visual studio bug
// http://connect.microsoft.com/VisualStudio/feedback/details/719847/erroneous-error-c2783-could-not-deduce-template-argument
// Use this function for default arguments rather than calling
// what is the return value below. Also helps in reducing
// duplication in various constructors.
return KokkosClassic::Details::getNode<Node>();
}
template<class Node>
Teuchos::RCP<Node> defaultArgNode() {
// Workaround function for a deferred visual studio bug
// http://connect.microsoft.com/VisualStudio/feedback/details/719847/erroneous-error-c2783-could-not-deduce-template-argument
// Use this function for default arguments rather than calling
// what is the return value below. Also helps in reducing
// duplication in various constructors.
return KokkosClassic::Details::getNode<Node> ();
}
/// \class Map
/// \brief Describes a parallel distribution of objects over processes.
......
......@@ -41,9 +41,10 @@
//@HEADER
*/
//Note this code lives only temporarily in Tpetra
//As soon as GEMM kernels exist in KokkosLinAlg and thus a depnedency on Teuchos
//can be eliminated the code will move to KokkosLinAlg.
// Note this code lives only temporarily in TpetraCore. As soon as
// GEMM kernels exist in the TpetraKernels subpackage, and thus a
// depnedency on Teuchos can be eliminated, the code will move to
// TpetraKernels.
#if defined(KOKKOS_MULTIVECTOR_H_) && defined(TPETRA_KOKKOS_REFACTOR_MULTIVECTOR_DEF_HPP)
......
nvcc -O3 -Xcompiler -fopenmp -x cu $1.cpp -o $1.cuda -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -lkokkosnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -lkokkoslinalg -lkokkoscompat -lkokkoscore -lkokkoscontainers -lhwloc -ltpi -lcuda -L$3/lib64/ -lcudart -I$4/include -L$4/lib -lmpi -lmpi_cxx -arch sm_35 --compiler-bindir icpc -DCOMPILE_CUDA -DKERNEL_PREFIX="__host__ __device__" -lineinfo -DKOKKOS_USE_CUDA_UVM
nvcc -O3 -Xcompiler -fopenmp -x cu $1.cpp -o $1.cuda -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -ltpetraclassicnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -ltpetrakernels -lkokkoscompat -lkokkoscore -lkokkoscontainers -lhwloc -ltpi -lcuda -L$3/lib64/ -lcudart -I$4/include -L$4/lib -lmpi -lmpi_cxx -arch sm_35 --compiler-bindir icpc -DCOMPILE_CUDA -DKERNEL_PREFIX="__host__ __device__" -lineinfo -DKOKKOS_USE_CUDA_UVM
mpicxx -O3 -mavx -fopenmp $1.cpp -o $1.omp -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -lkokkosnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -lkokkoslinalg -lkokkoscompat -lkokkoscore -lkokkoscontainers -lhwloc -ltpi -lcuda -I$3/include -L$3/lib64/ -lcudart -lcusparse
mpicxx -O3 -mavx -fopenmp $1.cpp -o $1.omp -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -ltpetraclassicnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -ltpetrakernels -lkokkoscompat -lkokkoscore -lkokkoscontainers -lhwloc -ltpi -lcuda -I$3/include -L$3/lib64/ -lcudart -lcusparse
mpicxx -O3 -mavx $1.cpp -o $1.threads -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -lkokkosnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -lkokkoscore -lkokkoscompat -lkokkoscontainers -lkokkoslinalg -lhwloc -ltpi
mpicxx -O3 -mavx $1.cpp -o $1.threads -I$2/include -I./ -L$2/lib -ltpetrainout -ltpetra -ltpetraclassicnodeapi -lteuchoscomm -lteuchosnumerics -lteuchosparameterlist -lteuchosremainder -lteuchoscore -lblas -lkokkoscore -lkokkoscompat -lkokkoscontainers -ltpetrakernels -lhwloc -ltpi
......@@ -9,7 +9,6 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
MultiVectorFiller_SerialNodeTest
COMM serial
STANDARD_PASS_OUTPUT
DEPLIBS kokkos kokkoslinalg kokkosnodeapi
)
IF (KokkosClassic_ENABLE_TBB AND Tpetra_ENABLE_MPI)
......@@ -18,7 +17,6 @@ IF (KokkosClassic_ENABLE_TBB AND Tpetra_ENABLE_MPI)
# SOURCES MultiVectorFiller_TbbTest
# COMM mpi
# STANDARD_PASS_OUTPUT
# DEPLIBS kokkos kokkoslinalg kokkosnodeapi
# )
ENDIF()
......@@ -28,7 +26,6 @@ IF (KokkosClassic_ENABLE_ThreadPool AND Tpetra_ENABLE_MPI)
# SOURCES MultiVectorFiller_ThreadPoolTest
# COMM mpi
# STANDARD_PASS_OUTPUT
# DEPLIBS kokkos kokkoslinalg kokkosnodeapi
# )
ENDIF()
......@@ -38,7 +35,6 @@ IF (KokkosClassic_ENABLE_Thrust AND KokkosClassic_ENABLE_CUDA_DOUBLE AND Tpetra_
# SOURCES MultiVectorFiller_ThrustTest
# COMM mpi
# STANDARD_PASS_OUTPUT
# DEPLIBS kokkos kokkoslinalg kokkosnodeapi
# )
ENDIF()
#
# Define the subpackage
#
TRIBITS_SUBPACKAGE(LinAlg)
TRIBITS_SUBPACKAGE(Kernels)
#
# Set up subpackage-specific configuration options
#
#
# "Optimization level" for KokkosLinAlg computational kernels. The
# "Optimization level" for TpetraKernels computational kernels. The
# higher the level, the more code variants get generated, and thus the
# longer the compile times. However, more code variants mean both
# better performance overall, and more uniform performance for corner
......@@ -17,7 +16,7 @@ TRIBITS_SUBPACKAGE(LinAlg)
#
TRIBITS_ADD_OPTION_AND_DEFINE( KokkosLinAlg_Opt_Level
KOKKOSLINALG_OPT_LEVEL
"Optimization level for KokkosLinAlg computational kernels: a nonnegative integer. Higher levels result in better performance that is more uniform for corner cases, but increase build time and library size. The default value is 1, which should give performance within ten percent of optimal on most platforms, for most problems."
"Optimization level for TpetraKernels computational kernels: a nonnegative integer. Higher levels result in better performance that is more uniform for corner cases, but increase build time and library size. The default value is 1, which should give performance within ten percent of optimal on most platforms, for most problems."
"1"
)
......
# FIXME (mfh 18 Dec 2014) MKL and CUSPARSE are optional TPLs, in the
# sense that one need not include the MKL resp. CUSPARSE header files
# if those TPLs aren't enabled. However, we should still list them as
# optional TPLs.
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
LIB_REQUIRED_PACKAGES KokkosCore KokkosContainers
TEST_REQUIRED_PACKAGES Gtest
)
#ifndef TPETRAKERNELS_CONFIG_H
#define TPETRAKERNELS_CONFIG_H
/*
* "Optimization level" for computational kernels in this subpackage.
* The higher the level, the more code variants get generated, and
* thus the longer the compile times. However, more code variants
* mean both better performance overall, and more uniform performance
* for corner cases.
*/
#define KOKKOSLINALG_OPT_LEVEL @KokkosLinAlg_Opt_Level@
#endif // TPETRAKERNELS_CONFIG_H
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment