MiniTensor: issues blocking efficient GPU usage
Created by: ibaned
@lxmota This issue is here to collect all the different problems there would be with using MiniTensor in a runtime and memory efficient way on GPUs for the Alexa application. It is not urgent that these get solved, since we have an alternative approach besides MiniTensor that we can use for a while. Feel free to use this as the basis for improvements to MiniTensor at your discretion. If most of them get resolved, MiniTensor will then be useful to Alexa.
-
Unnecessary variables in objects.
sizeof(Tensor<double,3>)==96
, whilesizeof(double)*3*3==72
. The extra 24 bytes are for three different things:- The
size_
integer inStorage
, to support changing it at runtime. We would like a variant ofStorage
similar toDYNAMIC
instead it meansREALLY_STATIC
or something. - The
dimension_
integer inTensorBase
. One possible fix is to movedimension_
intoStorage
, so that theREALLY_STATIC
thing above could fix that as well. [DONE] - The
vtable
forTensorBase
. Because theTensorBase
destructor was declaredvirtual
, everyTensorBase
object has avtable
. This should only be needed if something like this is being done:TensorBase<...>* ptr = new Tensor<double,2>; delete ptr;
- The
-
All constructors fill components with
NaN
. While this is useful for debugging, we really need an option to construct aTensor
that does not set any values. Setting these values can be quite expensive if the lifetimes ofTensor
s are short. -
Calling non-
KOKKOS_INLINE_FUNCTION
s fromKOKKOS_INLINE_FUNCTION
s.- The biggest thing here is the use of
Teuchos::ScalarTraits
.Kokkos::ArithTraits
was created (I think) to replaceTeuchos::ScalarTraits
in GPU code. Please consider switching toKokkos::ArithTraits
- C++ standard library objects should not be used in GPU-callable code. This includes
std::pair
and definitely includesstd::vector
, both of which were involved in some MiniTensorKOKKOS_INLINE_FUNCTION
calls. - Likewise,
boost::tie
andboost::tuple
, actually anything from Boost, should not be called fromKOKKOS_INLINE_FUNCTION
code.
- The biggest thing here is the use of