MueLu: Proposal: use special form of the tentative matrix for smoothed prolongator construction
Created by: aprokop
@trilinos/muelu @jhux2
Summary
Tentative factory produces a matrix of special form. This special form can be used to speed up matrix-matrix multiply procedure. For example, when nullspace is a constant vector, multiplying by tentative matrix is equivalent to averaging columns of matrix A. In addition, the tentative matrix can be stored in its "multivector" form, with columns compacted to multivectors as the aggregates do not overlap.
Description
Let us first start with the situation where we do construct the QR factorization.
In this case, let us store the constructed Q factors in a single multivector. The multivector is based based on the same map that the current Ptent matrix is on, i.e. the row map of A. The multiplication of A by P could be seen to be similar to matrix-vector multiplication with the main difference being that instead of summation of products into a single value, we instead put product pairs into the bins corresponding to aggregates. To do that, we need to be able to translate local column ids to global aggregates ids (ids in the coarseMap
). This may require an import of integer vector.
Q&A
What changes if we do not want to do QR decomposition?
In this case, the TentativePFactory will set the "P" multivector to be the nullspace".
What is the expected performance improvement?
Ideally, we would skip the global tentative matrix construction, and instead just construct a local multivector and then do import (and, if domain map of A is the same as row map, we already have the Import object for it). In the SaPFactory, the hope is that the matrix-vector multiply-like procedure is about 2-3 times as expensive as regular matvec and is significantly cheaper than the full matrix-matrix multiplication. We should be able to do it using just local indices and parallelize over threads. There will be an additional cost of compression, as we don't know the number of nonzeros per row in the final matrix.
How to optimally do the matrix-vector like procedure?
An open question. There are some similarities with sortAndMerge
procedure in Tpetra, so it may be possible to learn from that.