MueLu: Proposal: merge CoalesceDrop and FilteredA factories

Created by: aprokop

@trilinos/muelu @jhux2 @tawiesn

Summary

Right now both factories do very similar things. Each one has a loop that goes through a matrix and filters some entries. The proposal combines those loops in one thus speeding up execution.

Description

The current CoalesceDropFactory is responsible for a single thing: constructing a (amalgamated) filtered graph that is later used in the aggregation. For the filtered scenario (when drop tolerance is not 0) it has a loop that goes through rows one by one and for each row entry determines whether to drop it (based on the original matrix or distance laplacian). It simultaneously constructs the compressed graph (LWGraph).

The construction of the the filtered matrix in the FilteredAFactory has two variants: 1) through reusing the graph of the original matrix and zeroing out entries; and 2) though construction a brand new matrix with the compressed graph. In both cases, the looping is done through rows and uses a filter array that helps to determine which matrix values to drop/zero out.

The loop in the FilteredAFactory is remarkably similar to the loop in the CoalesceDropFactory. In fact, the only difference is that in the later it constructs only rows and columns, and in the former it constructs values. The work is duplicated, and in fact it's even more expensive to go through the matrix the second time as we have to determine again which entries to filter.

Proposed solution

In the proposed solution, the CoalesceDropFactory (or its renamed version) will construct both LWGraph that is used in aggregation and the filtered matrix if desired. The FilteredAFactory goes away. The filtering loop in the factory will construct rows, columns, and values. For the block variant it will also construct coalesced rows and columns.

Benefits

Looping through the level matrix is done once potentially achieving significant speedup, especially when reusing matrix graph.

This would benefit applications that use MueLu with filtering.

Q&A

Are there issues with block systems?

So far, I don't see any issues. The block systems are treated by first filtering a block row, and the coalescing it. Constructing filtered matrix is independent of coalescing, though they both depend on the filtering.

What about a special non-lightweight graph branch in CoalesceDropFactory?

Tricky question. I don't have a good answer as I don't know what that does and what is special about it. @tawiesn ?

Does it benefit Kokkos version?

It should benefit Kokkos version the same way. In fact, it could be merged with another idea where we do not even construct the compressed graph for LWGraph_kokkos but rather provide a wrapper around the graph of the original matrix. This would eliminate the 2nd loop in the current CoalesceDropFactory_kokkos.

What are the steps to implement this?

I will start implementing it in the kokkos-refactor branch of MueLu for the scalar case. This can be done independently from the default branch through modifying the dependency tree in the ParameterListInterpreter. If the result demonstrate the feasibility and speedup using this approach, we could discuss backporting to the non-kokkos version.