## MueLu: Proposal: merge CoalesceDrop and FilteredA factories

*Created by: aprokop*

@trilinos/muelu @jhux2 @tawiesn

**Summary**

Right now both factories do very similar things. Each one has a loop that goes through a matrix and filters some entries. The proposal combines those loops in one thus speeding up execution.

**Description**

The current `CoalesceDropFactory`

is responsible for a single thing: constructing a (amalgamated) filtered graph that is later used in the aggregation. For the filtered scenario (when drop tolerance is not 0) it has a loop that goes through rows one by one and for each row entry determines whether to drop it (based on the original matrix or distance laplacian). It simultaneously constructs the compressed graph (LWGraph).

The construction of the the filtered matrix in the `FilteredAFactory`

has two variants: 1) through reusing the graph of the original matrix and zeroing out entries; and 2) though construction a brand new matrix with the compressed graph. In both cases, the looping is done through rows and uses a `filter`

array that helps to determine which matrix values to drop/zero out.

The loop in the `FilteredAFactory`

is remarkably similar to the loop in the `CoalesceDropFactory`

. In fact, the only difference is that in the later it constructs only rows and columns, and in the former it constructs values. The work is duplicated, and in fact it's even more expensive to go through the matrix the second time as we have to determine *again* which entries to filter.

**Proposed solution**

In the proposed solution, the `CoalesceDropFactory`

(or its renamed version) will construct both `LWGraph`

that is used in aggregation **and** the filtered matrix if desired. The `FilteredAFactory`

goes away. The filtering loop in the factory will construct rows, columns, and values. For the block variant it will also construct coalesced rows and columns.

**Benefits**

Looping through the level matrix is done once potentially achieving significant speedup, especially when reusing matrix graph.

This would benefit applications that use MueLu with filtering.

**Q&A**

*Are there issues with block systems?*

So far, I don't see any issues. The block systems are treated by first filtering a block row, and the coalescing it. Constructing filtered matrix is independent of coalescing, though they both depend on the filtering.

*What about a special non-lightweight graph branch in CoalesceDropFactory*?

Tricky question. I don't have a good answer as I don't know what that does and what is special about it. @tawiesn ?

*Does it benefit Kokkos version?*

It should benefit Kokkos version the same way. In fact, it could be merged with another idea where we do not even construct the compressed graph for `LWGraph_kokkos`

but rather provide a wrapper around the graph of the original matrix. This would eliminate the 2nd loop in the current `CoalesceDropFactory_kokkos`

.

*What are the steps to implement this?*

I will start implementing it in the kokkos-refactor branch of MueLu for the scalar case. This can be done independently from the default branch through modifying the dependency tree in the `ParameterListInterpreter`

. If the result demonstrate the feasibility and speedup using this approach, we could discuss backporting to the non-kokkos version.