Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2017-10-26T20:30:14Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/660Tpetra::CrsGraph: 2- or 3-level thread parallelization of sortAllIndices & me...2017-10-26T20:30:14ZJames WillenbringTpetra::CrsGraph: 2- or 3-level thread parallelization of sortAllIndices & mergeAllIndices*Created by: mhoemmen*
@trilinos/tpetra Do a 2-level or 3-level thread parallelization of Tpetra::CrsGraph methods sortAllIndices and mergeAllIndices.
This is a "story" because this may call for a thread-parallel segmented sort, or s...*Created by: mhoemmen*
@trilinos/tpetra Do a 2-level or 3-level thread parallelization of Tpetra::CrsGraph methods sortAllIndices and mergeAllIndices.
This is a "story" because this may call for a thread-parallel segmented sort, or segmented sort-and-merge.
Update (12 Nov 2016): I rewrote this issue to reflect a multiple-step process. See #832. The first step will be a single-level thread parallelization. The second step (likely done at the same time) will be to remove any implicit UVM assumptions that the methods may make. The third step would be this issue, a 2-level or 3-level parallelization that relies on a segmented sort (which does not exist yet; see #662).Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/628Tpetra::CrsGraph: Fuse column Map and Import construction2018-05-15T23:30:08ZJames WillenbringTpetra::CrsGraph: Fuse column Map and Import construction*Created by: mhoemmen*
@trilinos/tpetra Tpetra::CrsGraph constructs its column Map, if it doesn't already have one, in makeColMap(). This method needs information that could be reused to build the Import more efficiently, but CrsGraph ...*Created by: mhoemmen*
@trilinos/tpetra Tpetra::CrsGraph constructs its column Map, if it doesn't already have one, in makeColMap(). This method needs information that could be reused to build the Import more efficiently, but CrsGraph currently throws away this information. Here is a comment from the inside of makeColMap() that explains:
```
// FIXME (mfh 03 Apr 2013) Now would be a good time to use the
// information we collected above to construct the Import. In
// particular, building an Import requires:
//
// 1. numSameIDs (length of initial contiguous sequence of GIDs
// on this process that are the same in both Maps; this
// equals the number of domain Map elements on this process)
//
// 2. permuteToLIDs and permuteFromLIDs (both empty in this
// case, since there's no permutation going on; the column
// Map starts with the domain Map's GIDs, and immediately
// after them come the remote GIDs)
//
// 3. remoteGIDs (exactly those GIDs that we found out above
// were not in the domain Map) and remoteLIDs (which we could
// have gotten above by using the three-argument version of
// getRemoteIndexList() that computes local indices as well
// as process ranks, instead of the two-argument version that
// was used above)
//
// 4. remotePIDs (which we have from the getRemoteIndexList()
// call above)
//
// 5. Sorting remotePIDs, and applying that permutation to
// remoteGIDs and remoteLIDs (by calling sort3 above instead
// of sort2)
//
// 6. Everything after the sort3 call in Import::setupExport():
// a. Create the Distributor via createFromRecvs(), which
// computes exportGIDs and exportPIDs
// b. Compute exportLIDs from exportGIDs (by asking the
// source Map, in this case the domain Map, to convert
// global to local)
//
// Steps 1-5 come for free, since we must do that work anyway in
// order to compute the column Map. In particular, Step 3 is
// even more expensive than Step 6a, since it involves both
// creating and using a new Distributor object.
```
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/213Tpetra::CrsGraph::getLocalDiagOffsets: Use LocalOrdinal instead of size_t for...2017-10-26T19:43:25ZJames WillenbringTpetra::CrsGraph::getLocalDiagOffsets: Use LocalOrdinal instead of size_t for offsets*Created by: mhoemmen*
@trilinos/tpetra
This doesn't block #212, but is related. In particular, #205 makes it easier to use whatever offset type we want in Tpetra, since the proposed search code is agnostic of the offset type.
`Tpetr...*Created by: mhoemmen*
@trilinos/tpetra
This doesn't block #212, but is related. In particular, #205 makes it easier to use whatever offset type we want in Tpetra, since the proposed search code is agnostic of the offset type.
`Tpetra::CrsGraph::getLocalDiagOffsets` computes the offset of each row's diagonal entry. Tpetra uses this to speed up extracting the diagonal entries of a CrsMatrix, and the block diagonal entries of a BlockCrsMatrix.
`getLocalDiagOffsets` computes offsets _relative_ to each row. Thus, the type used to store offsets need only be able to represent the number of entries in a single row. If there are no duplicates in the row, `LocalOrdinal` thus suffices. If `LocalOrdinal` is 32 bits, this saves space and speeds up the computation.
There are no duplicates if the graph or matrix is fillComplete. Epetra never has duplicates in the graph, because insertion always merges; it never stores duplicates. #119 means that in Tpetra, it is technically possible to insert more duplicate entries in a row than the number of columns in the matrix. However, that is a weird edge case. Furthermore, there is no advantage to having a single type (currently `size_t`) for offsets, independent of other template parameters, because the main customer (Ifpack2) of `getLocalDiagOffsets` takes the same template parameters as Tpetra classes anyway.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/833Tpetra::CrsGraph::globalAssemble only needs to be called at first fillComplet...2017-10-26T20:42:13ZJames WillenbringTpetra::CrsGraph::globalAssemble only needs to be called at first fillComplete, if at all*Created by: mhoemmen*
@trilinos/tpetra
CrsGraph does not (currently) allow structure changes after first fillComplete, so there is no point in calling globalAssemble (with default all-reduce check) at subsequent fillComplete calls.*Created by: mhoemmen*
@trilinos/tpetra
CrsGraph does not (currently) allow structure changes after first fillComplete, so there is no point in calling globalAssemble (with default all-reduce check) at subsequent fillComplete calls.Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/682Tpetra::Crs{Graph,Matrix}: Add local_offset_type typedef2016-11-02T20:11:36ZJames WillenbringTpetra::Crs{Graph,Matrix}: Add local_offset_type typedef*Created by: mhoemmen*
@trilinos/tpetra Per request by @kddevin (see #674 discussion), add a `local_offset_type` typedef to Tpetra::CrsGraph and Tpetra::CrsMatrix. This type tells users the type that Tpetra uses to store row offsets, i...*Created by: mhoemmen*
@trilinos/tpetra Per request by @kddevin (see #674 discussion), add a `local_offset_type` typedef to Tpetra::CrsGraph and Tpetra::CrsMatrix. This type tells users the type that Tpetra uses to store row offsets, in the local sparse graph / matrix. The `local_` prefix makes it clear that this refers to the _local_ data structure.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/606Tpetra::Crs{Graph,Matrix}: Either remove getGlobalNumEntries & getGlobalMaxNu...2018-05-08T18:52:15ZJames WillenbringTpetra::Crs{Graph,Matrix}: Either remove getGlobalNumEntries & getGlobalMaxNumRowEntries, or make them global collectives*Created by: mhoemmen*
@trilinos/tpetra
@trilinos/amesos2 @trilinos/ifpack2 @trilinos/muelu @trilinos/xpetra @trilinos/zoltan2
Tpetra::CrsGraph and Tpetra::CrsMatrix provide three methods, getGlobalNumDiags, getGlobalNumEntries, a...*Created by: mhoemmen*
@trilinos/tpetra
@trilinos/amesos2 @trilinos/ifpack2 @trilinos/muelu @trilinos/xpetra @trilinos/zoltan2
Tpetra::CrsGraph and Tpetra::CrsMatrix provide three methods, getGlobalNumDiags, getGlobalNumEntries, and getGlobalMaxNumRowEntries. These methods do _not_ currently have collective semantics. Thus, it must be correct to call them at any time (when the graph / matrix is fillComplete), on any process. This implies that the graph / matrix must compute them via all-reduce at first fillComplete. This increases set-up cost.
Most users don't need to know the global (over all MPI processes in the communicator) number of entries, or diagonal entries. Some users might; for example, Amesos2 might want to know the global number of entries in order to prepare enough space to gather in the matrix for a direct solve. However, those users can do the all-reduce themselves, and save and reuse its result as part of the "symbolic factorization" set-up phase.
Thus, I think it would be good to deprecate and remove these methods. @csiefer2 talked about another option, namely to change the methods to have collective semantics that cache the value on first call and clear the cache at resumeFill (if the graph's structure can change). Please explain which option you would prefer here.
Edit (21 Dec 2016): I changed the issue title, to make clear the consequences of a fix, and edited the text a bit to give affected packages a chance to offer feedback for their preferred solution.Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/435Tpetra::CrsMatrix::apply: Don't copy entire source (multi)vector 2016-11-02T21:02:37ZJames WillenbringTpetra::CrsMatrix::apply: Don't copy entire source (multi)vector *Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::CrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication,...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::CrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication, then apply() copies the entire source (multi)vector, including the local entries. This only affects performance for unpreconditioned or weakly preconditioned iterative solves, and even then, not very much.
The usual case is that the domain and column Maps have all their local entries first on every participating process, and that the remote entries follow in the column Map. This case does not require copying the local entries. Instead, the remote entries could be Imported into a separate data structure, and the remote part of the mat-vec done separately. See also #439 for discussion of a more general fix.
This depends on #437 and #439.
This is related to #385, in that the same tech that fixes #385 would fix this issue. See discussion there.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/381Tpetra::CrsMatrix: Implement "DualView" (better: "dual view") semantics2017-10-30T19:55:07ZJames WillenbringTpetra::CrsMatrix: Implement "DualView" (better: "dual view") semantics*Created by: mhoemmen*
@trilinos/tpetra Tpetra::MultiVector already implements "dual view" semantics. We need the other modifiable Tpetra classes to implement this as well.
Tpetra::CrsMatrix needs to implement the following interface:...*Created by: mhoemmen*
@trilinos/tpetra Tpetra::MultiVector already implements "dual view" semantics. We need the other modifiable Tpetra classes to implement this as well.
Tpetra::CrsMatrix needs to implement the following interface:
- `sync<MemorySpace>()`: Sync _to_ the given memory space
- `modify<MemorySpace>()`: Mark data in the given memory space as modified
- `bool need_sync<MemorySpace>() const`: Do we need a sync _to_ the given memory space?
- `local_matrix_type getLocalMatrix<MemorySpace>() const`: Get the KokkosSparse::CrsMatrix living in the given memory space
Currently, `getLocalMatrix` is not templated, and returns the version of the data living in `Tpetra::CrsMatrix::device_type::memory_space`. Tpetra::CrsMatrix does not yet provide any of the other methods.
It does not make sense for Tpetra::CrsGraph to implement dual view semantics until that class' interface allows thread-parallel graph construction (other than by constructing a Kokkos::StaticCrsGraph in a thread-parallel way, and handing it to Tpetra::CrsGraph's constructor). Thus, this Tpetra::CrsMatrix issue only refers to the fixed graph case -- either when the matrix was created using a const graph, or after the matrix's first `fillComplete` call.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/385Tpetra::CrsMatrix: Overlap communication & computation in apply()2017-10-26T19:54:39ZJames WillenbringTpetra::CrsMatrix: Overlap communication & computation in apply()*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Overlap communication & computation in the apply() method of Tpetra::CrsMatrix, which implements sparse matrix-vector multiply.
This depends on #384 working for Tpetra::MultiV...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Overlap communication & computation in the apply() method of Tpetra::CrsMatrix, which implements sparse matrix-vector multiply.
This depends on #384 working for Tpetra::MultiVector, which in turn depends on #383.
If we fix #439, then we can fix this issue without needing to change CrsGraph semantics. In particular, CrsGraph could still compute its Import from the domain Map to the (entire, with locals too) column Map, and code that relies on this Import could work unchanged. Sparse matrix-vector multiply implementations, such as those in CrsMatrix and BlockCrsMatrix (see #424), could then do coarse-grained overlap as follows:
1. Start a nonblocking Import of the remotes
2. Import the locals (if necessary)
3. Do the local part of the mat-vec
4. Finish the nonblocking Import of the remotes
5. Do the remote part of the mat-vec (in place, in the row Map vector -- hence coarse-grained overlap)
6. Do an analogous procedure to overlap the Export, if an Export is needed (if row Map != range Map)
The "if necessary" remark on Step 2 relates to whether the domain Map is "fitted" to the column Map (see #437 for a definition of "fitted"). If so, the local entries of the input vector would not need to be copied (see #435 and #436). If not, they would need to be copied (and/or permuted), but this copy could be per process. For example, processes with the same number of local entries and no entries that need permutation, would not need to make a copy: the local part of the mat-vec could just take the original input (multi)vector pointer as input. (In fact, the domain Map need not even be fitted; that's sufficient but not necessary.)
This approach has the following benefits over one that uses a different Import than the CrsGraph's domain -> column Map Import:
1. CrsGraph's Import retains its current meaning
2. Neither the graph nor the matrix would need to compute a new Import object just for the remotes
3. This approach would work for any domain and column Maps, and in fact for any range and row Maps
In particular, this approach would work regardless of whether the domain Map is fitted to the column Map. The graph or matrix would not need to do any extra all-reduces to figure out if the Maps are fitted on all processes.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/958Tpetra::CrsMatrix: Possible bug in transposed apply2018-01-23T17:11:16ZJames WillenbringTpetra::CrsMatrix: Possible bug in transposed apply*Created by: mhoemmen*
@trilinos/tpetra
@ikalash reported the following:
> I am trying to use Tpetra::CrsMatrix apply method with a Teuchos::TRANS combine mode, and the method does not appear to be working correctly. I end up with...*Created by: mhoemmen*
@trilinos/tpetra
@ikalash reported the following:
> I am trying to use Tpetra::CrsMatrix apply method with a Teuchos::TRANS combine mode, and the method does not appear to be working correctly. I end up with a vector of zeros even though the operator is nonzero, nor is the input vector. If I set the combine mode to Teuchos::NO_TRANS, things work correctly. I assume Teuchos::TRANS has been tested, so perhaps I am doing something wrong, although I am not sure what it could be. Is there some caveat about the method’s usage with the TRANS combine mode? I printed the Boolean returned when calling hasTransposeApply() and it prints true.
@ikalash also sent data, which I'll post here.Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/627Tpetra::CrsMatrix: Store matrix in such a way as to allow overlap of communic...2017-10-26T20:25:13ZJames WillenbringTpetra::CrsMatrix: Store matrix in such a way as to allow overlap of communication & computation*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
This blocks #385.
For example, we could keep an extra row offsets array (where remotes start in each row) in CrsGraph.
*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
This blocks #385.
For example, we could keep an extra row offsets array (where remotes start in each row) in CrsGraph.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/448Tpetra::CrsMatrix: Use new KokkosKernels sumInto where possible2017-10-26T20:06:54ZJames WillenbringTpetra::CrsMatrix: Use new KokkosKernels sumInto where possible*Created by: mhoemmen*
@trilinos/tpetra See #369 and #447. When @bathmatt added sumIntoValuesSorted to KokkosSparse::CrsMatrix, there was some controversy about whether it was adequately tested. The best way to test it (and optimize i...*Created by: mhoemmen*
@trilinos/tpetra See #369 and #447. When @bathmatt added sumIntoValuesSorted to KokkosSparse::CrsMatrix, there was some controversy about whether it was adequately tested. The best way to test it (and optimize it) would be to make Tpetra::CrsMatrix use it, where that is possible.
This requires implementing the search "hint" (see #369 discussion). I'm doing this by changing those methods to call the existing and tested findRelOffset function. Tpetra::Crs{Graph, Matrix} already test this, and the implementation works even if the graph or matrix has never been fill complete.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/188Tpetra: Deprecate & remove nonmember constructors2016-11-02T19:47:48ZJames WillenbringTpetra: Deprecate & remove nonmember constructors*Created by: mhoemmen*
@trilinos/tpetra
Moved from Bugzilla Bug 6363: https://software.sandia.gov/bugzilla/show_bug.cgi?id=6363
I plan to get rid of the nonmember constructors for Tpetra objects.
Tpetra objects already have perfectl...*Created by: mhoemmen*
@trilinos/tpetra
Moved from Bugzilla Bug 6363: https://software.sandia.gov/bugzilla/show_bug.cgi?id=6363
I plan to get rid of the nonmember constructors for Tpetra objects.
Tpetra objects already have perfectly workable constructors. Multiple ways of doing the same thing increase the testing burden, and confuse users. Some of the nonmember constructors introduce build errors. For example, users will invoke the nonmember Map constructor that assumes the default Node type, but assign it to a Map with a nondefault Node type.
This bug relates to [Bugzilla] Bug 5863 [since closed as WONTFIX].
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/439Tpetra::DistObject: Add "locals only" and "remotes only" options to doImport ...2017-10-26T20:06:27ZJames WillenbringTpetra::DistObject: Add "locals only" and "remotes only" options to doImport and doExport*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
To the doImport and doExport methods of Tpetra::DistObject, add an option to {ex,im}port
- locals only,
- remotes only, or
- everything (what these methods currently do).
...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
To the doImport and doExport methods of Tpetra::DistObject, add an option to {ex,im}port
- locals only,
- remotes only, or
- everything (what these methods currently do).
This would let us overlap communication and computation in CrsMatrix or BlockCrsMatrix matrix-vector multiply, without needing to change CrsGraph semantics. In particular, CrsGraph could still compute its Import from the domain Map to the (entire, with locals too) column Map, and code that relies on this Import could work unchanged. Matrix-vector multiply implementations could then do coarse-grained overlap as follows:
1. Start a nonblocking Import of the remotes
2. Import the locals (if necessary)
3. Do the local part of the mat-vec
4. Finish the nonblocking Import of the remotes
5. Do the remote part of the mat-vec (in place, in the row Map vector -- hence coarse-grained overlap)
6. Do an analogous procedure to overlap the Export, if an Export is needed (if row Map != range Map)
The "if necessary" remark on Step 2 relates to whether the domain Map is "fitted" to the column Map (see #437 for a definition of "fitted"). If so, the local entries of the input vector would not need to be copied (see #435 and #436). If not, they would need to be copied, but this copy could be per process -- processes that don't need permutations wouldn't need to make a copy.
This approach has the following benefits over one that uses a different Import than the CrsGraph's domain -> column Map Import:
1. CrsGraph's Import retains its current meaning
2. Neither the graph nor the matrix would need to compute a new Import object just for the remotes
3. This approach would work for any domain and column Maps, and in fact for any range and row Maps. In particular, it would work regardless of whether the domain Map is fitted to the column Map. The graph or matrix would not need to do any extra all-reduces to figure out if the Maps are fitted on all processes.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/193Tpetra::DistObject::copyAndPermute implementations should respect CombineMode2016-11-02T19:48:25ZJames WillenbringTpetra::DistObject::copyAndPermute implementations should respect CombineMode*Created by: mhoemmen*
@trilinos/tpetra This was originally Bugzilla Bug 6141 ("DistObject::copyAndPermute does not use CombineMode"): https://software.sandia.gov/bugzilla/show_bug.cgi?id=6141
*Created by: mhoemmen*
@trilinos/tpetra This was originally Bugzilla Bug 6141 ("DistObject::copyAndPermute does not use CombineMode"): https://software.sandia.gov/bugzilla/show_bug.cgi?id=6141
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/384Tpetra::DistObject: Expose nonblocking versions of doExport & doImport2017-10-26T19:54:13ZJames WillenbringTpetra::DistObject: Expose nonblocking versions of doExport & doImport*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Expose nonblocking versions of doExport and doImport in Tpetra::DistObject. These comprise Tpetra's public interface for data redistribution, so this would expose nonblocking r...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Expose nonblocking versions of doExport and doImport in Tpetra::DistObject. These comprise Tpetra's public interface for data redistribution, so this would expose nonblocking redistribution to users.
Tpetra uses the DistObject methods doImport and doExport, called on MultiVector objects, to handle communication for sparse matrix-vector multiplication. Some preconditioners, such as those in Ifpack2, may also do so in their apply() methods. Thus, if we want to overlap communication and computation, the best place to start is by making doImport and doExport nonblocking.
In an interface sense, this is independent of #383. However, best performance benefits would come from fixing #383 first.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/383Tpetra::Distributor: Make doPosts nonblocking2019-01-24T01:21:09ZJames WillenbringTpetra::Distributor: Make doPosts nonblocking*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Tpetra::Distributor implements the MPI communication that happens in an Export or Import. It uses MPI 2-sided point-to-point communication. Its `doPosts` method starts the rec...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
Tpetra::Distributor implements the MPI communication that happens in an Export or Import. It uses MPI 2-sided point-to-point communication. Its `doPosts` method starts the receives and sends, and its `doWaits` method waits on them (`MPI_Waitall`). Receives are nonblocking (`MPI_Irecv`) and sends may be either blocking (various options, but only `MPI_Send` is used in practice) or nonblocking (`MPI_Isend`). However, sends default to blocking, and this is the only completely correct path. This is because of the so-called "slow path" of doPosts.
The "slow path" comes about when the indices in a send to a particular process aren't contiguous (i.e., are interrupted by data ~~from~~ [meant for]* other process(es)). The current implementation thus requires an intermediate pack buffer in that case. It allocates the extra buffer on the spot. In order to avoid holding on to that memory, the implementation forces blocking sends in that case (it throws `std::logic_error` otherwise).
Two fixes come to mind:
1. Keep the extra buffer. Keep it in the returned CommRequest so it doesn't get deallocated.
2. Pre-permute the data during packing (`DistObject::packAndPrepare`) so the slow path never gets invoked in practice.
The first fix is easier, but may be ultimately less performant.
The "slow path" occurs in both the 3-argument (fixed # packets per index, used by Vector and MultiVector) and 4-argument (variable # packets per index, used by CrsGraph, CrsMatrix, etc.) versions of doPosts. The 3-argument version matters most for solver performance, but it's easy to do both at the same time.
[*edit by jhux2 23-Jan-2019]Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/608Tpetra::Distributor: Split out execution of comm pattern into separate class 2017-10-26T20:19:36ZJames WillenbringTpetra::Distributor: Split out execution of comm pattern into separate class *Created by: mhoemmen*
@trilinos/tpetra
Tpetra::Distributor currently combines setting up a communication pattern, with executing that pattern. Methods for executing the communication pattern are templated on Packet type.
We might l...*Created by: mhoemmen*
@trilinos/tpetra
Tpetra::Distributor currently combines setting up a communication pattern, with executing that pattern. Methods for executing the communication pattern are templated on Packet type.
We might like to have different Distributor back-ends, in order to support communication protocols other than MPI 2-sided. (For example, we could have an MPI 1-sided implementation, or a PGAS implementation, or a Kokkos-wrapping-PGAS implementation.) This implies a base class with subclasses for the different implementations.
In order to make execution of a communication plan happen through virtual methods, we would need to template the class on Packet, not the methods. (Templated methods can't be virtual.) However, the setup code does _not_ depend on the Packet type. We would not want to build all that setup code redundantly for all Packet types (there are a lot of them!).
This suggests splitting the setup code into a separate class from the execution code.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/174Tpetra: Document 3-Map finite-element global assembly use pattern2016-11-02T19:46:18ZJames WillenbringTpetra: Document 3-Map finite-element global assembly use pattern*Created by: mhoemmen*
@trilinos/tpetra @amklinv The three Maps in question refer to mesh points. This is for a finite-element code where elements are uniquely owned by processes, but mesh points or other discretization goodies associa...*Created by: mhoemmen*
@trilinos/tpetra @amklinv The three Maps in question refer to mesh points. This is for a finite-element code where elements are uniquely owned by processes, but mesh points or other discretization goodies associated with elements may be shared by multiple processes. In the text below, I'll assume that degrees of freedom live on mesh points, but the same considerations apply for degrees of freedom that live on edges.
1. Uniquely owned (nonoverlapping Map), with mesh points that my MPI process owns
2. Overlapping Map, with mesh points belonging to elements that my MPI processes owns
3. Overlapping Map, with mesh points connected to points in (1) or (2)
Map (3) is the column Map of the sparse graph / matrix. Use replaceColumnMap if necessary. We will fill out this pattern in more detail in discussion of this issue. It has already proven useful for at least three different applications, two of which use BlockCrsMatrix, and two of which use CrsMatrix.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/61Tpetra::Experimental::GEMM: Fix for mode = "C"(onjugate Transpose)2017-10-26T19:34:46ZJames WillenbringTpetra::Experimental::GEMM: Fix for mode = "C"(onjugate Transpose)*Created by: mhoemmen*
@trilinos/tpetra @amklinv
GEMM currently only implements the Non-Transpose ("N") and Transpose ("T") modes, not the Conjugate Transpose ("C") mode. The GEMM interface has no way to return an error for "not impl...*Created by: mhoemmen*
@trilinos/tpetra @amklinv
GEMM currently only implements the Non-Transpose ("N") and Transpose ("T") modes, not the Conjugate Transpose ("C") mode. The GEMM interface has no way to return an error for "not implemented," and can't throw an exception, so we unfortunately do have to implement all the options.
Tpetra-backlog