Trilinos issueshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues2017-08-07T02:09:35Zhttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/229Make Teuchos Memory Management Classes thread-safe2017-08-07T02:09:35ZJames WillenbringMake Teuchos Memory Management Classes thread-safe*Created by: bartlettroscoe*
This story is to address the long-standing problem that the Teuchos Memory management Classes which use reference-counting are not thread safe.
CC: @MicheldeMessieres, @jwillenbring,
**Next Action Status:...*Created by: bartlettroscoe*
This story is to address the long-standing problem that the Teuchos Memory management Classes which use reference-counting are not thread safe.
CC: @MicheldeMessieres, @jwillenbring,
**Next Action Status:** See tasks ...
**Tasks:**
1. Initial development and testing for multi-thread correctness [Done]
2. **Add configure time switch for thread safety:** Define configure-time options `Trilinos_ENABLE_THREAD_SAFE` and `Teuchos_ENABLE_THREAD_SAFE` (latter is given the given the default of the former value).
3. **Turn off for Trilinos_ENABLE_CXX11=OFF**: That is, set `Teuchos_ENABLE_THREAD_SAFE=OFF` in this case. Run full Trilinos test suite with `-DTrilinos_ENABLE_CXX11=OFF`.
4. **Update the Teuchos test suite:**
- **Inform CTest of number of threads for thread-safe tests:** Figure this out at configure time and then set `NUM_TOTAL_CORES_USED` (see [TRIBITS_ADD_TEST())(https://tribits.org/doc/TribitsDevelopersGuide.html#formal-arguments-tribits-add-test))
- **Make pre-push `BASIC` test suite fast:** Make the longer running threading tests `NIGHTLY`.
5. **Performance testing:**
- For builds:
- `-DCMAKE_BUILD_TYPE=RELEASE -DTrilinos_ENABLE_DEBUG=ON` (`Trilinos_ENABLE_THREAD_SAFE` on and off)
- `-DCMAKE_BUILD_TYPE=RELEASE -DTrilinos_ENABLE_DEBUG=OFF` (`Trilinos_ENABLE_THREAD_SAFE` on and off)
- For compilers:
- GCC version 4.8.x .
- Intel version 15.x
- Clang X
- Run Trilinos (nearly full) test suite with and without thread-safety turned on.
- Run Nalu, Albany, and Drekar test suites with thread safety on and off and see the performance impact with debug-mode checking turned on.
- Request report from Cedric about usage and performance.
- If performance okay, continue. Otherwise, decide what to do.
6. **Disallow throwing exceptions from destructors:** We just need to disallow exceptions and make Teuchos MM classes abort in destructors when errors occur. Update unit tests for the case of circular references and exceptions. Need to provide `TEUCHOS_ABORT_IF(<condition>)` that will print and then call abort.
7. **Merge into develop branch with Trilinos_ENABLE_THREAD_SAFE=OFF by default**:
- Update teuchos/ReleaseNotes.txt to discuss exception destructor difference.
- Announce time schedule for turning this on by default.
8. **Update documentation / Code review:**
- Update unit test documentation: With final tests in place, will create a uniformly formatted summary for each in code to describe it’s purpose.
- Update RCP documentation: Need to update RCP documents to reflect these changes
- Ross reviews code, tests, and updated documentation.
9. **Turn on Trilinos_ENABLE_THREAD_SAFE=OFF by default:**
- Update teuchos/ReleaseNotes.txt
- Send out announcement
10. **Other considerations and improvements:** (move to new stories?)
1. **Review Array.h mutex implementation:** This was new code I added after our last review to make Array.h thread safe - I have implemented suggested tests we discussed on Github.
2. **Discuss plan for debug detection of dangling weak ptr.** Debug builds have checks to validate weak ptrs but those checks can fail if another thread kills the data. I’ve got tests in place which detect and demonstrate this issue but need to discuss further how we would like to address this.
3. **Consider additional changes for ArrayView, ArrayRCP, Tuple, Ptr**: Implemented fairly limited sanity checks on these.
4. **Weak to strong conversion:** Have code in place which implements thread safe upgrade of a weak ptr to a strong ptr, along with a unit test, but the role of this is unclear at the moment.
5. **Make tests have inverted case**: Tests should demonstrate they can detect thread problems when the fix is not applied - the inverted case. I’ve got some #defines set up to do this but wanted to discuss how to best organize those. Many of the inverted tests will need separate main functions.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/353Tpetra: Make CrsMatrix MatrixMarket I/O memory scalable2017-02-02T06:30:16ZJames WillenbringTpetra: Make CrsMatrix MatrixMarket I/O memory scalable*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769.
MatrixMarket input and output (I/O) for Tpetra::CrsMatrix currently gathers the whole matrix to Process 0 before writing it. This is not memory scalable. "Memory scalable" means ...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769.
MatrixMarket input and output (I/O) for Tpetra::CrsMatrix currently gathers the whole matrix to Process 0 before writing it. This is not memory scalable. "Memory scalable" means that no single process has to store more than a small constant factor times the maximum per-process memory usage of the distributed data structure. See also #352 and #1017.
Tasks:
- [x] #1018: Write functions for packing and unpacking matrix triples (i, j, A(i,j)).
- [x] #1025: Implement DistObject subclass that communicates by triples, using the above functions.
- [ ] Rewrite CrsMatrix reader in the following way:
- [x] #1031: Deal out chunks of triples to all the processes, in card-dealer fashion.
- [ ] Push all the triples into the aforementioned DistObject subclass.
- [ ] Do an Export of the subclass to the desired 1-to-1 Map. (If we wanted to be fancy, we could let CrsMatrix be the direct target of that Export.)
- [ ] As the Matrix Market reader already does, redistribute again if the original desired row Map is not 1-to-1.Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/1017Tpetra: Make CrsGraph MatrixMarket I/O memory scalable2017-01-20T18:18:36ZJames WillenbringTpetra: Make CrsGraph MatrixMarket I/O memory scalable*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769
MatrixMarket input and output (I/O) for Tpetra::CrsGraph currently gathers the whole matrix to Process 0 before writing it. This is not memory scalable. "Memory scalable" means th...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769
MatrixMarket input and output (I/O) for Tpetra::CrsGraph currently gathers the whole matrix to Process 0 before writing it. This is not memory scalable. "Memory scalable" means that no single process has to store more than a small constant factor times the maximum per-process memory usage of the distributed data structure. See also #352 and #353.https://gitlab.osti.gov/jmwille/Trilinos/-/issues/901Automated testing with docker containers2016-12-06T21:33:39ZJames WillenbringAutomated testing with docker containers*Created by: tawiesn*
This story is to collect all ideas for setting up a checkin/testing environment based on docker containers. The idea is to develop Trilinos on platforms different than Linux but make sure that all tests run on sele...*Created by: tawiesn*
This story is to collect all ideas for setting up a checkin/testing environment based on docker containers. The idea is to develop Trilinos on platforms different than Linux but make sure that all tests run on selected Linux platforms before checkin the changes to the Trilinos repository.
@jwillenbring @maherou https://gitlab.osti.gov/jmwille/Trilinos/-/issues/352Tpetra: Make MatrixMarket (Multi)Vector input memory scalable2016-11-02T21:05:25ZJames WillenbringTpetra: Make MatrixMarket (Multi)Vector input memory scalable*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769.
MatrixMarket input for `Tpetra::MultiVector` and `Tpetra::Vector` currently gathers the whole (multi)vector to Process 0 before reading it. This is not memory scalable. "Memory s...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #769.
MatrixMarket input for `Tpetra::MultiVector` and `Tpetra::Vector` currently gathers the whole (multi)vector to Process 0 before reading it. This is not memory scalable. "Memory scalable" means that no single process has to store more than a small constant factor times the maximum per-process memory usage of the distributed data structure.
MatrixMarket _output_ for Tpetra::MultiVector and Tpetra::Vector _is_ memory scalable. In fact, it uses a clever pipelining scheme to overlap MPI communication and file writes.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/435Tpetra::CrsMatrix::apply: Don't copy entire source (multi)vector 2016-11-02T21:02:37ZJames WillenbringTpetra::CrsMatrix::apply: Don't copy entire source (multi)vector *Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::CrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication,...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::CrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication, then apply() copies the entire source (multi)vector, including the local entries. This only affects performance for unpreconditioned or weakly preconditioned iterative solves, and even then, not very much.
The usual case is that the domain and column Maps have all their local entries first on every participating process, and that the remote entries follow in the column Map. This case does not require copying the local entries. Instead, the remote entries could be Imported into a separate data structure, and the remote part of the mat-vec done separately. See also #439 for discussion of a more general fix.
This depends on #437 and #439.
This is related to #385, in that the same tech that fixes #385 would fix this issue. See discussion there.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/436Tpetra::BlockCrsMatrix::apply: Don't copy entire source (multi)vector2016-11-02T21:02:29ZJames WillenbringTpetra::BlockCrsMatrix::apply: Don't copy entire source (multi)vector*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::Experimental::BlockCrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally req...*Created by: mhoemmen*
@trilinos/tpetra
Epic: #767.
If the number of MPI process in a Tpetra::Experimental::BlockCrsMatrix's communicator is greater than 1, and if sparse matrix-vector multiply with that matrix would normally require communication, then apply() copies the entire source (multi)vector, including the local entries.
The usual case is that the domain and column Maps have all their local entries first on every participating process, and that the remote entries follow in the column Map. This case does not require copying the local entries. Instead, the remote entries could be Imported into a separate data structure, and the remote part of the mat-vec done separately.
Fixing this relates to #385. See discussion there. See also #435.
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/317Amesos2 support for Tpetra Block CRS matrices2016-11-02T19:49:30ZJames WillenbringAmesos2 support for Tpetra Block CRS matrices*Created by: cihanuq*
I'd like to flag interest for direct solver support for (experimental) block matrices in Tpetra. This would be very useful for geophysical inversion applications where iterative solvers do not work (well), and wher...*Created by: cihanuq*
I'd like to flag interest for direct solver support for (experimental) block matrices in Tpetra. This would be very useful for geophysical inversion applications where iterative solvers do not work (well), and where each (FEM) mesh node has multiple degrees of freedom.
There was a comment in the mailing list which suggested that there is a way to 'unroll' the block matrix and pass the resulting regular matrix to Amesos2. I could not find a way to do so, however.
https://gitlab.osti.gov/jmwille/Trilinos/-/issues/193Tpetra::DistObject::copyAndPermute implementations should respect CombineMode2016-11-02T19:48:25ZJames WillenbringTpetra::DistObject::copyAndPermute implementations should respect CombineMode*Created by: mhoemmen*
@trilinos/tpetra This was originally Bugzilla Bug 6141 ("DistObject::copyAndPermute does not use CombineMode"): https://software.sandia.gov/bugzilla/show_bug.cgi?id=6141
*Created by: mhoemmen*
@trilinos/tpetra This was originally Bugzilla Bug 6141 ("DistObject::copyAndPermute does not use CombineMode"): https://software.sandia.gov/bugzilla/show_bug.cgi?id=6141
Tpetra-backloghttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/354Selectively use the pImpl idiom to reduce Trilinos build and rebuild times2016-09-19T17:50:53ZJames WillenbringSelectively use the pImpl idiom to reduce Trilinos build and rebuild times*Created by: bartlettroscoe*
The selective usage of the [pImpl idiom](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0ahUKEwjm8Krc-tfMAhUX0GMKHQJ9BrgQFggcMAA&url=http%3A%2F%2Fc2.com%2Fcgi%2Fwiki...*Created by: bartlettroscoe*
The selective usage of the [pImpl idiom](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0ahUKEwjm8Krc-tfMAhUX0GMKHQJ9BrgQFggcMAA&url=http%3A%2F%2Fc2.com%2Fcgi%2Fwiki%3FPimplIdiom&usg=AFQjCNHM-qnOOCUddXU2YBNGJYIux0hYdw&sig2=r2MKmj1MmDq8n7-cuHhY3Q&bvm=bv.122129774,d.cGc) can dramatically reduce build times for C++. We could use this in any class where inline functions that see the implementation are not needed. The overhead for most classes/objects is very low.
CC: @mhoemmen
Reduce build times for Trilinoshttps://gitlab.osti.gov/jmwille/Trilinos/-/issues/421Add support for C++14, C++17, etc.2016-06-07T16:19:15ZJames WillenbringAdd support for C++14, C++17, etc.*Created by: bartlettroscoe*
**CC:** @hcedwar, @etphipp, @maherou
**Relates To:** TriBITSPub/TriBITS#127
**Description:**
This Story is to add support for C++14 and C++17 (and support future C++ standards). Interest for doing this...*Created by: bartlettroscoe*
**CC:** @hcedwar, @etphipp, @maherou
**Relates To:** TriBITSPub/TriBITS#127
**Description:**
This Story is to add support for C++14 and C++17 (and support future C++ standards). Interest for doing this was expressed at a recent Trilinos customer meeting at Sandia labs.
What interested Trilinos developers need to provide are:
1. Trial programs for C++14 and C++17 that should only work if that given C++ standard is supported
2. The list of compiler options to try on various compilers to try to turn on C++14 and C++17
The details for doing this are given in TriBITSPub/TriBITS#127. Detailed discussion should take place in that TriBITS Issue ticket, not in this Trilinos Issue ticket.