Make Teuchos Memory Management Classes thread-safe
Created by: bartlettroscoe
This story is to address the long-standing problem that the Teuchos Memory management Classes which use reference-counting are not thread safe.
CC: @MicheldeMessieres, @jwillenbring,
Next Action Status: See tasks ...
Tasks:
- Initial development and testing for multi-thread correctness [Done]
-
Add configure time switch for thread safety: Define configure-time options
Trilinos_ENABLE_THREAD_SAFE
andTeuchos_ENABLE_THREAD_SAFE
(latter is given the given the default of the former value). -
Turn off for Trilinos_ENABLE_CXX11=OFF: That is, set
Teuchos_ENABLE_THREAD_SAFE=OFF
in this case. Run full Trilinos test suite with-DTrilinos_ENABLE_CXX11=OFF
. -
Update the Teuchos test suite:
-
Inform CTest of number of threads for thread-safe tests: Figure this out at configure time and then set
NUM_TOTAL_CORES_USED
(see [TRIBITS_ADD_TEST())(https://tribits.org/doc/TribitsDevelopersGuide.html#formal-arguments-tribits-add-test)) -
Make pre-push
BASIC
test suite fast: Make the longer running threading testsNIGHTLY
.
-
Inform CTest of number of threads for thread-safe tests: Figure this out at configure time and then set
-
Performance testing:
- For builds:
-
-DCMAKE_BUILD_TYPE=RELEASE -DTrilinos_ENABLE_DEBUG=ON
(Trilinos_ENABLE_THREAD_SAFE
on and off) -
-DCMAKE_BUILD_TYPE=RELEASE -DTrilinos_ENABLE_DEBUG=OFF
(Trilinos_ENABLE_THREAD_SAFE
on and off)
-
- For compilers:
- GCC version 4.8.x .
- Intel version 15.x
- Clang X
- Run Trilinos (nearly full) test suite with and without thread-safety turned on.
- Run Nalu, Albany, and Drekar test suites with thread safety on and off and see the performance impact with debug-mode checking turned on.
- Request report from Cedric about usage and performance.
- If performance okay, continue. Otherwise, decide what to do.
- For builds:
-
Disallow throwing exceptions from destructors: We just need to disallow exceptions and make Teuchos MM classes abort in destructors when errors occur. Update unit tests for the case of circular references and exceptions. Need to provide
TEUCHOS_ABORT_IF(<condition>)
that will print and then call abort. -
Merge into develop branch with Trilinos_ENABLE_THREAD_SAFE=OFF by default:
- Update teuchos/ReleaseNotes.txt to discuss exception destructor difference.
- Announce time schedule for turning this on by default.
-
Update documentation / Code review:
- Update unit test documentation: With final tests in place, will create a uniformly formatted summary for each in code to describe it’s purpose.
- Update RCP documentation: Need to update RCP documents to reflect these changes
- Ross reviews code, tests, and updated documentation.
-
Turn on Trilinos_ENABLE_THREAD_SAFE=OFF by default:
- Update teuchos/ReleaseNotes.txt
- Send out announcement
-
Other considerations and improvements: (move to new stories?)
- Review Array.h mutex implementation: This was new code I added after our last review to make Array.h thread safe - I have implemented suggested tests we discussed on Github.
- Discuss plan for debug detection of dangling weak ptr. Debug builds have checks to validate weak ptrs but those checks can fail if another thread kills the data. I’ve got tests in place which detect and demonstrate this issue but need to discuss further how we would like to address this.
- Consider additional changes for ArrayView, ArrayRCP, Tuple, Ptr: Implemented fairly limited sanity checks on these.
- Weak to strong conversion: Have code in place which implements thread safe upgrade of a weak ptr to a strong ptr, along with a unit test, but the role of this is unclear at the moment.
- Make tests have inverted case: Tests should demonstrate they can detect thread problems when the fix is not applied - the inverted case. I’ve got some #defines set up to do this but wanted to discuss how to best organize those. Many of the inverted tests will need separate main functions.