Tpetra: Fix BlockCrs unit test on CUDA.
Created by: kyungjoo-kim
Description
The new comm interface in Tpetra::BlockCrs fails unit tests on CUDA architectures. This PR address the problem. Using DualView, it is very very very tricky to track where it is modified or synced.
Especially, when we pass dual views through virtual interface, one can expect polymophic behaviors based on the derived classes. In such a case, input dual view should be synced and both device and host buffers are available for the function.
/// callee function can safely assume that inputs are available for both host and device
/// callee function make sure output view modify flags
/// - when this function is virtual, some function may modify host side and some may modify device
void CalleeFunction(const DualView<value_type*>& in_a,
DualView<value_type*> &out_b,
ArrayView<value_type> &out_c) {
// callee function does not sync inputs but check if the data is available or not
assert(!in_a.need_sync_device());
// this function can choose whether it uses host algorithm or device algorithm. e.g.,
out_b.modify_device();
parallel_for(device_range_policy, []() { doSomethingDevice(in_a.view_device(), out_b.view_device(); );
// input can be also used for host algorithm.
assert(!in_a.need_sync_host());
parallel_for(host_range_policy, []() { doSomethingHost(in_a.view_host(), out_c); });
}
void CallerFunction() {
DualView<value_type*> in_a;
DualView<value_type*> out_b;
DualView<value_type*> out_c;
// make sure in_a is synced for both
in_a.sync_host(); in_a.sync_device();
// out_b is output and its modification flag will be adjusted in the callee function
// out_c host view is extracted from the dual view; its modification flag should be adjusted in the caller
auto out_c_av = getArrayViewFromHost(out_c);
out_c.modify_host();
CalleeFunction(in_a, out_b, out_c_av);
}
Maybe all people who use the dualviews already follow the above practice. It is something new to me.
Related Issues
#4257 (closed) #4162
How Has This Been Tested?
Following the instruction described in #4257 (closed), I reproduced the error and confirm that all unit tests are passed with this PR.