Skip to content

Tpetra: Fix BlockCrs unit test on CUDA.

James Willenbring requested to merge kyungjoo-kim:tpetra-develop into develop

Created by: kyungjoo-kim

Description

The new comm interface in Tpetra::BlockCrs fails unit tests on CUDA architectures. This PR address the problem. Using DualView, it is very very very tricky to track where it is modified or synced.

Especially, when we pass dual views through virtual interface, one can expect polymophic behaviors based on the derived classes. In such a case, input dual view should be synced and both device and host buffers are available for the function.

/// callee function can safely assume that inputs are available for both host and device
/// callee function make sure output view modify flags
///   - when this function is virtual, some function may modify host side and some may modify device 
void CalleeFunction(const DualView<value_type*>& in_a, 
                                   DualView<value_type*> &out_b,
                                   ArrayView<value_type> &out_c) {
  // callee function does not sync inputs but check if the data is available or not 
  assert(!in_a.need_sync_device());
  // this function can choose whether it uses host algorithm or device algorithm. e.g.,
  out_b.modify_device();
  parallel_for(device_range_policy, []() { doSomethingDevice(in_a.view_device(), out_b.view_device(); );  
  // input can be also used for host algorithm. 
   assert(!in_a.need_sync_host()); 
   parallel_for(host_range_policy, []() { doSomethingHost(in_a.view_host(), out_c); });
}
void CallerFunction() {
  DualView<value_type*> in_a;
  DualView<value_type*> out_b;
  DualView<value_type*> out_c;
  // make sure in_a is synced for both
  in_a.sync_host(); in_a.sync_device();
  // out_b is output and its modification flag will be adjusted in the callee function
  // out_c host view is extracted from the dual view; its modification flag should be adjusted in the caller
  auto out_c_av = getArrayViewFromHost(out_c);
  out_c.modify_host();
  CalleeFunction(in_a, out_b, out_c_av);
}

Maybe all people who use the dualviews already follow the above practice. It is something new to me.

Related Issues

#4257 (closed) #4162

How Has This Been Tested?

Following the instruction described in #4257 (closed), I reproduced the error and confirm that all unit tests are passed with this PR.

Merge request reports

Loading