Belos::PseudoBlockCGSolMgr (CG) does >= 3 all-reduces per iteration
Created by: mhoemmen
@trilinos/belos @hkthorn @amklinv @rstumin @jhux2
Belos' implementation of CG, PseudoBlockCGSolMgr, does at least 3 all-reduces per iteration. The algorithm itself, implemented in Belos::PseudoBlockCGIter::iterate, does 2 all-reduces per iteration. However, the status check (StatusTestGenResNorm) does an extra all-reduce (MvNorm), even if it is set to Implicit mode (see lines 497-506 of BelosStatusTestGenResNorm.hpp). This is because PseudoBlockCGIter::getNativeResiduals returns nonnull (line 190 of BelosPseudoBlockCGIter.hpp). This need not happen, because CG computes an implicit estimate of the residual norm, just like GMRES. (Belos::PseudoBlockGmresIter::getNativeResiduals correctly returns null; see line 521 of BelosPseudoBlockGmresIter.hpp.)
I'm mentioning @rstumin and @jhux2 because they mentioned tonight having noticed this performance issue before.