unexpected (and silent) sumIntoGlobalValues failure
Created by: nschloe
@trilinos/tpetra
I have a larger application code in which I build up a matrix by looping over the edges of a mesh and sum values into the matrix for each of the two end points. I found that some indices aren't correctly inserted when running on more than one processes.
I could boil it down to
#include <Teuchos_DefaultComm.hpp>
#include <Tpetra_CrsMatrix.hpp>
int main ( int argc, char *argv[] )
{
Teuchos::GlobalMPISession session(&argc, &argv, NULL);
// Initialize MPI
auto comm = Teuchos::DefaultComm<int>::getComm();
// build map in which process K owns row K
std::vector<int> ids = {comm->getRank()};
auto map = Teuchos::rcp(new Tpetra::Map<int>(-1, ids, 0, comm));
const int globalSize = map->getGlobalNumElements();
// insert indices from all processes
auto graph = Teuchos::rcp(new Tpetra::CrsGraph<int, int>(map, map, 0));
for (int i = 0; i < globalSize; i++) {
for (int j = 0; j < globalSize; j++) {
graph->insertGlobalIndices(i, Teuchos::tuple(j));
}
}
graph->fillComplete();
// fill matrix
auto A = Teuchos::rcp(new Tpetra::CrsMatrix<double,int,int>(graph));
A->setAllToScalar(0.0);
for (size_t k = 0; k < globalSize; k++) {
const int num = A->sumIntoGlobalValues(
k,
Teuchos::tuple(0, 1),
Teuchos::tuple(0.33, 1.55)
);
std::cout << "added from @" << comm->getRank() << " in row " << k << ": "
<< num
<< std::endl;
}
A->fillComplete();
return EXIT_SUCCESS;
}
When run on two procs, I'm getting
$ mpiexec -n 2 ./test
added from @0 in row 0: 1
added from @0 in row 1: 2
added from @1 in row 0: 2
added from @1 in row 1: 1
I would have expected that two entries are inserted for each sumIntoGlobalValues
. The insertGlobalIndices
I thought makes sure that the corresponding "slots" are present in the respective processes. Apparently, that's not the case.
When checking out the actual values after A->fillComplete()
with
Teuchos::Array<int> cols(100);
Teuchos::Array<double> vals(100);
size_t numss;
for (size_t k = 0; k < globalSize; k++) {
A->getGlobalRowCopy(k, cols, vals, numss);
std::cout << "@" << comm->getRank()
<< ", row " << k
<< ", num columns " << numss
<< ", columns ";
for (size_t kk = 0; kk < numss; kk++) {
std::cout << cols[kk] << " ";
}
std::cout << ", values ";
for (size_t kk = 0; kk < numss; kk++) {
std::cout << vals[kk] << " ";
}
std::cout << std::endl;
}
I'm getting
@0, row 0, num columns 2, columns -1 0 , values 0 0.66
@0, row 1, num columns 0, columns , values
@1, row 0, num columns 0, columns , values
@1, row 1, num columns 2, columns -1 1 , values 0 3.1
On a side node, is there a way to make Tpetra throw when this happens? Having to check the return value to assert correct operation seems a little Epetry.