Teuchos: Teuchos::send/receive not handling a large message
Created by: seheracer
Bug Report
@trilinos/teuchos
Description
Error when communicating a very long message (consisting of 400 million long long values) using Teuchos::send and Teuchos::receive. When Teuchos::send and Teuchos::receive are replaced by MPI_Send and MPI_Recv, the respective message is successfully communicated with a warning.
Steps to Reproduce
The code to reproduce the bug: (2 MPI ranks)
#include <Teuchos_DefaultMpiComm.hpp>
#include <Teuchos_CommHelpers.hpp>
int main (int argc, char *argv[])
{
typedef long long count_type;
typedef long long packet_type;
MPI_Init(&argc, &argv);
Teuchos::MpiComm<count_type> comm (MPI_COMM_WORLD);
count_type length = 400000000;
if(comm.getRank() == 0) {
packet_type val = -1;
Teuchos::ArrayRCP<packet_type> array_to_send(length, val);
Teuchos::send<count_type, packet_type>(comm, length, array_to_send.getRawPtr(), 1);
//MPI_Send(array_to_send.getRawPtr(), length, MPI_LONG_LONG, 1, 0, MPI_COMM_WORLD);
}
else {
Teuchos::ArrayRCP<packet_type> array_to_recv(length);
Teuchos::receive<count_type, packet_type>(comm, 0, length, array_to_recv.getRawPtr());
// MPI_Status status;
// int result = MPI_Recv(array_to_recv.getRawPtr(), length, MPI_LONG_LONG, 0, 0, MPI_COMM_WORLD, &status);
// if(result == MPI_SUCCESS)
// std::cout << "Successfully received!" << std::endl
// << "MPI_SOURCE: " << status.MPI_SOURCE << std::endl
// << "MPI_TAG: " << status.MPI_TAG << std::endl
// << "MPI_ERROR: " << status.MPI_ERROR << std::endl
// << "_cancelled: " << status._cancelled << std::endl
// << "_ucount: " << status._ucount << std::endl;
}
MPI_Finalize();
return 0;
}
The output:
[blake:192730] *** An error occurred in MPI_Send
[blake:192730] *** reported by process [1952841729,0]
[blake:192730] *** on communicator MPI_COMM_WORLD
[blake:192730] *** MPI_ERR_COUNT: invalid count argument
[blake:192730] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[blake:192730] *** and potentially your MPI job)
When the Teuchos::send/receive calls are replaced by MPI_send/recv (see the lines commented out in the code), the output is:
[blake:192800] Read 2147479552, expected 3200000000, errno = 2
Successfully received!
MPI_SOURCE: 0
MPI_TAG: 0
MPI_ERROR: 0
_cancelled: 0
_ucount: 3200000000
Notes
mpicc: icc (ICC) 18.0.1 20171018 mpirun: mpirun (Open MPI) 2.1.2
An issue on the warning when MPI_Send/Recv is used: https://github.com/open-mpi/ompi/issues/4829.