Muelu: Multithreaded performance of Muelu and Ifpack2 SGS
Created by: mhoemmen
@trilinos/ifpack2 @trilinos/muelu
[Edited to include contributors, who agreed to be public]
I just got the following e-mail from @wppowers and @eamarttila :
Hi Mark,
I have been doing some profiling (using Intel's Vtune) of simulations using the experimental multi-threaded gauss-seidel smoother and comparing the results to the same simulation using only MPI. I am seeing that a run using 1 MPI process and 4 openMP threads is roughly 20% slower than solving the same problem using 4 MPI processes and 1 openMP thread per process.
- Would you expect this type of a performance gap between multi-threaded and multi-process runs?
I have a small sample program attached that illustrates our typical usage of trilinos and also manifests the performance issue. Additionally I can send the inputs we are using if it will help with answering the question.
I also noticed during profiling that the computation of the preconditioner does not appear to be threaded.
- Is this true?
- If so, are there plans to thread the preconditioner computation?
Something that was highlighted by Vtune was a load imbalance during the multi-threaded runs. I was curious if telling openMP to use dynamic scheduling would help with the load imbalance, but I don't know how to accomplish that through trilinos. In fact, it appears from the profiling results that some of the loops are using dynamic scheduling and others are using static scheduling.
- Is there a way to tell trilinos to default to a specific type of thread scheduling, particularly from the higher level interfaces such as MueLu and Tpetra?
Thanks for your assistance!
I got their permission to post this e-mail and a sample code that they wrote. Attached is a .tgz archive that has an example .cpp file, a Makefile, and some kind of project file for an IDE. multi-threaded-belos-solve.tgz.txt
One issue with the example code is that it uses the new multithreaded Gauss-Seidel through MueLu, as a smoother. I'm not sure if MueLu has plugged in that option yet, which is why I'm CC'ing the MueLu developers.