Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • T Trilinos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 936
    • Issues 936
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • James Willenbring
  • Trilinos
  • Issues
  • #1622
Closed
Open
Issue created Aug 18, 2017 by James Willenbring@jmwilleOwner

KokkosKernels: Gauss-Seidel threaded setup performance issues with Ifpack2 and MueLu

Created by: pwxy

I am trying to use the KokkosKernels threaded Gauss-Seidel. I'm calling it through ifpack2 ("MT Gauss-Seidel") as a smoother for MueLu, so the problem could be a bad interaction between KokkosKernels and ifpack2 or MueLu.

I'm running drekar on a single KNL of mutrino, with 1 MPI process, and I increase the OMP threads from 1 to 64 (1 OMP thread per core):

setup smoother (ifpack2 "MT Gauss-Seidel")

t solve time(s) GS setup time(s)
1 33.27 493.10
2 24.67 286.50
4 12.26 157.80
8 6.97 79.82
16 3.97 36.61
32 3.50 24.06
64 3.16 16.01

For reference, here are the times if I use the standard, non-threaded Gauss-Seidel (but if it really is non-threaded, why is the setup time going down as the number of OMP threads is increased?)

setup smoother (ifpack2 "Gauss-Seidel")

t solve time(s) GS setup time(s)
1 27.04 0.36
2 25.13 0.21
4 24.09 0.13
8 23.58 0.09
16 23.38 0.06
32 23.32 0.05
64 23.33 0.05

drekar/Trilinos was built with intel 17.0.2 and gnu 6.1.0 (Trilinos repo as of August 16, 2017)

I ran vtune on ellis for the 1 OMP case (the 493.1s case above). According to vtune, all the time is the two Kokkos::parallel_for calls in KokkosKernels::Experimental::Util::symmetrize_graph_symbolic_hashmap (lines 1097 and 1139 of KokkosKernels_Utils.hpp) the time is pretty much equally split between the two Kokkos::parallel_for calls

The following is the stack trace from Ifpack2:

Ifpack2::Relaxation::initialize()
  KokkosKernels::Experimental::Graph::gauss_seidel_symbolic
    KokkosKernels::Experimental::Graph::Impl::GaussSeidel
      KokkosKernels::Experimental::Util::symmetrize_graph_symbolic_hashmap  (Kokkos::parallel_for on line 1097 and 1139)
        Kokkos::parallel_for
        Kokkos::parallel_for

Edit (@aprokop): formatting

Assignee
Assign to
Time tracking