Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • S spinifel
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 36
    • Issues 36
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 7
    • Merge requests 7
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • MTIP
  • spinifel
  • Merge requests
  • !36

Start working on updating how we do device management

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Johannes Paul Blaschke requested to merge jpb/device-management into development Mar 03, 2022
  • Overview 7
  • Commits 1
  • Changes 17

A lot of the C++ stuff (in spinfel/device) will likely move to a seperate PyPI package soon. This MR makes two changes:

  1. Get the device count from the CUDA API. I checked and this respects CUDA_VISIBLE_DEVICES on Cori GPU, Perlmutter, and Summit.
  2. Remove the gpu.devices_per_node setting -- this is controlled by srun or CUDA_VISIBLE_DEVICES now.

RE point 2 above: you need to remove the

[gpu]
devices_per_node = ...

section of your own tomls.

This also fixed a bug where the orientation matching code would use the setting rather than context.dev_id

Edited Mar 03, 2022 by Johannes Paul Blaschke
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: jpb/device-management