Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • S spinifel
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 30
    • Issues 30
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • MTIP
  • spinifel
  • Issues
  • #45

Closed
Open
Created Feb 24, 2022 by Johannes Paul Blaschke@jpblaschkeOwner

Error running at high numbers on Perlmutter

@cahrens has seen the following when running at 64 nodes on Perlmutter:

cahrens@perlmutter:login38:/pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev> more output_1450602_nodes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100.log 
BATCHSIZE: 100
root_dir: /global/homes/c/cahrens/Projects/exafel/pm/spinifel-dev
NTASKS_PER_NODE: 4
NCPUS_PER_NODE: 4
RUN_MODE: mpi
LAUNCH_SCRIPT: spinifel
SRUN_COMMAND: srun -n 256 --ntasks-per-node=4 -c 10 --gpus-per-task=1 python -m spinifel --default-settings=pm_gpu_mpi.toml --mode=mpi runtime.N_images_per_rank=1500 algorithm.N_bin
ning=0 algorithm.N_orientations=10000 algorithm.N_batch_size=100 data.out_dir=/pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev/n
odes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100 data.name=3iyf_sim_400k.h5 data.in_dir=/global/cfs/cdirs/m2859/data/3iyf/clean
WARNING! The environment variable VERBOSE supersedes all other inputs for this setting. If this is unintensional unset VERBOSE.
WARNING! The environment variable DATA_DIR supersedes all other inputs for this setting. If this is unintensional unset DATA_DIR.
WARNING! The environment variable DATA_FILENAME supersedes all other inputs for this setting. If this is unintensional unset DATA_FILENAME.
WARNING! The environment variable OUT_DIR supersedes all other inputs for this setting. If this is unintensional unset OUT_DIR.
SpinifelSettings:
 + M = 81
  (derived data)
 + M_ups = 162
  (derived data)
 + Mquat = 20
  (derived data)
 + N_batch_size = 100
  source: algorithm.N_batch_size
  description: N_batch_size parameter for slicing in batches
 + N_binning = 0
  source: algorithm.N_binning
  description: N_binning parameter for dataset preprocessing
 + N_binning_tot = 0
  (derived data)
 + N_clipping = 0
  source: algorithm.N_clipping
  description: N_clipping parameter for dataset preprocessing
 + N_generations = 10
  source: algorithm.N_generations
  description: max generations
 + N_images_max = 10000
  source: algorithm.N_images_max
  description: max images
 + N_images_per_rank = 1500
  source: runtime.N_images_per_rank
  description: no. of images per rank
 + N_orientations = 10000
  source: algorithm.N_orientations
  description: N_orientations parameter for orientation matching
 + N_phase_loops = 10
  source: algorithm.N_phase_loops
  description: number of loops for phasing
 + beta = 0.3
  source: algorithm.beta
  description: negative feedback in HIO
 + chk_convergence = False
  source: runtime.chk_convergence
  description: if false, no check if output density converges
 + cutoff = 0.05
  source: algorithm.cutoff
  description: cutoff in shrinkwrap
 + data_dir = /global/cfs/cdirs/m2859/data/3iyf/clean
  source: data.in_dir
  description: data dir
 + data_field_name = intensities
  source: detector.data_field_name
  description: name of data field in the detector output files
 + data_filename = 3iyf_sim_400k.h5
  source: data.name
  description: data file name
 + data_path = /global/cfs/cdirs/m2859/data/3iyf/clean/3iyf_sim_400k.h5
  (derived data)
 + data_type_str = float32
  source: detector.data_type_str
  description: type string (numpy) for the detector output
 + det_shape = (1, 128, 128)
  source: detector.shape
  description: detector shape
 + load_generation = 0
  source: algorithm.load_generation
  description: start from output of this generation
 + nER = 50
  source: algorithm.nER
  description: number of iterations in ER
 + nHIO = 25
  source: algorithm.nHIO
  description: number of iterations in HIO
 + orientation_type_str = float32
  source: algorithm.orientation_type_str
  description: type string (numpy) for the orientation array
 + out_dir = /pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev/nodes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100
  source: data.out_dir
  description: output dir
 + oversampling = 1
  source: algorithm.oversampling
  description: oversampling rate
 + pixel_index_shape = (2, 1, 128, 128)
  (derived data)
 + pixel_index_shape_0 = (2,)
  source: algorithm.pixel_index_shape_0
  description: pixel_index_shape = pixel_index_shape_0 + det_shape
 + pixel_index_type_str = int32
  source: algorithm.pixel_index_type_str
  description: type string (numpy) for the pixel_index array
 + pixel_position_shape = (3, 1, 128, 128)
  (derived data)
 + pixel_position_shape_0 = (3,)
  source: algorithm.pixel_position_shape_0
  description: pixel_position_shape = pixel_position_shape_0 + det_shape
 + pixel_position_type_str = float32
  source: algorithm.pixel_position_type_str
  description: type string (numpy) for the pixel_position array
 + ps_eb_nodes = 1
  source: psana.ps_eb_nodes
  description: no. of eventbuilder cores
 + ps_exp = xpptut1
  source: psana.exp
  description: PSANA experiment name
 + ps_runnum = 1
  source: psana.runnum
  description: PSANA experiment number
 + ps_smd_n_events = 10000
  source: psana.ps_smd_n_events
  description: no. of events to be sent to an EventBuilder core
 + ps_srv_nodes = 0
  source: psana.ps_srv_nodes
  description: no. of server cores
 + reduced_det_shape = (1, 128, 128)
  (derived data)
 + reduced_pixel_index_shape = (2, 1, 128, 128)
  (derived data)
 + reduced_pixel_position_shape = (3, 1, 128, 128)
  (derived data)
 + solve_ac_maxiter = 100
  source: algorithm.solve_ac_maxiter
  description: max number of iterations in the CG solver
 + test = Quickstart settings for Perlmutter
  source: debug.test
  description: test field used for debugging
 + use_callmonitor = False
  source: debug.use_callmonitor
  description: enable call-monitor
 + use_cuda = True
  source: runtime.use_cuda
  description: use cuda wherever possible
 + use_cufinufft = True
  source: runtime.use_cufinufft
  description: use cufinufft for nufft support
 + use_cupy = True
  source: runtime.use_cupy
  description: use cupy wherever possible
 + use_psana = False
  source: psana.enable
  description: enable PSANA
 + use_single_prec = False
  source: runtime.use_single_prec
  description: if true, spinifel will use single-precision floating point
 + verbose = True
  source: debug.verbose
  description: is verbosity > 0
 + verbosity = 1
  source: debug.verbosity
  description: reporting verbosity
 + volume_shape = (151, 151, 151)
  source: algorithm.volume_shape
  description: shape of volume array
 + volume_type_str = complex64
  source: algorithm.volume_type_str
  description: type string (numpy) for the volume array
…
Traceback (most recent call last):
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
  return _run_code(code, main_globals, None,
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/runpy.py”, line 87, in _run_code
  exec(code, run_globals)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/__main__.py”, line 24, in <module>
  main()
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/PyNVTX/__init__.py”, line 33, in wrapper
  ret = func(*args, **kwargs)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/main.py”, line 137, in main
  ac = solve_ac(
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/PyNVTX/__init__.py”, line 33, in wrapper
  ret = func(*args, **kwargs)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/autocorrelation.py”, line 209, in solve_ac
  ret, info = cg(W, d, x0=x0, maxiter=maxiter, callback=callback)
 File “<decorator-gen-7>“, line 2, in cg
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/_lib/_threadsafety.py”, line 44, in caller
  return func(*a, **kw)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/isolve/iterative.py”, line 329, in cg
  work[slice2] += sclr1*matvec(work[slice1])
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/interface.py”, line 232, in matvec
  y = self._matvec(x)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/interface.py”, line 530, in _matvec
  return self.__matvec_impl(x)
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/autocorrelation.py”, line 119, in W_matvec
  uvect_ADA = autocorrelation.core_problem_convolution(
 File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/autocorrelation.py”, line 167, in core_problem_convolution
  assert np.all(np.isreal(uvect))
Assignee
Assign to
Time tracking