Error running at high numbers on Perlmutter
@cahrens has seen the following when running at 64 nodes on Perlmutter:
cahrens@perlmutter:login38:/pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev> more output_1450602_nodes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100.log
BATCHSIZE: 100
root_dir: /global/homes/c/cahrens/Projects/exafel/pm/spinifel-dev
NTASKS_PER_NODE: 4
NCPUS_PER_NODE: 4
RUN_MODE: mpi
LAUNCH_SCRIPT: spinifel
SRUN_COMMAND: srun -n 256 --ntasks-per-node=4 -c 10 --gpus-per-task=1 python -m spinifel --default-settings=pm_gpu_mpi.toml --mode=mpi runtime.N_images_per_rank=1500 algorithm.N_bin
ning=0 algorithm.N_orientations=10000 algorithm.N_batch_size=100 data.out_dir=/pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev/n
odes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100 data.name=3iyf_sim_400k.h5 data.in_dir=/global/cfs/cdirs/m2859/data/3iyf/clean
WARNING! The environment variable VERBOSE supersedes all other inputs for this setting. If this is unintensional unset VERBOSE.
WARNING! The environment variable DATA_DIR supersedes all other inputs for this setting. If this is unintensional unset DATA_DIR.
WARNING! The environment variable DATA_FILENAME supersedes all other inputs for this setting. If this is unintensional unset DATA_FILENAME.
WARNING! The environment variable OUT_DIR supersedes all other inputs for this setting. If this is unintensional unset OUT_DIR.
SpinifelSettings:
+ M = 81
(derived data)
+ M_ups = 162
(derived data)
+ Mquat = 20
(derived data)
+ N_batch_size = 100
source: algorithm.N_batch_size
description: N_batch_size parameter for slicing in batches
+ N_binning = 0
source: algorithm.N_binning
description: N_binning parameter for dataset preprocessing
+ N_binning_tot = 0
(derived data)
+ N_clipping = 0
source: algorithm.N_clipping
description: N_clipping parameter for dataset preprocessing
+ N_generations = 10
source: algorithm.N_generations
description: max generations
+ N_images_max = 10000
source: algorithm.N_images_max
description: max images
+ N_images_per_rank = 1500
source: runtime.N_images_per_rank
description: no. of images per rank
+ N_orientations = 10000
source: algorithm.N_orientations
description: N_orientations parameter for orientation matching
+ N_phase_loops = 10
source: algorithm.N_phase_loops
description: number of loops for phasing
+ beta = 0.3
source: algorithm.beta
description: negative feedback in HIO
+ chk_convergence = False
source: runtime.chk_convergence
description: if false, no check if output density converges
+ cutoff = 0.05
source: algorithm.cutoff
description: cutoff in shrinkwrap
+ data_dir = /global/cfs/cdirs/m2859/data/3iyf/clean
source: data.in_dir
description: data dir
+ data_field_name = intensities
source: detector.data_field_name
description: name of data field in the detector output files
+ data_filename = 3iyf_sim_400k.h5
source: data.name
description: data file name
+ data_path = /global/cfs/cdirs/m2859/data/3iyf/clean/3iyf_sim_400k.h5
(derived data)
+ data_type_str = float32
source: detector.data_type_str
description: type string (numpy) for the detector output
+ det_shape = (1, 128, 128)
source: detector.shape
description: detector shape
+ load_generation = 0
source: algorithm.load_generation
description: start from output of this generation
+ nER = 50
source: algorithm.nER
description: number of iterations in ER
+ nHIO = 25
source: algorithm.nHIO
description: number of iterations in HIO
+ orientation_type_str = float32
source: algorithm.orientation_type_str
description: type string (numpy) for the orientation array
+ out_dir = /pscratch/sd/c/cahrens/spinifel_output/spinifel_pm_2022-02-19T2311-0800_TRIAL_clean_3iyf_ensemble_dev/nodes_64_nimages_1500_norient_10000_nbinning_0_nbatchsize_100
source: data.out_dir
description: output dir
+ oversampling = 1
source: algorithm.oversampling
description: oversampling rate
+ pixel_index_shape = (2, 1, 128, 128)
(derived data)
+ pixel_index_shape_0 = (2,)
source: algorithm.pixel_index_shape_0
description: pixel_index_shape = pixel_index_shape_0 + det_shape
+ pixel_index_type_str = int32
source: algorithm.pixel_index_type_str
description: type string (numpy) for the pixel_index array
+ pixel_position_shape = (3, 1, 128, 128)
(derived data)
+ pixel_position_shape_0 = (3,)
source: algorithm.pixel_position_shape_0
description: pixel_position_shape = pixel_position_shape_0 + det_shape
+ pixel_position_type_str = float32
source: algorithm.pixel_position_type_str
description: type string (numpy) for the pixel_position array
+ ps_eb_nodes = 1
source: psana.ps_eb_nodes
description: no. of eventbuilder cores
+ ps_exp = xpptut1
source: psana.exp
description: PSANA experiment name
+ ps_runnum = 1
source: psana.runnum
description: PSANA experiment number
+ ps_smd_n_events = 10000
source: psana.ps_smd_n_events
description: no. of events to be sent to an EventBuilder core
+ ps_srv_nodes = 0
source: psana.ps_srv_nodes
description: no. of server cores
+ reduced_det_shape = (1, 128, 128)
(derived data)
+ reduced_pixel_index_shape = (2, 1, 128, 128)
(derived data)
+ reduced_pixel_position_shape = (3, 1, 128, 128)
(derived data)
+ solve_ac_maxiter = 100
source: algorithm.solve_ac_maxiter
description: max number of iterations in the CG solver
+ test = Quickstart settings for Perlmutter
source: debug.test
description: test field used for debugging
+ use_callmonitor = False
source: debug.use_callmonitor
description: enable call-monitor
+ use_cuda = True
source: runtime.use_cuda
description: use cuda wherever possible
+ use_cufinufft = True
source: runtime.use_cufinufft
description: use cufinufft for nufft support
+ use_cupy = True
source: runtime.use_cupy
description: use cupy wherever possible
+ use_psana = False
source: psana.enable
description: enable PSANA
+ use_single_prec = False
source: runtime.use_single_prec
description: if true, spinifel will use single-precision floating point
+ verbose = True
source: debug.verbose
description: is verbosity > 0
+ verbosity = 1
source: debug.verbosity
description: reporting verbosity
+ volume_shape = (151, 151, 151)
source: algorithm.volume_shape
description: shape of volume array
+ volume_type_str = complex64
source: algorithm.volume_type_str
description: type string (numpy) for the volume array
…
Traceback (most recent call last):
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/__main__.py”, line 24, in <module>
main()
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/PyNVTX/__init__.py”, line 33, in wrapper
ret = func(*args, **kwargs)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/main.py”, line 137, in main
ac = solve_ac(
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/PyNVTX/__init__.py”, line 33, in wrapper
ret = func(*args, **kwargs)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/autocorrelation.py”, line 209, in solve_ac
ret, info = cg(W, d, x0=x0, maxiter=maxiter, callback=callback)
File “<decorator-gen-7>“, line 2, in cg
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/_lib/_threadsafety.py”, line 44, in caller
return func(*a, **kw)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/isolve/iterative.py”, line 329, in cg
work[slice2] += sclr1*matvec(work[slice1])
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/interface.py”, line 232, in matvec
y = self._matvec(x)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/setup/conda/envs/myenv/lib/python3.8/site-packages/scipy/sparse/linalg/interface.py”, line 530, in _matvec
return self.__matvec_impl(x)
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/mpi/autocorrelation.py”, line 119, in W_matvec
uvect_ADA = autocorrelation.core_problem_convolution(
File “/global/u2/c/cahrens/Projects/exafel/pm/spinifel-dev/spinifel/autocorrelation.py”, line 167, in core_problem_convolution
assert np.all(np.isreal(uvect))