Introduce batching in orientation matching to prevent running out of device's memory

Feature Detail. Changing parameters (N_binning and N_orientations) can cause more memory usage in orientation matching module. On Summit, each device has 16GB and so tasks in this module should be batched so that we do not exceed this capactiy.

Description of Current Limitation
Test Data /gpfs/alpine/proj-shared/chm137/data/spi.
Data dimensions 4 x 512 x 512.
No. of images per rank/gpu 1000.

With default setting N_orientations=2000 and N_binning=3 or 2, the module works:
Match tot:11.56s. slice=5.00s. match=0.50s. slice_oh=5.04s. match_oh=1.03s.
Orientations matched in 11.60s.
Total: 960.01s.

N_binning=1. Debug Info:
autocorrelation.py init gpu_free=16.13GB gpu_total=16.91GB. Using CUDA to solve the forward transform on device 0. autocorrelation.py copy H,K,L gpu_free=16.13GB gpu_total=16.91GB. autocorrelation.py H=4.19GB L_[:3]=[-0.43815921 -0.43581707 -0.433481 ] dtype=float64 float64 copied gpu_free=11.93GB gpu_total=16.91GB. autocorrelation.py K=4.19GB copied gpu_free=7.74GB gpu_total=16.91GB. autocorrelation.py L=4.19GB copied gpu_free=3.54GB gpu_total=16.91GB. autocorrelation.py nuvect N=524288000 type=complex_dtype gpu_free=3.53GB gpu_total=16.91GB. Traceback (most recent call last):
File "mpi_main.py", line 16, in . main(). File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/spinifel/mpi/main.py", line 56, in main orientations = match(. File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/spinifel/mpi/orientation_matching.py", line 9, in match. return sequential_match( File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/spinifel/sequential/orientation_matching.py", line 50, in match. model_slices = autocorrelation.forward( File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/spinifel/context.py", line 117, in _noop return f(*args, **kwargs). File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/setup/conda/envs/myenv/lib/python3.8/site-packages/PyNVTX/init.py", line 27, in wrapper. ret = func(*args, **kwargs). File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/spinifel/autocorrelation.py", line 157, in forward_gpu nuvect_gpu = GPUArray(shape=(N,), dtype=complex_dtype).
File "/autofs/nccs-svm1_home1/monarin/sw/spinifel/setup/conda/envs/myenv/lib/python3.8/site-packages/pycuda/gpuarray.py", line 210, in init. self.gpudata = self.allocator(self.size * self.dtype.itemsize). pycuda._driver.MemoryError: cuMemAlloc failed: out of memory.

Proposed Solution. Add batching concept to orientation matching (new parameter N_batch_size) to avoid running out of memory when no. of reference orientations increase or when N_binning decreases.

Note that N_pixels also grown when N_binning is small but it's harder to divide an image up at the moment.

Some pseudocode in def match of orientation_matching.py: For each batch in batched ref. orientations get H, K, L for this batch generate model_slices for this H, K, L get indices of the best matched

Edited Mar 17, 2021 by Monarin Uervirojnangkoorn