Tutorial 8. Batch Processing
If the number of frames (files) to process exceeds the defined batch_size (default: 32),
the Batch() mode is activated automatically.
At the initialization step (analysis = pygid.Conversion(...)), images are not loaded into memory.
When conversion functions (det2q_gid, det2pol_gid, etc.) are called, the raw data paths are
split into batches and processed sequentially.
In batch mode:
Converted images are not plotted.
Results cannot be returned directly to the workspace, except when
average_all=Trueorsum_all=True.Only saving to disk is supported, except for the averaged or summed result when
average_all=Trueorsum_all=True.
Parameters
batch_size– maximum number of frames or files processed per batch. Default:32.multiprocessing– enables multiprocessing for faster batch execution.plot_result– must beFalse(plotting is disabled in batch mode).save_result– must beTrueto store converted data in HDF5/NXsas format.path_to_save– path where converted data will be saved.overwrite_file– whether to overwrite an existing result file.
Example
In the example below, batch_size is set to 2 to explicitly activate batch processing. Total number of frames in the raw data file: 13.
from pygid.datasets import get_dataset
# Download example dataset from Zenodo
try:
files = get_dataset("tutorial_07")
poni_path = files["poni"]
mask_path = files["mask"]
# several files for batch processing
data_path = files["data"]
except:
print("Dataset download skipped on Read the Docs.")
import pygid
params = pygid.ExpParams(
poni_path=poni_path,
mask_path=mask_path,
ai=0.004,
fliplr=True,
flipud=True
)
matrix = pygid.CoordMaps(
params,
vert_positive=True,
hor_positive=True,
)
analysis = pygid.Conversion(
matrix=matrix,
path=data_path,
dataset='/entry_0000/ESRF-ID10/eiger4m/data',
frame_num = None, # all image
batch_size=2, # limit of the batch size
)
print(f"loaded images: {analysis.img_raw}")
INFO - Number of frames (13) is more than 2. The batch processing has been activated.
loaded images: None
NOTE: images are not loaded
analysis.det2q_gid(save_result=True,
path_to_save='result.h5')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 analysis.det2q_gid(save_result=True,
2 path_to_save='result.h5')
File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:1412, in Conversion.det2q_gid(self, frame_num, interp_type, multiprocessing, return_result, q_xy_range, q_z_range, dq, plot_result, clims, xlim, ylim, save_fig, path_to_save_fig, save_result, path_to_save, h5_group, overwrite_file, overwrite_group, exp_metadata, smpl_metadata)
1410 # If batch mode is active, delegate the task to the batch processor
1411 if self.batch_activated:
-> 1412 res = self.Batch(path_to_save, "det2q_gid", h5_group, exp_metadata, smpl_metadata, overwrite_file,
1413 overwrite_group, save_result, plot_result, return_result)
1414 self.batch_activated = True
1415 return res
File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:425, in Conversion.Batch(self, path_to_save, remap_func, h5_group, exp_metadata, smpl_metadata, overwrite_file, overwrite_group, save_result, plot_result, return_result)
386 """
387 Divides raw images into batches and processes them separately.
388
(...) 422 Result from aggregated batch processing if return_result is True, otherwise None.
423 """
424 # Adjust batch size for compatibility
--> 425 self._adjust_batch_size()
427 # Determine if path-based or frame-based batching
428 is_path_batch = isinstance(self.path, list)
File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:163, in Conversion._adjust_batch_size(self)
156 def _adjust_batch_size(self):
157 """
158 Adjusts batch size to be compatible with number_to_combine.
159
160 If number_to_combine is set, ensures batch_size is a multiple of it
161 to avoid processing incomplete groups.
162 """
--> 163 if self.number_to_combine>self.batch_size:
164 raise ValueError("number_to_combine cannot be greater than batch size")
165 if self.number_to_combine is not None:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
analysis.det2q_gid(save_result=True,
path_to_save='result.h5',
plot_result=True)
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
D:\PhD\mlgid\pygid\pygid\conversion.py:304: UserWarning: Plotting and returning of the result are not supported in batch analysis mode.
warnings.warn("Plotting and returning of the result are not supported in batch analysis mode.",
NOTE: plotting is not supported
The Batch processing works for all two-dimensional conversions and radial profiles (see Tutorials 4-5):
analysis.radial_profile_gid(save_result=True,
path_to_save='result.h5',
plot_result=True)
In combination with averaging:
analysis = pygid.Conversion(
matrix=matrix,
path=data_path,
dataset='/entry_0000/ESRF-ID10/eiger4m/data',
frame_num = None, # all image
batch_size=2, # limit of the batch size
average_all=True, # average all frames
)
analysis.det2q_gid(save_result=True,
path_to_save='result.h5',
plot_result=True)
INFO - Number of frames (13) is more than 2. The batch processing has been activated.
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000