Tutorial 8. Batch Processing

If the number of frames (files) to process exceeds the defined batch_size (default: 32), the Batch() mode is activated automatically.

At the initialization step (analysis = pygid.Conversion(...)), images are not loaded into memory. When conversion functions (det2q_gid, det2pol_gid, etc.) are called, the raw data paths are split into batches and processed sequentially.

In batch mode:

  • Converted images are not plotted.

  • Results cannot be returned directly to the workspace, except when average_all=True or sum_all=True.

  • Only saving to disk is supported, except for the averaged or summed result when average_all=True or sum_all=True.

Parameters

  • batch_size – maximum number of frames or files processed per batch. Default: 32.

  • multiprocessing – enables multiprocessing for faster batch execution.

  • plot_result – must be False (plotting is disabled in batch mode).

  • save_result – must be True to store converted data in HDF5/NXsas format.

  • path_to_save – path where converted data will be saved.

  • overwrite_file – whether to overwrite an existing result file.

Example

In the example below, batch_size is set to 2 to explicitly activate batch processing. Total number of frames in the raw data file: 13.

from pygid.datasets import get_dataset

# Download example dataset from Zenodo
try:
    files = get_dataset("tutorial_07")
    poni_path = files["poni"]
    mask_path = files["mask"]
    # several files for batch processing
    data_path = files["data"]
except:
    print("Dataset download skipped on Read the Docs.")
import pygid

params = pygid.ExpParams(
    poni_path=poni_path,
    mask_path=mask_path,
    ai=0.004,
    fliplr=True,
    flipud=True
)

matrix = pygid.CoordMaps(
    params,
    vert_positive=True,
    hor_positive=True,
)

analysis = pygid.Conversion(
    matrix=matrix,
    path=data_path,
    dataset='/entry_0000/ESRF-ID10/eiger4m/data',
    frame_num = None,             # all image
    batch_size=2,                 # limit of the batch size
)
print(f"loaded images: {analysis.img_raw}")
INFO - Number of frames (13) is more than 2. The batch processing has been activated.
loaded images: None

NOTE: images are not loaded

analysis.det2q_gid(save_result=True,
                   path_to_save='result.h5')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 analysis.det2q_gid(save_result=True,
      2                    path_to_save='result.h5')

File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:1412, in Conversion.det2q_gid(self, frame_num, interp_type, multiprocessing, return_result, q_xy_range, q_z_range, dq, plot_result, clims, xlim, ylim, save_fig, path_to_save_fig, save_result, path_to_save, h5_group, overwrite_file, overwrite_group, exp_metadata, smpl_metadata)
   1410 # If batch mode is active, delegate the task to the batch processor
   1411 if self.batch_activated:
-> 1412     res = self.Batch(path_to_save, "det2q_gid", h5_group, exp_metadata, smpl_metadata, overwrite_file,
   1413                      overwrite_group, save_result, plot_result, return_result)
   1414     self.batch_activated = True
   1415     return res

File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:425, in Conversion.Batch(self, path_to_save, remap_func, h5_group, exp_metadata, smpl_metadata, overwrite_file, overwrite_group, save_result, plot_result, return_result)
    386 """
    387 Divides raw images into batches and processes them separately.
    388 
   (...)    422     Result from aggregated batch processing if return_result is True, otherwise None.
    423 """
    424 # Adjust batch size for compatibility
--> 425 self._adjust_batch_size()
    427 # Determine if path-based or frame-based batching
    428 is_path_batch = isinstance(self.path, list)

File ~/checkouts/readthedocs.org/user_builds/pygid/envs/stable/lib/python3.11/site-packages/pygid/conversion.py:163, in Conversion._adjust_batch_size(self)
    156 def _adjust_batch_size(self):
    157     """
    158     Adjusts batch size to be compatible with number_to_combine.
    159     
    160     If number_to_combine is set, ensures batch_size is a multiple of it
    161     to avoid processing incomplete groups.
    162     """
--> 163     if self.number_to_combine>self.batch_size:
    164         raise ValueError("number_to_combine cannot be greater than batch size")
    165     if self.number_to_combine is not None:

TypeError: '>' not supported between instances of 'NoneType' and 'int'
analysis.det2q_gid(save_result=True,
                   path_to_save='result.h5',
                   plot_result=True)
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
D:\PhD\mlgid\pygid\pygid\conversion.py:304: UserWarning: Plotting and returning of the result are not supported in batch analysis mode.
  warnings.warn("Plotting and returning of the result are not supported in batch analysis mode.",

NOTE: plotting is not supported

The Batch processing works for all two-dimensional conversions and radial profiles (see Tutorials 4-5):

analysis.radial_profile_gid(save_result=True,
                   path_to_save='result.h5',
                   plot_result=True)

In combination with averaging:

analysis = pygid.Conversion(
    matrix=matrix,
    path=data_path,
    dataset='/entry_0000/ESRF-ID10/eiger4m/data',
    frame_num = None,             # all image
    batch_size=2,                 # limit of the batch size
    average_all=True,             # average all frames
)
analysis.det2q_gid(save_result=True,
                   path_to_save='result.h5',
                   plot_result=True)
INFO - Number of frames (13) is more than 2. The batch processing has been activated.
INFO - Saved in D:\PhD\mlgid\pygid\docs\tutorials\result.h5 in group entry_0000
../_images/3310b71c42b2f3da9a09378023f9fb535c95aa397ca3449bf6b0e107d1778ba9.png