Batching and Buffered Logging

Overview

The FAST spectrogram batch utilities implement disk I/O reduction and resiliency through configurable batching for:

  • Global extrema JSON persistence

  • Plot progress JSON persistence

  • Buffered info logging

Key Parameters

flush_batch_size (default: 10)

Controls how many orbits worth of successful extrema / progress updates are accumulated in-memory before writing JSON changes to disk. A final flush always occurs at normal program termination to avoid data loss.

log_flush_batch_size (default: flush_batch_size)

Controls how many log records are buffered before being written. If unspecified, logging reuses flush_batch_size. A forced flush occurs at shutdown.

Design Guarantees

  • At most flush_batch_size - 1 orbit extrema updates are lost if the process is interrupted unexpectedly (e.g., SIGKILL, power loss).

  • Progress JSON only writes when new orbit work was completed or on forced flush to minimize unnecessary disk churn.

  • Buffered logging reduces filesystem sync frequency; logs are still flushed on completion or when the buffer reaches threshold.

Operational Notes

  • Setting flush_batch_size=1 restores per-orbit persistence (maximum safety, higher I/O volume).

  • Very large batch sizes increase risk of lost in-memory work if the process terminates abnormally.

  • For long-running jobs, consider future enhancements like time-based flushes (not implemented yet) if extremely sparse orbit completion rates occur.

Error Categorization in Progress JSON

Per-instrument keys record error categories using the pattern: {instrument}_{y_scale}_{z_scale}_error-{reason}

Common reasons include: * timeout - worker exceeded allotted time * invalid-cdf - structural or content issues in input CDF * divide-by-zero - numerical domain error * plotting - matplotlib/rendering failure * generic - any uncategorized exception

Example Usage Snippet

from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory

FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    flush_batch_size=10,          # write extrema/progress every 10 orbits
    log_flush_batch_size=20,      # log buffer larger than JSON flush
)

Tuning Recommendations

  • SSD / local FS: batch size 20-50 often reduces overhead further.

  • Network / cloud drive: keep batch smaller (5-15) to limit at-risk data.

  • Debug sessions: use flush_batch_size=1 plus lower log batch for immediacy.

  • Embedded / IDE environments: if keyboard shortcuts appear impaired, run with install_signal_handlers=False (generic) or rely on default behavior in FAST (handlers are now restored automatically after completion).

Per-Instrument / Per-Row Overrides (FAST & Generic)

Both the FAST-specific and generic batch plotting pipelines accept row-level axis overrides via dataset dictionaries. For FAST multi-instrument grids this enables distinct energy (y_min/y_max) and intensity (z_min/z_max) ranges for ees, eeb, ies, and ieb so that one instrument with a high dynamic range does not wash out contrast for another.

Example applying precomputed extrema per instrument:

from batch_multi_plot_FAST_spectrograms import (
  compute_global_extrema, FAST_plot_spectrograms_directory
)

extrema = compute_global_extrema(
  directory_path="FAST_data",
  y_scale="linear",
  z_scale="log",
  flush_batch_size=20,
)

# Each row in multi-instrument figures now receives its own limits
FAST_plot_spectrograms_directory(
  directory_path="FAST_data",
  y_scale="linear",
  z_scale="log",
  flush_batch_size=20,
  log_flush_batch_size=20,
)

In addition, pitch-angle category grids now attach per-row energy and color scale bounds (when provided) so categories with narrower relevant ranges retain visual detail.

API Examples

Minimal Plotting Run

from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory

# Process all orbits with default batching (10) and logging
FAST_plot_spectrograms_directory(
  input_dir="FAST_data",      # path containing CDF files
  output_dir="FAST_plots",    # destination for generated figures
)

Custom Batching and Logging

FAST_plot_spectrograms_directory(
  input_dir="FAST_data",
  output_dir="FAST_plots",
  y_scale="linear",
  z_scale="log",
  flush_batch_size=25,          # write extrema/progress every 25 orbits
  log_flush_batch_size=50,      # less frequent log writes
  max_workers=6,                # increase parallelism
  orbit_timeout_seconds=90,     # orbit-level timeout
  instrument_timeout_seconds=45,
)

Resuming After Interruption

If execution is interrupted, re-running with the same parameters resumes from the last recorded orbit per scale combination (unless ignore_progress_json=True):

FAST_plot_spectrograms_directory(
  input_dir="FAST_data",
  output_dir="FAST_plots",
  flush_batch_size=10,
)  # continues where previous run left off

Forcing a Fresh Pass (ignoring prior progress):

FAST_plot_spectrograms_directory(
  input_dir="FAST_data",
  output_dir="FAST_plots",
  ignore_progress_json=True,  # disregard existing progress JSON
)

Computing / Refreshing Extrema Only

You can call the internal extrema routine directly to precompute ranges without plotting:

from batch_multi_plot_FAST_spectrograms import compute_global_extrema  # if re-exported
# (If not publicly exported, call through the plotting module or duplicate logic.)
# When calling directly you can still specify the intensity percentile as 'max_percentile'.
# The energy (Y) coverage threshold remains fixed at 99% of cumulative positive samples.
state = compute_global_extrema(
  directory_path="FAST_data",
  y_scale="linear",
  z_scale="log",
  instrument_order=("ees", "eeb", "ies", "ieb"),
  extrema_json_path="FAST_calculated_extrema.json",
  max_percentile=95.0,      # matches FAST_plot_spectrograms_directory default
  flush_batch_size=20,
)
print("Updated extrema keys:", [k for k in state.keys() if k.endswith("_z_max")])

Tight Loop / Immediate Persistence

Set flush_batch_size=1 for maximum safety (higher I/O volume):

FAST_plot_spectrograms_directory(
  input_dir="FAST_data",
  output_dir="FAST_plots",
  flush_batch_size=1,       # per-orbit JSON writes
  log_flush_batch_size=1,   # immediate log emission
)

High-Throughput Bulk Mode

On fast local SSDs you can increase both batch sizes to reduce writes further:

FAST_plot_spectrograms_directory(
  input_dir="FAST_data",
  output_dir="FAST_plots",
  flush_batch_size=50,
  log_flush_batch_size=50,
  max_workers=8,
max_processing_percentile=97.5,  # use a different intensity percentile
)

Axis Label Conventions

Both FAST-specific and generic plotting pipelines default to labeling the vertical (energy) axis as Energy (eV) and the colorbar as Counts. These defaults are injected into each dataset row (y_label / z_label) and can be overridden at dataset construction time to reflect alternative units (e.g., 'Differential Energy Flux'). Centralizing units in the dataset dictionaries maintains separation between physical semantics and the generic rendering code.

Intensity Percentile vs Energy Coverage

max_processing_percentile (FAST) controls the intensity percentile used for color scale maxima during global extrema computation. Energy (Y-axis) coverage remains fixed at 99% of cumulative positive samples and is not currently user configurable. The generic module defers to per-row vmin/vmax or on-the-fly percentile selection inside make_spectrogram when bounds are omitted.