Batching and Buffered Logging¶
Overview¶
The FAST spectrogram batch utilities implement disk I/O reduction and resiliency through configurable batching for:
Global extrema JSON persistence
Plot progress JSON persistence
Buffered info logging
Key Parameters¶
flush_batch_size(default: 10)Controls how many orbits worth of successful extrema / progress updates are accumulated in-memory before writing JSON changes to disk. A final flush always occurs at normal program termination to avoid data loss.
log_flush_batch_size(default:flush_batch_size)Controls how many log records are buffered before being written. If unspecified, logging reuses
flush_batch_size. A forced flush occurs at shutdown.
Design Guarantees¶
At most
flush_batch_size - 1orbit extrema updates are lost if the process is interrupted unexpectedly (e.g., SIGKILL, power loss).Progress JSON only writes when new orbit work was completed or on forced flush to minimize unnecessary disk churn.
Buffered logging reduces filesystem sync frequency; logs are still flushed on completion or when the buffer reaches threshold.
Operational Notes¶
Setting
flush_batch_size=1restores per-orbit persistence (maximum safety, higher I/O volume).Very large batch sizes increase risk of lost in-memory work if the process terminates abnormally.
For long-running jobs, consider future enhancements like time-based flushes (not implemented yet) if extremely sparse orbit completion rates occur.
Error Categorization in Progress JSON¶
Per-instrument keys record error categories using the pattern:
{instrument}_{y_scale}_{z_scale}_error-{reason}
Common reasons include:
* timeout - worker exceeded allotted time
* invalid-cdf - structural or content issues in input CDF
* divide-by-zero - numerical domain error
* plotting - matplotlib/rendering failure
* generic - any uncategorized exception
Example Usage Snippet¶
from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
flush_batch_size=10, # write extrema/progress every 10 orbits
log_flush_batch_size=20, # log buffer larger than JSON flush
)
Tuning Recommendations¶
SSD / local FS: batch size 20-50 often reduces overhead further.
Network / cloud drive: keep batch smaller (5-15) to limit at-risk data.
Debug sessions: use
flush_batch_size=1plus lower log batch for immediacy.Embedded / IDE environments: if keyboard shortcuts appear impaired, run with
install_signal_handlers=False(generic) or rely on default behavior in FAST (handlers are now restored automatically after completion).
Per-Instrument / Per-Row Overrides (FAST & Generic)¶
Both the FAST-specific and generic batch plotting pipelines accept row-level
axis overrides via dataset dictionaries. For FAST multi-instrument grids this
enables distinct energy (y_min/y_max) and intensity (z_min/z_max)
ranges for ees, eeb, ies, and ieb so that one instrument with a
high dynamic range does not wash out contrast for another.
Example applying precomputed extrema per instrument:
from batch_multi_plot_FAST_spectrograms import (
compute_global_extrema, FAST_plot_spectrograms_directory
)
extrema = compute_global_extrema(
directory_path="FAST_data",
y_scale="linear",
z_scale="log",
flush_batch_size=20,
)
# Each row in multi-instrument figures now receives its own limits
FAST_plot_spectrograms_directory(
directory_path="FAST_data",
y_scale="linear",
z_scale="log",
flush_batch_size=20,
log_flush_batch_size=20,
)
In addition, pitch-angle category grids now attach per-row energy and color scale bounds (when provided) so categories with narrower relevant ranges retain visual detail.
API Examples¶
Minimal Plotting Run¶
from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory
# Process all orbits with default batching (10) and logging
FAST_plot_spectrograms_directory(
input_dir="FAST_data", # path containing CDF files
output_dir="FAST_plots", # destination for generated figures
)
Custom Batching and Logging¶
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
y_scale="linear",
z_scale="log",
flush_batch_size=25, # write extrema/progress every 25 orbits
log_flush_batch_size=50, # less frequent log writes
max_workers=6, # increase parallelism
orbit_timeout_seconds=90, # orbit-level timeout
instrument_timeout_seconds=45,
)
Resuming After Interruption¶
If execution is interrupted, re-running with the same parameters resumes from the
last recorded orbit per scale combination (unless ignore_progress_json=True):
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
flush_batch_size=10,
) # continues where previous run left off
Forcing a Fresh Pass (ignoring prior progress):
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
ignore_progress_json=True, # disregard existing progress JSON
)
Computing / Refreshing Extrema Only¶
You can call the internal extrema routine directly to precompute ranges without plotting:
from batch_multi_plot_FAST_spectrograms import compute_global_extrema # if re-exported
# (If not publicly exported, call through the plotting module or duplicate logic.)
# When calling directly you can still specify the intensity percentile as 'max_percentile'.
# The energy (Y) coverage threshold remains fixed at 99% of cumulative positive samples.
state = compute_global_extrema(
directory_path="FAST_data",
y_scale="linear",
z_scale="log",
instrument_order=("ees", "eeb", "ies", "ieb"),
extrema_json_path="FAST_calculated_extrema.json",
max_percentile=95.0, # matches FAST_plot_spectrograms_directory default
flush_batch_size=20,
)
print("Updated extrema keys:", [k for k in state.keys() if k.endswith("_z_max")])
Tight Loop / Immediate Persistence¶
Set flush_batch_size=1 for maximum safety (higher I/O volume):
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
flush_batch_size=1, # per-orbit JSON writes
log_flush_batch_size=1, # immediate log emission
)
High-Throughput Bulk Mode¶
On fast local SSDs you can increase both batch sizes to reduce writes further:
FAST_plot_spectrograms_directory(
input_dir="FAST_data",
output_dir="FAST_plots",
flush_batch_size=50,
log_flush_batch_size=50,
max_workers=8,
max_processing_percentile=97.5, # use a different intensity percentile
)
Axis Label Conventions¶
Both FAST-specific and generic plotting pipelines default to labeling the
vertical (energy) axis as Energy (eV) and the colorbar as Counts. These
defaults are injected into each dataset row (y_label / z_label) and can
be overridden at dataset construction time to reflect alternative units (e.g.,
'Differential Energy Flux'). Centralizing units in the dataset dictionaries
maintains separation between physical semantics and the generic rendering code.
Intensity Percentile vs Energy Coverage¶
max_processing_percentile (FAST) controls the intensity percentile used for
color scale maxima during global extrema computation. Energy (Y-axis) coverage
remains fixed at 99% of cumulative positive samples and is not currently user
configurable. The generic module defers to per-row vmin/vmax or on-the-fly
percentile selection inside make_spectrogram when bounds are omitted.