Batching and Buffered Logging
=============================

Overview
--------
The FAST spectrogram batch utilities implement disk I/O reduction and resiliency
through configurable batching for:

* Global extrema JSON persistence
* Plot progress JSON persistence
* Buffered info logging

Key Parameters
--------------
``flush_batch_size`` (default: 10)
    Controls how many orbits worth of successful extrema / progress updates are
    accumulated in-memory before writing JSON changes to disk. A final flush
    always occurs at normal program termination to avoid data loss.

``log_flush_batch_size`` (default: ``flush_batch_size``)
    Controls how many log records are buffered before being written. If
    unspecified, logging reuses ``flush_batch_size``. A forced flush occurs at
    shutdown.

Design Guarantees
-----------------
* At most ``flush_batch_size - 1`` orbit extrema updates are lost if the process
  is interrupted unexpectedly (e.g., SIGKILL, power loss).
* Progress JSON only writes when new orbit work was completed or on forced
  flush to minimize unnecessary disk churn.
* Buffered logging reduces filesystem sync frequency; logs are still flushed on
  completion or when the buffer reaches threshold.

Operational Notes
-----------------
* Setting ``flush_batch_size=1`` restores per-orbit persistence (maximum safety,
  higher I/O volume).
* Very large batch sizes increase risk of lost in-memory work if the process
  terminates abnormally.
* For long-running jobs, consider future enhancements like time-based flushes
  (not implemented yet) if extremely sparse orbit completion rates occur.

Error Categorization in Progress JSON
-------------------------------------
Per-instrument keys record error categories using the pattern:
``{instrument}_{y_scale}_{z_scale}_error-{reason}``

Common reasons include:
* ``timeout`` - worker exceeded allotted time
* ``invalid-cdf`` - structural or content issues in input CDF
* ``divide-by-zero`` - numerical domain error
* ``plotting`` - matplotlib/rendering failure
* ``generic`` - any uncategorized exception

Example Usage Snippet
---------------------
.. code-block:: python

    from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory

    FAST_plot_spectrograms_directory(
        input_dir="FAST_data",
        output_dir="FAST_plots",
        flush_batch_size=10,          # write extrema/progress every 10 orbits
        log_flush_batch_size=20,      # log buffer larger than JSON flush
    )

Tuning Recommendations
----------------------
* SSD / local FS: batch size 20-50 often reduces overhead further.
* Network / cloud drive: keep batch smaller (5-15) to limit at-risk data.
* Debug sessions: use ``flush_batch_size=1`` plus lower log batch for immediacy.
* Embedded / IDE environments: if keyboard shortcuts appear impaired, run
  with ``install_signal_handlers=False`` (generic) or rely on default behavior
  in FAST (handlers are now restored automatically after completion).

Related API Docstrings
----------------------
Full parameter and behavior details are in the docstrings for:
* ``compute_global_extrema``
* ``FAST_plot_spectrograms_directory``
* ``FAST_plot_instrument_grid`` (per-instrument overrides)

Per-Instrument / Per-Row Overrides (FAST & Generic)
---------------------------------------------------
Both the FAST-specific and generic batch plotting pipelines accept **row-level
axis overrides** via dataset dictionaries. For FAST multi-instrument grids this
enables distinct energy (``y_min``/``y_max``) and intensity (``z_min``/``z_max``)
ranges for ``ees``, ``eeb``, ``ies``, and ``ieb`` so that one instrument with a
high dynamic range does not wash out contrast for another.

Example applying precomputed extrema per instrument:

.. code-block:: python

  from batch_multi_plot_FAST_spectrograms import (
    compute_global_extrema, FAST_plot_spectrograms_directory
  )

  extrema = compute_global_extrema(
    directory_path="FAST_data",
    y_scale="linear",
    z_scale="log",
    flush_batch_size=20,
  )

  # Each row in multi-instrument figures now receives its own limits
  FAST_plot_spectrograms_directory(
    directory_path="FAST_data",
    y_scale="linear",
    z_scale="log",
    flush_batch_size=20,
    log_flush_batch_size=20,
  )

In addition, pitch-angle category grids now attach per-row energy and color
scale bounds (when provided) so categories with narrower relevant ranges retain
visual detail.

API Examples
------------

Minimal Plotting Run
~~~~~~~~~~~~~~~~~~~~
.. code-block:: python

  from batch_multi_plot_FAST_spectrograms import FAST_plot_spectrograms_directory

  # Process all orbits with default batching (10) and logging
  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",      # path containing CDF files
    output_dir="FAST_plots",    # destination for generated figures
  )

Custom Batching and Logging
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python

  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    y_scale="linear",
    z_scale="log",
    flush_batch_size=25,          # write extrema/progress every 25 orbits
    log_flush_batch_size=50,      # less frequent log writes
    max_workers=6,                # increase parallelism
    orbit_timeout_seconds=90,     # orbit-level timeout
    instrument_timeout_seconds=45,
  )

Resuming After Interruption
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If execution is interrupted, re-running with the same parameters resumes from the
last recorded orbit per scale combination (unless ``ignore_progress_json=True``):

.. code-block:: python

  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    flush_batch_size=10,
  )  # continues where previous run left off

Forcing a Fresh Pass (ignoring prior progress):

.. code-block:: python

  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    ignore_progress_json=True,  # disregard existing progress JSON
  )

Computing / Refreshing Extrema Only
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can call the internal extrema routine directly to precompute ranges without plotting:

.. code-block:: python

  from batch_multi_plot_FAST_spectrograms import compute_global_extrema  # if re-exported
  # (If not publicly exported, call through the plotting module or duplicate logic.)
  # When calling directly you can still specify the intensity percentile as 'max_percentile'.
  # The energy (Y) coverage threshold remains fixed at 99% of cumulative positive samples.
  state = compute_global_extrema(
    directory_path="FAST_data",
    y_scale="linear",
    z_scale="log",
    instrument_order=("ees", "eeb", "ies", "ieb"),
    extrema_json_path="FAST_calculated_extrema.json",
    max_percentile=95.0,      # matches FAST_plot_spectrograms_directory default
    flush_batch_size=20,
  )
  print("Updated extrema keys:", [k for k in state.keys() if k.endswith("_z_max")])

Tight Loop / Immediate Persistence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set ``flush_batch_size=1`` for maximum safety (higher I/O volume):

.. code-block:: python

  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    flush_batch_size=1,       # per-orbit JSON writes
    log_flush_batch_size=1,   # immediate log emission
  )

High-Throughput Bulk Mode
~~~~~~~~~~~~~~~~~~~~~~~~~
On fast local SSDs you can increase both batch sizes to reduce writes further:

.. code-block:: python

  FAST_plot_spectrograms_directory(
    input_dir="FAST_data",
    output_dir="FAST_plots",
    flush_batch_size=50,
    log_flush_batch_size=50,
    max_workers=8,
  max_processing_percentile=97.5,  # use a different intensity percentile
  )

Axis Label Conventions
----------------------
Both FAST-specific and generic plotting pipelines default to labeling the
vertical (energy) axis as ``Energy (eV)`` and the colorbar as ``Counts``. These
defaults are injected into each dataset row (``y_label`` / ``z_label``) and can
be overridden at dataset construction time to reflect alternative units (e.g.,
``'Differential Energy Flux'``). Centralizing units in the dataset dictionaries
maintains separation between physical semantics and the generic rendering code.

Intensity Percentile vs Energy Coverage
---------------------------------------
``max_processing_percentile`` (FAST) controls the intensity percentile used for
color scale maxima during global extrema computation. Energy (Y-axis) coverage
remains fixed at 99% of cumulative positive samples and is not currently user
configurable. The generic module defers to per-row ``vmin``/``vmax`` or on-the-fly
percentile selection inside ``make_spectrogram`` when bounds are omitted.