ise.data

Data loading, processing, and feature engineering for ice sheet emulation.

This package covers the full data pipeline:

  • ForcingFile / GridFile — load and sector-average climate forcing and grid NetCDF files.

  • ISEFlowAISInputs / ISEFlowGrISInputs — validated input dataclasses for running pretrained ISEFlow emulators.

  • FeatureEngineer — train/val/test splitting, scaling, lag variables, outlier handling, and ISM characteristic merging.

  • ProjectionProcessor / DatasetMerger — IVAF calculation from raw ISMIP6 NetCDF outputs and sector-level forcing/projection merging.

  • EmulatorDataset / PyTorchDataset / TSDataset / ScenarioDataset — PyTorch Dataset subclasses for LSTM and normalizing-flow training.

  • StandardScaler / RobustScaler / LogScaler — GPU-compatible nn.Module scalers for use in training loops.

Submodules

ise.data.anomaly

Climatology-based anomaly conversion for ISEFlow inputs.

The ISEFlow models are trained on forcing anomalies (departures from a historical baseline), not raw absolute values. This module provides AnomalyConverter — a lightweight class that looks up the pre-extracted ISMIP6 climatological baselines (stored in data_files/) and subtracts them from user-supplied raw time-series arrays to produce the anomaly arrays expected by ISEFlowAISInputs and ISEFlowGrISInputs.

Supported ice sheets

AIS:

Atmospheric variables pr, evspsbl, smb, ts → anomalies. All inputs are in kg m⁻² s⁻¹ (pr / evspsbl / smb / mrro) or K (ts), matching the ISMIP6 atmospheric forcing file convention. The baseline is the 1995-2014 spatial mean over each AIS sector (AIS_atmos_climatologies.csv). Anomaly outputs retain the same units as the inputs.

GrIS:

Atmospheric variables smb, st → anomalies. Raw inputs are expected in mm w.e. yr⁻¹ (smb) and °C (st), matching the MAR 3.9 Reference file convention (1960-1989 long-term mean, GrIS_atmos_climatologies.csv). The output aSMB anomaly is automatically converted to kg m⁻² s⁻¹ — the units used in the ISMIP6 aSMB forcing files and in the ISEFlow training data. aST is returned in °C.

Variables that are not anomalies (passed through unchanged):
AIS: ocean_thermal_forcing (°C), ocean_salinity (PSU),

ocean_temperature (°C)

GrIS: ocean_thermal_forcing (°C), basin_runoff (m yr⁻¹)

Usage — AIS

With a bundled ISMIP6 climatology:

converter = AnomalyConverter("AIS")
anomalies = converter.compute_ais(
    aogcm="noresm1-m_rcp85",
    sector=10,
    pr=pr_array,           # kg m⁻² s⁻¹
    evspsbl=evspsbl_array, # kg m⁻² s⁻¹
    smb=smb_array,         # kg m⁻² s⁻¹
    ts=ts_array,           # K
)
# anomalies = {"pr_anomaly":      ...,   # kg m⁻² s⁻¹
#              "evspsbl_anomaly":  ...,   # kg m⁻² s⁻¹
#              "smb_anomaly":      ...,   # kg m⁻² s⁻¹
#              "ts_anomaly":       ...}   # K

With a user-supplied climatology (e.g. a new CMIP model not in ISMIP6):

converter = AnomalyConverter("AIS")
anomalies = converter.compute_ais(
    sector=10,
    pr=pr_array,
    evspsbl=evspsbl_array,
    smb=smb_array,
    ts=ts_array,
    custom_climatology={       # 1995-2014 absolute means, same units as inputs
        "pr":      1.3e-5,     # kg m⁻² s⁻¹
        "evspsbl": 4e-6,       # kg m⁻² s⁻¹
        "smb":     9e-6,       # kg m⁻² s⁻¹
        "ts":      253.7,      # K
    },
)

Usage — GrIS

With a bundled ISMIP6 climatology:

converter = AnomalyConverter("GrIS")
anomalies = converter.compute_gris(
    aogcm="hadgem2-es_rcp85",
    sector=1,
    smb=smb_array,  # absolute SMB in mm w.e. yr⁻¹  (MAR Reference units)
    st=st_array,    # absolute surface temperature in °C  (MAR Reference units)
)
# anomalies = {"aSMB": ...,   # SMB anomaly in kg m⁻² s⁻¹  (model training units)
#              "aST":  ...}   # surface temperature anomaly in °C

With a user-supplied climatology:

converter = AnomalyConverter("GrIS")
anomalies = converter.compute_gris(
    sector=1,
    smb=smb_array,
    st=st_array,
    custom_climatology={   # 1960-1989 MAR absolute baseline means
        "smb": -241.2,     # mm w.e. yr⁻¹
        "st":  -22.8,      # °C
    },
)
class ise.data.anomaly.AnomalyConverter(ice_sheet: str)[source]

Bases: object

Convert raw absolute forcing arrays to anomalies using ISMIP6 climatologies.

Parameters:

ice_sheet (str) – 'AIS' or 'GrIS'.

ice_sheet
Type:

str

climatology

The loaded climatology table for the selected ice sheet.

Type:

pd.DataFrame

property climatology: DataFrame

Return the climatology DataFrame, loading it on first access.

compute_ais(sector: int, pr: ndarray, evspsbl: ndarray, smb: ndarray, ts: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, mrro: ndarray | None = None) dict[source]

Compute AIS atmospheric anomalies from raw annual time-series arrays.

Subtracts the 1995-2014 ISMIP6 climatological baseline for the given AOGCM and sector from each raw input array. All anomaly outputs retain the same units as the corresponding inputs.

Exactly one of aogcm (use bundled ISMIP6 climatology) or custom_climatology (user-supplied baseline scalars) must be provided.

Parameters:
  • sector (int) – AIS drainage sector number (1-18).

  • pr (np.ndarray) – Raw precipitation time series (86 values, kg m⁻² s⁻¹).

  • evspsbl (np.ndarray) – Raw evaporation/sublimation time series (86 values, kg m⁻² s⁻¹).

  • smb (np.ndarray) – Raw surface mass balance time series (86 values, kg m⁻² s⁻¹).

  • ts (np.ndarray) – Raw surface temperature time series (86 values, K).

  • aogcm (str, optional) – AOGCM name to look up in the bundled climatology. Common alternate spellings are normalised automatically (e.g. 'NorESM1-M_rcp8.5''noresm1-m_rcp85').

  • custom_climatology (dict, optional) – User-supplied 1995-2014 absolute baseline means for a CMIP model not in ISMIP6. Must contain keys 'pr' (kg m⁻² s⁻¹), 'evspsbl' (kg m⁻² s⁻¹), 'smb' (kg m⁻² s⁻¹), 'ts' (K), and optionally 'mrro' (kg m⁻² s⁻¹) if mrro is provided.

  • mrro (np.ndarray, optional) – Raw runoff time series (86 values, kg m⁻² s⁻¹). Required only for ISEFlow v1.0.0; not used by v1.1.0.

Returns:

Keys 'pr_anomaly', 'evspsbl_anomaly', 'smb_anomaly', 'ts_anomaly' as 86-element numpy arrays. Units match the inputs: kg m⁻² s⁻¹ for pr / evspsbl / smb, K for ts. 'mrro_anomaly' (kg m⁻² s⁻¹) is included when mrro is provided and a baseline is available for the requested AOGCM.

Return type:

dict

Raises:

ValueError – If neither or both of aogcm / custom_climatology are given, or if array lengths are not 86.

compute_gris(sector: int, smb: ndarray, st: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None) dict[source]

Compute GrIS atmospheric anomalies from raw annual time-series arrays.

Subtracts the 1960-1989 MAR long-term mean for the given AOGCM and sector from each raw input array, then converts the SMB anomaly from mm w.e. yr⁻¹ to kg m⁻² s⁻¹ to match the units used in the ISMIP6 aSMB forcing files and in the ISEFlow training data.

Exactly one of aogcm (use bundled ISMIP6 climatology) or custom_climatology (user-supplied baseline scalars) must be provided.

Parameters:
  • sector (int) – GrIS drainage basin number (1-6).

  • smb (np.ndarray) – Raw (absolute) surface mass balance time series (86 values, mm w.e. yr⁻¹, matching the MAR 3.9 Reference file convention). Typical range: −2000 to +200 mm w.e. yr⁻¹ depending on sector. The output aSMB is automatically converted to kg m⁻² s⁻¹.

  • st (np.ndarray) – Raw (absolute) surface temperature time series (86 values, °C, matching the MAR 3.9 Reference file convention).

  • aogcm (str, optional) – AOGCM name to look up in the bundled climatology. Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – User-supplied 1960-1989 MAR absolute baseline means for a CMIP model not in ISMIP6. Must contain keys 'smb' (mm w.e. yr⁻¹) and 'st' (°C).

Returns:

{'aSMB': ..., 'aST': ...} as 86-element numpy arrays.

  • aSMB: SMB anomaly in kg m⁻² s⁻¹, matching the units of the ISMIP6 aSMB forcing files and the ISEFlow training data.

  • aST: surface temperature anomaly in °C.

Variable names match ISEFlowGrISInputs field names.

Return type:

dict

Raises:

ValueError – If neither or both of aogcm / custom_climatology are given, or if array lengths are not 86.

get_climatology(aogcm: str, sector: int) dict[source]

Return the climatological mean values for a given AOGCM and sector.

Parameters:
  • aogcm (str) – Canonical AOGCM name (see list_aogcms()). Common alternate spellings are normalised automatically.

  • sector (int) – Sector / drainage basin number.

Returns:

Variable name → scalar climatological mean for the baseline period. AIS units: kg m⁻² s⁻¹ (pr / evspsbl / smb / mrro), K (ts). GrIS units: mm w.e. yr⁻¹ (smb), °C (st).

Return type:

dict

Raises:

KeyError – If aogcm is not found in the bundled climatology.

list_aogcms() list[str][source]

Return the list of AOGCM names available in the bundled climatology.

ise.data.forcings

NetCDF climate forcing file loading and sector aggregation.

ForcingFile wraps a single ISMIP6 atmospheric or oceanic forcing NetCDF and provides a chainable API for loading, cleaning, depth-aggregating, sector assigning, and spatially averaging the data into the per-sector time series required by the ISEFlow training pipeline.

Supported ice sheets

AIS:

Atmospheric variables pr, evspsbl, smb, ts and oceanic variables thermal_forcing, salinity, temperature.

GrIS:

Atmospheric variables aSMB, aST and oceanic variables thermal_forcing, basin_runoff.

Typical workflow

from ise.data.grids import GridFile
from ise.data.forcings import ForcingFile

gridfile = GridFile("AIS", "AIS_sectors_8km.nc")
gridfile.format_grids()

forcing = ForcingFile("AIS", realm="atmos", filepath="pr_AIS_noresm1-m_rcp85.nc")
forcing.load(decode_times=False)
forcing.format_timestamps()
forcing.drop_vars(["lat", "lon", "mapping"])
forcing.assign_sectors(gridfile)
sector_df = forcing.average_over_sector(sector_number=10).to_dataframe()

Ocean realm requires depth aggregation before sector assignment:

ocean = ForcingFile("AIS", realm="ocean", filepath="thermal_forcing.nc",
                    varname="thermal_forcing")
ocean.load(decode_times=False)
ocean.format_timestamps()
ocean.aggregate_depth(method="mean")
ocean.assign_sectors(gridfile)
tf_df = ocean.average_over_sector(sector_number=10).to_dataframe()

These steps are orchestrated automatically by process_AIS_atmospheric_sectors(), process_AIS_oceanic_sectors(), and their GrIS counterparts in ise.data.process.

class ise.data.forcings.ForcingFile(ice_sheet: str, realm: str, filepath: str, varname: str | None = None)[source]

Bases: object

Wrapper for loading and processing climate forcing NetCDF files.

Supports atmospheric and oceanic realms, sector assignment, depth aggregation (ocean), and sector-averaged time series.

Parameters:
  • ice_sheet (str) – Ice sheet identifier (‘AIS’ or ‘GrIS’).

  • realm (str) – Forcing realm (‘atmos’ or ‘ocean’).

  • filepath (str) – Path to the NetCDF forcing file.

  • varname (str, optional) – Name of the data variable. Defaults to None (first data var).

ice_sheet

Ice sheet identifier.

Type:

str

realm

Forcing realm.

Type:

str

filepath

Path to the file.

Type:

str

data

Loaded dataset after load().

Type:

xarray.Dataset or None

sector_averages

Sector-averaged data after average_over_sector().

Type:

xarray.Dataset or None

sectors

Sector IDs after assign_sectors().

Type:

numpy.ndarray or None

varname

Data variable name.

Type:

str or None

aggregate_depth(method='mean')[source]

Aggregate over the depth dimension (ocean realm only).

Parameters:

method (str) – ‘mean’ or ‘sum’. Defaults to ‘mean’.

Returns:

The dataset with depth aggregated.

Return type:

xarray.Dataset

Raises:

ValueError – If realm is not ‘ocean’, data not loaded, or no ‘z’ dimension.

assign_sectors(sectors: ndarray | GridFile) Dataset[source]

Assign sector IDs to the dataset (e.g. from a GridFile).

Parameters:

sectors (numpy.ndarray or GridFile) – Sector IDs or GridFile to get sectors from.

Returns:

The dataset with sector coordinate.

Return type:

xarray.Dataset

Raises:

ValueError – If data is not loaded.

average_over_sector(sector_number: int | None = None) Dataset[source]

Average data over grid cells within a sector (or all sectors).

Parameters:

sector_number (int, optional) – Sector ID. If None, must be pre-averaged. Defaults to None.

Returns:

Sector-averaged data.

Return type:

xarray.Dataset

Raises:
  • ValueError – If data not loaded or sectors not assigned.

  • NotImplementedError – If sector_number is None (averaging all sectors at once).

drop_vars(vars: list[str]) Dataset[source]

Drop dimensions or variables from the loaded dataset.

Parameters:

vars (List[str]) – Names of dimensions or variables to drop.

Returns:

The dataset (modified in place).

Return type:

xarray.Dataset

format_timestamps() Dataset[source]

Convert and subset time coordinate to 2015-2100 (86 years).

Returns:

The dataset with formatted time.

Return type:

xarray.Dataset

get_data() Dataset[source]

Return the loaded dataset.

load(filepath: str | None = None, validate=True, **kwargs) Dataset[source]

Load the forcing dataset from the NetCDF file.

Parameters:
  • filepath (str, optional) – Override path. Defaults to self.filepath.

  • validate (bool, optional) – Whether to validate (non-NaN data). Defaults to True.

  • **kwargs – Passed to xarray.open_dataset.

Returns:

The loaded dataset.

Return type:

xarray.Dataset

ise.data.grids

NetCDF sector-definition grid file loading and formatting.

GridFile wraps the ice-sheet sector boundary grids used to assign each spatial grid cell to a drainage sector (AIS: 18 sectors; GrIS: 6 drainage basins). The sector array it exposes is consumed by ForcingFile.assign_sectors() during the data processing pipeline.

Grid files expected

AIS:

AIS_sectors_8km.nc — sector variable named 'sectors'.

GrIS:

GrIS_Basins_Rignot_sectors_5km.nc — sector variable named 'ID'.

Typical workflow

Sector grids need a time dimension that matches the forcing data (86 years) before they can be broadcast alongside a forcing xarray.Dataset. The format_grids() convenience method handles the three required steps:

from ise.data.grids import GridFile

gridfile = GridFile("AIS", filepath="AIS_sectors_8km.nc")
gridfile.format_grids()           # load → expand time to 86 → align dims
sectors = gridfile.get_sectors()  # xr.DataArray of shape (time, x, y)

To perform steps individually (e.g. for a custom time length):

gridfile = GridFile("GrIS", filepath="GrIS_Basins_Rignot_sectors_5km.nc")
gridfile.load()
gridfile.expand_dims(dim="time", size=86)
gridfile.align_dims(dims=["time", "x", "y"])
sectors = gridfile.get_sectors()

In both cases the returned DataArray is passed directly to ForcingFile.assign_sectors(gridfile) or used as a mask in the sector-level aggregation functions in ise.data.process.

class ise.data.grids.GridFile(ice_sheet: str, filepath: str)[source]

Bases: object

Wrapper for loading and formatting sector grid NetCDF files.

Used to load sector IDs and optionally expand/align dimensions for compatibility with forcing data (e.g. time dimension of length 86).

Parameters:
  • ice_sheet (str) – Ice sheet identifier (‘AIS’ or ‘GrIS’).

  • filepath (str) – Path to the grid NetCDF file.

ice_sheet

Ice sheet identifier.

Type:

str

filepath

Path to the file.

Type:

str

data

Loaded dataset after load().

Type:

xarray.Dataset or None

sector_variable_name

Name of the sector variable (‘sectors’ for AIS, ‘ID’ for GrIS).

Type:

str

align_dims(dims: list | None = None) Dataset[source]

Transpose dimensions to a standard order.

Parameters:

dims (list, optional) – Dimension order. If None, uses (‘time’, ‘x’, ‘y’, …).

Returns:

The dataset with reordered dimensions.

Return type:

xarray.Dataset

expand_dims(dim: str = 'time', size: int | None = None) Dataset[source]

Expand dimensions (e.g. add time dimension of given size).

Parameters:
  • dim (str, optional) – Dimension name. Defaults to ‘time’.

  • size (int, optional) – Size of the new dimension. Defaults to None.

Returns:

The dataset with expanded dimension.

Return type:

xarray.Dataset

format_grids() Dataset[source]

Load (if needed), expand time to 86, and align dimensions.

Returns:

The formatted grid dataset.

Return type:

xarray.Dataset

get_sectors() DataArray[source]

Return the sector ID array from the grid dataset.

load(filepath: str | None = None, **kwargs) Dataset[source]

Load the grid dataset from the NetCDF file.

Parameters:
  • filepath (str, optional) – Override path. Defaults to self.filepath.

  • **kwargs – Passed to xarray.open_dataset.

Returns:

The loaded dataset.

Return type:

xarray.Dataset

ise.data.inputs

Input dataclasses for ISEFlow-AIS and ISEFlow-GrIS predictions.

This module defines ISEFlowAISInputs and ISEFlowGrISInputs, which validate, encode, and package the climate forcing arrays and ice sheet model (ISM) configuration required by the pretrained ISEFlow emulators.

Both dataclasses perform the following on construction:

  1. Validation — all parameter values are checked against the enumerated sets of allowed options (numerics, stress balance, resolution, etc.).

  2. Encoding — human-readable strings (e.g. 'fd', 'hybrid') are mapped to the internal categorical encodings expected by the model weights (e.g. 'FD', 'Hybrid').

  3. Array coercion — all forcing arrays are cast to numpy.ndarray.

  4. Year encoding — calendar years 2015-2100 are converted to the model-internal 1-86 encoding.

Alternative constructor — raw absolute forcings

If you have raw (non-anomaly) atmospheric forcing values, use from_absolute_forcings(). It calls AnomalyConverter internally to subtract the ISMIP6 climatological baseline before building the dataclass:

from ise.data.inputs import ISEFlowAISInputs
import numpy as np

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=10,
    pr=pr_array,           # kg m⁻² s⁻¹, raw absolute values
    evspsbl=evspsbl_array,
    smb=smb_array,
    ts=ts_array,           # K
    ocean_thermal_forcing=otf_array,
    ocean_salinity=sal_array,
    ocean_temperature=temp_array,
    aogcm="noresm1-m_rcp85",   # or custom_climatology={...} for new CMIP models
    # ISM configuration:
    numerics="fd",
    stress_balance="hybrid",
    resolution="8",
    init_method="eq",
    initial_year=2005,
    melt_in_floating_cells="sub-grid",
    icefront_migration="str",
    ocean_forcing_type="open",
    ocean_sensitivity="medium",
    ice_shelf_fracture=False,
    open_melt_type="quad",
    standard_melt_type=None,
)

If the ISM configuration matches one of the bundled ISMIP6 models, you can pass model_configs="BISICLES_UBC" (or whichever model key appears in ismip6_model_configs.json) instead of specifying all parameters individually.

Output

Call inputs.to_df() to obtain a pandas.DataFrame (86 rows × features) that can be passed directly to ISEFlow_AIS.process() or ISEFlow_GrIS.process(). The pretrained wrappers call process() internally when you invoke model.predict(inputs).

See also: ise.data.anomaly.AnomalyConverter

class ise.data.inputs.ISEFlowAISInputs(year: ndarray, sector: ndarray | int, pr_anomaly: ndarray, evspsbl_anomaly: ndarray, smb_anomaly: ndarray, ts_anomaly: ndarray, ocean_thermal_forcing: ndarray, ocean_salinity: ndarray, ocean_temperature: ndarray, ice_shelf_fracture: bool, ocean_sensitivity: str, mrro_anomaly: ndarray | None = None, initial_year: int | None = None, numerics: str | None = None, stress_balance: str | None = None, resolution: str | None = None, init_method: str | None = None, melt_in_floating_cells: str | None = None, icefront_migration: str | None = None, ocean_forcing_type: str | None = None, open_melt_type: str | None = None, standard_melt_type: str | None = None, model_configs: str | None = None, version: str = 'v1.1.0', override_params: dict | None = None)[source]

Bases: object

Inputs for an ISEFlow-AIS prediction.

Expects pre-computed anomaly arrays (pr_anomaly, evspsbl_anomaly, smb_anomaly, ts_anomaly). If you have raw absolute forcing values instead, use the alternative constructor:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=..., sector=..., pr=..., evspsbl=..., smb=..., ts=...,
    ocean_thermal_forcing=..., ocean_salinity=..., ocean_temperature=...,
    aogcm="noresm1-m_rcp85",   # or custom_climatology={...}
    **ism_config_kwargs,
)

from_absolute_forcings() subtracts the ISMIP6 1995-2014 climatological baseline automatically. Pass aogcm for a bundled ISMIP6 model or custom_climatology (dict with keys 'pr', 'evspsbl', 'smb', 'ts') for a CMIP model not in the bundled climatology.

evspsbl_anomaly: ndarray
classmethod from_absolute_forcings(year: ndarray, sector: int, pr: ndarray, evspsbl: ndarray, smb: ndarray, ts: ndarray, ocean_thermal_forcing: ndarray, ocean_salinity: ndarray, ocean_temperature: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, mrro: ndarray | None = None, **kwargs) ISEFlowAISInputs[source]

Construct ISEFlowAISInputs from raw (non-anomaly) atmospheric forcings.

Subtracts the ISMIP6 1995-2014 climatological baseline from each atmospheric variable to produce the anomaly arrays required by the model. Ocean variables (ocean_thermal_forcing, ocean_salinity, ocean_temperature) are absolute values and are passed through unchanged.

Exactly one of aogcm or custom_climatology must be provided.

Parameters:
  • year (np.ndarray) – Years corresponding to the time series (86 values, 2015-2100).

  • sector (int) – AIS drainage sector (1-18).

  • pr (np.ndarray) – Raw precipitation (86 values, kg m⁻² s⁻¹).

  • evspsbl (np.ndarray) – Raw evaporation / sublimation (86 values, kg m⁻² s⁻¹).

  • smb (np.ndarray) – Raw surface mass balance (86 values, kg m⁻² s⁻¹).

  • ts (np.ndarray) – Raw surface temperature (86 values, K).

  • ocean_thermal_forcing (np.ndarray) – Ocean thermal forcing (86 values, °C). Passed through unchanged.

  • ocean_salinity (np.ndarray) – Ocean salinity (86 values, PSU). Passed through unchanged.

  • ocean_temperature (np.ndarray) – Ocean temperature (86 values, °C). Passed through unchanged.

  • aogcm (str, optional) – AOGCM name to look up in the bundled ISMIP6 climatology (e.g. 'noresm1-m_rcp85'). Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – Baseline means for a CMIP model not in the bundled climatology. Must contain keys 'pr', 'evspsbl', 'smb', 'ts' (and 'mrro' if mrro is also provided). Values should be in the same units as the raw input arrays.

  • mrro (np.ndarray, optional) – Raw runoff (86 values). Only needed for ISEFlow v1.0.0.

  • **kwargs – All remaining keyword arguments are forwarded to ISEFlowAISInputs.__init__ (e.g. ISM config fields such as numerics, stress_balance, model_configs, etc.).

Returns:

Fully validated inputs object ready for model.predict().

Return type:

ISEFlowAISInputs

Examples

Using a bundled ISMIP6 climatology:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=10,
    pr=pr_array,
    evspsbl=evspsbl_array,
    smb=smb_array,
    ts=ts_array,
    ocean_thermal_forcing=otf_array,
    ocean_salinity=sal_array,
    ocean_temperature=temp_array,
    aogcm="noresm1-m_rcp85",
    numerics="fd",
    stress_balance="hybrid",
    resolution="8",
    init_method="eq",
    initial_year=2005,
    melt_in_floating_cells="sub-grid",
    icefront_migration="str",
    ocean_forcing_type="open",
    ocean_sensitivity="medium",
    ice_shelf_fracture=False,
    open_melt_type="quad",
    standard_melt_type="nonlocal",
)

Using a custom climatology for a new CMIP model:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=10,
    pr=pr_array, evspsbl=evspsbl_array,
    smb=smb_array, ts=ts_array,
    ocean_thermal_forcing=otf_array,
    ocean_salinity=sal_array,
    ocean_temperature=temp_array,
    custom_climatology={
        "pr": 1.3e-5, "evspsbl": 3.8e-6,
        "smb": 9.0e-6, "ts": 253.7,
    },
    numerics="fd", ...
)
classmethod from_raw_values(*args, **kwargs)[source]

Deprecated — use from_absolute_forcings instead.

ice_shelf_fracture: bool
icefront_migration: str | None = None
init_method: str | None = None
initial_year: int | None = None
melt_in_floating_cells: str | None = None
model_configs: str | None = None
mrro_anomaly: ndarray | None = None
numerics: str | None = None
ocean_forcing_type: str | None = None
ocean_salinity: ndarray
ocean_sensitivity: str
ocean_temperature: ndarray
ocean_thermal_forcing: ndarray
open_melt_type: str | None = None
override_params: dict | None = None
pr_anomaly: ndarray
resolution: str | None = None
sector: ndarray | int
smb_anomaly: ndarray
standard_melt_type: str | None = None
stress_balance: str | None = None
to_df()[source]

Convert the dataclass fields to a pandas DataFrame.

Returns:

One row per timestep (86 rows) with all forcing and configuration columns needed by ISEFlow_AIS.process().

Return type:

pandas.DataFrame

ts_anomaly: ndarray
version: str = 'v1.1.0'
year: ndarray
class ise.data.inputs.ISEFlowGrISInputs(year: ndarray, sector: ndarray | int, aST: ndarray, aSMB: ndarray, ocean_thermal_forcing: ndarray, basin_runoff: ndarray, ice_shelf_fracture: bool, ocean_sensitivity: str, standard_ocean_forcing: bool, initial_year: int | None = None, numerics: str | None = None, ice_flow_model: str | None = None, initialization: str | None = None, initial_smb: str | None = None, velocity: str | None = None, bedrock_topography: str | None = None, surface_thickness: str | None = None, geothermal_heat_flux: str | None = None, res_min: str | None = None, res_max: str | None = None, model_configs: str | None = None, version: str = 'v1.1.0')[source]

Bases: object

Inputs for an ISEFlow-GrIS prediction.

Expects pre-computed anomaly arrays (aSMB, aST). If you have raw absolute forcing values instead, use the alternative constructor:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=..., sector=..., smb=..., st=...,
    ocean_thermal_forcing=..., basin_runoff=...,
    aogcm="hadgem2-es_rcp85",  # or custom_climatology={...}
    **ism_config_kwargs,
)

from_absolute_forcings() subtracts the ISMIP6 1960-1989 MAR climatological baseline automatically. Pass aogcm for a bundled ISMIP6 model or custom_climatology (dict with keys 'smb', 'st') for a CMIP model not in the bundled climatology.

aSMB: ndarray
aST: ndarray
basin_runoff: ndarray
bedrock_topography: str | None = None
classmethod from_absolute_forcings(year: ndarray, sector: int, smb: ndarray, st: ndarray, ocean_thermal_forcing: ndarray, basin_runoff: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, **kwargs) ISEFlowGrISInputs[source]

Construct ISEFlowGrISInputs from raw (non-anomaly) atmospheric forcings.

Subtracts the ISMIP6 1960-1989 MAR climatological baseline from each atmospheric variable to produce the anomaly arrays (aSMB, aST) required by the model. Ocean variables (ocean_thermal_forcing, basin_runoff) are absolute values and are passed through unchanged.

Exactly one of aogcm or custom_climatology must be provided.

Parameters:
  • year (np.ndarray) – Years (86 values, 2015-2100).

  • sector (int) – GrIS drainage basin number (1-6).

  • smb (np.ndarray) – Raw surface mass balance (86 values, mm w.e. yr⁻¹, matching the MAR Reference file units used in the bundled climatology CSV). The anomaly conversion automatically converts to kg m⁻² s⁻¹.

  • st (np.ndarray) – Raw surface temperature (86 values, K or °C, consistent with the MAR reference).

  • ocean_thermal_forcing (np.ndarray) – Ocean thermal forcing (86 values). Passed through unchanged.

  • basin_runoff (np.ndarray) – Basin-integrated runoff (86 values). Passed through unchanged.

  • aogcm (str, optional) – AOGCM name to look up in the bundled ISMIP6 climatology (e.g. 'hadgem2-es_rcp85'). Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – Baseline means for a CMIP model not in the bundled climatology. Must contain keys 'smb' and 'st' in MAR units.

  • **kwargs – All remaining keyword arguments are forwarded to ISEFlowGrISInputs.__init__ (e.g. ISM config fields such as numerics, ice_flow_model, model_configs, etc.).

Returns:

Fully validated inputs object ready for model.predict().

Return type:

ISEFlowGrISInputs

Examples

Using a bundled ISMIP6 climatology:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=1,
    smb=smb_array,
    st=st_array,
    ocean_thermal_forcing=otf_array,
    basin_runoff=runoff_array,
    aogcm="hadgem2-es_rcp85",
    initial_year=1990,
    numerics="fe",
    ice_flow_model="ho",
    initialization="dav",
    initial_smb="ra3",
    velocity="joughin",
    bedrock_topography="morlighem",
    surface_thickness="None",
    geothermal_heat_flux="g",
    res_min=1.0,
    res_max=7.5,
    standard_ocean_forcing=True,
    ocean_sensitivity="medium",
    ice_shelf_fracture=False,
)

Using a custom climatology for a new CMIP model:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=1,
    smb=smb_array,
    st=st_array,
    ocean_thermal_forcing=otf_array,
    basin_runoff=runoff_array,
    custom_climatology={"smb": -241.2, "st": -22.8},
    initial_year=1990, ...
)
classmethod from_raw_values(*args, **kwargs)[source]

Deprecated — use from_absolute_forcings instead.

geothermal_heat_flux: str | None = None
ice_flow_model: str | None = None
ice_shelf_fracture: bool
initial_smb: str | None = None
initial_year: int | None = None
initialization: str | None = None
model_configs: str | None = None
numerics: str | None = None
ocean_sensitivity: str
ocean_thermal_forcing: ndarray
res_max: str | None = None
res_min: str | None = None
sector: ndarray | int
standard_ocean_forcing: bool
surface_thickness: str | None = None
to_df()[source]

Convert the dataclass fields to a pandas DataFrame.

Returns:

One row per timestep (86 rows) with all forcing and configuration columns needed by ISEFlow_GrIS.process().

Return type:

pandas.DataFrame

velocity: str | None = None
version: str = 'v1.1.0'
year: ndarray

ise.data.feature_engineer

Feature engineering for ISMIP6 emulator training datasets.

This module transforms the raw merged dataset (output of ise.data.process) into the scaled, lagged, train/val/test-split arrays consumed by ISEFlow.fit(). The primary interface is the FeatureEngineer class, backed by a set of standalone functions that can also be called independently.

Pipeline stages

The typical preprocessing sequence is:

from ise.data.feature_engineer import FeatureEngineer

fe = FeatureEngineer("AIS", data=df)
fe.add_model_characteristics()        # merge ISM config one-hot columns
fe.drop_outliers(                      # remove SLE < -26.3 mm (physics bound)
    method="explicit",
    column="sle",
    expression=[("sle", "<", -26.3)],
)
fe.backfill_outliers()                 # replace extreme spikes with prev value
fe.add_lag_variables(lag=5)            # add t-1 … t-5 copies of forcing vars
fe.split_data(output_directory="splits/")  # 70/15/15 by simulation id
X_scaled, y_scaled = fe.scale_data(method="standard", save_dir="splits/")

Key design choices

  • Split granularity: train/val/test is done by simulation id, not by individual rows, so no future data leaks into the validation set. The default split is 70/15/15 with random_state=1.

  • Outlier threshold: drop_outliers with expression=[("sle", "<", -26.3)] removes physically implausible projections (sea level rise of more than 26.3 mm is considered a physical bound for individual sectors).

  • Lag variables: add_lag_variables(lag=5) adds t-1 through t-5 copies of each atmospheric and oceanic forcing column within each 86-year segment, respecting projection boundaries so lag values do not cross between runs.

  • Model characteristics: add_model_characteristics() merges the ISM configuration CSV (e.g. AIS_model_characteristics.csv) and one-hot encodes categorical columns such as numerics, stress balance, etc.

Standalone functions (also usable without FeatureEngineer)

split_training_data — train/val/test split by simulation id. add_lag_variables — add t-k lag columns within each 86-step segment. backfill_outliers — replace extreme y-values with previous-row value. drop_outliers — remove entire runs containing outlier timesteps. add_model_characteristics — merge and encode ISM config metadata. scale_data — apply a pre-fitted sklearn scaler from disk. fill_mrro_nans — impute missing mrro_anomaly values.

class ise.data.feature_engineer.FeatureEngineer(ice_sheet, data: DataFrame, fill_mrro_nans: bool = False, split_dataset: bool = False, train_size: float = 0.7, val_size: float = 0.15, test_size: float = 0.15, output_directory: str | None = None)[source]

Bases: object

A class for performing feature engineering on a given dataset, including preprocessing, scaling, dataset splitting, and outlier handling.

Parameters:
  • ice_sheet (str) – The name of the ice sheet being analyzed.

  • data (pd.DataFrame) – The input dataset.

  • fill_mrro_nans (bool, optional) – Whether to fill missing values in the ‘mrro’ column. Defaults to False.

  • split_dataset (bool, optional) – Whether to split the dataset into training, validation, and test sets. Defaults to False.

  • train_size (float, optional) – Proportion of data to use for training. Defaults to 0.7.

  • val_size (float, optional) – Proportion of data to use for validation. Defaults to 0.15.

  • test_size (float, optional) – Proportion of data to use for testing. Defaults to 0.15.

  • output_directory (str, optional) – Directory to save the split datasets. Defaults to None.

data

The input dataset.

Type:

pd.DataFrame

train_size

Proportion of training data.

Type:

float

val_size

Proportion of validation data.

Type:

float

test_size

Proportion of testing data.

Type:

float

output_directory

Directory to save datasets.

Type:

str

scaler_X_path

Path to the saved input feature scaler.

Type:

str

scaler_y_path

Path to the saved target variable scaler.

Type:

str

scaler_X

Scaler for input features.

Type:

scaler object

scaler_y

Scaler for target variables.

Type:

scaler object

train

Training dataset.

Type:

pd.DataFrame

val

Validation dataset.

Type:

pd.DataFrame

test

Test dataset.

Type:

pd.DataFrame

_including_model_characteristics

Whether model characteristics have been included.

Type:

bool

split_data()[source]

Splits dataset into train, validation, and test sets.

fill_mrro_nans()[source]

Fills missing values in the ‘mrro’ column.

scale_data()[source]

Scales input and target variables using a specified method.

unscale_data()[source]

Reverses the scaling transformation.

add_lag_variables()[source]

Adds lag features to the dataset.

backfill_outliers()[source]

Replaces extreme values in target variables.

drop_outliers()[source]

Removes outliers based on specified criteria.

add_model_characteristics()[source]

Merges model characteristics into the dataset.

add_lag_variables(lag, data=None)[source]

Adds lagged versions of predictor variables to the dataset.

Parameters:
  • lag (int) – Number of time steps to lag the variables.

  • data (pd.DataFrame, optional) – The dataset. If not provided, the class attribute ‘data’ is used.

Returns:

The modified instance with lag variables added.

Return type:

FeatureEngineer

add_model_characteristics(data=None, model_char_path=None, encode=True, ids_path=None)[source]

Merges model characteristic data with the dataset.

Parameters:
  • data (pd.DataFrame, optional) – The dataset. If not provided, the class attribute ‘data’ is used.

  • model_char_path (str, optional) – Path to the model characteristics file. Defaults to the internal path.

  • encode (bool, optional) – Whether to one-hot encode categorical characteristics. Defaults to True.

  • ids_path (str, optional) – Path to an additional ID mapping file. Defaults to None.

Returns:

The modified instance with model characteristics added.

Return type:

FeatureEngineer

backfill_outliers(percentile=99.999, data=None)[source]

Replaces extreme values in target variables with the previous row’s value.

Parameters:
  • percentile (float, optional) – Percentile threshold for identifying outliers. Defaults to 99.999.

  • data (pd.DataFrame, optional) – The dataset. If not provided, the class attribute ‘data’ is used.

Returns:

The modified instance with outliers handled.

Return type:

FeatureEngineer

drop_outliers(method, column, expression=None, quantiles=[0.01, 0.99], data=None)[source]

Drops simulations that are outliers based on the provided method.

Parameters:
  • method (str) – Method of outlier deletion (‘quantile’ or ‘explicit’).

  • column (str) – Column used for detecting outliers.

  • expression (list[tuple], optional) – List of filtering expressions in the form [(column, operator, value)]. Defaults to None.

  • quantiles (list[float], optional) – Quantiles for ‘quantile’ method. Defaults to [0.01, 0.99].

  • data (pd.DataFrame, optional) – The dataset. If not provided, the class attribute ‘data’ is used.

Returns:

The modified instance with outliers removed.

Return type:

FeatureEngineer

exclude_fetish_models(data=None, exclude='both')[source]

Excludes specific models from the dataset.

Parameters:

data (pd.DataFrame, optional) – The dataset. If not provided, the class attribute ‘data’ is used.

Returns:

The modified instance with specific models excluded.

Return type:

FeatureEngineer

fill_mrro_nans(method, data=None)[source]

Fills missing values in the ‘mrro’ column.

Parameters:
  • method (str) – The method used to fill missing values.

  • data (pd.DataFrame, optional) – The dataset. Defaults to None.

Returns:

The dataset with missing values filled.

Return type:

pd.DataFrame

scale_data(X=None, y=None, method='standard', save_dir=None)[source]

Scales input (X) and target (y) variables using a specified scaling method.

Parameters:
  • X (pd.DataFrame or np.ndarray, optional) – Input data. Defaults to None.

  • y (pd.DataFrame or np.ndarray, optional) – Target data. Defaults to None.

  • method (str, optional) – Scaling method (‘standard’, ‘minmax’, ‘robust’). Defaults to ‘standard’.

  • save_dir (str, optional) – Directory to save scalers. Defaults to None.

Returns:

Scaled X and y values.

Return type:

tuple

split_data(data=None, train_size=None, val_size=None, test_size=None, output_directory=None, random_state=1)[source]

Splits the dataset into training, validation, and test sets.

Parameters:
  • data (pd.DataFrame, optional) – The input dataset. Defaults to None.

  • train_size (float, optional) – Proportion of training data. Defaults to None.

  • val_size (float, optional) – Proportion of validation data. Defaults to None.

  • test_size (float, optional) – Proportion of testing data. Defaults to None.

  • output_directory (str, optional) – Directory to save split datasets. Defaults to None.

  • random_state (int, optional) – Random seed for reproducibility. Defaults to 42.

Returns:

Training, validation, and test datasets as pandas DataFrames.

Return type:

tuple

unscale_data(X=None, y=None, scaler_X_path=None, scaler_y_path=None)[source]

Reverses the scaling transformation for input (X) and target (y) variables.

Parameters:
  • X (pd.DataFrame or np.ndarray, optional) – The input data to be unscaled. Defaults to None.

  • y (pd.DataFrame, np.ndarray, or torch.Tensor, optional) – The target data to be unscaled. Defaults to None.

  • scaler_X_path (str, optional) – Path to the stored input scaler. Defaults to None.

  • scaler_y_path (str, optional) – Path to the stored target scaler. Defaults to None.

Returns:

Unscaled X and y data.

Return type:

tuple

ise.data.feature_engineer.add_lag_variables(data: DataFrame, lag: int, verbose=True) DataFrame[source]

Adds lagged variables to the input dataset, creating time-shifted versions of the predictor variables.

Parameters:
  • data (pd.DataFrame) – The dataset containing time series data.

  • lag (int) – The number of time steps to lag the variables.

  • verbose (bool, optional) – Whether to display a progress bar. Defaults to True.

Returns:

The dataset with lagged variables added.

Return type:

pd.DataFrame

ise.data.feature_engineer.add_model_characteristics(data, model_char_path=None, encode=True, ids_path=None) DataFrame[source]

Adds model characteristics to the dataset.

Parameters:
  • data (pd.DataFrame) – The input dataset.

  • model_char_path (str, optional) – Path to the model characteristics file. Defaults to internal path.

  • encode (bool, optional) – Whether to one-hot encode categorical characteristics. Defaults to True.

  • ids_path (str, optional) – Path to an additional ID mapping file. Defaults to None.

Returns:

The dataset with model characteristics added.

Return type:

pd.DataFrame

ise.data.feature_engineer.backfill_outliers(data, percentile=99.999)[source]

Replaces extreme values in y-values (above the specified percentile and below the 1-percentile across all y-values) with the value from the next row (bfill). Trailing outliers at the end of the series will remain as NaN.

Parameters:
  • data (pd.DataFrame) – The dataset containing y-values.

  • percentile (float, optional) – The percentile threshold to define upper extreme values. Defaults to 99.999.

Returns:

The dataset with extreme values replaced using backfill.

Return type:

pd.DataFrame

ise.data.feature_engineer.drop_outliers(data: DataFrame, column: str, method: str, expression: list[tuple] | None = None, quantiles: list[float] = [0.01, 0.99])[source]

Removes outliers from the dataset based on a specified method.

Parameters:
  • data (pd.DataFrame) – The dataset containing the column with potential outliers.

  • column (str) – The column to assess for outliers.

  • method (str) – The method of outlier detection (‘quantile’ or ‘explicit’).

  • expression (list of tuples, optional) – A list of conditions in the format [(column, operator, value)] for explicit filtering. Defaults to None.

  • quantiles (list of float, optional) – Quantiles for filtering when using the ‘quantile’ method. Defaults to [0.01, 0.99].

Returns:

The dataset with outliers removed.

Return type:

pd.DataFrame

Raises:
  • AttributeError – If the method is ‘quantile’ but no quantiles are provided.

  • AttributeError – If the method is ‘explicit’ but no expression is provided.

  • ValueError – If the operator in the expression is not recognized.

ise.data.feature_engineer.exclude_fetish_models(data: DataFrame, exclude: str = 'both') DataFrame[source]

Excludes specific models from the dataset.

Parameters:

data (pd.DataFrame) – The input DataFrame.

Returns:

The filtered DataFrame.

Return type:

pd.DataFrame

ise.data.feature_engineer.fill_mrro_nans(data: DataFrame, method) DataFrame[source]

Fills the NaN values in the specified columns with the given method.

Parameters:
  • data (pd.DataFrame) – The input DataFrame.

  • method (str or int) – The method to fill NaN values. Must be one of ‘zero’, ‘mean’, ‘median’, or ‘drop’.

Returns:

The DataFrame with NaN values filled according to the specified method.

Return type:

pd.DataFrame

Raises:

ValueError – If the method is not one of ‘zero’, ‘mean’, ‘median’, or ‘drop’.

ise.data.feature_engineer.scale_data(data, scaler_path)[source]

Scales the provided dataset using a pre-trained scaler.

Parameters:
  • data (pd.DataFrame) – The dataset to be scaled.

  • scaler_path (str) – Path to the saved scaler.

Returns:

The scaled dataset.

Return type:

pd.DataFrame

ise.data.feature_engineer.split_training_data(data, train_size, val_size, test_size=None, output_directory=None, random_state=1)[source]

Splits the dataset into training, validation, and test sets.

Parameters:
  • data (str or pd.DataFrame) – The dataset or path to the dataset to be split.

  • train_size (float) – Proportion of data to use for training.

  • val_size (float) – Proportion of data to use for validation.

  • test_size (float, optional) – Proportion of data to use for testing. Defaults to the remainder.

  • output_directory (str, optional) – Directory to save the split datasets as CSV files. Defaults to None.

  • random_state (int, optional) – Seed for reproducibility. Defaults to 1.

Returns:

Training, validation, and test datasets as pandas DataFrames.

Return type:

tuple

Raises:
  • ValueError – If the dataset length is not divisible by 86, indicating incomplete projections.

  • ValueError – If the dataset does not contain an ‘id’ column.

ise.data.process

End-to-end ISMIP6 data processing pipeline for ISEFlow training data.

This module converts raw ISMIP6 forcing and projection files into the sector-level, analysis-ready dataset.csv consumed by FeatureEngineer and ultimately by ISEFlow.fit().

Public entry points

process_sectors (main entry point):

End-to-end pipeline. Reads raw ISMIP6 forcing NetCDFs from the GHub directory layout and pre-computed IVAF scalar projection files from Zenodo, aggregates both to sector-level annual time series (86 years, 2015-2100), joins them on (aogcm, year, sector), and returns a single pandas.DataFrame:

from ise.data.process import process_sectors

dataset = process_sectors(
    ice_sheet="AIS",
    forcing_directory="/path/to/GHub/AIS/",
    grid_file="/path/to/AIS_sectors_8km.nc",
    zenodo_directory="/path/to/zenodo_download/",
    export_directory="outputs/",
)

Intermediate CSVs (AIS_atmospheric.csv, AIS_oceanic.csv, forcings.csv, projections.csv, dataset.csv) are written to export_directory so individual stages are skipped on re-runs (controlled by overwrite=False).

ProjectionProcessor:

Only needed when starting from raw 3-D ISMIP6 NetCDF output files rather than the pre-computed Zenodo scalar files. Computes Ice Volume Above Flotation (IVAF) from bed topography, ice thickness, and ice/grounded fraction at each grid cell, subtracts the matched control-run IVAF, and writes ivaf_<ice_sheet>_<group>_<model>_<exp>.nc files:

from ise.data.process import ProjectionProcessor

processor = ProjectionProcessor(
    ice_sheet="AIS",
    forcings_directory="/path/to/forcing/",
    projections_directory="/path/to/projections/",
    scalefac_path="af2_scalefac.nc",
    densities_path="AIS_densities.csv",
)
processor.process()
DatasetMerger:

Lower-level alternative to process_sectors() for when intermediate per-run CSV files already exist on disk. Performs only the join step (forcing ↔ projection matched by CMIP model and pathway).

Supporting functions

process_AIS_atmospheric_sectors / process_GrIS_atmospheric_sectors

Aggregate atmospheric forcing NetCDFs to sector-level annual means.

process_AIS_oceanic_sectors / process_GrIS_oceanic_sectors

Aggregate oceanic forcing NetCDFs to sector-level annual means.

process_AIS_outputs / process_GrIS_outputs

Load pre-computed IVAF scalar projections from Zenodo and convert to SLE.

merge_datasets

Join sector-level forcings and projections DataFrames on (aogcm, year, sector).

get_model_densities

Extract ice/water density values (rhoi, rhow) from raw ISMIP6 NetCDFs.

combine_gris_forcings

Concatenate annual GrIS atmospheric NetCDF files into per-AOGCM combined files.

class ise.data.process.DatasetMerger(ice_sheet, forcings, projections, experiment_file, output_dir)[source]

Bases: object

Merges pre-processed CSV forcing and projection files into a single dataset.

This is a lower-level alternative to process_sectors(). Use it when the intermediate per-run CSV files already exist on disk and you only need the join step (forcing ↔ projection matched by CMIP model and pathway).

Parameters:
  • ice_sheet (str) – The ice sheet name (‘AIS’ or ‘GrIS’).

  • forcings (str) – Directory containing forcing CSV files.

  • projections (str) – Directory containing projection CSV files.

  • experiment_file (str) – Path to the experiment metadata file (CSV or JSON).

  • output_dir (str) – Directory to save the merged dataset.csv.

experiments

Experiment metadata loaded from experiment_file.

Type:

pd.DataFrame

forcing_paths

File paths for all forcing CSVs found under forcings.

Type:

list

projection_paths

File paths for all projection CSVs found under projections.

Type:

list

forcing_metadata

Extracted CMIP model and pathway for each forcing file.

Type:

pd.DataFrame

merge_dataset()[source]

Merges forcing and projection datasets based on CMIP model and pathway metadata.

Returns:

Returns 0 upon successful merging and saving of the dataset.

Return type:

int

merge_sectors(_forcings_file=None, _projections_file=None, _save_dir=None)[source]
class ise.data.process.ProjectionProcessor(ice_sheet, forcings_directory, projections_directory, scalefac_path=None, densities_path=None)[source]

Bases: object

A class for processing ISMIP6 projections (outputs) for ice sheet models, specifically for calculating Ice Volume Above Flotation (IVAF), handling control projections, and processing experimental projections.

Parameters:
  • ice_sheet (str) – The ice sheet being analyzed (‘AIS’ or ‘GIS’).

  • forcings_directory (str) – Path to the directory containing forcing datasets.

  • projections_directory (str) – Path to the directory containing projection datasets.

  • scalefac_path (str, optional) – Path to the NetCDF file containing scaling factors for each grid cell. Defaults to None.

  • densities_path (str, optional) – Path to the CSV file containing density data for models. Defaults to None.

forcings_directory

Path to forcing data.

Type:

str

projections_directory

Path to projection data.

Type:

str

densities_path

Path to density dataset.

Type:

str

scalefac_path

Path to scaling factor dataset.

Type:

str

ice_sheet

Ice sheet identifier (‘AIS’ or ‘GIS’).

Type:

str

resolution

Resolution of the dataset (5 for GIS, 8 for AIS).

Type:

int

process()[source]

Processes ISMIP6 projections by calculating IVAF and subtracting control projections.

Note

This class is only needed when starting from raw 3-D ISMIP6 NetCDF output files. If you are using the pre-computed scalar files from Zenodo (ComputedScalarsPaper/ for AIS, v7_CMIP5_pub/ for GrIS), call process_sectors() directly instead.

process()[source]

Process ISMIP6 projections by calculating IVAF for control and experiment projections, subtracting out control IVAF from experiments, and exporting IVAF files.

For each model run the method:
  1. Loads bed topography, ice thickness, ice fraction, and grounded fraction.

  2. Computes IVAF at every grid cell and time step.

  3. Subtracts the matched control-run IVAF to isolate the forced signal.

  4. Writes ivaf_<ice_sheet>_<group>_<model>_<exp>.nc next to the input files.

Returns:

1 if processing is successful.

Return type:

int

Raises:

ValueError – If projections_directory is not specified.

ise.data.process.combine_gris_forcings(forcing_dir)[source]

Combines GrIS forcings from multiple CMIP model directories into consolidated NetCDF files.

Parameters:

forcing_dir (str) – Directory containing the GrIS forcing files.

Returns:

0 upon successful processing.

Return type:

int

ise.data.process.get_model_densities(zenodo_directory: str, output_path: str | None = None)[source]

Extracts density values (rhoi and rhow) from NetCDF files in the specified directory and returns them in a pandas DataFrame.

Parameters:
  • zenodo_directory (str) – Path to the directory containing the NetCDF files.

  • output_path (str, optional) – Path to save the extracted density values as a CSV file. Defaults to None.

Returns:

A DataFrame containing the group, model, rhoi, and rhow values for each model run.

Return type:

pandas.DataFrame

ise.data.process.get_xarray_data(dataset_fp, var_name=None, ice_sheet='AIS', convert_and_subset=False)[source]

Retrieves and processes data from an xarray dataset.

Parameters:
  • dataset_fp (str) – The file path to the xarray dataset.

  • var_name (str, optional) – The name of the variable to retrieve from the dataset. Defaults to None.

  • ice_sheet (str, optional) – The ice sheet type (‘AIS’ or ‘GrIS’). Defaults to ‘AIS’.

  • convert_and_subset (bool, optional) – If True, converts and subsets the dataset for the target time range. Defaults to False.

Returns:

The extracted variable as a NumPy array or the entire processed dataset.

Return type:

np.ndarray or xarray.Dataset

ise.data.process.interpolate_values(data)[source]

Interpolates missing values in the x and y dimensions of the input dataset using linear interpolation. Ensures that first and last values are properly adjusted to maintain consistency.

Parameters:

data (xarray.Dataset) – A dataset containing x and y dimensions with potential missing values.

Returns:

A tuple containing the interpolated x and y arrays.

Return type:

tuple

ise.data.process.merge_datasets(forcings, projections, experiments_file, ice_sheet='AIS')[source]

Join sector-level forcings and projections into a single analysis-ready DataFrame.

Uses the experiment metadata to add the AOGCM name to the projections table, normalises AOGCM name formatting so the two tables join cleanly on (aogcm, year, sector), then performs an inner merge.

Parameters:
  • forcings (pd.DataFrame) – Sector-level forcing DataFrame as produced by process_AIS/GrIS_atmospheric_sectors() + process_AIS/GrIS_oceanic_sectors() (or read from forcings.csv).

  • projections (pd.DataFrame) – Sector-level projection DataFrame as produced by process_AIS/GrIS_outputs() (or read from projections.csv).

  • experiments_file (str or pd.DataFrame) – Path to the experiment-metadata CSV (maps experiment IDs → AOGCM names) or a pre-loaded DataFrame.

  • ice_sheet (str, optional) – 'AIS' or 'GrIS'. Defaults to 'AIS'.

Returns:

Merged dataset with one row per (model, experiment,

sector, year), containing all forcing columns and the target SLE projection.

Return type:

pandas.DataFrame

ise.data.process.process_AIS_atmospheric_sectors(forcing_directory, grid_file)[source]

Aggregate AIS atmospheric forcing to sector-level annual means.

Searches Atmosphere_Forcing/ for 8 km, 1995-2100 NetCDF files, loads each via ForcingFile, and averages spatially over each of the 18 AIS sectors defined by the grid file.

Parameters:
  • forcing_directory (str) – Root forcing directory (GHub layout expected). The function navigates to the Atmosphere_Forcing/ sub-directory automatically.

  • grid_file (str) – Path to the AIS sector-definition NetCDF (e.g. AIS_sectors_8km.nc).

Returns:

Rows indexed by (aogcm, sector, year) with one column

per atmospheric forcing variable, plus aogcm, year, and sector.

Return type:

pandas.DataFrame

ise.data.process.process_AIS_oceanic_sectors(forcing_directory, grid_file)[source]

Aggregate AIS oceanic forcing to sector-level annual means.

Loads thermal forcing, salinity, and ocean temperature NetCDFs from Ocean_Forcing/ (8 km, 1995-2100 files), depth-averages each variable, and then spatially averages over each of the 18 AIS sectors.

Parameters:
  • forcing_directory (str) – Root forcing directory (GHub layout expected). The function navigates to the Ocean_Forcing/ sub-directory automatically.

  • grid_file (str) – Path to the AIS sector-definition NetCDF (e.g. AIS_sectors_8km.nc).

Returns:

Rows indexed by (aogcm, sector, year) with columns

thermal_forcing, salinity, temperature, aogcm, year, and sector.

Return type:

pandas.DataFrame

ise.data.process.process_AIS_outputs(zenodo_directory, with_ctrl=False)[source]

Load AIS IVAF scalar projections from Zenodo and convert to sea-level equivalent.

Reads per-experiment NetCDF files from the ComputedScalarsPaper/ sub-directory. Each file contains sector-level IVAF time series (ivaf_sector_1ivaf_sector_18). Files with only 85 time steps have their first year duplicated to reach the required 86.

SLE is computed as:

sle = -ivaf / 362.5 * 910 / (1e9 * 1000)

following the sign convention and ice density (910 kg m⁻³) used in Seroussi et al. (2020) ISMIP6 scripts.

Parameters:
  • zenodo_directory (str) – Path to the Zenodo download directory. The function looks inside ComputedScalarsPaper/ automatically.

  • with_ctrl (bool, optional) – If True, includes files that contain control projections (ivaf_AIS_* files, excluding hist/ctrl filenames). Defaults to False, which selects only ivaf_minus_ctrl_proj files.

Returns:

One row per (model, experiment, sector, year) with

columns ivaf, sle, sector, year, id, exp, and model.

Return type:

pandas.DataFrame

ise.data.process.process_GrIS_atmospheric_sectors(forcing_directory, grid_file)[source]

Aggregate GrIS atmospheric forcing (aSMB and aST) to sector-level annual means.

Reads annual NetCDF files from Atmosphere_Forcing/aSMB_observed/v1/ and combines them per AOGCM via combine_gris_forcings() if combined files do not yet exist. Then averages Surface Mass Balance anomaly (aSMB) and surface temperature anomaly (aST) spatially over each of the 6 GrIS drainage basins.

Parameters:
  • forcing_directory (str) – Root forcing directory (GHub layout expected). The function navigates to the Atmosphere_Forcing/aSMB_observed/v1/ sub-directory automatically.

  • grid_file (str or xarray.Dataset) – Path to (or loaded) sector-definition NetCDF defining the 6 GrIS drainage-basin sectors.

Returns:

Rows indexed by (aogcm, sector, year) with columns

aSMB, aST, aogcm, year, and sector.

Return type:

pandas.DataFrame

ise.data.process.process_GrIS_oceanic_sectors(forcing_directory, grid_file)[source]

Aggregate GrIS oceanic forcing to sector-level annual means.

Reads thermal forcing and basin runoff NetCDFs from Ocean_Forcing/Melt_Implementation/v4/ and spatially averages each over the 6 GrIS drainage-basin sectors.

Parameters:
  • forcing_directory (str) – Root forcing directory (GHub layout expected). The function navigates to the Ocean_Forcing/Melt_Implementation/v4/ sub-directory automatically.

  • grid_file (str or xarray.Dataset) – Path to (or loaded) sector-definition NetCDF defining the 6 GrIS drainage-basin sectors.

Returns:

Rows indexed by (aogcm, sector, year) with columns

thermal_forcing, basin_runoff, aogcm, year, and sector.

Return type:

pandas.DataFrame

ise.data.process.process_GrIS_outputs(zenodo_directory)[source]

Load GrIS IVAF scalar projections from Zenodo and convert to sea-level equivalent.

Reads per-experiment NetCDF files from the v7_CMIP5_pub/ sub-directory. Each file contains basin-level IVAF time series for the 6 GrIS drainage basins (ivaf_no, ivaf_ne, ivaf_se, ivaf_sw, ivaf_cw, ivaf_nw). Files with only 85 time steps have their first year duplicated to reach the required 86.

SLE is computed as:

sle = ivaf / 362.5 / 1e9
Parameters:

zenodo_directory (str) – Path to the Zenodo download directory. The function looks inside v7_CMIP5_pub/ automatically.

Returns:

One row per (model, experiment, sector, year) with

columns ivaf, sle, sector, year, id, exp, and model.

Return type:

pandas.DataFrame

ise.data.process.process_sectors(ice_sheet, forcing_directory, grid_file, zenodo_directory, experiments_file='/home/docs/checkouts/readthedocs.org/user_builds/ise/checkouts/stable/ise/data/data_files/ismip6_experiments_updated.csv', export_directory=None, overwrite=False, with_ctrl=False)[source]

End-to-end pipeline that builds the sector-level training dataset from raw ISMIP6 files.

This is the main entry point for data preparation. It reads raw climate forcing NetCDFs and pre-computed IVAF scalar projection files, aggregates both to sector-level annual time series (86 years, 2015-2100), joins them on (aogcm, year, sector), and returns a single analysis-ready DataFrame.

Intermediate files are written to export_directory so individual stages can be skipped on re-runs (controlled by overwrite):

  • <ice_sheet>_atmospheric.csv - sector-averaged atmospheric forcings

  • <ice_sheet>_oceanic.csv - sector-averaged oceanic forcings

  • forcings.csv - atmospheric + oceanic merged

  • projections.csv - IVAF projections by sector

  • dataset.csv - final merged dataset (also returned)

Parameters:
  • ice_sheet (str) – Ice sheet to process: 'AIS' (18 sectors) or 'GrIS' (6 sectors).

  • forcing_directory (str) – Root directory of the ISMIP6 forcing data. Expected sub-structure mirrors the GHub layout (Atmosphere_Forcing/, Ocean_Forcing/, etc.).

  • grid_file (str) – Path to the sector-definition NetCDF (e.g. AIS_sectors_8km.nc or GrIS_Basins_Rignot_sectors_5km.nc).

  • zenodo_directory (str) – Directory containing the pre-computed IVAF scalar files from Zenodo (ComputedScalarsPaper/ for AIS, v7_CMIP5_pub/ for GrIS).

  • experiments_file (str) – Path to the experiment-metadata CSV that maps experiment IDs to AOGCM names. Defaults to the bundled ismip6_experiments_updated.csv.

  • export_directory (str, optional) – Directory to write intermediate and final CSVs. If None, nothing is saved to disk.

  • overwrite (bool, optional) – If True, re-process and overwrite any existing intermediate files. Defaults to False.

  • with_ctrl (bool, optional) – AIS only — if True, includes control projections in the output. Defaults to False.

Returns:

Merged dataset with one row per (model, experiment,

sector, year), containing both forcing variables and the target SLE projection.

Return type:

pandas.DataFrame

ise.data.dataclasses

PyTorch Dataset classes for ISEFlow training and inference.

This module provides four torch.utils.data.Dataset subclasses for loading ice-sheet emulator data. The default for ISEFlow is EmulatorDataset, which handles the 86-timestep projection structure and the sequence padding needed by the LSTM members of DeepEnsemble.

Dataset classes

EmulatorDataset (default for ISEFlow):

Wraps a flat (N_projections * 86, features) or batched (N_projections, 86, features) feature matrix. __getitem__ returns a zero-padded sliding window of sequence_length timesteps so that the LSTM always receives a fixed-length context window even at the start of a projection. Used by both LSTM.fit() and NormalizingFlow.fit():

from ise.data.dataclasses import EmulatorDataset
from torch.utils.data import DataLoader

ds = EmulatorDataset(X, y, sequence_length=5, projection_length=86)
loader = DataLoader(ds, batch_size=64, shuffle=True)
PyTorchDataset:

Minimal (X[i], y[i]) pair dataset with no sequence logic. Used when data is already structured as individual feature vectors (e.g. for the normalizing flow, which uses sequence_length=1).

TSDataset:

Similar to EmulatorDataset but expects pre-batched 3-D tensors (N, T, F). Kept for backward compatibility.

ScenarioDataset:

Simple (features[idx], labels[idx]) pair dataset used in the experimental scenario-classification models.

Padding convention

All sequence-aware datasets pad at the beginning of each projection with the zero vector so that the most recent timestep is always at index -1 of the returned sequence. This means the LSTM sees a causal context that grows from zero padding at t=1 to a full sequence_length window by t=``sequence_length``.

class ise.data.dataclasses.EmulatorDataset(X, y, sequence_length=5, projection_length=86)[source]

Bases: Dataset

A PyTorch dataset for loading emulator data, designed to handle sequence-based inputs and projections.

Parameters:
  • X (pandas.DataFrame, numpy.ndarray, or torch.Tensor) – The input data.

  • y (pandas.DataFrame, numpy.ndarray, or torch.Tensor) – The target data.

  • sequence_length (int, optional) – The length of the input sequence. Default is 5.

  • projection_length (int or tuple, optional) – The length of the projection period. Default is 86.

X

The input data converted to a PyTorch tensor.

Type:

torch.Tensor

y

The target data converted to a PyTorch tensor.

Type:

torch.Tensor

sequence_length

The length of the input sequence.

Type:

int

xdim

The number of dimensions in X.

Type:

int

num_projections

The number of projections in the dataset.

Type:

int

num_timesteps

The number of timesteps per projection.

Type:

int

num_features

The number of features in the dataset.

Type:

int

_to_tensor(x)[source]

Converts input data to a PyTorch tensor.

__len__()[source]

Returns the total number of samples.

__getitem__(i)[source]

Retrieves the i-th sample from the dataset, including proper padding.

class ise.data.dataclasses.PyTorchDataset(X, y)[source]

Bases: Dataset

A PyTorch dataset for general-purpose data loading.

Parameters:
  • X (torch.Tensor) – The input data.

  • y (torch.Tensor) – The target data.

__getitem__(index)[source]

Retrieves the sample at the specified index.

__len__()[source]

Returns the total dataset length.

class ise.data.dataclasses.ScenarioDataset(features, labels)[source]

Bases: Dataset

A PyTorch dataset designed for scenario-based data loading.

Parameters:
  • features (torch.Tensor) – The input features.

  • labels (torch.Tensor) – The target labels.

features

The input features.

Type:

torch.Tensor

labels

The target labels.

Type:

torch.Tensor

__len__()[source]

Returns the dataset length.

__getitem__(idx)[source]

Retrieves the sample at the given index.

class ise.data.dataclasses.TSDataset(X, y, sequence_length=5)[source]

Bases: Dataset

A PyTorch dataset for handling time series data with sequence-based input.

Parameters:
  • X (torch.Tensor) – The input data.

  • y (torch.Tensor) – The target data.

  • sequence_length (int, optional) – The length of the input sequence. Default is 5.

X

The input data.

Type:

torch.Tensor

y

The target data.

Type:

torch.Tensor

sequence_length

The sequence length.

Type:

int

__len__()[source]

Returns the dataset length.

__getitem__(i)[source]

Retrieves the i-th time series sample.

ise.data.scaler

GPU-compatible PyTorch scalers for ISEFlow inputs and outputs.

This module provides StandardScaler, RobustScaler, and LogScaler as torch.nn.Module subclasses. They mirror the scikit-learn scaler API (fit / transform / inverse_transform / save / load) but operate on torch.Tensor objects and can be kept on GPU throughout the forward pass.

Why not use sklearn?

Scikit-learn scalers require a CPU round-trip and cannot participate in the autograd graph. These subclasses keep scaling arithmetic on whichever device the model is running on (CUDA or CPU), avoiding expensive device transfers during inference.

Scalers in the ISEFlow pipeline

The pretrained ISEFlow models ship a scaler_X.pkl (sklearn) for input features and a scaler_y.pkl (sklearn) for the SLE output target. These are sklearn scalers used inside ise.data.feature_engineer.scale_data and ISEFlow.predict().

The PyTorch scalers in this module are used during model training when GPU-resident tensors must be transformed inside the training loop without leaving the GPU:

from ise.data.scaler import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train_tensor)                 # computes mean/std on GPU
X_scaled = scaler.transform(X_train_tensor)
X_orig   = scaler.inverse_transform(X_scaled)

scaler.save("scaler.pt")
scaler_loaded = StandardScaler.load("scaler.pt")

Scaler summary

StandardScaler:

(x - mean) / std. Zero-variance columns are replaced with a small epsilon to prevent division by zero.

RobustScaler:

(x - median) / IQR. More resistant to outliers than StandardScaler.

LogScaler:

log(x - min + epsilon). Useful for strictly positive, right-skewed targets. A shift is computed from the training-set minimum so that all values remain positive before taking the log.

class ise.data.scaler.LogScaler(epsilon=1e-08)[source]

Bases: Module

A class for scaling input data using a logarithmic transformation, ensuring all values are positive by applying a shift.

Parameters:

epsilon (float, optional) – A small constant to avoid log(0) errors. Defaults to 1e-8.

epsilon

A small constant to avoid log(0) errors.

Type:

float

min_value

The minimum value in the dataset used for shifting.

Type:

float

device

The device (CPU or GPU) on which calculations are performed.

Type:

torch.device

fit(X)[source]

Computes the minimum value of the input data for shifting.

transform(X)[source]

Applies the logarithmic transformation.

inverse_transform(X)[source]

Reverses the log transformation.

save(path)[source]

Saves the scaler parameters to a file.

load(path)[source]

Loads the scaler parameters from a file.

fit(X)[source]

Computes the minimum value in the dataset to ensure all values remain positive during transformation.

Parameters:

X (torch.Tensor) – The input data to be scaled.

inverse_transform(X)[source]

Reverses the log transformation to recover the original scale of the data.

Parameters:

X (torch.Tensor) – The log-transformed input data.

Returns:

The transformed input data in its original scale.

Return type:

torch.Tensor

static load(path)[source]

Load a LogScaler from disk.

Parameters:

path (str) – Path to a checkpoint produced by LogScaler.save().

Returns:

A scaler with epsilon and min_value restored.

Return type:

LogScaler

save(path)[source]

Save the fitted epsilon and min_value to path via torch.save.

Parameters:

path (str) – Destination file path.

transform(X)[source]

Applies the logarithmic transformation to the input data.

Parameters:

X (torch.Tensor) – The input data to be transformed.

Returns:

The log-transformed input data.

Return type:

torch.Tensor

class ise.data.scaler.RobustScaler[source]

Bases: Module

A class for scaling input data using the median and interquartile range (IQR), making it robust to outliers.

Parameters:

nn.Module – The base class for all neural network modules in PyTorch.

median_

The median values of the input data.

Type:

torch.Tensor

iqr_

The interquartile range (IQR) values of the input data.

Type:

torch.Tensor

device

The device (CPU or GPU) on which the calculations are performed.

Type:

torch.device

fit(X)[source]

Computes the median and IQR of the input data.

transform(X)[source]

Scales the input data using the computed median and IQR.

inverse_transform(X)[source]

Reverses the scaling operation on the input data.

save(path)[source]

Saves the median and IQR to a file.

load(path)[source]

Loads the median and IQR from a file.

fit(X)[source]

Computes the median and interquartile range (IQR) of the input data.

Parameters:

X (torch.Tensor) – The input data to be scaled.

inverse_transform(X)[source]

Reverses the scaling operation on the input data.

Parameters:

X (torch.Tensor) – The scaled input data to be transformed back.

Returns:

The transformed input data.

Return type:

torch.Tensor

Raises:

RuntimeError – If the RobustScaler instance is not fitted yet.

static load(path)[source]

Load a RobustScaler from disk.

Parameters:

path (str) – Path to a checkpoint produced by RobustScaler.save().

Returns:

A scaler with median_ and iqr_ restored.

Return type:

RobustScaler

save(path)[source]

Save the fitted median and IQR tensors to path via torch.save.

Parameters:

path (str) – Destination file path.

transform(X)[source]

Scales the input data using the computed median and IQR.

Parameters:

X (torch.Tensor) – The input data to be scaled.

Returns:

The scaled input data.

Return type:

torch.Tensor

Raises:

RuntimeError – If the RobustScaler instance is not fitted yet.

class ise.data.scaler.StandardScaler[source]

Bases: Module

A class for scaling input data using mean and standard deviation.

Parameters:

nn.Module – The base class for all neural network modules in PyTorch.

mean_

The mean values of the input data.

Type:

torch.Tensor

scale_

The standard deviation values of the input data.

Type:

torch.Tensor

device

The device (CPU or GPU) on which the calculations are performed.

Type:

torch.device

fit(X)[source]

Computes the mean and standard deviation of the input data.

transform(X)[source]

Scales the input data using the computed mean and standard deviation.

inverse_transform(X)[source]

Reverses the scaling operation on the input data.

save(path)[source]

Saves the mean and standard deviation to a file.

load(path)[source]

Loads the mean and standard deviation from a file.

fit(X)[source]

Computes the mean and standard deviation of the input data.

Parameters:

X (torch.Tensor) – The input data to be scaled.

inverse_transform(X)[source]

Reverses the scaling operation on the input data.

Parameters:

X (torch.Tensor) – The scaled input data to be transformed back.

Returns:

The transformed input data.

Return type:

torch.Tensor

Raises:

RuntimeError – If the Scaler instance is not fitted yet.

static load(path)[source]

Loads the mean and standard deviation from a file.

Parameters:

path (str) – The path to load the file from.

Returns:

A Scaler instance with the loaded mean and standard deviation.

Return type:

Scaler

save(path)[source]

Saves the mean and standard deviation to a file.

Parameters:

path (str) – The path to save the file.

transform(X)[source]

Scales the input data using the computed mean and standard deviation.

Parameters:

X (torch.Tensor) – The input data to be scaled.

Returns:

The scaled input data.

Return type:

torch.Tensor

Raises:

RuntimeError – If the Scaler instance is not fitted yet.

ise.data.utils

Time coordinate normalisation for ISMIP6 xarray datasets.

ISMIP6 models encode the time dimension in a wide variety of formats: cftime.DatetimeNoLeap, cftime.Datetime360Day, “days since” numeric offsets, plain numpy.datetime64, or integer year labels. Before any spatial or sector-level processing can be performed, all datasets must share a uniform numpy.datetime64 time axis covering 2015-2100 (86 years).

This module exposes a single function, convert_and_subset_times, that handles all known ISMIP6 time encodings and edge cases encountered in the GHub dataset collection, including:

  • cftime calendar types (NoLeap, 360-day) → pandas.DatetimeIndex

  • Numeric “days since X” offsets → numpy.datetime64

  • VUW PISM “seconds since 0001-01-01” offsets

  • UAF every-5-years datasets (assume 2015-2100)

  • Datasets with duplicate time stamps (de-duplicated by unique index)

  • Datasets shorter than 86 years (padded with forward-fill)

  • Datasets longer than 86 years (trimmed to the last 86 steps)

Usage

import xarray as xr
from ise.data.utils import convert_and_subset_times

ds = xr.open_dataset("lithk_AIS_NCAR_CISM_exp01.nc", decode_times=False)
ds = convert_and_subset_times(ds)
# ds.time is now numpy.datetime64 with 86 annual steps from 2015 to 2100

This function is called internally by ForcingFile.format_timestamps(), ProjectionProcessor._calculate_ivaf_single_file(), and the sector aggregation functions in ise.data.process.

ise.data.utils.convert_and_subset_times(dataset)[source]

Converts time variables in an xarray dataset to a uniform format and subsets time to the range 2015-2100.

Parameters:

dataset (xarray.Dataset) – The dataset with time values to be converted and subset.

Returns:

The dataset with standardized time format and subset to the correct time range.

Return type:

xarray.Dataset

Raises:

ValueError – If time values are not in a recognizable format.

Module contents

Data loading, processing, and utilities for ice sheet emulation.

This package provides: - ForcingFile: load and process climate forcing NetCDF data. - GridFile: load and format sector grid definitions. - ISEFlowAISInputs, ISEFlowGrISInputs: input dataclasses for ISEFlow predictions. - AnomalyConverter: convert raw absolute forcing arrays to anomalies using bundled ISMIP6 climatologies; used internally by from_absolute_forcings() on the input dataclasses. - feature_engineer: FeatureEngineer and helpers for scaling, splitting, and lag variables. - dataclasses: EmulatorDataset, PyTorchDataset, TSDataset, ScenarioDataset. - process: ProjectionProcessor and sector-level forcing/projection processing. - scaler: PyTorch-based StandardScaler, RobustScaler, LogScaler. - utils: time conversion and subsetting for xarray datasets.

class ise.data.AnomalyConverter(ice_sheet: str)[source]

Bases: object

Convert raw absolute forcing arrays to anomalies using ISMIP6 climatologies.

Parameters:

ice_sheet (str) – 'AIS' or 'GrIS'.

ice_sheet
Type:

str

climatology

The loaded climatology table for the selected ice sheet.

Type:

pd.DataFrame

property climatology: DataFrame

Return the climatology DataFrame, loading it on first access.

compute_ais(sector: int, pr: ndarray, evspsbl: ndarray, smb: ndarray, ts: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, mrro: ndarray | None = None) dict[source]

Compute AIS atmospheric anomalies from raw annual time-series arrays.

Subtracts the 1995-2014 ISMIP6 climatological baseline for the given AOGCM and sector from each raw input array. All anomaly outputs retain the same units as the corresponding inputs.

Exactly one of aogcm (use bundled ISMIP6 climatology) or custom_climatology (user-supplied baseline scalars) must be provided.

Parameters:
  • sector (int) – AIS drainage sector number (1-18).

  • pr (np.ndarray) – Raw precipitation time series (86 values, kg m⁻² s⁻¹).

  • evspsbl (np.ndarray) – Raw evaporation/sublimation time series (86 values, kg m⁻² s⁻¹).

  • smb (np.ndarray) – Raw surface mass balance time series (86 values, kg m⁻² s⁻¹).

  • ts (np.ndarray) – Raw surface temperature time series (86 values, K).

  • aogcm (str, optional) – AOGCM name to look up in the bundled climatology. Common alternate spellings are normalised automatically (e.g. 'NorESM1-M_rcp8.5''noresm1-m_rcp85').

  • custom_climatology (dict, optional) – User-supplied 1995-2014 absolute baseline means for a CMIP model not in ISMIP6. Must contain keys 'pr' (kg m⁻² s⁻¹), 'evspsbl' (kg m⁻² s⁻¹), 'smb' (kg m⁻² s⁻¹), 'ts' (K), and optionally 'mrro' (kg m⁻² s⁻¹) if mrro is provided.

  • mrro (np.ndarray, optional) – Raw runoff time series (86 values, kg m⁻² s⁻¹). Required only for ISEFlow v1.0.0; not used by v1.1.0.

Returns:

Keys 'pr_anomaly', 'evspsbl_anomaly', 'smb_anomaly', 'ts_anomaly' as 86-element numpy arrays. Units match the inputs: kg m⁻² s⁻¹ for pr / evspsbl / smb, K for ts. 'mrro_anomaly' (kg m⁻² s⁻¹) is included when mrro is provided and a baseline is available for the requested AOGCM.

Return type:

dict

Raises:

ValueError – If neither or both of aogcm / custom_climatology are given, or if array lengths are not 86.

compute_gris(sector: int, smb: ndarray, st: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None) dict[source]

Compute GrIS atmospheric anomalies from raw annual time-series arrays.

Subtracts the 1960-1989 MAR long-term mean for the given AOGCM and sector from each raw input array, then converts the SMB anomaly from mm w.e. yr⁻¹ to kg m⁻² s⁻¹ to match the units used in the ISMIP6 aSMB forcing files and in the ISEFlow training data.

Exactly one of aogcm (use bundled ISMIP6 climatology) or custom_climatology (user-supplied baseline scalars) must be provided.

Parameters:
  • sector (int) – GrIS drainage basin number (1-6).

  • smb (np.ndarray) – Raw (absolute) surface mass balance time series (86 values, mm w.e. yr⁻¹, matching the MAR 3.9 Reference file convention). Typical range: −2000 to +200 mm w.e. yr⁻¹ depending on sector. The output aSMB is automatically converted to kg m⁻² s⁻¹.

  • st (np.ndarray) – Raw (absolute) surface temperature time series (86 values, °C, matching the MAR 3.9 Reference file convention).

  • aogcm (str, optional) – AOGCM name to look up in the bundled climatology. Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – User-supplied 1960-1989 MAR absolute baseline means for a CMIP model not in ISMIP6. Must contain keys 'smb' (mm w.e. yr⁻¹) and 'st' (°C).

Returns:

{'aSMB': ..., 'aST': ...} as 86-element numpy arrays.

  • aSMB: SMB anomaly in kg m⁻² s⁻¹, matching the units of the ISMIP6 aSMB forcing files and the ISEFlow training data.

  • aST: surface temperature anomaly in °C.

Variable names match ISEFlowGrISInputs field names.

Return type:

dict

Raises:

ValueError – If neither or both of aogcm / custom_climatology are given, or if array lengths are not 86.

get_climatology(aogcm: str, sector: int) dict[source]

Return the climatological mean values for a given AOGCM and sector.

Parameters:
  • aogcm (str) – Canonical AOGCM name (see list_aogcms()). Common alternate spellings are normalised automatically.

  • sector (int) – Sector / drainage basin number.

Returns:

Variable name → scalar climatological mean for the baseline period. AIS units: kg m⁻² s⁻¹ (pr / evspsbl / smb / mrro), K (ts). GrIS units: mm w.e. yr⁻¹ (smb), °C (st).

Return type:

dict

Raises:

KeyError – If aogcm is not found in the bundled climatology.

list_aogcms() list[str][source]

Return the list of AOGCM names available in the bundled climatology.

class ise.data.ForcingFile(ice_sheet: str, realm: str, filepath: str, varname: str | None = None)[source]

Bases: object

Wrapper for loading and processing climate forcing NetCDF files.

Supports atmospheric and oceanic realms, sector assignment, depth aggregation (ocean), and sector-averaged time series.

Parameters:
  • ice_sheet (str) – Ice sheet identifier (‘AIS’ or ‘GrIS’).

  • realm (str) – Forcing realm (‘atmos’ or ‘ocean’).

  • filepath (str) – Path to the NetCDF forcing file.

  • varname (str, optional) – Name of the data variable. Defaults to None (first data var).

ice_sheet

Ice sheet identifier.

Type:

str

realm

Forcing realm.

Type:

str

filepath

Path to the file.

Type:

str

data

Loaded dataset after load().

Type:

xarray.Dataset or None

sector_averages

Sector-averaged data after average_over_sector().

Type:

xarray.Dataset or None

sectors

Sector IDs after assign_sectors().

Type:

numpy.ndarray or None

varname

Data variable name.

Type:

str or None

aggregate_depth(method='mean')[source]

Aggregate over the depth dimension (ocean realm only).

Parameters:

method (str) – ‘mean’ or ‘sum’. Defaults to ‘mean’.

Returns:

The dataset with depth aggregated.

Return type:

xarray.Dataset

Raises:

ValueError – If realm is not ‘ocean’, data not loaded, or no ‘z’ dimension.

assign_sectors(sectors: ndarray | GridFile) Dataset[source]

Assign sector IDs to the dataset (e.g. from a GridFile).

Parameters:

sectors (numpy.ndarray or GridFile) – Sector IDs or GridFile to get sectors from.

Returns:

The dataset with sector coordinate.

Return type:

xarray.Dataset

Raises:

ValueError – If data is not loaded.

average_over_sector(sector_number: int | None = None) Dataset[source]

Average data over grid cells within a sector (or all sectors).

Parameters:

sector_number (int, optional) – Sector ID. If None, must be pre-averaged. Defaults to None.

Returns:

Sector-averaged data.

Return type:

xarray.Dataset

Raises:
  • ValueError – If data not loaded or sectors not assigned.

  • NotImplementedError – If sector_number is None (averaging all sectors at once).

drop_vars(vars: list[str]) Dataset[source]

Drop dimensions or variables from the loaded dataset.

Parameters:

vars (List[str]) – Names of dimensions or variables to drop.

Returns:

The dataset (modified in place).

Return type:

xarray.Dataset

format_timestamps() Dataset[source]

Convert and subset time coordinate to 2015-2100 (86 years).

Returns:

The dataset with formatted time.

Return type:

xarray.Dataset

get_data() Dataset[source]

Return the loaded dataset.

load(filepath: str | None = None, validate=True, **kwargs) Dataset[source]

Load the forcing dataset from the NetCDF file.

Parameters:
  • filepath (str, optional) – Override path. Defaults to self.filepath.

  • validate (bool, optional) – Whether to validate (non-NaN data). Defaults to True.

  • **kwargs – Passed to xarray.open_dataset.

Returns:

The loaded dataset.

Return type:

xarray.Dataset

class ise.data.GridFile(ice_sheet: str, filepath: str)[source]

Bases: object

Wrapper for loading and formatting sector grid NetCDF files.

Used to load sector IDs and optionally expand/align dimensions for compatibility with forcing data (e.g. time dimension of length 86).

Parameters:
  • ice_sheet (str) – Ice sheet identifier (‘AIS’ or ‘GrIS’).

  • filepath (str) – Path to the grid NetCDF file.

ice_sheet

Ice sheet identifier.

Type:

str

filepath

Path to the file.

Type:

str

data

Loaded dataset after load().

Type:

xarray.Dataset or None

sector_variable_name

Name of the sector variable (‘sectors’ for AIS, ‘ID’ for GrIS).

Type:

str

align_dims(dims: list | None = None) Dataset[source]

Transpose dimensions to a standard order.

Parameters:

dims (list, optional) – Dimension order. If None, uses (‘time’, ‘x’, ‘y’, …).

Returns:

The dataset with reordered dimensions.

Return type:

xarray.Dataset

expand_dims(dim: str = 'time', size: int | None = None) Dataset[source]

Expand dimensions (e.g. add time dimension of given size).

Parameters:
  • dim (str, optional) – Dimension name. Defaults to ‘time’.

  • size (int, optional) – Size of the new dimension. Defaults to None.

Returns:

The dataset with expanded dimension.

Return type:

xarray.Dataset

format_grids() Dataset[source]

Load (if needed), expand time to 86, and align dimensions.

Returns:

The formatted grid dataset.

Return type:

xarray.Dataset

get_sectors() DataArray[source]

Return the sector ID array from the grid dataset.

load(filepath: str | None = None, **kwargs) Dataset[source]

Load the grid dataset from the NetCDF file.

Parameters:
  • filepath (str, optional) – Override path. Defaults to self.filepath.

  • **kwargs – Passed to xarray.open_dataset.

Returns:

The loaded dataset.

Return type:

xarray.Dataset

class ise.data.ISEFlowAISInputs(year: ndarray, sector: ndarray | int, pr_anomaly: ndarray, evspsbl_anomaly: ndarray, smb_anomaly: ndarray, ts_anomaly: ndarray, ocean_thermal_forcing: ndarray, ocean_salinity: ndarray, ocean_temperature: ndarray, ice_shelf_fracture: bool, ocean_sensitivity: str, mrro_anomaly: ndarray | None = None, initial_year: int | None = None, numerics: str | None = None, stress_balance: str | None = None, resolution: str | None = None, init_method: str | None = None, melt_in_floating_cells: str | None = None, icefront_migration: str | None = None, ocean_forcing_type: str | None = None, open_melt_type: str | None = None, standard_melt_type: str | None = None, model_configs: str | None = None, version: str = 'v1.1.0', override_params: dict | None = None)[source]

Bases: object

Inputs for an ISEFlow-AIS prediction.

Expects pre-computed anomaly arrays (pr_anomaly, evspsbl_anomaly, smb_anomaly, ts_anomaly). If you have raw absolute forcing values instead, use the alternative constructor:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=..., sector=..., pr=..., evspsbl=..., smb=..., ts=...,
    ocean_thermal_forcing=..., ocean_salinity=..., ocean_temperature=...,
    aogcm="noresm1-m_rcp85",   # or custom_climatology={...}
    **ism_config_kwargs,
)

from_absolute_forcings() subtracts the ISMIP6 1995-2014 climatological baseline automatically. Pass aogcm for a bundled ISMIP6 model or custom_climatology (dict with keys 'pr', 'evspsbl', 'smb', 'ts') for a CMIP model not in the bundled climatology.

evspsbl_anomaly: ndarray
classmethod from_absolute_forcings(year: ndarray, sector: int, pr: ndarray, evspsbl: ndarray, smb: ndarray, ts: ndarray, ocean_thermal_forcing: ndarray, ocean_salinity: ndarray, ocean_temperature: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, mrro: ndarray | None = None, **kwargs) ISEFlowAISInputs[source]

Construct ISEFlowAISInputs from raw (non-anomaly) atmospheric forcings.

Subtracts the ISMIP6 1995-2014 climatological baseline from each atmospheric variable to produce the anomaly arrays required by the model. Ocean variables (ocean_thermal_forcing, ocean_salinity, ocean_temperature) are absolute values and are passed through unchanged.

Exactly one of aogcm or custom_climatology must be provided.

Parameters:
  • year (np.ndarray) – Years corresponding to the time series (86 values, 2015-2100).

  • sector (int) – AIS drainage sector (1-18).

  • pr (np.ndarray) – Raw precipitation (86 values, kg m⁻² s⁻¹).

  • evspsbl (np.ndarray) – Raw evaporation / sublimation (86 values, kg m⁻² s⁻¹).

  • smb (np.ndarray) – Raw surface mass balance (86 values, kg m⁻² s⁻¹).

  • ts (np.ndarray) – Raw surface temperature (86 values, K).

  • ocean_thermal_forcing (np.ndarray) – Ocean thermal forcing (86 values, °C). Passed through unchanged.

  • ocean_salinity (np.ndarray) – Ocean salinity (86 values, PSU). Passed through unchanged.

  • ocean_temperature (np.ndarray) – Ocean temperature (86 values, °C). Passed through unchanged.

  • aogcm (str, optional) – AOGCM name to look up in the bundled ISMIP6 climatology (e.g. 'noresm1-m_rcp85'). Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – Baseline means for a CMIP model not in the bundled climatology. Must contain keys 'pr', 'evspsbl', 'smb', 'ts' (and 'mrro' if mrro is also provided). Values should be in the same units as the raw input arrays.

  • mrro (np.ndarray, optional) – Raw runoff (86 values). Only needed for ISEFlow v1.0.0.

  • **kwargs – All remaining keyword arguments are forwarded to ISEFlowAISInputs.__init__ (e.g. ISM config fields such as numerics, stress_balance, model_configs, etc.).

Returns:

Fully validated inputs object ready for model.predict().

Return type:

ISEFlowAISInputs

Examples

Using a bundled ISMIP6 climatology:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=10,
    pr=pr_array,
    evspsbl=evspsbl_array,
    smb=smb_array,
    ts=ts_array,
    ocean_thermal_forcing=otf_array,
    ocean_salinity=sal_array,
    ocean_temperature=temp_array,
    aogcm="noresm1-m_rcp85",
    numerics="fd",
    stress_balance="hybrid",
    resolution="8",
    init_method="eq",
    initial_year=2005,
    melt_in_floating_cells="sub-grid",
    icefront_migration="str",
    ocean_forcing_type="open",
    ocean_sensitivity="medium",
    ice_shelf_fracture=False,
    open_melt_type="quad",
    standard_melt_type="nonlocal",
)

Using a custom climatology for a new CMIP model:

inputs = ISEFlowAISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=10,
    pr=pr_array, evspsbl=evspsbl_array,
    smb=smb_array, ts=ts_array,
    ocean_thermal_forcing=otf_array,
    ocean_salinity=sal_array,
    ocean_temperature=temp_array,
    custom_climatology={
        "pr": 1.3e-5, "evspsbl": 3.8e-6,
        "smb": 9.0e-6, "ts": 253.7,
    },
    numerics="fd", ...
)
classmethod from_raw_values(*args, **kwargs)[source]

Deprecated — use from_absolute_forcings instead.

ice_shelf_fracture: bool
icefront_migration: str | None = None
init_method: str | None = None
initial_year: int | None = None
melt_in_floating_cells: str | None = None
model_configs: str | None = None
mrro_anomaly: ndarray | None = None
numerics: str | None = None
ocean_forcing_type: str | None = None
ocean_salinity: ndarray
ocean_sensitivity: str
ocean_temperature: ndarray
ocean_thermal_forcing: ndarray
open_melt_type: str | None = None
override_params: dict | None = None
pr_anomaly: ndarray
resolution: str | None = None
sector: ndarray | int
smb_anomaly: ndarray
standard_melt_type: str | None = None
stress_balance: str | None = None
to_df()[source]

Convert the dataclass fields to a pandas DataFrame.

Returns:

One row per timestep (86 rows) with all forcing and configuration columns needed by ISEFlow_AIS.process().

Return type:

pandas.DataFrame

ts_anomaly: ndarray
version: str = 'v1.1.0'
year: ndarray
class ise.data.ISEFlowGrISInputs(year: ndarray, sector: ndarray | int, aST: ndarray, aSMB: ndarray, ocean_thermal_forcing: ndarray, basin_runoff: ndarray, ice_shelf_fracture: bool, ocean_sensitivity: str, standard_ocean_forcing: bool, initial_year: int | None = None, numerics: str | None = None, ice_flow_model: str | None = None, initialization: str | None = None, initial_smb: str | None = None, velocity: str | None = None, bedrock_topography: str | None = None, surface_thickness: str | None = None, geothermal_heat_flux: str | None = None, res_min: str | None = None, res_max: str | None = None, model_configs: str | None = None, version: str = 'v1.1.0')[source]

Bases: object

Inputs for an ISEFlow-GrIS prediction.

Expects pre-computed anomaly arrays (aSMB, aST). If you have raw absolute forcing values instead, use the alternative constructor:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=..., sector=..., smb=..., st=...,
    ocean_thermal_forcing=..., basin_runoff=...,
    aogcm="hadgem2-es_rcp85",  # or custom_climatology={...}
    **ism_config_kwargs,
)

from_absolute_forcings() subtracts the ISMIP6 1960-1989 MAR climatological baseline automatically. Pass aogcm for a bundled ISMIP6 model or custom_climatology (dict with keys 'smb', 'st') for a CMIP model not in the bundled climatology.

aSMB: ndarray
aST: ndarray
basin_runoff: ndarray
bedrock_topography: str | None = None
classmethod from_absolute_forcings(year: ndarray, sector: int, smb: ndarray, st: ndarray, ocean_thermal_forcing: ndarray, basin_runoff: ndarray, aogcm: str | None = None, custom_climatology: dict | None = None, **kwargs) ISEFlowGrISInputs[source]

Construct ISEFlowGrISInputs from raw (non-anomaly) atmospheric forcings.

Subtracts the ISMIP6 1960-1989 MAR climatological baseline from each atmospheric variable to produce the anomaly arrays (aSMB, aST) required by the model. Ocean variables (ocean_thermal_forcing, basin_runoff) are absolute values and are passed through unchanged.

Exactly one of aogcm or custom_climatology must be provided.

Parameters:
  • year (np.ndarray) – Years (86 values, 2015-2100).

  • sector (int) – GrIS drainage basin number (1-6).

  • smb (np.ndarray) – Raw surface mass balance (86 values, mm w.e. yr⁻¹, matching the MAR Reference file units used in the bundled climatology CSV). The anomaly conversion automatically converts to kg m⁻² s⁻¹.

  • st (np.ndarray) – Raw surface temperature (86 values, K or °C, consistent with the MAR reference).

  • ocean_thermal_forcing (np.ndarray) – Ocean thermal forcing (86 values). Passed through unchanged.

  • basin_runoff (np.ndarray) – Basin-integrated runoff (86 values). Passed through unchanged.

  • aogcm (str, optional) – AOGCM name to look up in the bundled ISMIP6 climatology (e.g. 'hadgem2-es_rcp85'). Common alternate spellings are normalised automatically.

  • custom_climatology (dict, optional) – Baseline means for a CMIP model not in the bundled climatology. Must contain keys 'smb' and 'st' in MAR units.

  • **kwargs – All remaining keyword arguments are forwarded to ISEFlowGrISInputs.__init__ (e.g. ISM config fields such as numerics, ice_flow_model, model_configs, etc.).

Returns:

Fully validated inputs object ready for model.predict().

Return type:

ISEFlowGrISInputs

Examples

Using a bundled ISMIP6 climatology:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=1,
    smb=smb_array,
    st=st_array,
    ocean_thermal_forcing=otf_array,
    basin_runoff=runoff_array,
    aogcm="hadgem2-es_rcp85",
    initial_year=1990,
    numerics="fe",
    ice_flow_model="ho",
    initialization="dav",
    initial_smb="ra3",
    velocity="joughin",
    bedrock_topography="morlighem",
    surface_thickness="None",
    geothermal_heat_flux="g",
    res_min=1.0,
    res_max=7.5,
    standard_ocean_forcing=True,
    ocean_sensitivity="medium",
    ice_shelf_fracture=False,
)

Using a custom climatology for a new CMIP model:

inputs = ISEFlowGrISInputs.from_absolute_forcings(
    year=np.arange(2015, 2101),
    sector=1,
    smb=smb_array,
    st=st_array,
    ocean_thermal_forcing=otf_array,
    basin_runoff=runoff_array,
    custom_climatology={"smb": -241.2, "st": -22.8},
    initial_year=1990, ...
)
classmethod from_raw_values(*args, **kwargs)[source]

Deprecated — use from_absolute_forcings instead.

geothermal_heat_flux: str | None = None
ice_flow_model: str | None = None
ice_shelf_fracture: bool
initial_smb: str | None = None
initial_year: int | None = None
initialization: str | None = None
model_configs: str | None = None
numerics: str | None = None
ocean_sensitivity: str
ocean_thermal_forcing: ndarray
res_max: str | None = None
res_min: str | None = None
sector: ndarray | int
standard_ocean_forcing: bool
surface_thickness: str | None = None
to_df()[source]

Convert the dataclass fields to a pandas DataFrame.

Returns:

One row per timestep (86 rows) with all forcing and configuration columns needed by ISEFlow_GrIS.process().

Return type:

pandas.DataFrame

velocity: str | None = None
version: str = 'v1.1.0'
year: ndarray