ise.evaluation
Evaluation metrics for ice sheet emulator predictions.
Covers three categories of metrics:
Point metrics — R², MSE, MAPE, relative squared error, sector-wise MSE.
Probabilistic / uncertainty metrics — CRPS (Gaussian), Winkler score, Expected Calibration Error (ECE), prediction interval width.
Distribution metrics — KL divergence, JS divergence, KS test, two-sample t-test.
Spatial aggregation —
sum_by_sectorfor basin-level aggregation.
Submodules
ise.evaluation.metrics
Evaluation metrics for ISEFlow sea-level projections.
This module collects the full set of metrics used to assess ISEFlow prediction quality across three dimensions: point accuracy, distributional fidelity, and uncertainty calibration.
Point accuracy metrics
r2_score — coefficient of determination (R²).
mean_squared_error — MSE between predicted and true SLE.
mean_absolute_error — MAE.
mape — mean absolute percentage error (skips zero targets).
relative_squared_error— RSE = SS_res / SS_tot (0 = perfect, 1 = baseline mean).
crps — continuous ranked probability score for Gaussian forecasts (lower is better; uses properscoring library).
Distribution metrics
These compare the distribution of projected SLE values at 2100 across many runs, rather than individual timestep accuracy:
kl_divergence — Kullback-Leibler divergence between predicted and true PDFs.
js_divergence — Jensen-Shannon divergence (symmetric, bounded [0, 1]).
kolmogorov_smirnov— KS two-sample test statistic and p-value.
t_test — two-sample t-test statistic and p-value.
Uncertainty calibration metrics
calculate_ece— Expected Calibration Error (ECE). Measureshow well the predicted std aligns with actual errors by binning predictions by uncertainty level and checking what fraction of true values fall within ±2σ (expected ≈ 95.4 % for a Gaussian). Lower ECE = better calibrated.
mean_prediction_interval_width— average width of prediction intervals(sharpness proxy; should be small while ECE stays low).
winkler_score— proper scoring rule for interval forecasts atsignificance level α. Penalises both wide intervals and violations.
Sector aggregation
sum_by_sector— given a full 2-D (x, y) grid array and a sector-definitionNetCDF file, sums values within each sector mask to produce an
(N_timesteps, N_sectors)matrix:from ise.evaluation.metrics import sum_by_sector sector_sums = sum_by_sector(predicted_grid, "AIS_sectors_8km.nc")
Usage example
from ise.evaluation.metrics import r2_score, crps, calculate_ece
r2 = r2_score(y_true, predictions)
score = crps(y_true, predictions, uncertainties["total"])
ece = calculate_ece(predictions, uncertainties["total"], y_true)
- ise.evaluation.metrics.calculate_ece(predictions, uncertainties, true_values, bins=10)[source]
Computes the Expected Calibration Error (ECE) for a regression model.
- Parameters:
predictions (numpy.ndarray) – The predicted mean values.
uncertainties (numpy.ndarray) – The predicted standard deviations.
true_values (numpy.ndarray) – The true values.
bins (int, optional) – The number of bins for uncertainty grouping. Defaults to 10.
- Returns:
The Expected Calibration Error (ECE).
- Return type:
float
Notes
ECE measures how well predicted uncertainties align with actual errors.
A lower ECE indicates better-calibrated uncertainty estimates.
- ise.evaluation.metrics.crps(y_true, y_pred, y_std)[source]
Computes the Continuous Ranked Probability Score (CRPS) for a Gaussian distribution.
- Parameters:
y_true (numpy.ndarray) – The true values.
y_pred (numpy.ndarray) – The predicted mean values.
y_std (numpy.ndarray) – The predicted standard deviations.
- Returns:
The computed CRPS values for each prediction.
- Return type:
numpy.ndarray
- ise.evaluation.metrics.js_divergence(p: ndarray, q: ndarray)[source]
Computes the Jensen-Shannon Divergence (JSD) between two probability distributions.
- Parameters:
p (numpy.ndarray) – The first probability distribution.
q (numpy.ndarray) – The second probability distribution.
- Returns:
The Jensen-Shannon divergence value.
- Return type:
float
Notes
JSD is a smoothed and symmetric version of KL divergence.
The function normalizes the distributions before computation.
- ise.evaluation.metrics.kl_divergence(p: ndarray, q: ndarray)[source]
Computes the Kullback-Leibler (KL) Divergence between two probability distributions.
- Parameters:
p (numpy.ndarray) – The first probability distribution.
q (numpy.ndarray) – The second probability distribution.
- Returns:
The KL divergence value.
- Return type:
float
Notes
The distributions p and q must be normalized (i.e., sum to 1).
Small epsilon values are used to avoid numerical instability.
- ise.evaluation.metrics.kolmogorov_smirnov(x1, x2)[source]
Computes the Kolmogorov-Smirnov (KS) statistic to compare two distributions.
- Parameters:
x1 (numpy.ndarray or list) – The first dataset.
x2 (numpy.ndarray or list) – The second dataset.
- Returns:
(KS statistic, p-value).
- Return type:
tuple
- ise.evaluation.metrics.mape(y_true, y_pred)[source]
Computes the Mean Absolute Percentage Error (MAPE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The MAPE value, expressed as a percentage.
- Return type:
float
Notes
MAPE ignores zero values in y_true to prevent division by zero.
If all true values are zero, returns infinity.
- ise.evaluation.metrics.mean_absolute_error(y_true, y_pred)[source]
Computes the Mean Absolute Error (MAE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The Mean Absolute Error (MAE).
- Return type:
float
- ise.evaluation.metrics.mean_prediction_interval_width(upper_bound, lower_bound)[source]
Computes the Mean Prediction Interval Width (MPIW).
- Parameters:
upper_bound (numpy.ndarray or list) – The upper bounds of the prediction intervals.
lower_bound (numpy.ndarray or list) – The lower bounds of the prediction intervals.
- Returns:
The Mean Prediction Interval Width (MPIW).
- Return type:
float
- ise.evaluation.metrics.mean_squared_error(y_true, y_pred)[source]
Computes the Mean Squared Error (MSE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The Mean Squared Error (MSE).
- Return type:
float
- ise.evaluation.metrics.mean_squared_error_sector(sum_sectors_true, sum_sectors_pred)[source]
Computes the mean squared error (MSE) between true and predicted sector-wise sums.
- Parameters:
sum_sectors_true (numpy.ndarray) – The true summed sector values.
sum_sectors_pred (numpy.ndarray) – The predicted summed sector values.
- Returns:
The mean squared error (MSE).
- Return type:
float
- ise.evaluation.metrics.r2_score(y_true, y_pred)[source]
Computes the coefficient of determination (R² score).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The R² score, where 1 indicates perfect predictions.
- Return type:
float
- ise.evaluation.metrics.relative_squared_error(y_true, y_pred)[source]
Computes the Relative Squared Error (RSE), measuring the error relative to the variance in y_true.
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The computed RSE value.
- Return type:
float
Notes
A lower RSE indicates better performance, with RSE=0 indicating perfect predictions.
- ise.evaluation.metrics.sum_by_sector(array, grid_file)[source]
Computes the sum of values in a given array by predefined sectors using a grid file.
- Parameters:
array (numpy.ndarray or torch.Tensor) – A 2D or 3D array containing values to be summed by sector.
grid_file (str or xarray.Dataset) – Path to the grid file defining sector boundaries or an xarray dataset.
- Returns:
A 2D array where each row represents a timestep and each column represents a sector.
- Return type:
numpy.ndarray
- Raises:
ValueError – If grid_file is not a valid string or xarray dataset.
- ise.evaluation.metrics.t_test(x1, x2)[source]
Performs an independent two-sample t-test to compare the means of two distributions.
- Parameters:
x1 (numpy.ndarray or list) – The first dataset.
x2 (numpy.ndarray or list) – The second dataset.
- Returns:
(t-statistic, p-value).
- Return type:
tuple
- ise.evaluation.metrics.winkler_score(y_true, y_pred, lower_bound, upper_bound, alpha=0.05)[source]
Computes the Winkler Score for prediction intervals.
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted mean values.
lower_bound (numpy.ndarray or list) – The lower bounds of the prediction intervals.
upper_bound (numpy.ndarray or list) – The upper bounds of the prediction intervals.
alpha (float, optional) – The significance level for the prediction intervals. Defaults to 0.05.
- Returns:
The Winkler Score.
- Return type:
float
Module contents
Evaluation metrics for ice sheet emulator predictions.
This package provides metrics (e.g. R², MSE, CRPS, ECE, sector-wise sums) for assessing point predictions and uncertainty quantification.
- ise.evaluation.calculate_ece(predictions, uncertainties, true_values, bins=10)[source]
Computes the Expected Calibration Error (ECE) for a regression model.
- Parameters:
predictions (numpy.ndarray) – The predicted mean values.
uncertainties (numpy.ndarray) – The predicted standard deviations.
true_values (numpy.ndarray) – The true values.
bins (int, optional) – The number of bins for uncertainty grouping. Defaults to 10.
- Returns:
The Expected Calibration Error (ECE).
- Return type:
float
Notes
ECE measures how well predicted uncertainties align with actual errors.
A lower ECE indicates better-calibrated uncertainty estimates.
- ise.evaluation.crps(y_true, y_pred, y_std)[source]
Computes the Continuous Ranked Probability Score (CRPS) for a Gaussian distribution.
- Parameters:
y_true (numpy.ndarray) – The true values.
y_pred (numpy.ndarray) – The predicted mean values.
y_std (numpy.ndarray) – The predicted standard deviations.
- Returns:
The computed CRPS values for each prediction.
- Return type:
numpy.ndarray
- ise.evaluation.js_divergence(p: ndarray, q: ndarray)[source]
Computes the Jensen-Shannon Divergence (JSD) between two probability distributions.
- Parameters:
p (numpy.ndarray) – The first probability distribution.
q (numpy.ndarray) – The second probability distribution.
- Returns:
The Jensen-Shannon divergence value.
- Return type:
float
Notes
JSD is a smoothed and symmetric version of KL divergence.
The function normalizes the distributions before computation.
- ise.evaluation.kl_divergence(p: ndarray, q: ndarray)[source]
Computes the Kullback-Leibler (KL) Divergence between two probability distributions.
- Parameters:
p (numpy.ndarray) – The first probability distribution.
q (numpy.ndarray) – The second probability distribution.
- Returns:
The KL divergence value.
- Return type:
float
Notes
The distributions p and q must be normalized (i.e., sum to 1).
Small epsilon values are used to avoid numerical instability.
- ise.evaluation.kolmogorov_smirnov(x1, x2)[source]
Computes the Kolmogorov-Smirnov (KS) statistic to compare two distributions.
- Parameters:
x1 (numpy.ndarray or list) – The first dataset.
x2 (numpy.ndarray or list) – The second dataset.
- Returns:
(KS statistic, p-value).
- Return type:
tuple
- ise.evaluation.mape(y_true, y_pred)[source]
Computes the Mean Absolute Percentage Error (MAPE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The MAPE value, expressed as a percentage.
- Return type:
float
Notes
MAPE ignores zero values in y_true to prevent division by zero.
If all true values are zero, returns infinity.
- ise.evaluation.mean_absolute_error(y_true, y_pred)[source]
Computes the Mean Absolute Error (MAE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The Mean Absolute Error (MAE).
- Return type:
float
- ise.evaluation.mean_prediction_interval_width(upper_bound, lower_bound)[source]
Computes the Mean Prediction Interval Width (MPIW).
- Parameters:
upper_bound (numpy.ndarray or list) – The upper bounds of the prediction intervals.
lower_bound (numpy.ndarray or list) – The lower bounds of the prediction intervals.
- Returns:
The Mean Prediction Interval Width (MPIW).
- Return type:
float
- ise.evaluation.mean_squared_error(y_true, y_pred)[source]
Computes the Mean Squared Error (MSE).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The Mean Squared Error (MSE).
- Return type:
float
- ise.evaluation.mean_squared_error_sector(sum_sectors_true, sum_sectors_pred)[source]
Computes the mean squared error (MSE) between true and predicted sector-wise sums.
- Parameters:
sum_sectors_true (numpy.ndarray) – The true summed sector values.
sum_sectors_pred (numpy.ndarray) – The predicted summed sector values.
- Returns:
The mean squared error (MSE).
- Return type:
float
- ise.evaluation.r2_score(y_true, y_pred)[source]
Computes the coefficient of determination (R² score).
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The R² score, where 1 indicates perfect predictions.
- Return type:
float
- ise.evaluation.relative_squared_error(y_true, y_pred)[source]
Computes the Relative Squared Error (RSE), measuring the error relative to the variance in y_true.
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted values.
- Returns:
The computed RSE value.
- Return type:
float
Notes
A lower RSE indicates better performance, with RSE=0 indicating perfect predictions.
- ise.evaluation.sum_by_sector(array, grid_file)[source]
Computes the sum of values in a given array by predefined sectors using a grid file.
- Parameters:
array (numpy.ndarray or torch.Tensor) – A 2D or 3D array containing values to be summed by sector.
grid_file (str or xarray.Dataset) – Path to the grid file defining sector boundaries or an xarray dataset.
- Returns:
A 2D array where each row represents a timestep and each column represents a sector.
- Return type:
numpy.ndarray
- Raises:
ValueError – If grid_file is not a valid string or xarray dataset.
- ise.evaluation.t_test(x1, x2)[source]
Performs an independent two-sample t-test to compare the means of two distributions.
- Parameters:
x1 (numpy.ndarray or list) – The first dataset.
x2 (numpy.ndarray or list) – The second dataset.
- Returns:
(t-statistic, p-value).
- Return type:
tuple
- ise.evaluation.winkler_score(y_true, y_pred, lower_bound, upper_bound, alpha=0.05)[source]
Computes the Winkler Score for prediction intervals.
- Parameters:
y_true (numpy.ndarray or list) – The true values.
y_pred (numpy.ndarray or list) – The predicted mean values.
lower_bound (numpy.ndarray or list) – The lower bounds of the prediction intervals.
upper_bound (numpy.ndarray or list) – The upper bounds of the prediction intervals.
alpha (float, optional) – The significance level for the prediction intervals. Defaults to 0.05.
- Returns:
The Winkler Score.
- Return type:
float