19 Module 2: Modelling Foundations
Tools, Machinery, and the Logic of Exploration
19.1 §2.1 Representation: Boundaries, Scale, and Aggregation
A model does not represent reality; it represents selected features of reality in a form that serves a purpose. The selection is never neutral. Every model boundary excludes something. Every scale choice aggregates something. Every aggregation hides variation that might be consequential. The question is not whether these choices involve trade-offs but whether they are made consciously and in service of the decision the model is meant to support. The decision-first boundary principle of §0.3 applies to representation directly: boundaries, scale, and aggregation level are set by what must remain visible for the comparison to be valid.
Boundaries determine what is inside the model and what is treated as external or given. A spatial boundary defines the geographic or physical extent. A temporal boundary defines the period and resolution. A sectoral boundary determines which interacting systems are included. An institutional boundary determines which actors and their decisions are represented. Under the decision-first principle, each boundary choice should be answerable by the question: is this inclusion or exclusion required to preserve a consequence that is relevant to the comparison? If a consequence that would change the decision ranking lies outside the boundary, the boundary is wrong. If a consequence that does not change the decision ranking lies inside the boundary at significant computational cost, the boundary is over-specified. The principle is discriminating rather than permissive: it includes what the decision requires and excludes what the decision does not require.
Scale determines the spatial, temporal, and organisational level at which model variables are defined and dynamics are represented. A national energy model operates at the scale of aggregated sectors and annual time periods. A site-level process simulation operates at the scale of individual equipment and sub-hourly time steps. Each scale is appropriate for the consequences it is designed to reveal; neither is appropriate for the consequences the other was designed to reveal. In decision-centred modelling, the appropriate scale for any component of the analytical environment is the scale at which the decision-relevant consequences become visible. This is a practical criterion rather than a formal one: it requires knowing what the consequences are before knowing what scale is needed to reveal them, which is why the decision must be articulated before the representation is designed.
Aggregation is unavoidable in modelling but carries costs that the decision-first principle makes explicit. Two industrial facilities with the same annual energy demand may have very different temporal demand profiles, peak demand characteristics, and grid interface consequences. An aggregation that combines them into one load profile may preserve the annual total while destroying the temporal structure that determines whether the GXP hosting capacity is exceeded during winter peak periods. This is not a deficiency of aggregation in general; it is a deficiency of aggregation that removes detail that is decision-relevant. The test is always the same: does this aggregation change the decision-relevant comparison? If yes, it is destructive simplification. If no, it is appropriate abstraction.
The Edendale case illustrates this test concretely. Aggregating the facility’s heat demand to an annual total would make the GXP adequacy assessment in Module 6 meaningless. The adequacy assessment depends critically on the temporal coincidence of peak electrical demand with regional grid stress conditions: the question of whether the grid’s hosting capacity is exceeded in the hours of highest simultaneous demand in a dry hydro year, during winter peak periods. That question cannot be answered from an annual total. The DemandPack artefact preserves the hourly resolution precisely because the decision-relevant consequence requires it, and the GXP assessment confirms, through the 23-of-64 finding, that the temporal structure is not a decorative detail but a material determinant of pathway feasibility.
The artefact schema decisions that govern exactly which fields must be preserved at which resolution are developed in Module 3 §3.5, where the canonical artefact families are specified. Section §2.2 now turns to the analytical machinery that operates on representations to generate consequences.
19.2 §2.2 Optimisation as Consequence Generation
Optimisation is the most powerful analytical machinery available for a wide class of planning problems, and it has an important role throughout the framework’s analytical chain. What it cannot do is make the decisions the framework is designed to support. The distinction is not pedantic; it has practical consequences for how optimisation is deployed and interpreted.
Optimisation answers a conditional question: given this representation, these constraints, these costs, and these future conditions, what allocation or configuration minimises total cost or maximises some specified objective? That conditional answer is analytically valuable. It tells the decision-maker what is efficient within the declared formulation. What it does not tell the decision-maker is whether the formulation is the right one, whether the declared future conditions are the ones that matter, or whether the optimal solution within this boundary remains optimal when consequences outside the boundary are considered. Optimisation as consequence generation is the framing that resolves this tension. Optimisation generates the consequences associated with each alternative under each future; the DMDU outer layer evaluates those consequences through regret, robustness, and satisficing metrics. Optimisation is a module-internal tool; the decision process is the outer layer. Keeping these roles distinct is what allows optimisation to be applied rigorously without the illusion of closure that §1.2 identified as one of its characteristic failure modes.
Different optimisation classes serve different analytical roles within this framework. Linear programming is especially suited to dispatch and screening problems where the relationships between decision variables and objectives are approximately linear and where transparency and marginal interpretation matter. LP dual variables provide shadow prices that are directly interpretable as value indicators: the shadow price of a capacity constraint is the marginal cost of relaxing that constraint, which is precisely the kind of adequacy signal that a regional electricity module should pass to the evaluation layer through a SignalsPack. PyPSA, which uses Linopy as its algebraic modelling layer, produces these dual variables as standard outputs and is the natural implementation tool for the regional electricity module specified in SM-6.6-E.
Mixed-integer programming handles discrete choices that LP cannot represent: whether a technology is committed or idle in a given period, whether a staging decision is taken in year one or year five, whether a particular upgrade option is installed or not. MILP is appropriate for the unit-commitment-lite dispatch logic already implemented in the proof of concept’s optimal-subset mode, and for the upgrade option selection in the grid RDM evaluation.
Stochastic programming extends LP and MILP to represent uncertainty explicitly through scenarios with probabilities, making it appropriate when the uncertainty within a module has a defensible probabilistic structure and when the cost of recourse decisions is part of what is being optimised. This is the right tool when uncertainty is parametric, well-characterised, and specific to a module’s internal operation. It is not the right tool for the outer DMDU ensemble, where the uncertainty is structural rather than parametric and probabilities are contested. The outer ensemble is handled by the DMDU orchestration layer, not by stochastic programming.
The open-source modelling ecosystem provides mature implementations of all three optimisation classes that are architecturally compatible with the framework. PyPSA and Linopy provide network-aware LP and MILP for the regional electricity module. Calliope provides flexible multi-scale energy system optimisation with strong transparency commitments. OSeMOSYS provides a lean, open-source long-range energy system modelling framework that has been used in national energy planning contexts across multiple countries and is compatible with the FutureArtefact ensemble design approach. Sub-Module SM-2.2-A provides the formal problem statements and the framework role summary for each optimisation class.
LP, MILP, and Stochastic Optimisation: Role Summary — formal problem statements, role table, and PyPSA/Linopy integration note — is in SM-2.2-A. Skip if the optimisation classes and their appropriate uses are already known.
19.3 §2.3 Simulation and Surrogate Emulation
Optimisation is the right machinery when the question is what is best under a stated formulation. When the question is what happens, how a system evolves, how it responds to different sequences of conditions, or whether a declared operating logic remains viable across different futures, simulation is the appropriate tool. Simulation is not a fallback for cases where optimisation is intractable. It is a different kind of analytical machinery for a different kind of question.
Simulation generates time-resolved trajectories of a system’s behaviour under specified rules, initial conditions, and external forcing. Its strengths are temporal realism, the representation of chronology and sequence, the ability to capture feedback and threshold effects, and its natural compatibility with uncertainty exploration: once a system can be run repeatedly under different conditions, examining distributions of consequences across those conditions becomes straightforward. In the framework’s Facility Module, the proportional dispatch logic already implemented in the proof of concept is a simulation: at each hourly timestep, heat demand is allocated across available units by capacity fraction, producing a time-resolved dispatch record that feeds the incremental electricity calculation. The OpenModelica-based thermal network simulation specified in SM-6.4-C is a higher-fidelity simulation of the same facility using equation-based physical modelling of connected thermal components.
OpenModelica is particularly suited to the Facility Module because it can represent the physical connectivity of steam headers, heat exchangers, boilers, and heat pumps as a network of components governed by conservation equations. This level of physical fidelity is warranted when the dispatch pattern, peak demand timing, or thermal storage behaviour materially affects the GXP adequacy assessment. It is not warranted in a first-generation proof of concept where the primary analytical question concerns the architectural validity of the site-to-region coupling rather than the precision of the dispatch schedule. The progressive-refinement philosophy determines when OpenModelica is introduced: when the regret sensitivity diagnostics indicate that the proportional dispatch simplification is materially changing the regional adequacy finding.
Surrogate emulation addresses the computational constraint that becomes binding when an expensive module must be evaluated many times inside a large DMDU ensemble. A full PyPSA regional electricity optimisation for the Southland transmission context under one future takes several minutes of computation. Running it for 500 or 5,000 futures in the ensemble is impractical at that cost. A trained ML surrogate of the PyPSA module, which learns to approximate the module’s input-output mapping from a sample of full model runs, reduces the per-evaluation cost by several orders of magnitude while retaining the analytical content that matters for the decision.
The key design principle for surrogate emulation under the decision-first boundary principle is that the surrogate emulates the module at its output boundary, not at its internal variable level. A surrogate for the regional electricity module needs to produce a valid SignalsPack, the schema-conforming governed output that the Facility Module and evaluation layer consume. It does not need to reproduce every nodal price, every line loading fraction, or every dispatch dispatch decision internal to the PyPSA model. This reduction in output dimensionality is not a compromise on fidelity; it is the decision-first principle applied to the surrogate specification: what must be emulated is what crosses the thin-waist boundary, and that is the SignalsPack fields.
The appropriate validation criterion for a DMDU surrogate follows from the same principle. Standard ML validation metrics, RMSE, R², mean absolute error, measure average prediction accuracy across all outputs and all inputs. What matters for a DMDU application is whether the surrogate produces the same decision-relevant outputs: the same preferred alternative, the same regret ranking, and the same threshold-violation pattern as the full model, across the futures where the decision is most sensitive. This is the decision-ranking preservation criterion, and it is the acceptance standard against which every surrogate in this framework is validated. Sub-Module SM-2.3-A develops the full validation protocol.
Decision-Ranking Preservation: Validation Protocol — formal definition, validation workflow, confidence scoring, and active learning protocol — is in SM-2.3-A. Skip if the surrogate validation approach is already known.
19.4 §2.4 Constructing and Selecting Futures
Representation and analytical machinery together determine how the model generates consequences for a given set of conditions. Futures determine which conditions are tested. A poorly designed future set can narrow the analytical problem as completely as a poorly designed model; a future set that explores only familiar territory will not reveal the vulnerabilities that a long-horizon commitment may harbour.
Uncertainties relevant to a decision problem fall into four classes. Parametric uncertainty concerns the values of quantities already present in the representation: technology costs, fuel prices, demand growth rates, resource availability indices. These are naturally varied through multipliers applied to reference values and are the most straightforward class to represent. Structural uncertainty concerns the possibility that the model itself may be incomplete or that the causal pathways it represents may be incorrect: a regional electricity model that does not represent multi-site demand aggregation may systematically underestimate hosting capacity stress regardless of parameter values. Policy uncertainty concerns future regulatory settings, carbon price trajectories, infrastructure investment programmes, and institutional arrangements. Policy uncertainty often manifests as discrete regime shifts rather than continuous parameter variation, making scenario classes a more appropriate representation than probability distributions. Behavioural uncertainty concerns how actors respond when conditions change: whether competing industrial users electrify simultaneously, whether regulators enforce compliance, whether Transpower prioritises a specific reinforcement project. The five uncertain driver dimensions in the Edendale ensemble, GXP headroom, regional demand growth, hydro year class, biomass availability, and ETS carbon price, span all four classes.
The paired-futures requirement, introduced in §1.4 as a consequence of Requirement 5 in §1.7, has a direct implication for future construction. All pathway alternatives must be evaluated under identical external conditions in every future. This means the FutureArtefact for a given future carries the same uncertain driver values regardless of which pathway is being evaluated under it. Differences in performance outcomes across pathways are then attributable to pathway characteristics, not to asymmetric experimental conditions. The futures.csv file in the Edendale proof of concept enforces this requirement: the same file is used as the source of futures for both the 2035_EB and 2035_BB grid RDM evaluations, and the overlay module verifies that both evaluations reference identical future identifiers before computing the comparison.
The selection of which futures to include in the ensemble is the most analytically consequential design choice, and it is where the decision-first principle most directly displaces the instinct to explore input-space extremes. The futures that earn a place in the ensemble are not those with the most extreme individual input values but those that are most decision-critical in the outcome space: the futures that produce rank reversals between competing pathways, threshold violations that would not be visible at central-case conditions, or regret concentrations that distinguish the strategies most sharply. This is the inversion from input-extremity to consequence-centrality that defines decision-critical future selection.
In practice, ensemble construction proceeds through a staged expansion. The first stage produces a small set of hand-designed anchor futures, typically 21 in the Edendale ensemble, that span the most analytically important combinations identified by expert knowledge and preliminary screening. These anchor futures are preserved exactly in all subsequent ensemble versions, providing a stable regression test baseline. The second stage expands the ensemble to 100 futures using Latin hypercube sampling across continuous driver dimensions, preserving the 21 anchors and adding 79 futures that improve coverage of the uncertainty space. The third stage, which requires a trained and validated surrogate of the regional module, expands the ensemble to several hundred or thousand futures, making statistically meaningful scenario discovery and vulnerability mapping feasible. The three-stage progression is governed at each step by the decision-first principle: expansion is warranted when regret sensitivity diagnostics indicate that the current ensemble is missing futures that would change the decision-relevant comparison.
The diagnostic methods that identify decision-critical futures and vulnerability regions, PRIM and CART, are developed in Sub-Module SM-2.4-A.
Scenario Discovery Methods: PRIM and CART — algorithms, worked Edendale application, and comparison table — is in SM-2.4-A. Skip if PRIM and CART are already known.
19.5 §2.5 From Tools to Architecture: What the Toolkit Cannot Do Alone
The four sections of this module have developed the analytical toolkit available to the framework: the representation choices governed by decision-relevance, the optimisation machinery that generates conditional consequences, the simulation and surrogate tools that extend the range of questions that can be asked and the scale at which they can be asked, and the future construction and selection logic that determines which conditions are tested. This toolkit is powerful. It is not sufficient.
The insufficiency has a specific character. Tools cannot draw their own appropriate boundaries; the decision-first principle must be applied before any tool is configured. Tools cannot manage their own comparability across versions; a second-generation surrogate that produces different consequences from a first-generation full model represents progress only if the comparison between the two is itself tractable, which requires that both operated within a shared schema and that their outputs were stored with shared provenance. Tools cannot preserve their own analytical history; a result produced under a first-generation proportional dispatch assumption is only comparable with one produced under a second-generation LP-based scheduling assumption if both are stored with their generating conditions intact and linked through explicit lineage. Tools cannot validate their own outputs; a surrogate that satisfies decision-ranking preservation on a random held-out set but fails it on the decision-critical futures it encounters in deployment provides misleading confidence unless the validation was designed to test the right conditions. None of these properties can be provided by the tools themselves, regardless of how sophisticated those tools become.
What the toolkit requires that it cannot provide for itself is a governance architecture: the infrastructure that enforces schema conformance at every module output, maintains explicit lineage from every result back to the inputs and conditions that produced it, gates admission to the comparison chain through declared acceptance criteria, and preserves the append-only analytical history that makes progressive refinement cumulative rather than destructive.
The architecture of Module 3 does not introduce new tools. It creates the conditions under which the tools described in this module satisfy the seven requirements of §1.7. Requirement 5, comparability across alternatives and futures, is satisfied by the paired-futures design and schema-governed artefact exchange that Module 3 specifies. Requirement 6, traceability and revisability, is satisfied by the governed backbone that Module 3 designs. The other requirements follow from the combination of the tools of Module 2 operating within the boundaries of the architecture of Module 3. The governance architecture is not overhead; it is what transforms a collection of sophisticated tools into a decision-centred analytical environment.