Stochastic Weather Generators
MSTWeatherGen is Multivariate and Space-Time Weather Generator package is designed for the simulation of multivariate spatio-temporal meteorological variables. It provides tools for estimating the model parameters and generating synthetic weather data that can be used for a variety of applications, including climate research, agricultural or hydrological modeling.
https://sobakrim.github.io/MSTWeatherGen/index.html
Obakrim S., Benoit L., Allard D. (2025) A multivariate and space-time stochastic weather generator using a latent Gaussian framework. (Accepted for publication in SERRA) https://hal.science/hal-04715860/
WACSgen is a single-site, stationary multivariate weather generator for daily climate variables based on weather-states that uses a Markov chain for modeling the succession of (an unlimited number of) weather states. Conditionally to the weather states, the multivariate variables are modeled using the family of Complete skew-normal distributions. It is described in Flecher et~al. (2010).
WACS is available as a package on the R-CRAN repository. Follow this link.
A selection of research outcomes
Modeling and simulating spatio-temporal, multivariate and nonstationary Gaussian Random Fields: a Gaussian mixtures perspective (with Lionel Benoit and Said Obakrim)
This research was financially supported by the Chair Geolearning, funded by Andra, BNP-Paribas, CCR and the SCOR Foundation for science.
Gaussian Random Fields (GRFs) play a critical role in modeling and simulating environmental and climate-driven processes. The simulation of GRFs enables the representation of the variability of the process under study through the generation of multiple equally plausible realizations. Gaussian vectors corresponding to a sample of moderate size of a GRF can easily be simulated using the Cholesky decomposition of the associated covariance matrix, but this approach is limited to vectors of moderate size. To overcome this limitation, an interesting alternative is to rely on spectral methods that are based on the decomposition of the target GRF into spectral waves. This approach has been recently extended in various directions in order to make it more versatile, including in spatial, multivariate and spatio-temporal settings. To further increase the versatility of spectral simulation methods, we propose to revisit them adopting a Gaussian mixture perspective. This work leverages the Gaussian mixture perspective to propose extensions covering new classes of covariance functions for nonstationary (univariate or multivariate) spatio-temporal GRFs, as well as simulation algorithms for those that are currently missing in the framework of spectral simulation. All simulation methods are translated into pseudo-code algorithms, and an illustration is provided for a bivariate nonstationary spatio-temporal example.
Assessing multivariate bias corrections of climate simulations on various impact models under climate change (with Mathieu Vrac, Bastien François, Iñaki García de Cortázar-Atauri)
This research project was funded by the AAFCC (Adaptation of Agriculture and Forests to Climate Change) Metaprogram at INRAE.
Atmospheric variables simulated from climate models often present biases relative to the same variables calculated by reanalysis in the past. In order to use these models to assess the impact of climate change on processes of interest, it is necessary to correct these biases. Currently, the bias correction methods used operationally correct one-dimensional time series and are therefore applied separately, physical variable by physical variable and site by site. Multivariate bias correction methods have been developed to better take into account dependencies between variables and in space. Although the performance of multivariate bias correction methods for adjusting the statistical properties of simulated climate variables has already been evaluated, their effectiveness for different impact models has been little investigated. In this work, we propose a comparison between two multivariate bias correction methods (R2D2 and dOTC) in three different configurations (intervariable, spatial and spatial-intervariable) and a univariate correction (CDF-t) through several highly multivariate impact models (phenological stage, reference evapotranspiration, soil water content, fire weather index) integrating the weather conditions over a whole season. Our results show that CDF-t does a fair job in most situations and that there is no single best MBC method. The performances of multivariate bias correction methods depend both on some characteristics of the studied process and on the configuration of the chosen bias correction method. When all characteristics are important (multivariate, time cumulative and spatial) it is found that dOTC in its spatial-intervariable.
A submitted manuscript is avalaible on Copernicus
The full report is available on the hal preprint repository
The dataset is avalaible on the INRAE data repository
Distribution-based pooling for combination and multi-model bias correction of climate simulations
(with Mathieu Vrac, Grégoire Mariéthoz, Soulivanh Thao, and Lucas Schmutz)
For investigating, assessing, and anticipating climate change, tens of global climate models (GCMs) have been designed, each modelling the Earth system slightly differently. To extract a robust signal from the diverse simulations and outputs, models are typically gathered into multi-model ensembles (MMEs). Those are then summarized in various ways, including (possibly weighted) multi-model means, medians, or quantiles.
In this work, we introduce a new probability aggregation method termed “alpha pooling” which builds an aggregated cumulative distribution function (CDF) designed to be closer to a reference CDF over the calibration (historical) period. The aggregated CDFs can then be used to perform bias adjustment of the raw climate simulations, hence performing a “multi-model bias correction”. In practice, each CDF is first transformed according to a non-linear transformation that depends on a parameter α. Then, a weight is assigned to each transformed CDF. This weight is an increasing function of the CDF closeness to the reference transformed CDF. Key to the α pooling is a parameter α that describes the type of transformation and hence the type of aggregation, generalizing both linear and log-linear pooling methods. We first establish that α pooling is a proper aggregation method by verifying some optimal properties. Then, focusing on climate model simulations of temperature and precipitation over western Europe, several experiments are run in order to assess the performance of α pooling against methods currently available, including multi-model means and weighted variants. A reanalysis-based evaluation as well as a perfect model experiment and a sensitivity analysis to the set of climate models are run. Our findings demonstrate the superiority of the proposed pooling method, indicating that α pooling presents a potent way to combine GCM CDFs. The results of this study also show that our unique concept of CDF pooling strategy for multi-model bias correction is a credible alternative to usual GCM-by-GCM bias correction methods by allowing handling and considering several climate models at once.
The SPDE approach for spatio-temporal datasets with advection and diffusion
(with Lucia Clarotto, Thomas Romary and Nicolas Desassis)
In the context of spatio-temporal field prediction in environmental science, the introduction of physics-inspired models of the underlying phenomena that are numerically efficient is of growing interest in spatial statistics. The size of these datasets requires new numerical methods to process them efficiently. The EDPS (Stochastic Partial Differential Equation) approach has proven effective for estimation and prediction in a spatial context. We present here the advection-diffusion EDPS with a time derivative in order to extend the family of EDPS to the spatio-temporal context. By varying the coefficients of the differential operators, this approach makes it possible to define a large class of non-separable spatio-temporal models. An approximation of the solution of the EDPS by a Gaussian Markovian random field is constructed by discretizing the time derivative by a finite difference method (implicit Euler) and by solving the purely spatial EDPS by a finite element method (continuous Galerkin ) at each time step. The “Streamline Diffusion” stabilization technique is introduced when the advection term dominates the diffusion term. Efficient computational methods are proposed to estimate the parameters of the EDPS and to predict the spatio-temporal field by kriging. The approach is applied to a solar radiation data set. Its advantages and limitations are discussed.
Spatial statistics and stochastic partial differential equations: a mechanistic viewpoint (Spatial Statistics) [with Lionel Roques and Samuel Soubeyrand]
The Stochastic Partial Differential Equation (SPDE) approach is revisited from a mechanistic perspective based on the movement of microscopic particles, thereby relating pseudo-differential operators to dispersal kernels. A connection between L\'evy flights and PDEs involving the Fractional Laplacian (FL) operator is established. The corresponding Fokker-Planck PDEs will serve as a basis to propose new generalisations by considering a general form of SPDE with terms accounting for dispersal, drift and reaction. The difference between the FL operator (with or without linear reaction term) associated with a fat-tailed dispersal kernel and therefore describing long-distance dependencies and the damped FL operator associated with a thin-tailed kernel, thus corresponding to short-distance dependencies, is detailed. Then, SPDE-based random fields with non-stationary external spatially and temporally varying force are illustrated and nonlinear bistable reaction term are introduced. The physical meaning of the latter and possible applications are discussed. Returning to the particulate interpretation of the above-mentioned equations, their links with point processes is described in a particular case. We unravel the nature of the point processes they generate and show how such mechanistic models, associated to a probabilistic observation model, can be used in a hierarchical setting to estimate the parameters of the particle dynamics.
A general framework for SPDE-based stationary random fields (Bernoulli) [with R. Carizo Vergara and N. Desassis]
The paper presents theoretical advances in the application of the Stochastic Partial Differential Equation (SPDE) approach in geostatistics. We show a general approach to construct stationary models related to a wide class of SPDEs, with applications to spatio-temporal models having non-trivial properties. Within the framework of Generalized Random Fields, a criterion for existence and uniqueness of stationary solutions for a wide class of linear SPDEs is proposed and proven. Their covariance are then obtained through their associated spectral measure. We also present a result that relates the covariance in the case of a White Noise source term with that of a generic case through convolution. Using these results, we obtain a variety of SPDE-based stationary random fields. In particular, well-known results regarding the Matérn Model and models with Markovian behavior are recovered. A new relationship between the Stein model and a particular SPDE is obtained. New spatio-temporal models obtained from evolution SPDEs of arbitrary temporal derivative order are then obtained, for which properties of separability and symmetry can easily be controlled. We also obtain results concerning stationary solutions for physically inspired models, such as solutions for the heat equation, the advection-diffusion equation, some Langevin's equations and the wave equation. arXiv:1806.04999
Semi-parametric resampling with extremes (Spatial Statistics) [with Thomas Opitz and Grégoire Mariethoz]
Nonparametric resampling methods such as Direct Sampling are powerful tools to simulate new datasets preserving important data features such as spatial patterns from observed datasets while using only minimal assumptions. However, such methods cannot generate extreme events beyond the observed range of data values. We here propose using tools from extreme value theory for stochastic processes to extrapolate observed data towards yet unobserved high quantiles. Original data are first enriched with new values in the tail region, and then classical resampling algorithms are applied to enriched data. In a first approach to enrichment that we label “naive resampling”, we generate an independent sample of the marginal distribution while keeping the rank order of the observed data. We point out inaccuracies of this approach around the most extreme values, and therefore develop a second approach that works for datasets with many replicates. It is based on the asymptotic representation of extreme events through two stochastically independent components: a magnitude variable, and a profile field describing spatial variation. To generate enriched data, we fix a target range of return levels of the magnitude variable, and we resample magnitudes constrained to this range. We then use the second approach to generate heatwave scenarios of yet unobserved magnitude over France, based on daily temperature reanalysis training data for the years 2010 to 2016.
Simulating space-time random fields with nonseparable Gneiting-type covariance functions (Statistics and Computing) [with Xavier Emery, Céline Lacaux and Christian Lantuéjoul]
We propose two algorithms to simulate space-time Gaussian random fields with a covariance function belonging to an extended Gneiting class, the definition of which depends on a completely monotone function associated with the spatial structure and a conditionally negative definite function associated with the temporal structure. In both cases, the simulated random field is constructed as a weighted sum of cosine waves, with a Gaussian spatial frequency vector and a uniform phase. The difference lies in the way to handle the temporal component. The first algorithm relies on a spectral decomposition in order to simulate a temporal frequency conditional upon the spatial one, while in the second algorithm the temporal frequency is replaced by an intrinsic random field whose variogram is proportional to the conditionally negative definite function associated with the temporal structure. Both algorithms are scalable as their computational cost is proportional to the number of space-time locations, which may be unevenly spaced in space and/or in time. They are illustrated and validated through synthetic examples. arXiv: 1912.02026
A new class of α-transformations for the spatial analysis of Compositional Data (Spatial Statistics) [with Lucia Clarotto and Alessandra Menafoglio]
Georeferenced compositional data are prominent in many scientific fields and in spatial statistics. This work addresses the problem of proposing models and methods to analyze and predict, through kriging, this type of data. To this purpose, a novel class of -transformations, named the Isometric α-transformation (α-IT), is proposed, which encompasses the traditional Isometric Log-Ratio (ILR) transformation. Similarly to other α-transformations existing in the literature, it is shown that the ILR is the limit case of the α-IT as tends to 0 and that corresponds to a linear transformation of the data. Unlike the ILR, the proposed transformation accepts 0s in the compositions when Maximum likelihood estimation of the parameter is established. Prediction using kriging on α-IT transformed data is validated on synthetic spatial compositional data, using prediction scores computed either in the geometry induced by the α-IT, or in the simplex. Application to land cover data shows that the relative superiority of the various approaches w.r.t. a prediction objective depends on whether the compositions contained any zero component. When all components are positive, the limit cases (ILR or linear transformations) are optimal for none of the considered metrics. An intermediate geometry, corresponding to the α-IT with maximum likelihood estimate, better describes the dataset in a geostatistical setting. When the amount of compositions with 0s is not negligible, some side-effects of the transformation gets amplified as decreases, entailing poor kriging performances both within the α-IT geometry and for metrics in the simplex.
Means and covariance functions for spatial compositional data: an axiomatic approach (Mathematical Geosciences) [with T. Marchant]
Our work focuses on the characterization of the central tendency of a sample of compositional data. It provides new results about theoretical properties of means and covariance functions for compositional data, with an axiomatic perspective. Original results that shed new light on the geostatistical modeling of compositional data are presented.As a first result, it is shown that the weighted arithmetic mean is the only central tendency characteristic verifying a small set of axioms, namely reflexivity and marginal stability. Moreover, the weights must be identical for all components of the compositional vector.This result has deep consequences on the spatial multivariate covariance modeling of compositional data. In a geostatistical setting,it is shown as a second result that the proportional model of covariance functions (i.e. the product of a covariance matrix and a single correlation function) is the only model that provides identical kriging for all components of the compositional data. As a consequence of these two results, the proportional model of covariance function is the only covariance model compatible with reflexivity and marginal stability. A preprint can be found here. Publication in Mathematical Geosciences is here.
Half-tapering strategy for conditional simulation with large datasets [with D. Marcotte]
Gaussian conditional realizations are routinely used for risk assessment and planning in a variety of Earth sciences applications. Conditional realizations can be obtained by first creating unconditional realizations that are then post-conditioned by kriging. Many efficient algorithms are available for the first step, so the bottleneck resides in the second step. Instead of doing the conditional simulations with the desired covariance (F approach) or with a tapered covariance (T approach), we propose to use the taper covariance only in the conditioning step (Half-Taper or HT approach). This enables to speed up the computations and to reduce memory requirements for the conditioning step but also to keep the right short scale variations in the realizations. A criterion based on mean square error of the simulation is derived to help anticipate the similarity of HT to F. Moreover, an index is used to predict the sparsity of the kriging matrix for the conditioning step. Some guides for the choice of the taper function are discussed. The distributions of a series of 1D, 2D and 3D scalar response functions are compared for F, T and HT approaches. The distributions obtained indicate a much better similarity to F with HT than with T. A preprint is avalaible here. Publication in SERRA is here.
Multivariate space-time models [with M. Bourotte and E. Porcu]
Multivariate space-time data are increasingly recorded in various scientific disciplines. When analyzing these data, one of the key issue is to describe the multivariate space-time dependencies. In a Gaussian framework, this necessitates to propose relevant models for multivariate space-time covariance functions, mathematically described as matrix-valued covariance functions for which non-negative definiteness must be ensured. A new flexible parametric class of cross-covariance functions for multivariate space-time Gaussian random fields has been proposed where space-time components belong to the (univariate) Gneiting class of space-time covariance functions, with Matern or Cauchy covariance functions in the spatial dimensions. In this class, the smoothness and the scale parameters can be different for each variable. Sufficient conditions are provided, ensuring that this model is a valid matrix-valued covariance functionfor multivariate space-time random fields. Through a simulation study, it is shown that the parameters of this model can be efficiently estimated using weighted pairwise likelihood, which belongs to class of composite likelihood methods. A preprint is available here. Publication in Spatial Statistics is here.
Variograms for anisotropic random fields [with R. Senoussi and E. Porcu]
The question of building useful and valid models of anisotropic variograms for spatial data that go beyond classical anisotropy models (geometric and zonal models of anisotropy) is rarely addressed. In Allard, Senoussi and Porcu (Math. Geosciences) it is shown that if the regularity parameter is a continuous function of the direction, it must necessarily be constant. Instead, the scale parameter can vary in a continuous or discontinuous fashion with the direction according to a directional mixture representation, which allows to build a very large class of anisotropy models. A turning band algorithm for the simulation of Gaussian anisotropic processes, obtained from the mixture representation, is also presented.
Stochastic Weather Generators [with P. Naveau, P. Ailliot, V. Monbet, M. Bourotte]
A recurrent issue encountered in impact studies is to provide fast and realistic (in a distributional sense) simulations of atmospheric variables like temperatures, precipitation and winds at a few specific locations and at daily or hourly temporal scales. This stochastic inquiry leads to a large variety of so-called Stochastic Weather Generators (SWG) in the hydrological and weather literature. A concise and up-to-date review paper on Weather-states stochastic Weather Generators is available in Ailliot, Allard, Monbet and Naveau (2015).
To simulate multivariate daily time series (minimum and maximum temperatures, global radiation, wind speed and precipitation intensity) at a single site, WACS-Gen, a Weather-state Approach with conditionally multivariate Closed Skew-normal distribution is proposed in Flecher, Naveau, Allard, Brisson (WRR, 2010). WACS-Gen is able to accurately reproduce the statistical properties of these five variables, including time dependence. It takes advantage of two elements. First, the classical wet and dry days dichotomy used in most past weather generators is generalized to multiple weather states using clustering techniques. The transitions among weather states are modeled by a first order Markov chain. Secondly, the vector of our five daily variables of interest is sampled, conditionally on these weather states, from a closed skew-normal distribution, thus allowing to handle non-symmetric behaviors. WACS-Gen is coded in R, and is available upon request by emailing me.
Allard and Bourotte (2014) considers the problem disaggregating daily precipitations into hourly values with a transformed censored latent Gaussian process. Current researches aim at proposing relevant models for multisite, multivariate Stochastic Weather Generators.
Combining indicator probabilities [with D. D'Or, R. Froidevaux, A. Communian, P. Renard]
The need of combining in a probabilistic framework different sources of information is a frequent task in geoscience. For example, the probability of occurrence of a certain lithofacies at a given location can easily be computed conditionally on the values observed at other sources of information (sample observations, geophysics, remote sensing, training images). The problem of aggregating these different conditional probability distributions into a single conditional distribution arises as an approximation to the inaccessible genuine conditional probability given all information. Allard, Communian and Renard (2012) makes a formal review of most aggregation methods with a particular focus on their mathematical properties. Exact relationships relating the different methods is emphasized. The case of events with more than 2 possible outcomes is treated in details. It is shown that in this case, equivalence between different aggregation formulas is lost. It is proved that the log-linear pooling formulas with parameters estimated from maximum likelihood are calibrated. These results are illustrated on simulations from two common stochastic models for earth science: the truncated Gaussian model and the Boolean model.
When considering the problem of the spatial prediction of a categorical variable given a set of observations at surrounding locations, a useful approximation of the conditional probability of observing a category at a location is obtained with a particular maximum entropy principle. It leads to a simple combination of sums and products of univariate and bivariate probabilities. This prediction equation can be used for categorical estimation or categorical simulation. In Allard, D'Or and Froideveaux (2011), connections are made to earlier work on prediction of categorical variables. In particular, it is a parameter free, suboptimal, special case of log-linear pooling.
Skew normal random fields [with P. Naveau]
Skewness is often present in a wide range of environmental problems, and modelling it in the spatial context remains a challenging problem. In Allard and Naveau (2007), a new family of skewed random fields based on the multivariate closed skew-normal distribution is proposed. Such fields can be written as the sum of two independent fields; one Gaussian and the other truncated Gaussian. This model contains very few parameters while still incorporating the classical spatial structures used in geostatistics. Crucially, a high degree of skewness can be induced through the use of a single skewness parameter. It is thus possible to compute the first- and second-order moments of our skewed fields, as well as deriving the properties of the sample variogram and covariance. This leads to a method of moments algorithm to estimate the parameters.
Zones of Abrupt Changes [with Edith Gabriel and J.N. Bacro]
Estimating the zones where a variable under study changes abruptly is a problem encountered in many biological, ecological, agricultural or environmental applications. In Gabriel, Allard and Bacro (2011), a method is proposed for detecting the zones where a spatially correlated variable irregularly sampled in the plane changes abruptly. The general model is that under the null hypothesis the variable is the realization of a stationary Gaussian process with constant expectation. The alternative is that the mean function is discontinuous on some curves in the plane. The general approach is a global aggregation of local tests of the hypothesis of a local constant mean vs. the alternative of the existence of a discontinuity. The theory that links the local and global levels is based on asymptotic distributions of excursion sets of non-stationary khi2 fields. It is thus possible to control the global type I error and to simultaneously estimate the covariance function and the ZACs in the case of an unknown mean. This method is easy to use, to visualise and to interpret. An R set of functions, detecZAC, can be downloaded from Edith Gabriel's homepage.
CART algorithm for spatial data [with Liliane Bel and Avner Bar-Hen]
Classification And Regression Trees (CART) assume independent samples to compute classification rules. This assumption is very practical for estimating quantities involved in the algorithm and for assessing asymptotic properties of estimators. Unfortunately, in most environmental or ecological applications, the data under study present some amount of spatial correlation. When the sampling scheme is very irregular, a direct application of supervised classification algorithms leads to biased discriminant rules due, for example, to the possible oversampling of some areas. In Bel, Allard, Laurent, Cheddadi and Bar-Hen (2009), two approaches for taking this spatial dependence into account are considered. The first one takes into account the irregularity of the sampling by weighting the data according to their spatial pattern using two existing methods based on Voronoï tessellation and regular grid, and one original method based on kriging. The second one uses spatial estimates of the quantities involved in the construction of the discriminant rule at each step of the algorithm.