Tile low-rank approximations of non-Gaussian space and space-time Tukey g-and-h random field likelihoods and predictions on large-scale systems

Research output: Contribution to journalArticlepeer-review

Abstract

Large-scale statistical modeling has become necessary with the vast flood of geospace data coming from various sources. In space statistics, the Maximum Likelihood Estimation (MLE) is widely considered for modeling geospace data by estimating a set of statistical parameters related to a predefined covariance function. This covariance function describes the correlation between a set of geospace locations where the main goal is to model given data samples and impute missing data. Climate/weather modeling is a prevalent application for the MLE operation where data interpolation and forecasting are highly required. In the literature, the Gaussian random field is often used to describe geospace data as one of the most popular models for MLE. However, real-life datasets are often skewed and/or have extreme values, and non-Gaussian random field models are more appropriate for capturing such features. In this work, we provide an exact and approximate parallel implementation of the well-known Tukey g-and-h (TGH) non-Gaussian random field in the context of climate/weather applications. The proposed implementation alleviates the computation complexity of the log-likelihood function, which requires O(n2) storage and O(n3) operations, where N is the number of geospace locations, M is the number of time slots, and n=N×M. Based on tile low-rank (TLR) approximations, our implementation of the TGH model can tackle large-scale problems. Furthermore, we rely on task-based programming models and dynamic runtime systems to provide fast execution for the MLE operation in space and space-time cases. We assess the performance and accuracy of the proposed implementations using synthetic space and space-time datasets up to 800K. We also consider a 12-month precipitation dataset in Germany to demonstrate the advantage of using non-Gaussian over Gaussian random field models. We evaluate the prediction accuracy of the TGH model on the precipitation dataset using the Probability Integral Transformation (PIT) tool showing that the TGH model outperforms the Gaussian modeling in the real dataset. Moreover, our performance assessment indicates that TLR computations allow solving larger matrix sizes while preserving the required accuracy for prediction. The TLR-based approximation shows a speedup up to 7.29X and 2.96X over the exact solution.
Original languageEnglish (US)
Pages (from-to)104715
JournalJournal of Parallel and Distributed Computing
Volume180
DOIs
StatePublished - Jun 2 2023

Bibliographical note

KAUST Repository Item: Exported on 2023-06-14
Acknowledgements: This work is funded and supported by King Abdullah University of Science and Technology (KAUST) through the Office of Sponsored Research (OSR). This research used the resources of the Extreme Computing Research Center (ECRC) and the KAUST Supercomputing Laboratory, including Cray XC40, Shaheen II supercomputer.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Hardware and Architecture
  • Theoretical Computer Science
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Tile low-rank approximations of non-Gaussian space and space-time Tukey g-and-h random field likelihoods and predictions on large-scale systems'. Together they form a unique fingerprint.

Cite this