Assessing Bayesian Semi-Parametric Log-Linear Models: An Application to Disclosure Risk Estimation

Cinzia Carota*, Maurizio Filippone, Silvia Polettini

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a method for identifying models with good predictive performance in the family of Bayesian log-linear mixed models with Dirichlet process random effects for count data. Their wide applicability makes the assessment of model performance crucial in many fields, including disclosure risk estimation, which is the focus of the present work. Rather than assessing models on the whole contingency table, we target the specific objective of the analysis and propose a two-stage model selection procedure aimed at limiting a form of bias arising in the process of model selection. Our proposal combines two different criteria: at the first stage, a path in the model search space is identified through a strongly penalized log-likelihood; at the second, a small number of semi-parametric models is evaluated through a context-dependent score-based information criterion. Tested on a variety of contingency tables, our method proves to be able to identify models with good predictive performance in a few steps, even in the presence of large tables with many sampling and structural zeros. We carefully discuss the proposed method in the context of the literature on model assessment and contextualize the illustrative application in the recent debate on statistical disclosure limitation. Finally, we provide examples of further applications in different research areas.

Original languageEnglish (US)
Pages (from-to)165-183
Number of pages19
JournalInternational Statistical Review
Volume90
Issue number1
DOIs
StatePublished - Apr 2022

Bibliographical note

Publisher Copyright:
© 2021 The Authors. International Statistical Review published by John Wiley & Sons Ltd on behalf of International Statistical Institute.

Keywords

  • Bayesian model selection
  • Dirichlet process random effects
  • Disclosure risk
  • Log-linear mixed models
  • Model's predictive performance
  • Selection-induced bias
  • Statistical disclosure limitation

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Assessing Bayesian Semi-Parametric Log-Linear Models: An Application to Disclosure Risk Estimation'. Together they form a unique fingerprint.

Cite this