Absent Data Generating Classifier for Imbalanced Class Sizes

Arash Pourhabib, Bani K. Mallick, Yu Ding

Research output: Contribution to journalArticlepeer-review

22 Scopus citations


We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.
Original languageEnglish (US)
Pages (from-to)2695-2724
Number of pages30
JournalJournal of Machine Learning Research
StatePublished - Dec 2015
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2022-05-31
Acknowledged KAUST grant number(s): KUS-CI-016-04
Acknowledgements: Arash Pourhabib, Yu Ding and Bani K. Mallick were partially supported by grants from NSF (DMS-0914951, CMMI-0926803, and CMMI-1000088) and King Abdullah University of Science and Technology (KUS-CI-016-04). The authors are also grateful of the valuable suggestions made by the editor and referees that greatly improved the paper.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Statistics and Probability
  • Control and Systems Engineering


Dive into the research topics of 'Absent Data Generating Classifier for Imbalanced Class Sizes'. Together they form a unique fingerprint.

Cite this