Reliability and computational efficiency of classification error estimators are critical factors in classifier design. In a high-dimensional data setting where data is scarce, the conventional method of error estimation, cross-validation, can be very computationally expensive. In this thesis, we consider a particular discriminant analysis type classifier, the Randomly-Projected RLDA ensemble classifier, which operates under the assumption of such a ‘small sample’ regime. We conduct an asymptotic study of the generalization error of this classifier under this regime, which necessitates the use of tools from the field of random matrix theory. The main outcome of this study is a deterministic function of the true statistics of the data and the problem dimension that approximates the generalization error well for large enough dimensions. This is demonstrated by simulation on synthetic data. The main advantage of this approach is that it is computationally efficient. It also constitutes a major step towards the construction of a consistent estimator of the error that depends on the training data and not the true statistics, and so can be applied to real data. An analogous quantity for the Randomly-Projected LDA ensemble classifier, which appears in the literature and is a special case of the former, is also derived. We motivate its use for tuning the parameter of this classifier by simulation on synthetic data.
Date of Award | Jul 2019 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Mohamed-Slim Alouini (Supervisor) |
---|
- classisication
- random matrix theory
- machine learning
- error estimation
- discriminant analysis
- random projections