TY - GEN

T1 - The discriminating power of random features

AU - Rovetta, Stefano

AU - Masulli, Francesco

AU - Filippone, Maurizio

PY - 2009

Y1 - 2009

N2 - Input selection is found as a part of several machine learning tasks, either to improve performance or as the main goal. For instance, gene selection in bioinformatics is an input selection problem. However, as we prove in this paper, the reliability of input selection in the presence of high-dimensional data is affected by a small-sample problem. As a consequence of this effect, even completely random inputs have a chance to be selected as very useful, even if they are not relevant from the point of view of the underlying model. We express the probability of this event as a function of data cardinality and dimensionality, discuss the applicability of this analysis, and compute the probability for some data sets. We also show, as an illustration, some experimental results obtained by applying a specific input selection algorithm, previously presented by the authors, which show how inputs known to be random are consistently selected by the method.

AB - Input selection is found as a part of several machine learning tasks, either to improve performance or as the main goal. For instance, gene selection in bioinformatics is an input selection problem. However, as we prove in this paper, the reliability of input selection in the presence of high-dimensional data is affected by a small-sample problem. As a consequence of this effect, even completely random inputs have a chance to be selected as very useful, even if they are not relevant from the point of view of the underlying model. We express the probability of this event as a function of data cardinality and dimensionality, discuss the applicability of this analysis, and compute the probability for some data sets. We also show, as an illustration, some experimental results obtained by applying a specific input selection algorithm, previously presented by the authors, which show how inputs known to be random are consistently selected by the method.

UR - http://www.scopus.com/inward/record.url?scp=74349083498&partnerID=8YFLogxK

U2 - 10.3233/978-1-60750-072-8-3

DO - 10.3233/978-1-60750-072-8-3

M3 - Conference contribution

AN - SCOPUS:74349083498

SN - 9781607500728

T3 - Frontiers in Artificial Intelligence and Applications

SP - 3

EP - 10

BT - Neural Nets WIRN09 - Proceedings of the 19th Italian Workshop on Neural Nets

PB - IOS Press

ER -