TY - GEN
T1 - Discovering highly informative feature set over high dimensions
AU - Zhang, Chongsheng
AU - Masseglia, Florent
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2012/11
Y1 - 2012/11
N2 - For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.
AB - For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.
UR - http://hdl.handle.net/10754/564625
UR - http://ieeexplore.ieee.org/document/6495166/
UR - http://www.scopus.com/inward/record.url?scp=84876838868&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2012.149
DO - 10.1109/ICTAI.2012.149
M3 - Conference contribution
SN - 9780769549156
SP - 1059
EP - 1064
BT - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -