TY - JOUR

T1 - Active learning and basis selection for kernel-based linear models: A bayesian perspective

AU - Paisley, John

AU - Liao, Xuejun

AU - Carin, Lawrence

N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09

PY - 2010/5/1

Y1 - 2010/5/1

N2 - We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, Φ∈ RDN× N, for which the (i,j)th element is defined by the kernel function K(γi, γj)∈RD, with the observed data γi∈ RDd. We seek a model, M:γi→yi, where yi is a real-valued response or integer-valued label, which we do not have access to a priori. To achieve this goal, a submatrix, ΦIIb ∈RDn× m, is sought that corresponds to the intersection of n rows and m columns of Φ, indexed by the sets Il and Ib, respectively. Typically m≪N and n≪N. We have two objectives: (i) Determine the m columns of Φ, indexed by the set I-b, that are the most informative for building a linear model, M: [1 Φi,Ib]T → yi, without any knowledge of {yi}i=1N and (ii) using active learning, sequentially determine which subset of n elements of {yi} i=1N should be acquired; both stopping values, |I b|= m and |Il|= n, are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x, as measured by the differential entropy of its posterior distribution. The parameter vector x∈RDm, as well as the model bias η ∈ RD, is then learned from the resulting problem, YIl = ΦIl,Ib}x + η1+ε. The remaining N-n responses/labels not included in YIl can be inferred by applying x to the remaining N-n rows of ΦIb. We show experimental results for several regression and classification problems, and compare to other active learning methods. © 2006 IEEE.

AB - We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, Φ∈ RDN× N, for which the (i,j)th element is defined by the kernel function K(γi, γj)∈RD, with the observed data γi∈ RDd. We seek a model, M:γi→yi, where yi is a real-valued response or integer-valued label, which we do not have access to a priori. To achieve this goal, a submatrix, ΦIIb ∈RDn× m, is sought that corresponds to the intersection of n rows and m columns of Φ, indexed by the sets Il and Ib, respectively. Typically m≪N and n≪N. We have two objectives: (i) Determine the m columns of Φ, indexed by the set I-b, that are the most informative for building a linear model, M: [1 Φi,Ib]T → yi, without any knowledge of {yi}i=1N and (ii) using active learning, sequentially determine which subset of n elements of {yi} i=1N should be acquired; both stopping values, |I b|= m and |Il|= n, are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x, as measured by the differential entropy of its posterior distribution. The parameter vector x∈RDm, as well as the model bias η ∈ RD, is then learned from the resulting problem, YIl = ΦIl,Ib}x + η1+ε. The remaining N-n responses/labels not included in YIl can be inferred by applying x to the remaining N-n rows of ΦIb. We show experimental results for several regression and classification problems, and compare to other active learning methods. © 2006 IEEE.

UR - http://ieeexplore.ieee.org/document/5406147/

UR - http://www.scopus.com/inward/record.url?scp=77951218308&partnerID=8YFLogxK

U2 - 10.1109/TSP.2010.2042491

DO - 10.1109/TSP.2010.2042491

M3 - Article

SN - 1053-587X

VL - 58

SP - 2686

EP - 2700

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

IS - 5

ER -