TY - JOUR
T1 - Stochastic Spectral Descent for Discrete Graphical Models
AU - Carlson, David
AU - Hsieh, Ya Ping
AU - Collins, Edo
AU - Carin, Lawrence
AU - Cevher, Volkan
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09
PY - 2016/3/1
Y1 - 2016/3/1
N2 - Interest in deep probabilistic graphical models has increased in recent years, due to their state-of-the-art performance on many machine learning applications. Such models are typically trained with the stochastic gradient method, which can take a significant number of iterations to converge. Since the computational cost of gradient estimation is prohibitive even for modestly sized models, training becomes slow and practically usable models are kept small. In this paper we propose a new, largely tuning-free algorithm to address this problem. Our approach derives novel majorization bounds based on the Schatten-∞ norm. Intriguingly, the minimizers of these bounds can be interpreted as gradient methods in a non-Euclidean space. We thus propose using a stochastic gradient method in non-Euclidean space. We both provide simple conditions under which our algorithm is guaranteed to converge, and demonstrate empirically that our algorithm leads to dramatically faster training and improved predictive ability compared to stochastic gradient descent for both directed and undirected graphical models.
AB - Interest in deep probabilistic graphical models has increased in recent years, due to their state-of-the-art performance on many machine learning applications. Such models are typically trained with the stochastic gradient method, which can take a significant number of iterations to converge. Since the computational cost of gradient estimation is prohibitive even for modestly sized models, training becomes slow and practically usable models are kept small. In this paper we propose a new, largely tuning-free algorithm to address this problem. Our approach derives novel majorization bounds based on the Schatten-∞ norm. Intriguingly, the minimizers of these bounds can be interpreted as gradient methods in a non-Euclidean space. We thus propose using a stochastic gradient method in non-Euclidean space. We both provide simple conditions under which our algorithm is guaranteed to converge, and demonstrate empirically that our algorithm leads to dramatically faster training and improved predictive ability compared to stochastic gradient descent for both directed and undirected graphical models.
UR - http://ieeexplore.ieee.org/document/7347351/
UR - http://www.scopus.com/inward/record.url?scp=84962815645&partnerID=8YFLogxK
U2 - 10.1109/JSTSP.2015.2505684
DO - 10.1109/JSTSP.2015.2505684
M3 - Article
SN - 1932-4553
VL - 10
SP - 296
EP - 311
JO - IEEE Journal on Selected Topics in Signal Processing
JF - IEEE Journal on Selected Topics in Signal Processing
IS - 2
ER -