Waypoint averaging and step size control in learning by gradient descent

G. Papari, K. Bunte and M. Biehl

Machine Learning Reports (MLR-2011-06), pp. 16-26

MIWOCI 2011, Mittweida Workshop on Computational Intelligence, eds. Frank-Michael Schleif and Thomas Villmann, pp. 16-26

Date

2011

Abstract

We introduce a modification of batch gradient descent, which aims at better convergence properties and more robust minimization. In the course of the descent, the procedure compares the performance of the actual configuration with that of a gliding average over the most recent positions. If the latter corresponds to a lower value of the optimization objective, minimization proceeds from there and the step size of the descent is decreased. Here we present the prescription from a practitioner’s point of view and refrain from a detailed mathematical analysis. First, the method is illustrated in terms of a low dimensional example. Moreover, we discuss its application in the context of machine learning, examples corresponding to multilayered neural networks and a recent extension of Learning Vector Quantization (LVQ) termed Matrix Relevance LVQ

Bib

@techreport{papari_TR2011,
author = {Giuseppe Papari and Kerstin Bunte and Michael Biehl},
title = {Waypoint averaging and step size control in learning by gradient descent},
editor = {Frank-Michael Schleif and Thomas Villmann},
booktitle = {MIWOCI 2011, Mittweida Workshop on Computational Intelligence},
number = {MLR-2011-06},
journal = {Machine Learning Reports},
volume = {},
pages = {16--26},
year = {2011},
institution = {Leipzig University},
abstract = {We introduce a modification of batch gradient descent, which aims at better convergence properties and more robust minimization.  In the course of the descent, the procedure compares the performance of the actual configuration with that of a gliding average over the most recent positions.  If the latter corresponds to a lower value of the optimization objective, minimization proceeds from there and the step size of the descent is decreased. Here we present the prescription from a practitioner’s point of view and refrain from a detailed mathematical analysis. First, the method is illustrated in terms of a low dimensional example.  Moreover, we discuss its application in the context of machine learning, examples corresponding to multilayered neural networks and a recent extension of Learning Vector Quantization (LVQ) termed Matrix Relevance LVQ},
}