Michael Biehl, earlier research in machine learning

Earlier research (1989-2003) in the area of machine learning

A key issue in my scientific activities is the analysis, modelling, and application of learning in adaptive information processing systems. This includes, for instance, neural networks for classification and regression tasks as well as problems of unsupervised learning and data analysis.
Frequently, training can be formulated as an optimization problem which is guided by an appropriate cost function. One particularly successful approach is suitable for the investigation of systems with many adaptive parameters, the choice of which is based on high-dimensional randomized example data.

Monte Carlo computer simulations of training processes play an important role in this context. Furthermore, analytical methods borrowed from statistical physics allow for the computation of typical learning curves in the framework of model scenarios. In classification, for instance, one is interested in the generalization error, i.e. the expected error rate when classifying novel data, as a function of the number of examples used in training.
Key ingredients of the appoach are:
* the consideration of very large systems in the so-called thermodynamic limit which allows for
* the description in terms of a few macroscopic quantities or order parameters
* performing averages over randomized examples and the stochastic training process.
The thermodynamic limit corresponds to infinite dimensional data, formally. However, in most cases results are in excellent quantitative agreement with observations in systems of, say, a few hundred degrees of freedom. The analysis of such systematic finite size effects is of course one important aspect of the approach.
Two, in a sense extreme cases of training have been considered with particular success:

• Off-line or batch learning
• If a given set of example data is used to define a cost function for training, the latter can be interpreted as an energy in statistical physics language. A formal temperature controls the minimization and typical properties of the system can be evaluated from the corresponding free energy. Technical subtleties arise in the evaluation of the corresponing quenched average over random data. They require, for instance, a saddle-point integration in terms of macroscopic order parameters and the application of the replica formalism known from disorder statistical physics.

• On-line training
If examples are presented one-by-one in a temporal sequence, it is possible to analyse the learning dynamics exactly. Under simplifying assumptions, a system of non-linear ordinary differential equations (ODE) describe the temporal evolution of order parameters, and hence the learning curve. The formalism can also be applied in the case of purely heuristic training schemes with no obvious relation to a cost function.

All these investigations aim at an understanding of phenomena and problems which occur in practical applications of learning systems. The insights obtained from the model scenarios, ultimately, enable to improve and optimize existing algorithms and -moreover- to develop novel and efficient training schemes. Examples of this strategy and related key publications are given below.

• Statistical mechanics of on-line learning and generalization
M. Biehl and N. Caticha
in: Handbook of Brain Theory and Neural Networks (2nd ed.)
M.A. Arbib (ed.), MIT Press (2003)

• Learning in freed-forward neural networks: dynamics and phase transitions
M. Biehl
in: Adaptivity and Learning, R. Kühn et al. (eds.), Springer (2003)

• The Statistical physics of learning: phase transitions in neural networks
M. Biehl, M. Ahr, E. Schlösser,
in: Advances in Solid State Physics 40, B. Kramer (ed.), Vieweg (2000)

• The statistical mechanics of learning a rule
T.L.H. Watkin, A. Rau, and M. Biehl
Reviews of Modern Physics 65 (1993) 499.