A key issue in my scientific activities is the analysis, modelling, and application
of learning in adaptive information processing systems. This includes, for instance,
neural networks for classification and regression tasks as well as problems of
unsupervised learning and data analysis.

Frequently, training can be formulated as an
optimization problem which is guided by an appropriate cost function.
One particularly successful approach is suitable for the investigation of systems
with many adaptive parameters, the choice of which is based on high-dimensional
randomized example data.

Monte Carlo computer simulations of training processes play an important role in this
context. Furthermore, analytical methods borrowed from statistical physics allow for
the computation of typical learning curves in the framework of model scenarios.
In classification, for instance, one is interested in the generalization error, i.e.
the expected error rate when classifying novel data, as a function of the number of
examples used in training.

Key ingredients of the appoach are:

***** the consideration of very large systems in the so-called
* thermodynamic limit *
which allows for

***** the description in terms of a few * macroscopic * quantities
or * order parameters *

***** performing averages over randomized examples and the stochastic
training process.

The * thermodynamic limit* corresponds to infinite dimensional data, formally.
However, in most cases results are in excellent quantitative agreement with observations
in systems of, say, a few hundred degrees of freedom. The analysis of such systematic
* finite size effects * is of course one important aspect of the approach.
Two, in a sense extreme cases of training have been considered with particular success:

** Off-line or batch learning **
If a given set of example data is used to define a cost function for training, the
latter can be interpreted as an energy in statistical physics language. A formal
temperature controls the minimization and typical properties of the system can be
evaluated from the corresponding free energy. Technical subtleties arise in the
evaluation of the corresponing * quenched average* over random data. They require,
for instance, a saddle-point integration in terms of macroscopic order parameters
and the application of the replica formalism known from disorder statistical physics.

** On-line training **
If examples are presented one-by-one in a temporal sequence, it is possible to analyse
the learning dynamics exactly. Under simplifying assumptions, a system of non-linear
ordinary differential equations (ODE) describe the temporal evolution of order parameters,
and hence the learning curve. The formalism can also be applied in the case of purely
heuristic training schemes with no obvious relation to a cost function.

All these investigations aim at an
understanding of phenomena and problems which occur in
practical applications of learning systems. The insights obtained from the model scenarios,
ultimately, enable to improve and optimize existing algorithms and -moreover- to
develop novel and efficient training schemes. Examples of this strategy and related
key publications are given below.

* Statistical mechanics of on-line learning and generalization *

M. Biehl and
N. Caticha

in: * Handbook of Brain Theory and Neural Networks (2nd ed.)*

M.A. Arbib (ed.), MIT Press (2003)

Learning
in freed-forward neural networks: dynamics and phase transitions

M. Biehl

in:
*Adaptivity and Learning*, R. Kühn et * al. * (eds.), Springer (2003)

* The Statistical physics of learning: phase transitions in neural networks *

M. Biehl,
M. Ahr, E. Schlösser,

in: Advances in Solid State Physics 40, B. Kramer (ed.), Vieweg (2000)

The
statistical mechanics of learning a rule

T.L.H. Watkin, A. Rau, and M. Biehl

Reviews of Modern Physics 65 (1993) 499.