M. Biehl's homepage
Overview (back to the top)
A key issue in my scientific activities is the analysis, modelling, and application of learning in adaptive information processing systems. This includes, for instance, neural networks for classification and regression tasks as well as problems of unsupervised learning and data analysis.
Frequently, training can be formulated as an optimization problem which is guided by an appropriate cost function. One particularly successful approach is suitable for the investigation of systems with many adaptive parameters, the choice of which is based on high-dimensional randomized example data.
Key ingredients of the appoach are:
* the consideration of very large systems in the so-called thermodynamic limit which allows for
* the description in terms of a few macroscopic quantities or order parameters
* performing averages over randomized examples and the stochastic training process.
The thermodynamic limit corresponds to infinite dimensional data, formally. However, in most cases results are in excellent quantitative agreement with observations in systems of, say, a few hundred degrees of freedom. The analysis of such systematic finite size effects is of course one important aspect of the approach.
M. Biehl and N. Caticha
in: Handbook of Brain Theory and Neural Networks (2nd ed.)
M.A. Arbib (ed.), MIT Press (2003)
M. Biehl, in: Adaptivity and Learning, R. Kühn et al. (eds.), Springer (2003)
M. Biehl, M. Ahr, E. Schlösser,
in: Advances in Solid State Physics 40, B. Kramer (ed.), Vieweg (2000)
T.L.H. Watkin, A. Rau, and M. Biehl, Reviews of Modern Physics 65 (1993) 499.
For a more complete overview of my activities, see also my related, electronically
available articles and the complete list of publications.
A statistical physics analysis of perceptron training, due to M. Opper, led to the development of this training prescription. It maintains the favorable convergence properties of the Adaline algorithm (Widrow and Hoff) and - at the same time - yields the perceptron of optimal stability in linearly separable cases.
Recently, the algorithm has re-gained popularity in the context of Support Vector Machines, see the book by N. Christianini and J. Shawe-Taylor, for instance. Besides typical properties of the AdaTron, we have addressed numerous other problems in off-line perceptron training in the statistical physics framework. This includes, among others, learning from clustered examples and the typical ROC curve of the perceptron as discussed in the context of medical diagnosis.
Receiver operating characteristics of perceptrons:
the influence of sample size and prevalence
A. Freking, M. Biehl, C. Braun, W. Kinzel, M. Meesmann, Phys. Rev. E 60 (1999) 5926.
Supervised learning from clustered input examples
C. Marangi, M. Biehl, and S.A. Solla, Europhysics Letters 30 (1995) 117.
Perceptron learning by constrained optimization: the AdaTron algorithm
M. Biehl, J.K. Anlauf, and W. Kinzel
in: Neurodynamics, F. Pasemann and H.D. Doebner (eds.), World Scientific (1991)
J.K. Anlauf and M. Biehl, Europhysics Letters 10 (1989) 687.
The single layer perceptron can be interpreted as the basic building block from which to construct more complex devices, including dynamical networks of the Little-Hopfield type and multilayered feed-forward architectures. Consequently, it has served as a prototype system for the investigation of learning processes, in general. This applies in particular to the framework of on-line learning. Results are directly applicable to multilayer networks with non-overlapping receptive fields. The following are just a few major questions addressed in this context:
Learning from noisy data: an exactly solvable model
M. Biehl, P. Riegler, and M. Stechert, Physical Review E 52 Rapid Comm. (1995) R4624.
M. Biehl and P. Riegler, Europhysics Letters 28 (1994) 525.
On-line learning of a time-dependent rule
M. Biehl and H. Schwarze, Europhysics Letters 20 (1992) 733.
As opposed to problems of classification and regression, example data are unlabelled in unsupervised learning. Here the aim is to extract relevant inforamtion from input data only. Potential applications of unsupervised learning include feature detection for low-dimensional representation, data compression, or as a pre-processing step in supervised learning. Both, on-line and off-line unsupervised learning can be modelled and analysed along the same lines as supervised training. Example topics include
Optimisation of on-line principal component analysis
E. Schlösser, D. Saad, M. Biehl, Journal of Physics A 32 (1999) 4061.
Dynamics of on-line competitive learning
M. Biehl, A. Freking, G. Reents, Europhysics Letters 38 (1997) 427.
Statistical mechanics of unsupervised learning
M. Biehl and A. Mietzner, Europhysics Letters 24 (1993) 421.
In regression problems it is possible to base learning on a continous error measure and use gradient based methods for the choice of, for instance, the weights in a neural network. Our studies of on-line gradient descent initiated the first analytic treatment of the learning dynamics in multilayered neural networks. A topic of major importance in this context is the discussion of quasistationary states in the learning dynamics. These so-called plateaus are observed in a variety of learning scenarios and are well known from practical applications of machine learning. Insights into the nature of the plateau problem were the basis for the later development of novel, more efficient training prescriptions for, both, regression and classification.
Transient dynamics of online-learning in two-layered neural networks
M.Biehl, P. Riegler, and C. Wöhler, Journal of Physics A 29 (1996) 4769.
Learning by online gradient descent
M. Biehl and H. Schwarze, Journal of Physics A 28 (1995) 643.
On-line backpropagation in two-layered neural networks
P. Riegler and M. Biehl, Journal of Physics A 28 (Letters) (1995) L507.
The above mentioned plateaus in on-line training are just one example for specialization process which reflect inevitable symmetries of the learning scenario. Often, elements of the trained device can be exchanged with no effect on the performance. A permutation symmetry holds, for instance, for the branches of a multilayered neural network or for the prototypes in unsupervised competitive learning. Successful learning requires the breaking of such symmetries or, in other words, the specialization of sub-systems, e.g. the network branches.
The counterpart of on-line plateaus in off-line or batch training are phase transitions in the learning curve: frequently, the success of learning depends on the number of examples in a discontinous way. Only above a critical size of the training set, specialization and good generalization performance become possible.
Efficiently learning multilayer perceptrons
C. Bunzmann, M. Biehl, and R. Urbanczik, Physical Review Letters 86 (2001) 2166.
Statistical physics and practical training of soft-committee machines
M. Ahr, M. Biehl, R. Urbanczik, European Physics Journal B 10 (1999) 583.
Specialization processes in on-line unsupervised learning
M. Biehl, A. Freking, G. Reents, and E. Schlösser, Philosophical Magazine B 77 (1998) 1487.
(back to the top) last update February 2005