**
Clustering and Visualization of Large Dissimilarity Datasets
**
Barbara Hammer
TU Clausthal, Germany

Abstract:

Clustering and Visualization constitute key issues in computer-supporteddata
inspection, and a variety of promising tools exist for such tasks such as the
self-organizing map and variations thereof. Real life data, however, pose
severe problems to standard tools: on the one hand, data are given by
complex objects such as sequences of possibly different length, temporal
signals, images, text data, graph structures, etc. and standard methods
proposed for finite dimenional vectors in euclidean space cannot be applied.
On the other hand, massive data have to be dealt with, such that data do
neither fit into main memory nor more than one pass over the data is still
affordable, i.e. standard methods can simply not be applied due to the sheer
amount of data. We present two recent extensions of topographic mappings
which can deal with more general proximity data given by pairwise distances,
and which can process streaming data of arbitrary size in patches, thus resulting
in an efficient linear time data visualization method for quite general data structures.
We present the theoretical background as well as large scale applications to the
areas of text and multimedia processing based on the generalized compression distance.
back to the list of talks