Visualizing Data using t-SNE
Laurens van der Maaten
Delft Universtity of Technology

Abstract:

Over the last decade, many new techniques have been developed that visualize high-dimensional data by giving each datapoint a location in a two-dimensional map. The aim of these techniques is to represent the pairwise distances of the data points by similar pairwise distances of the corresponding points in the map. As the pairwise distances cannot be represented perfectly in the map, the emphasis of these techniques is on representing small pairwise distances accurately.
In the talk, I will present a new technique for data visualization, called t-SNE, that converts the pairwise distances between the data points into probabilities of selecting pairs of data points. The selection probability of a pair of data points is proportional to a Gaussian function of their pairwise distances, as a result of which the probabilities measure the local structure of the data. If the distances between points in the map are converted into pairwise probabilities in the same way, any given arrangement of map-points can be evaluated by measuring the divergence between the probability distributions obtained from the data points and the probability distributions obtained from the map-points. A good arrangement of map-points can then be found by performing gradient descent to minimize this divergence.
Unfortunately, if the probabilities of pairs of map-points are computed using a Gaussian function of their pairwise distance, the difference between the distributions of pairwise distances in high-dimensional and low-dimensional spaces causes the map-points to be crowded together in the center of the map. This crowding problem can be overcome by using a heavy-tailed Student t-distribution in the computation of the selection probabilities of pairs of map-points. The resulting technique, called t-SNE, constructs maps that reveal much more of the structure of the data than maps that are constructed by other recent visualization techniques. In particular, t-SNE is very good at preserving clusters in the data at many different scales simultaneously.
I will also present an extension of t-SNE that can be used to visualize data that has a non-metric similarity structure, such as semantic similarities or similarities between authors of scientific papers.
The talk describes joint work with Geoffrey Hinton (University of Toronto).