Abstract:

In the talk, I will present a new technique for data visualization, called t-SNE, that converts the pairwise distances between the data points into probabilities of selecting pairs of data points. The selection probability of a pair of data points is proportional to a Gaussian function of their pairwise distances, as a result of which the probabilities measure the local structure of the data. If the distances between points in the map are converted into pairwise probabilities in the same way, any given arrangement of map-points can be evaluated by measuring the divergence between the probability distributions obtained from the data points and the probability distributions obtained from the map-points. A good arrangement of map-points can then be found by performing gradient descent to minimize this divergence.

Unfortunately, if the probabilities of pairs of map-points are computed using a Gaussian function of their pairwise distance, the difference between the distributions of pairwise distances in high-dimensional and low-dimensional spaces causes the map-points to be crowded together in the center of the map. This crowding problem can be overcome by using a heavy-tailed Student t-distribution in the computation of the selection probabilities of pairs of map-points. The resulting technique, called t-SNE, constructs maps that reveal much more of the structure of the data than maps that are constructed by other recent visualization techniques. In particular, t-SNE is very good at preserving clusters in the data at many different scales simultaneously.

I will also present an extension of t-SNE that can be used to visualize data that has a non-metric similarity structure, such as semantic similarities or similarities between authors of scientific papers.

The talk describes joint work with Geoffrey Hinton (University of Toronto).