Explaining Multidimensional Projections
Multidimensional projections, also known as embeddings or dimensionality reduction methods, are techniques that take a high-dimensional dataset at input (say, tens..hundreds of dimensions) and produce a low-dimensional dataset with the same number of data points (say, 2D..3D). The key idea behind is that we can visualize a low-dimensional dataset easily, e.g. by a scatterplot. No equivalent method exists for a high-dimensional dataset.
We can surely create projections, but, what do the patterns that we see in them mean? The aim of this project is to provide interactive explanatory tools that tell us what point-clusters in a projection mean.
Correlation and dimensionality
We can explain point clusters in a projection in different ways:
- we can explain which dimension makes points cluster together. We explored this in this paper;
- we can explain which dimensions are strongly correlated in a cluster of points;
- we can explain what is the local intrinsic dimensionality of a cluster of points in high-dimensional space.
In this project, we explore the latter two aspects. We propose visual techniques for showing, on a projection, which are the locally strongest correlated dimensions; and also what is the local intrinsic dimensionality of a cluster.
A full description of the proposed techniques is given in the BSc thesis of D. van Driel (2017)
An implementation of the above techniques, including test datasets, is provided here.