Detection of extragalactic Ultra-Compact Dwarfs and Globular Clusters using Explainable AI techniques

M. Mohammadi, J. Mutatiinaa, T. Saifollahi and K. Bunte

Astronomy and Computing, vol. 39, pp. 100555

Date

2022

Abstract

Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain. Here, we aim to train a machine learning model to separate these objects from the foreground stars and background galaxies using the multi-wavelength imaging data of the Fornax galaxy cluster in 6 filters, namely u, g, r, i, J and Ks. The classes of objects are highly imbalanced which is problematic for many automatic classification techniques. Hence, we employ Synthetic Minority Over-sampling to handle the imbalance of the training data. Then, we compare two classifiers, namely Localized Generalized Matrix Learning Vector Quantization (LGMLVQ) and Random Forest (RF). Both methods are able to identify UCDs/GCs with a precision and a recall of >93% and provide relevances that reflect the importance of each feature dimension for the classification. Both methods detect angular sizes as important markers for this classification problem. While it is astronomical expectation that color indices of u−i and i−Ks are the most important colors, our analysis shows that colors such as g−r are more informative, potentially because of higher signal-to-noise ratio. Besides the excellent performance the LGMLVQ method allows further interpretability by providing the feature importance for each individual class, class-wise representative samples and the possibility for non-linear visualization of the data as demonstrated in this contribution. We conclude that employing machine learning techniques to identify UCDs/GCs can lead to promising results. Especially transparent methods allow further investigation and analysis of importance of the measurements for the detection problem and provide tools for non-linear visualization of the data.

Links

Link

Bib

@article{Mutatina2022,
author = {Mohammad Mohammadi and  Jarvin Mutatiinaa and Teymoor Saifollahi and Kerstin Bunte},
title = {Detection of extragalactic Ultra-Compact Dwarfs and Globular Clusters using Explainable AI techniques},
keywords = {Galaxies, Clusters, Individual (Fornax), Photometric, Machine learning, Explainable AI},
journal = {Astronomy and Computing},
volume = {39},
pages = {100555},
year = {2022},
issn = {2213-1337},
url = {https://www.sciencedirect.com/science/article/pii/S2213133722000063},
doi = {10.1016/j.ascom.2022.100555},
language1 = {English"},
abstract = {Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain. Here, we aim to train a machine learning model to separate these objects from the foreground stars and background galaxies using the multi-wavelength imaging data of the Fornax galaxy cluster in 6 filters, namely u, g, r, i, J and Ks. The classes of objects are highly imbalanced which is problematic for many automatic classification techniques. Hence, we employ Synthetic Minority Over-sampling to handle the imbalance of the training data. Then, we compare two classifiers, namely Localized Generalized Matrix Learning Vector Quantization (LGMLVQ) and Random Forest (RF). Both methods are able to identify UCDs/GCs with a precision and a recall of >93% and provide relevances that reflect the importance of each feature dimension for the classification. Both methods detect angular sizes as important markers for this classification problem. While it is astronomical expectation that color indices of u−i and i−Ks are the most important colors, our analysis shows that colors such as g−r are more informative, potentially because of higher signal-to-noise ratio. Besides the excellent performance the LGMLVQ method allows further interpretability by providing the feature importance for each individual class, class-wise representative samples and the possibility for non-linear visualization of the data as demonstrated in this contribution. We conclude that employing machine learning techniques to identify UCDs/GCs can lead to promising results. Especially transparent methods allow further investigation and analysis of importance of the measurements for the detection problem and provide tools for non-linear visualization of the data.},
}