1. fseval - A Benchmarking Framework for Feature Selection and Ranking Algorithms | |
---|---|
Authors | Overschie, Jeroen; Alsahaf, Ahmad; Azzopardi, George |
Year | 2022 |
Download | https://doi.org/10.34894/KDEPR0 |
Publication | Under review |
Description | The fseval Python package allows benchmarking Feature Selection and Feature Ranking algorithms on a large scale, and facilitates the comparison of multiple algorithms in a systematic way. In particular, fseval enables users to run experiments in parallel and distributed over multiple machines, and export the results to an SQL database. The execution of an experiment can be fully determined by a configuration file, which means the experiment results can be reproduced easily, given only the configuration file. fseval has high test coverage, continuous integration, and rich documentation. The package is open source and can be installed through PyPI. |
Notes | Jeroen Overschie was responsible for the implementation. Ahmad Alsahaf and George Azzopardi were the supervisors of this project. This software has been published under GNU license 3.0: https://www.gnu.org/licenses/gpl-3.0.en.html |
2. Recognition of Holstein Cattle with Thermal and RGB images | |
---|---|
Authors | Bhole, Amey; S. Udmale, Sandeep; Falzon, Owen; Azzopardi, George |
Year | 2021 |
Download | https://doi.org/10.34894/7M108F |
Publication | |
Description | This data set was collected from the Dairy Campus in Leeuwarden (The Netherlands) with a FLIR E6 thermal camera over a period of 9 days. It consists of 3694 images of 383, with each cow represented with an average of 9 images. Each snapshot created two images; 1) RGB and ii) Temperature. The image filenames are in the format [cow_id-4 digits]_[day no-1 digit]_[counter-1 digit]. The timestamp.xlsx file indicates the day number (day 1 to day 9) of when an image in the data set was collected. This allows to design and run leave-one day-out cross validation, the same as we did in our paper. Here is the link to the scripts that reproduce the results reported in the paper, and the following is the link to the GitHub repository that contains all the scripts |
3. Injury Prediction In Competitive Runners With Machine Learning | |
---|---|
Authors | Lovdal, Sofie; den Hartigh, Ruud; Azzopardi, George |
Year | 2021 |
Download | https://doi.org/10.34894/UWU9PV |
Publication | Injury Prediction in Competitive Runners With Machine Learning Journal Article |
Description | The data set consists of a detailed training log from a Dutch high-level running team over a period of seven years (2012-2019). We included the middle and long distance runners of the team, that is, those competing on distances between the 800 meters and the marathon. This design decision is motivated by the fact that these groups have strong endurance based components in their training, making their training regimes comparable. The head coach of the team did not change during the years of data collection. The data set contains samples from 74 runners, of whom 27 are women and 47 are men. At the moment of data collection, they had been in the team for an average of 3.7 years. Most athletes competed on a national level, and some also on an international level. The study was conducted according to the requirements of the Declaration of Helsinki, and was approved by the ethics committee of the second author’s institution (research code: PSY-1920-S-0007). (2020-11-20) |
4. Detection of illicit accounts over the Ethereum blockchain | |
---|---|
Authors | Farrugia, Steven; Ellul, Joshua; Azzopardi, George; |
Year | 2021 |
Download | https://doi.org/10.34894/GKAQYN |
Publication | Detection of illicit accounts over the Ethereum blockchain Journal Article |
Description | The recent technological advent of cryptocurrencies and their respective benefits have been shrouded with a number of illegal activities operating over the network such as money laundering, bribery, phishing, fraud, among others. In this work we focus on the Ethereum network, which has seen over 400 million transactions since its inception. Using 2179 accounts flagged by the Ethereum community for their illegal activity coupled with 2502 normal accounts, we seek to detect illicit accounts based on their transaction history using the XGBoost classifier. Using 10 fold cross-validation, XGBoost achieved an average accuracy of 0.963 ( ± 0.006) with an average AUC of 0.994 ( ± 0.0007). The top three features with the largest impact on the final model output were established to be ‘Time diff between first and last (Mins)’, ‘Total Ether balance’ and ‘Min value received’. Based on the results we conclude that the proposed approach is highly effective in detecting illicit accounts over the Ethereum network. Our contribution is multi-faceted; firstly, we propose an effective method to detect illicit accounts over the Ethereum network; secondly, we provide insights about the most important features; and thirdly, we publish the compiled data set as a benchmark for future related works. |
5. Labelled Dataset of Retinal Images for Glaucoma detection | |
---|---|
Authors | Guo, Jiapan; Azzopardi, George; Shi, Chenyu; Jansonius, Nomdo; Petkov, Nicolai |
Year | 2021 |
Download | https://doi.org/10.34894/H2SZSO |
Publication | |
Description | Fundus photography is a viable option for glaucoma population screening. In order to facilitate the development of computer-aided glaucoma detection systems, we publish this annotation dataset that contains manual annotations of glaucoma features for seven public fundus image data sets. All manual annotations are made by a specialised ophthalmologist. For each of the fundus images in the seven fundus datasets, the upper, the bottom, the left and the right boundary coordinates of the optic disc and the cup are stored in a .mat file with the corresponding fundus image name. |
6. Fall detection and recognition from egocentric visual data: A case study | |
---|---|
Authors | Wang, Xueyi; Talavera, Estefania; Karastoyanova, Dimka; Azzopardi, George |
Year | 2020 |
Download | https://doi.org/10.34894/3DV8BF |
Publication | |
Description | This data set contains egocentric videos from two cameras attached to the waist and chest of one volunteer. The contents of the videos contain indoor and outdoor scenes and do not contain people. The data set was to for evaluation of a novel fall detection system using ego centric visual data. |