Distributed Systems

Short Programming Projects

Parallel Stream Processing with In-Network Computing.

Supervisor: Bochra Boughzala.
Status: Available
Date: 21/01/2024.

Sensor Data Classification Using Deep Learning Methods

Supervisors: Majid Lotfian Delouee.
Status: available.
Date: 26/10/2023.
Classification of sensor data can be utilized to extract knowledge about the environment that can be used in different applications. Various machine learning methods, like deep learning mechanisms, can be used to determine different classes and make intelligent decisions based on the extracted knowledge. The focus of this project is to design and implement an architecture that classifies received sensor data about the bicycle lane surface and places them on a city map. The first stage of this project is to create a dataset for your project by using an Android app, called “sensor logger”, which creates two different types of sensor data (.json and .csv). The next stage is to detect different entities in the road (e.g., potholes, Tree roots, etc.) by examining the sensor data. Finally, the detected elements should be placed on a map. There is no programming language limitation, but writing your codes in Python or Java would be the preference.

Make Mixer Great Again in the Graph World

Supervisor: Huy Truong.
Status: available.
Date: 24/10/2023.
MLP-Mixer (Tolstikhin et al., 2021) is a deep learning architecture that leverages simple Multi-Layer Perceptron blocks to solve tasks in the computer vision domain and Natural Language Processing. However, its potential in the graph domain remains unexplored. In this project, a student will pioneer integrating a Mixer-based model into a graph-related task: pressure estimation. In particular, the model should estimate pressure values at unknown nodes given known sensors and a static topology. Its results will then be compared with other Graph Neural Network baselines. To implement Mixer in the pressure estimation task, the student should have experience with Python language and one of the Deep Learning frameworks (such as PyTorch or Tensorflow). The final deliverables include the source code and a two-page report explaining the way that makes Mixer great again.

Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems, 34, 24261-24272

Efficient Scenario Generation Tool for Water Distribution Networks

Supervisor: Huy Truong.
Status: available.
Date: 24/10/2023.
This project focuses on monitoring water distribution networks that deliver daily drinking water to civilians. However, this supervision has been challenging due to complex and diverse hydraulic parameters manipulating the behavior of water networks, such as customer demand, pump speed, and reservoir total heads. To overcome this issue, the student will leverage mathematical simulation WNTR, /winter/ wrapped as a Python package to sample, manage, and operate these parameters. In addition, they should consider temporal information and seasonality to replicate close-to-reality scenes. Along with WNTR, distribution framework (Ray) and compressed library (Zarr) will also be utilized to construct a massive dataset efficiently. The final deliverables should include the source code of the temporal scenario generation tool and the helpful dataset later used to help hydraulic experts analyze the water networks precisely.

Klise, K.A., Bynum, M., Moriarty, D., Murray, R. (2017). A software framework for assessing the resilience of drinking water systems to disasters with an example earthquake case study, Environmental Modelling and Software, 95, 420-431, doi: 10.1016/j.envsoft.2017.06.022. WNTR

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., ... & Stoica, I. (2017). Ray: a distributed framework for emerging AI applications. CoRR abs/1712.05889 (2017). *arXiv preprint arXiv:1712.05889*.

Alistair Miles, John Kirkham, Martin Durant, James Bourbeau, Tarik Onalan, Joe Hamman, Zain Patel, shikharsg, Matthew Rocklin, raphael dussin, Vincent Schut, Elliott Sales de Andrade, Ryan Abernathey, Charles Noyes, sbalmer, [pyup.io](http://pyup.io/) bot, Tommy Tran, Stephan Saalfeld, Justin Swaney, … Anderson Banihirwe. (2020). zarr-developers/zarr-python: v2.4.0 (v2.4.0). Zenodo. https://doi.org/10.5281/zenodo.3773450

Measuring bias in Human Activity Recognition due to data contamination.

Supervisor: Andrés Tello.
Status: available.
Date: 23/10/2023.
Data contamination occurs when part of the training data find their way into the test set. As a consequence, the results of classification models become highly overestimated because the models are evaluated on data that was already seen during training. Although undesirable, the problem happens inadvertently due to well established methods used within the Human Activity Recognition (HAR) pipeline. Sliding windows data segmentation followed by random training/test splits is a (re)current practice in HAR that introduces biases into the classification models. One of the most common practices to overcome this problem is to use the Leave-One-Subject-Out Cross Validation (LOSO-CV) for model evaluation. Using this method the folds used for CV are split based on User-Id, guaranteeing this way the independence between training and test sets, avoiding data contamination. The tasks of the student are: (1) implement and train a model for HAR based on CNNs and LSTMs. (2) evaluate the performance of the model on 5 benchmark HAR datasets following both, the standard Randon k-Fold Cross Validation and the LOSO-CV.

Datasets:
UCI-HAR
PAMAP2 Physical Activity Monitoring
Human Activity Recognition Trondheim (HARTH)
MHEALTH
OPPORTUNITY

Spiking Neural Networks for Network Functions

Supervisor: Saad Saleh.
Status: available.
Date: 20/10/2023.
The current network components rely on traditional von-Neumann architecture for implementing network functions at switches and routers. The von Neumann architecture requires huge amount of energy resources due to continuous data movements between memory and computational units. With the emergence of neuromorphic computing, network functions (like congestion control) can be deployed at network switches with cognitive capability using spiking neural networks (SNN). This project focuses on programming a spiking neural network in Python for providing cognitive network functions like congestion control in the Internet. As a deliverable, student will submit an SNN code in Python with performance comparison and analysis for various network traffic flows.

Programming projects on ECiDA Project

Contact: Mostafa Hadadian.
Status: available.
Date: 28/11/2022.
ECiDA aims to narrow the gap between the data scientists who experiment with data and try new models, and the production environment wherein data science models should eventually run and exhibit consistent behavior. Our goal is to provide a solution that can also be applied to existing data processing platforms.

The infrastructure of ECiDA revolves around containerized components, in which each computational component is responsible for a single step in the data science pipeline. For this, we rely on technologies such as Docker, Kubernetes, and Kafka to containerize, orchestrate, and enable communication between the computational components.

The topics that you can apply include but are not limited to:

  • Monitoring systems
  • Web development
  • Container Orchestration
  • CI/CD Pipelines
  • Network Service Mesh

Research packages for the formal specification and verification of process compositions

Supervisor: Heerko Groefsema.
Status: Available.
Date: 20/11/2023.
For our research we implemented and use a number of Java packages that allow us to specify, unfold, and verify process compositions such as business process models and service compositions. These packages require some work, including new functionality, replacing old dependencies, adding different output formats, replacing log functionality, refactoring to use certain programming patterns, and more. In this project, we would like a number of students to improve, refactor, and add functionality. This project is available for up to 5 students, which will work on separate sub-projects.

GitHub Theme and Website

Supervisor: Heerko Groefsema.
Status: Available.
Date: 20/11/2023.
The Distributed Systems group would like to redesign and move its webpages (this webpage) to two GitHub repositories. One repository should hold a generalized remote GitHub Jekyll Theme, and the other the GitHub webpage and its content. In addition, the webpages should include an automated GitHub actions script that imports automatically the publications for all members of the group from their ORCID webpages and filters duplicates. It then should display the results on the GitHub webpage using several different filters.

Encoding common sense knowledge with Graph Neural Networks

Supervisors: Andrés Tello and Alexander Lazovik.
Status: UNAVAILABLE.
Date: 14/10/2022.
ConceptNet is a knowledge graph (KG) that connects words and phrases of natural language (terms) with labeled, weighted edges (assertions). ConceptNet comes equipped with embeddings for its terms, which are vector representations of the "terms" that can be used directly as features for different downstream tasks. What makes ConceptNet a powerful tool for many natural language related problems is that its embeddings were calculated by combining the vector representations obtained with GloVe and node2vec, two common methods for vector space representations of words. In this programming assignment you have to implemented a model to learn new vector representations for ConceptNet based on Graph Convolutional Neural Networks. At end, an evaluation of the effectiveness of this new embeddings will be measured against Word Relatedness tasks. For example, given the pair candidates "wallet-moon" and "car-automobile", the score given by the algorithm should be higher for the latter. Example source code will be provided as the starting point for your work.

Optimization of existing leak detection models for water network

Supervisors: Dilek Dustegor and Mostafa Hadadian.
Status: available.
Date: 14/10/2022.
Leaks in water distribution networks (WDNs) are one of the main reasons for water loss during transportation. Considering water scarcity, combined with a growing population worldwide, it is an urgent humanitarian need to minimize water losses. Lately, some attempts have been made to use data-driven and machine learning techniques for leakage localization. But capabilities and limitations of these methods are not clearly understood. In this short programming project, the student will optimize three existing models (namely random forest classifier, LSTM neural network, and Facebook prophet models) that have been developed for leak detection purposes in a water network. As a deliverable, the student will submit a well documented Python source code to a GitHub repository, with clear explanation.

Web User Interface for Data Processing Platform

Supervisors: Mostafa Hadadian and Alexander Lazovik.
Status: available.
Date: 14/10/2022.
This short programming project is a part of the ECiDA project. ECiDA is a data processing platform specially designed for running data processing applications. An application consists of several smaller fragments that are working together to serve the application purpose. These smaller fragments are called data processing pipelines. Each pipeline is also formed from smaller components that are connected in a way that the output of a component is an input to another component. Your task is to develop a single-page web user interface for ECiDA. The user should be able to perform CRUD (Create, Read, Update, Delete) actions on applications, pipelines and components through the UI. The programming language and frameworks are discussable. Although the project should contain both frontend and backend, the focus is more towards the frontend to make a user-friendly and practical Web-UI. Please check the links below to get an insight into the Web-UI of similar projects.
Useful Links: Pipeline (Wikipedia), Kubeflow UI, Airflow UI, Dagster UI (Dagit)