Distributed Systems

Master Projects

Auto-Tuning of Management Configuration Parameters using AI Algorithms

Supervisor: Mahmoud Alasmar.
Status: available.
Date: 1/03/2024.
Management algorithms used by orchestration frameworks, such as Kubernetes, rely on a number of configuration parameters. Manual setting of these parameters based on prior experience and post-deployment monitoring results in suboptimal states, which in turn affect the throughput and latency of the system. Because of the increase in system scale, diversity of workloads, and number of configuration parameters, more robust techniques have become essential in tuning such parameters. Bayesian Optimization [1], Reinforcement Learning [2], and Decision Tree [3] are examples of recent proposals that are aimed at auto-tuning system configuration parameters; however, these solutions may fail in terms of efficiency during online deployment or scalability at a large system scale. In this project, you will work on exploring, developing, and evaluating different auto-tuning algorithms for system parameters, considering the complexity, optimality, and scalability aspects of the solution.

References:

  1. Zhao Lucis Li and Chieh-Jan Mike Liang and Wenjia He and Lianjie Zhu and Wenjun Dai and Jin Jiang and Guangzhong Sun, Metis: Robustly Tuning Tail Latencies of Cloud Systems, 2018 USENIX Annual Technical Conference 981--992, https://www.usenix.org/conference/atc18/presentation/li-zhao
  2. Ajaykrishna Karthikeyan and Nagarajan Natarajan and Gagan Somashekar and Lei Zhao and Ranjita Bhagwan and Rodrigo Fonseca and Tatiana Racheva and Yogesh Bansal, SelfTune: Tuning Cluster Managers, 20th USENIX Symposium on Networked Systems Design and Implementation 1097--1114, https://www.usenix.org/conference/nsdi23/presentation/karthikeyan
  3. Somashekar, Gagan and Tandon, Karan and Kini, Anush and Chang, Chieh-Chun and Husak, Petr and Bhagwan, Ranjita and Das, Mayukh and Gandhi, Anshul and Natarajan, Nagarajan, OPPerTune: Post-Deployment Configuration Tuning of Services Made Easy, USENIX NSDI 2024, https://www.microsoft.com/en-us/research/publication/oppertune/

In-Network Computing supporting In-Order Guarantees in Parallel Stream Processing.

Supervisor: Bochra Boughzala.
Status: Available
Date: 21/01/2024.

Federated Rule Mining in Complex Event Processing

Supervisor: Majid Lotfian Delouee.
Status: available.
Date: 01/12/2023.
The Complex Event Processing system (CEP) represents a crucial paradigm for real-time analysis of dynamic input streams. Typically, during the design phase, domain experts define CEP rules that enable the detection of relevant situations. However, the challenge arises from the highly dynamic nature of the environment. Parameters like thresholds or window sizes for complex event detection often require real-time adjustments. Additionally, the diversity in data sources necessitates the continuous definition of new rules to leverage various data streams for more confident situation detection. On the other hand, Federated Learning (FL) allows data owners to locally train learning models, transmitting only the models instead of raw data. This not only preserves privacy at a higher level but also enhances result quality. In the context of this research project, students are tasked with a comprehensive exploration of the main components and pros and cons inherent in both CEP and FL. The focal point lies in delving into innovative ideas to showcase the potential of adaptively generating new rules while concurrently updating existing ones.

Reinforcement Learning based Approach for Household Appliances Management.

Supervisor: Kawsar Haghshenas, Brian Setz.
Status: Available.
Date: 05/09/2023.
The Internet of Things (IoT) enables household appliances and other devices to be controlled remotely, including changing the power states (on, off, standby) as needed. Combining signals from the smart grid with the concept of IoT results in interesting optimization problems, where appliances’ usage are controlled to achieve a predefined objective such as minimizing total electricity cost. Key concepts within these optimization problems are local energy generation, energy storage, and economic efficiency. In addition, the price of energy is a dynamic signal, provided by the smart grid to end users in real time. This signal is influenced by weather conditions, fossil fuel prices, and current demand.

In this project you work on a reinforcement learning based algorithm to find the best power state plan and schedule for appliances. The overall goal is to determine the optimal schedule that minimizes costs given the energy price signals, the forecasted renewable energy generation, energy storage, the list of appliances and their scheduling constraints.

Computational Cost Prediction for Deep Learning Workloads.

Supervisor: Kawsar Haghshenas.
Status: Taken (unavailable).
Date: 01/06/2023.
Recent advances in Machine Learning (ML), coupled with the utilization of accelerators including GPUs, have led to significant advances in various domains such as machine translation and speech recognition. In addition, many businesses are now integrating ML models into their products. ML practitioners repeatedly retrain models to validate features, fine-tune hyperparameters, and adjust structures. Hence, ML workload is expected to be crucial and increasing, especially in data centers. Within a cloud data center, shared, multi-tenant, and accelerator-equipped clusters run ML training jobs that are competing for resources. Efficient scheduling and allocation algorithms are crucial for optimizing cluster resource utilization and ensuring high-quality services. A key factor for successful scheduling is the ability to accurately predict the computational costs of the jobs. DNNAbacus [1] and DNNPerf [2] are two state-of-the-art approaches for predicting the runtime of Deep Learning (DL) training jobs. Some recent studies have utilized the characteristics of completed mini-batches as feedback on accuracy to scheduler to prioritize or kill a subset of jobs [3]. In this project, we aim to investigate the feasibility of incorporating feedback on the runtime in cluster scheduling and compare this feedback with the predictions from DNNAbacus and DNNPerf approaches.

  1. Lu Bai, et al. "DNNAbacus: Toward Accurate Computational Cost Prediction for Deep Neural Networks," arXiv preprint arXiv:2205.12095, 2022.
  2. Yanjie Gao, et al. "Runtime Performance Prediction for Deep Learning Models with Graph Neural Network," Microsoft, Tech. Rep. MSR-TR-2021-3, 2021.
  3. Wencong Xiao, et al. "Gandiva: Introspective Cluster Scheduling for Deep Learning," 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018.

Deep learning in network switches

Supervisor: Saad Saleh.
Status: available.
Date: 07/02/2023.
The current network architecture has requirements of cognitive network functions at switching devices in order to minimize delay, maximize throughput and decrease network load. Owing to the huge data rates in current network switches, storage of enormous number of match rules inside the match-action units of these devices is not feasible. In this project, we aim to develop and analyze the performance of Deep learning in network switches for network functions like network traffic firewall. Already developed Deep learning approaches will be tested on network traffic and new approaches will be developed for implementation of network functions at switching nodes.

Software Verification for Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Software verification ensures that specific software components or subsystems meet their design requirements. It is done at the design time. On the other hand, data processing pipelines are composed of several data processing elements that are connected together, i.e. an output of one element is an input for another one, to produce the expected result. This project aims to use software verification techniques to verify the design of a data processing pipeline.

Automated Planning of Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Planning is the reasoning side of acting. It is an abstract, explicit deliberation process that chooses and organizes actions by anticipating their expected outcomes. This deliberation aims at achieving some pre-stated objectives. Automated planning is an area of artificial intelligence (AI) that studies this deliberation process computationally. This project aims to use these automated planning techniques to create a system that automatically designs data processing pipelines which are consisted of several building blocks working together to produce the expected result.

Verification and variability in cloud-based workflow engines

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. When variability is defined as allowing change depending on adherence to a specification, we can use existing model checkers to check for possible deviations from workflow executions. Zeebe is the workflow engine of the Camunda Cloud platform. In this project, the student will investigate and implement support for verification and variability based on model checking using the Zeebe workflow engine.

Variability and versioning of data pipelines

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Pipelines describe sequences of operations on buffered streams of data. To allow for different outcomes of pipelines depending on execution contexts, we propose versioning-based variability. For example, consider similar pipeline with Celcius and Fahrenheit in/outputs. Instead of modeling two separate pipelines, we could model one variant (e.g., the one using Celcius) and base the other variant on the first. Streams of data may then select variants based on their execution context. In this project the student will investigate modeling variability in pipelines.

Design of Declarative Process Specifications

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. Although such logics are perfect for formal verification, they are unintuitive and difficult to understand for the average user. In this project, the student is asked to research and implement a tool that visualizes the design of specifications over process models.

Automated reasoning using automated planning

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. To obtain information on successor states in system models, it is possible to rewrite temporal logic expressions using semantic equivalences and expansion laws. Automated planning is an artificial intelligence technique that aims to find an optimal set of actions which together accomplish a predetermined goal. The question for the student then is, can we use automated planning to obtain the possible expanded logic expressions and can we obtain the optimal expanded expression?

Automatically obtain declarative specifications from logs

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
In a previously published research paper, we showed how it is possible to obtain temporal logic specifications from sets of similar business processes. In this project, we would like to explore the possibilities of obtaining similar information from business process logs. That is, is it possible to describe a business process using temporal logic by efficiently converting information contained in log files (XES format) into a compound prime event structure (PES), and can we obtain useful temporal logic specifications from it? [1] Variability in business processes: Automatically obtaining a generic specification (N.R.T.P. van Beest, H. Groefsema, L. García-Bañuelos and M. Aiello), In Information Systems, volume 80, 2019. https://doi.org/10.1016/j.is.2018.09.005

Verification of Security and Privacy concepts in BPMN Choreography diagrams

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Where process models define the flow of activities of participants, choreographies describe interactions between participants. Within such interactions, the security and privacy related concepts of separation of duties and division of knowledge are important. The former specifies that no one person has the privileges to misuse the system, either by error or fraudulent behavior, while the latter defines the absence of total knowledge within a single person, such that the knowledge can not be abused. The problem is, how do we specify such concepts and what kind of model is required to verify these concepts? In this project we ask the student to devise an approach to formally specify and verify these concepts given a BPMN Choreography Diagram.

Allowing Variability while Aligning Business Process executions

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
The practice of checking conformance of business process models has revolutionized the industry through the amount of insight it creates into the actual process flows of businesses. Conformance checking entails matching an event log (which details events of past executions) against a business process model (which details the prescribed process flow) through a so called alignment. Any deviation from the prescribed process flow is detected and reported. The problem, however, is that deviations more often than not entail simple practical solutions to day-to-day business problems and are not malicious. Detailing each such a deviation in the prescribed process flow is not practical, as it would create an immensely complicated process model that includes redundant duplicate tasks. In such cases, we can employ so called variability to define a family of processes, i.e., a set of processes that describe different but related process models. The challenge is then to obtain alignments from process families. In this project we ask the student to solve this challenge by adapting and combining approaches.


Make a difference in Energy Transition with Machine Learning

Contact: Frank Blaauw.
Status: available.
Location: eWEning star.
Date: 01/06/2021.
eWEning star is a “fresh from the oven” Start-Up, which is currently developing a discovery tool that serves stakeholders in the renewable energy sector with relevant scientific information regarding renewable energy. Currently people in this sector use key-word based search queries in order to find scientific papers and reports, but with eWEning star’s concept, these papers are smartly categorized, saving users a lot of time and nerves. By making the search process more efficient we can make the energy transition towards renewables faster! Currently we have around 900 documents that are manually categorized in three different ways: (i) perspective, (ii) position in value chain, and (iii) geographical location. Combined, we have created 15 categories. Depending on the length of your internship, it is possible to work on these all, or choose one out of the three options. While this manual approach is feasible for a small number of papers, it does not scale well. Our aim is to apply Machine Learning to improve this process. We expect that machine learning can provide us with a fast solution for categorizing already published papers according to eWEning star concept. You are given the freedom to design, develop and test a process which leads to the automated categorization. You have a background in Data Science and/or computer science, and you have natural curiosity for solving issues. You aren’t afraid to ask questions if you seem to “hit the wall”, but are capable of working independently. Some entrepreneurial mentality is a benefit as eWEning star is a Start-Up. Good communication skills are needed towards non-technical founder.

Large scale data quality monitoring IT-solution

Contact: Bram van der Waaij.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
TNO is performing research on how to increase the trustworthiness of operational data driven services. Key for this is continuously guarding the data quality of the incoming data and the if the data is still fitting the requirements of the analysis model. Each incoming data stream must be evaluated using many different quality metrics at the same time. Quality metrics can be simple min, max evaluations, more complex distribution matching (e.g. normal distribution) or advanced fitting against the original training data set of the guarded model. The challenge of this assignment is to a) design and implement a scalable data quality monitoring tool which can continuously be adapted to (sensor) data input changes, model requirement changes and/or quality metrices updates. b) design and implement specific quality metrics for the TNO project related to your assignment. Prerequisites: understanding of AI and scalable event processing. You will receive some money as an intern in the company.

Large scale indexing for image hashes.

Contact: Alexander Lazovik or Mathijs Homminga.
Status: available.
Location: Web-IQ.
Date: 01/01/2021.
Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide.

Model development when the data may not be shared

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner.

Dynamic on-the-fly switching between data sources for distributed Big Data analysis

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided.

Runtime validation of software against constraints from context

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided.

Measures of Social Behaviour

Contact: Niels Jongs.
Status: available.
Location: UMCG.
Date: 01/01/2021.
Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients.

Passive Behavioural Monitoring

Contact: Martrien Kas.
Status: available.
Location: University of Groningen: Behavioral Neuroscience.
Date: 01/01/2021.
Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment.

Flexible computing infrastructures

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Privacy-friendly context-aware services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Interaction with devices in a household for the purpose of enabling smart grid services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.


Have your own project suggestions?

We are available to supervise projects on various aspects of distributed systems, in particular involving

  • Service-Oriented and Cloud Computing
  • Pervasive Computing and Smart Environments
  • Network Centric Real-time Analytics
  • Energy Distribution Infrastructures
  • Adaptive Communication Middleware

If you have an idea of a specific project or would like to work generally in a specific area, please let us know about it and we can then narrow the project down.

Please feel free to contact us to discuss specific topics and options.