Distributed Systems

Master Projects

Software Verification for Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Software verification ensures that specific software components or subsystems meet their design requirements. It is done at the design time. On the other hand, data processing pipelines are composed of several data processing elements that are connected together, i.e. an output of one element is an input for another one, to produce the expected result. This project aims to use software verification techniques to verify the design of a data processing pipeline.

Automated Planning of Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Planning is the reasoning side of acting. It is an abstract, explicit deliberation process that chooses and organizes actions by anticipating their expected outcomes. This deliberation aims at achieving some pre-stated objectives. Automated planning is an area of artificial intelligence (AI) that studies this deliberation process computationally. This project aims to use these automated planning techniques to create a system that automatically designs data processing pipelines which are consisted of several building blocks working together to produce the expected result.

A Privacy-Aware Sensor Data Classification Using Federated Learning.

Supervisor: Majid Lotfian Delouee.
Status: available.
Date: 07/01/2022.
Classification of sensor data can be utilized to extract knowledge about the environment that can be used in different applications. Various machine learning methods like deep learning mechanisms can be hired to determine different classes and make intelligent decisions based on the extracted knowledge. The focus of this project is to design and implement an architecture that classifies received sensor data about the bicycle lane surface and places them on a city map. The first stage of this project is to create a dataset for your project by using an android app called “sensor logger”, which creates two different types of sensor data (.json and .csv). The next stage is to detect different entities in the road (e.g. potholes, Tree roots) by examining the sensor data. Since we want to preserve the privacy of people who participate in data collection, a Federated Learning approach will be employed. Hence, the detected elements need to be placed on a map for each recoded route by the application as a local model. Finally, all of these local models will be integrated to create the final map as a global model.

Modeling and analysis of Network traffic flows

Supervisor: Saad Saleh.
Status: available.
Date: 13/12/2021.
The current network architecture is dominated by a range of traffic categories with different requirements of delay, throughout and jitter at the application layer of end devices. The underlying network mechanisms, like congestion control and flow control, make it quite challenging to guarantee the performance of various flows. In this project, we aim to do research on the requirements of various network traffic flows for measuring the lower threshold for various network traffic categories. The research would model and analyze the traffic flows and propose novel techniques for enhancing the network performance.

Deep learning in network switches

Supervisor: Saad Saleh.
Status: available.
Date: 13/12/2021.
The current network architecture has requirements of cognitive network functions at switching devices in order to minimize delay, maximize throughput and decrease network load. Owing to the huge data rates in current network switches, storage of enormous number of match rules inside the match-action units of these devices is not feasible. In this project, we aim to develop and analyze the performance of Deep learning in network switches for network functions like network traffic firewall. Already developed Deep learning approaches will be tested on network traffic and new approaches will be developed for implementation of network functions at switching nodes.

Verification and variability in cloud-based workflow engines

Supervisor: Heerko Groefsema.
Status: available.
Date: 26/10/2021.
Verification entails proving or disproving the correctness of a system model with respect to its specification. When variability is defined as allowing change depending on adherence to a specification, we can use existing model checkers to check for possible deviations from workflow executions. Zeebe is the workflow engine of the Camunda Cloud platform. In this project, the student will investigate and implement support for verification and variability based on model checking using the Zeebe workflow engine.

Variability and versioning of micro-service pipelines

Supervisor: Heerko Groefsema.
Status: available.
Date: 26/10/2021.
Pipelines describe sequences of operations on buffered streams of data. To allow for different outcomes of pipelines depending on execution contexts, we propose versioning-based variability. For example, consider similar pipeline with Celcius and Fahrenheit in/outputs. Instead of modeling two separate pipelines, we could model one variant (e.g., the one using Celcius) and base the other variant on the first. Streams of data may then select variants based on their execution context. In this project the student will investigate modeling variability in pipelines.

Design of Specifications

Supervisor: Heerko Groefsema.
Status: available.
Date: 26/10/2021.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. Although such logics are perfect for formal verification, they are unintuitive and difficult to understand for the average user. In this project, the student is asked to research and implement a tool that visualizes the design of specifications over process models.

Expansion of temporal logics using automated planning

Supervisor: Heerko Groefsema.
Status: available.
Date: 26/10/2021.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. To obtain information on successor states in system models, it is possible to rewrite temporal logic expressions using semantic equivalences and expansion laws. Automated planning is an artificial intelligence technique that aims to find an optimal set of actions which together accomplish a predetermined goal. The question for the student then is, can we use automated planning to obtain the possible expanded logic expressions and can we obtain the optimal expanded expression?

Automatically obtain specifications from logs

Supervisor: Heerko Groefsema.
Status: available.
Date: 26/10/2021.
In a previously published research paper, we showed how it is possible to obtain temporal logic specifications from sets of similar business processes. In this project, we would like to explore the possibilities of obtaining similar information from business process logs. That is, is it possible to describe a business process using temporal logic by efficiently converting information contained in log files (XES format) into a compound prime event structure (PES), and can we obtain useful temporal logic specifications from it? [1] Variability in business processes: Automatically obtaining a generic specification (N.R.T.P. van Beest, H. Groefsema, L. García-Bañuelos and M. Aiello), In Information Systems, volume 80, 2019. https://doi.org/10.1016/j.is.2018.09.005


Make a difference in Energy Transition with Machine Learning

Contact: Viktoriya Degeler or Frank Blaauw..
Status: available.
Location: eWEning star.
Date: 01/06/2021.
eWEning star is a “fresh from the oven” Start-Up, which is currently developing a discovery tool that serves stakeholders in the renewable energy sector with relevant scientific information regarding renewable energy. Currently people in this sector use key-word based search queries in order to find scientific papers and reports, but with eWEning star’s concept, these papers are smartly categorized, saving users a lot of time and nerves. By making the search process more efficient we can make the energy transition towards renewables faster! Currently we have around 900 documents that are manually categorized in three different ways: (i) perspective, (ii) position in value chain, and (iii) geographical location. Combined, we have created 15 categories. Depending on the length of your internship, it is possible to work on these all, or choose one out of the three options. While this manual approach is feasible for a small number of papers, it does not scale well. Our aim is to apply Machine Learning to improve this process. We expect that machine learning can provide us with a fast solution for categorizing already published papers according to eWEning star concept. You are given the freedom to design, develop and test a process which leads to the automated categorization. You have a background in Data Science and/or computer science, and you have natural curiosity for solving issues. You aren’t afraid to ask questions if you seem to “hit the wall”, but are capable of working independently. Some entrepreneurial mentality is a benefit as eWEning star is a Start-Up. Good communication skills are needed towards non-technical founder.


Caching with Limited Memory Consumption for Backtracking Search

Supervisor: Michel Medema.
Status: available.
Date: 25th of November 2021.
When solving Constraint Satisfaction Problems, standard backtracking search algorithms oftentimes encounter the same subproblem more than once. In order to avoid solving a subproblem repeatedly only to discover that it still has no solution, solvers can employ caching techniques to record unsatisfiable subproblems. Unfortunately, the size of the cache, and therefore the amount of memory that it consumes, is, in the worst case, exponential with respect to the size of the problem, making this generally infeasible in practice. Instead of storing all subproblems, it may be possible to store a fixed number of them. The aim of this project is to analyse how the entries in the cache are used during the search process and to use these insights to reduce the memory consumption of caching techniques. Some subproblems may, for example, not contribute much to the overall speed-up and could easily be discarded.

Reducing Redundant Exploration of Parallel Search Algorithms

Supervisor: Michel Medema.
Status: available.
Date: 25th of November 2021.
Finding the optimal solution to a Constraint Satisfaction Problem is an NP-complete problem, meaning that, in the worst case, the time complexity of a search algorithm grows exponentially with respect to the size of the problem. Many techniques have been developed to reduce the search space, including constraint propagation and pruning based on a bound on the cost. Besides that, parallel search algorithms can be used to explore multiple solutions in parallel, thereby potentially reducing the overall search time. Parallel search algorithms may potentially perform some redundant search, however, as it is not always possible to make the most effective use of pruning techniques. This redundant search may occur because, unlike sequential algorithms that can use the upper or lower bound on the cost to disregard subsequent solutions, a parallel algorithm may have already started to explore such solutions before the bound is known. The impact of this problem becomes considerably worse when the search is distributed and involves many local decisions without any global coordination, such as when the algorithm uses decomposition techniques. This project aims to explore the influence that this redundant search has on the overall execution time of the algorithm and possible techniques that can be used to avoid it, at least partially.

Decomposition Techniques for Optimisation Problems

Supervisor: Michel Medema.
Status: available.
Date: 25th of November 2021.
Decomposition techniques try to divide a Constraint Satisfaction Problem into independent subproblems based on the dependencies that exist between the variables. The decomposed problem often has a lower worst-case complexity than the original problem, and finding a solution to the problem is generally faster. One such algorithm is Backtracking with Tree-Decomposition, which applies standard backtracking search on the decomposed version of a problem. However, this algorithm was originally designed to solve satisfaction problems rather than optimisation problems, meaning performance results are not available for optimisation problems. The evaluation of Backtracking with Tree-Decomposition on optimisation problems is the focus of this project, as well as comparing these results to other constraint solvers and decomposition techniques.


Large scale data quality monitoring IT-solution

Contact: Bram van der Waaij.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
TNO is performing research on how to increase the trustworthiness of operational data driven services. Key for this is continuously guarding the data quality of the incoming data and the if the data is still fitting the requirements of the analysis model. Each incoming data stream must be evaluated using many different quality metrics at the same time. Quality metrics can be simple min, max evaluations, more complex distribution matching (e.g. normal distribution) or advanced fitting against the original training data set of the guarded model. The challenge of this assignment is to a) design and implement a scalable data quality monitoring tool which can continuously be adapted to (sensor) data input changes, model requirement changes and/or quality metrices updates. b) design and implement specific quality metrics for the TNO project related to your assignment. Prerequisites: understanding of AI and scalable event processing. You will receive some money as an intern in the company.

Large scale indexing for image hashes.

Contact: Alexander Lazovik or Mathijs Homminga.
Status: available.
Location: Web-IQ.
Date: 01/01/2021.
Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide.

Model development when the data may not be shared

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner.

Dynamic on-the-fly switching between data sources for distributed Big Data analysis

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided.

Runtime validation of software against constraints from context

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided.

Measures of Social Behaviour

Contact: Niels Jongs.
Status: available.
Location: UMCG.
Date: 01/01/2021.
Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients.

Passive Behavioural Monitoring

Contact: Martrien Kas.
Status: available.
Location: University of Groningen: Behavioral Neuroscience.
Date: 01/01/2021.
Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment.

Flexible computing infrastructures

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Privacy-friendly context-aware services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Interaction with devices in a household for the purpose of enabling smart grid services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.