Distributed Systems

Master Projects

E-Mobility data analysis and short and long term prediction

Supervisor: Viktoriya Degeler.
Status: available.
Date: 31/01/2022.
Shell, as a leading energy provider worldwide, has been growing its E-Mobility charging business rapidly during the ongoing energy transition. Currently, Shell operates over 80,000 charge points for electric cars at homes, business, Shell retail sites and destinations. In addition, Shell currently offers access to over 300,000 additional charge points through its roaming networks. The research work of this thesis project will be conducted with Shell’s dedicated E-Mobility analytics group. The team have been working in this field in the past few years and have built a solid foundation for the data infrastructure. For the research work, you are expected to make the most of the E-Mobility related data to develop advanced analytics and models for improving business operations. For example, one of the directions could be to develop (short-term) charging demand prediction models using real charging data from various markets, such as NL, UK and DE.

Software Verification for Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Software verification ensures that specific software components or subsystems meet their design requirements. It is done at the design time. On the other hand, data processing pipelines are composed of several data processing elements that are connected together, i.e. an output of one element is an input for another one, to produce the expected result. This project aims to use software verification techniques to verify the design of a data processing pipeline.

Automated Planning of Data Processing Pipeline

Supervisor: Mostafa Hadadian.
Status: available.
Date: 10/01/2022.
Planning is the reasoning side of acting. It is an abstract, explicit deliberation process that chooses and organizes actions by anticipating their expected outcomes. This deliberation aims at achieving some pre-stated objectives. Automated planning is an area of artificial intelligence (AI) that studies this deliberation process computationally. This project aims to use these automated planning techniques to create a system that automatically designs data processing pipelines which are consisted of several building blocks working together to produce the expected result.

A Privacy-Aware Sensor Data Classification Using Federated Learning.

Supervisor: Majid Lotfian Delouee.
Status: available.
Date: 07/01/2022.
Classification of sensor data can be utilized to extract knowledge about the environment that can be used in different applications. Various machine learning methods like deep learning mechanisms can be hired to determine different classes and make intelligent decisions based on the extracted knowledge. The focus of this project is to design and implement an architecture that classifies received sensor data about the bicycle lane surface and places them on a city map. The first stage of this project is to create a dataset for your project by using an android app called “sensor logger”, which creates two different types of sensor data (.json and .csv). The next stage is to detect different entities in the road (e.g. potholes, Tree roots) by examining the sensor data. Since we want to preserve the privacy of people who participate in data collection, a Federated Learning approach will be employed. Hence, the detected elements need to be placed on a map for each recoded route by the application as a local model. Finally, all of these local models will be integrated to create the final map as a global model.

Modeling and analysis of Network traffic flows

Supervisor: Saad Saleh.
Status: available.
Date: 13/12/2021.
The current network architecture is dominated by a range of traffic categories with different requirements of delay, throughout and jitter at the application layer of end devices. The underlying network mechanisms, like congestion control and flow control, make it quite challenging to guarantee the performance of various flows. In this project, we aim to do research on the requirements of various network traffic flows for measuring the lower threshold for various network traffic categories. The research would model and analyze the traffic flows and propose novel techniques for enhancing the network performance.

Deep learning in network switches

Supervisor: Saad Saleh.
Status: available.
Date: 13/12/2021.
The current network architecture has requirements of cognitive network functions at switching devices in order to minimize delay, maximize throughput and decrease network load. Owing to the huge data rates in current network switches, storage of enormous number of match rules inside the match-action units of these devices is not feasible. In this project, we aim to develop and analyze the performance of Deep learning in network switches for network functions like network traffic firewall. Already developed Deep learning approaches will be tested on network traffic and new approaches will be developed for implementation of network functions at switching nodes.

Verification and variability in cloud-based workflow engines

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. When variability is defined as allowing change depending on adherence to a specification, we can use existing model checkers to check for possible deviations from workflow executions. Zeebe is the workflow engine of the Camunda Cloud platform. In this project, the student will investigate and implement support for verification and variability based on model checking using the Zeebe workflow engine.

Variability and versioning of data pipelines

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Pipelines describe sequences of operations on buffered streams of data. To allow for different outcomes of pipelines depending on execution contexts, we propose versioning-based variability. For example, consider similar pipeline with Celcius and Fahrenheit in/outputs. Instead of modeling two separate pipelines, we could model one variant (e.g., the one using Celcius) and base the other variant on the first. Streams of data may then select variants based on their execution context. In this project the student will investigate modeling variability in pipelines.

Design of Declarative Process Specifications

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. Although such logics are perfect for formal verification, they are unintuitive and difficult to understand for the average user. In this project, the student is asked to research and implement a tool that visualizes the design of specifications over process models.

Automated reasoning using automated planning

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. To obtain information on successor states in system models, it is possible to rewrite temporal logic expressions using semantic equivalences and expansion laws. Automated planning is an artificial intelligence technique that aims to find an optimal set of actions which together accomplish a predetermined goal. The question for the student then is, can we use automated planning to obtain the possible expanded logic expressions and can we obtain the optimal expanded expression?

Automatically obtain declarative specifications from logs

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
In a previously published research paper, we showed how it is possible to obtain temporal logic specifications from sets of similar business processes. In this project, we would like to explore the possibilities of obtaining similar information from business process logs. That is, is it possible to describe a business process using temporal logic by efficiently converting information contained in log files (XES format) into a compound prime event structure (PES), and can we obtain useful temporal logic specifications from it? [1] Variability in business processes: Automatically obtaining a generic specification (N.R.T.P. van Beest, H. Groefsema, L. García-Bañuelos and M. Aiello), In Information Systems, volume 80, 2019. https://doi.org/10.1016/j.is.2018.09.005

Verification of Security and Privacy concepts in BPMN Choreography diagrams

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
Where process models define the flow of activities of participants, choreographies describe interactions between participants. Within such interactions, the security and privacy related concepts of separation of duties and division of knowledge are important. The former specifies that no one person has the privileges to misuse the system, either by error or fraudulent behavior, while the latter defines the absence of total knowledge within a single person, such that the knowledge can not be abused. The problem is, how do we specify such concepts and what kind of model is required to verify these concepts? In this project we ask the student to devise an approach to formally specify and verify these concepts given a BPMN Choreography Diagram.

Allowing Variability while Aligning Business Process executions

Supervisor: Heerko Groefsema.
Status: available.
Date: 31/10/2022.
The practice of checking conformance of business process models has revolutionized the industry through the amount of insight it creates into the actual process flows of businesses. Conformance checking entails matching an event log (which details events of past executions) against a business process model (which details the prescribed process flow) through a so called alignment. Any deviation from the prescribed process flow is detected and reported. The problem, however, is that deviations more often than not entail simple practical solutions to day-to-day business problems and are not malicious. Detailing each such a deviation in the prescribed process flow is not practical, as it would create an immensely complicated process model that includes redundant duplicate tasks. In such cases, we can employ so called variability to define a family of processes, i.e., a set of processes that describe different but related process models. The challenge is then to obtain alignments from process families. In this project we ask the student to solve this challenge by adapting and combining approaches.


Make a difference in Energy Transition with Machine Learning

Contact: Viktoriya Degeler or Frank Blaauw..
Status: available.
Location: eWEning star.
Date: 01/06/2021.
eWEning star is a “fresh from the oven” Start-Up, which is currently developing a discovery tool that serves stakeholders in the renewable energy sector with relevant scientific information regarding renewable energy. Currently people in this sector use key-word based search queries in order to find scientific papers and reports, but with eWEning star’s concept, these papers are smartly categorized, saving users a lot of time and nerves. By making the search process more efficient we can make the energy transition towards renewables faster! Currently we have around 900 documents that are manually categorized in three different ways: (i) perspective, (ii) position in value chain, and (iii) geographical location. Combined, we have created 15 categories. Depending on the length of your internship, it is possible to work on these all, or choose one out of the three options. While this manual approach is feasible for a small number of papers, it does not scale well. Our aim is to apply Machine Learning to improve this process. We expect that machine learning can provide us with a fast solution for categorizing already published papers according to eWEning star concept. You are given the freedom to design, develop and test a process which leads to the automated categorization. You have a background in Data Science and/or computer science, and you have natural curiosity for solving issues. You aren’t afraid to ask questions if you seem to “hit the wall”, but are capable of working independently. Some entrepreneurial mentality is a benefit as eWEning star is a Start-Up. Good communication skills are needed towards non-technical founder.

Large scale data quality monitoring IT-solution

Contact: Bram van der Waaij.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
TNO is performing research on how to increase the trustworthiness of operational data driven services. Key for this is continuously guarding the data quality of the incoming data and the if the data is still fitting the requirements of the analysis model. Each incoming data stream must be evaluated using many different quality metrics at the same time. Quality metrics can be simple min, max evaluations, more complex distribution matching (e.g. normal distribution) or advanced fitting against the original training data set of the guarded model. The challenge of this assignment is to a) design and implement a scalable data quality monitoring tool which can continuously be adapted to (sensor) data input changes, model requirement changes and/or quality metrices updates. b) design and implement specific quality metrics for the TNO project related to your assignment. Prerequisites: understanding of AI and scalable event processing. You will receive some money as an intern in the company.

Large scale indexing for image hashes.

Contact: Alexander Lazovik or Mathijs Homminga.
Status: available.
Location: Web-IQ.
Date: 01/01/2021.
Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide.

Model development when the data may not be shared

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner.

Dynamic on-the-fly switching between data sources for distributed Big Data analysis

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided.

Runtime validation of software against constraints from context

Contact: Elena Lazovik or Toon Albers.
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided.

Measures of Social Behaviour

Contact: Niels Jongs.
Status: available.
Location: UMCG.
Date: 01/01/2021.
Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients.

Passive Behavioural Monitoring

Contact: Martrien Kas.
Status: available.
Location: University of Groningen: Behavioral Neuroscience.
Date: 01/01/2021.
Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment.

Flexible computing infrastructures

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Privacy-friendly context-aware services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.

Interaction with devices in a household for the purpose of enabling smart grid services

Contact: Alexander or TNO directly (contact details in the PDF).
Status: available.
Location: TNO Groningen.
Date: 01/01/2021.
More information: pdf.


Have your own project suggestions?

We are available to supervise projects on various aspects of distributed systems, in particular involving

  • Service-Oriented and Cloud Computing
  • Pervasive Computing and Smart Environments
  • Network Centric Real-time Analytics
  • Energy Distribution Infrastructures
  • Adaptive Communication Middleware

If you have an idea of a specific project or would like to work generally in a specific area, please let us know about it and we can then narrow the project down.

Please feel free to contact us to discuss specific topics and options.