Distributed Systems

Available Projects

Many projects can be adjusted so that they fit the constraints of a Master's term, a Bachelor project, or an internship. Although some projects may have been available for some time, this does not mean they have become less relevant.

Internal Projects

  • Data Processing Pipeline in Water Management (Short Programming Project). In this programming project as part of ECiDA (Evolutionary Changes in Data Analysis) project, we want to design and implement a pipeline with different versions for a water management use case. This could be, but not limited to, finding leakage, monitoring water quality, and usage prediction. You don't have to design it yourself; you can also find an already working example and adjust it where it is needed. The use case should be a micro-service application containing different services working together to make the end result. Moreover, the application should be a streaming data processing pipeline. The data can be simulated or streaming of a batch dataset. You can use develop the pipeline from scratch or use predefined libraries and implementations to build the pipeline. However, you should be able to apply changes to the pipeline. There is no programming language restriction for development. For containerization and orchestrating the cluster, we will use Kubernetes, which means that at the end the application should be deployed on a Kubernetes cluster. Contact: Viktoriya Degeler or Mostafa Hadadian.
  • Topic: Evaluating the Cost of Compensating Failures for In-Network Analytics (BSc). Many IoT applications depend on the fast detection of situational changes in order to adapt physical processes, e.g., a robot interacting with its environment, changing a manufacturing process, planning the route of a vehicle. In order to detect and to be able to react with low latency to situational changes, modern communication networks offer limited support to execute data analytics inside the network, i.e., on the network path between the producers of data and consumers of data. For example, in-network processing models are accomplished by executing data analytics functions on programmable switch hardware with the P4 language or data centers at the edge. A key problem addressed, as part of this thesis proposal, is that the computational models for in-network processing (INP) data are limited. These usually comprise small buffers and processing at the packet header level in order to forward at line rate received network packages. Therefore, it is for INP-analytics, particularly hard to compensate for loss and reception or loss of data. This thesis will study methods for detecting for data-parallel processing models possible inconsistencies that occur due to out of order arrival and methods to compensate for such inconsistencies. The thesis will propose and evaluate approaches for detecting and executing approaches dependent on important system aspects, e.g., buffer length of networking functions, time to compensate for failures, and time to detect failures. Contact Boris Koldehofe
  • Reducing Redundant Exploration of Parallel Search Algorithms (BSc/Int/MSc). Finding the optimal solution to a Constraint Satisfaction Problem is an NP-complete problem, meaning that, in the worst case, the time complexity of a search algorithm grows exponentially with respect to the size of the problem. Many techniques have been developed to reduce the search space, including constraint propagation and pruning based on a bound on the cost. Besides that, parallel search algorithms can be used to explore multiple solutions in parallel, thereby potentially reducing the overall search time. Parallel search algorithms may potentially perform some redundant search, however, as it is not always possible to make the most effective use of pruning techniques. This redundant search may occur because, unlike sequential algorithms that can use the upper or lower bound on the cost to disregard subsequent solutions, a parallel algorithm may have already started to explore such solutions before the bound is known. The impact of this problem becomes considerably worse when the search is distributed and involves many local decisions without any global coordination, such as when the algorithm uses decomposition techniques. This project aims to explore the influence that this redundant search has on the overall execution time of the algorithm and possible techniques that can be used to avoid it, at least partially. Contact Michel Medema
  • Decomposition Techniques for Optimisation Problems (BSc/Int/MSc). Decomposition techniques try to divide a Constraint Satisfaction Problem into independent subproblems based on the dependencies that exist between the variables. The decomposed problem often has a lower worst-case complexity than the original problem, and finding a solution to the problem is generally faster. One such algorithm is Backtracking with Tree-Decomposition, which applies standard backtracking search on the decomposed version of a problem. However, this algorithm was originally designed to solve satisfaction problems rather than optimisation problems, meaning performance results are not available for optimisation problems. The evaluation of Backtracking with Tree-Decomposition on optimisation problems is the focus of this project, as well as comparing these results to other constraint solvers and decomposition techniques. Contact Michel Medema
  • Optimal Deployment of Actors in a Distributed Setting (BSc/Int/MSc). The actor model defines actors as the main primitives of concurrent computation, where actors can only communicate by exchanging messages, and each actor processes the received messages sequentially. This model of concurrent computation abstracts away low-level constructs that are normally used in concurrent applications such as locks. Actors can also be deployed in a distributed environment, making it possible to utilise the resources of a number of machines. An important question that arises is how the actors should be distributed across the machines to optimise the performance of the application. Using a constraint solver as an example, this project aims to find the best strategy to distribute the actors over a cluster of machines such that the search time is minimised. Contact Michel Medema
  • Prediction of Energy Consumption and Usage of Appliances in Smart Environments (BSc/Int/MSc). Residential and office buildings are responsible for 30% of global energy consumption. Traditionally, space heating and hot water demand have been considered the main domestic energy loads. However, the need for electricity has grown significantly due to the increasing ownership of appliances, and, as a consequence, its environmental impact. Demand-side management programs aim to control the residential power demand in response to signals or incentive schemes; future peer-to-peer energy markets will involve small-scale producers and consumers in energy trading. Accurate energy predictions are required for optimal decision making. The goal of this project is to apply Artificial Intelligence and Machine Learning techniques to predict both the short-term and long-term energy consumption of households and individual appliances, as well as to extract user profiles from historical data. Contact Michel Medema
  • [GDBC] Smart Buildings and Digital Twins (BSc/Int/MSc). The Internet of Things (IoT) is a concept in which a system of connected devices, or things, are able to communicate to each other without the necessity of human involvment. While this infrastructure is currently mainly used for collecting data and using these data as is, one can also imagine a world in which the combination of multiple devices can be used for the creation of so called digital twins: digital representations of real world objects (e.g., buildings). With these twins, one would essentially ask targeted questions to the twin regarding the state of its physical counterpart, aggregated from the different sensors, instead of inspecting the raw sensor values. This project will investigate the use of digital twins, algorithms to create digital twins, and working on applications that deal with sensor data. References: [1] [2]. Contact Viktoriya Degeler or Frank Blaauw.
  • Data science in water management (BSc/Int/MSc). The current water infrastructure in the northern part of the Netherlands generates a large amount of data. In this project, the student is asked to work on one (or more) data science projects related to water management. This includes, but is not limited to: leakage detection, usage prediction, anomaly detection, missing data reparation, GIS analysis, and many more. Findings from this research might find their way back into the production systems of the water management company we work with. Contact: Viktoriya Degeler or Mostafa Hadadian.
  • Containers for analysis (BSc/Int/MSc). Containerization is currently a hot topic in software engineering. Many companies rewrite their traditional applications for make use of this new and promising technology. In data science, however, not much is currently being done with containerization. In this project the student do a comparison of different container types (e.g., Docker, RKT, Singularity, etc.) and will analyse which one works best for the purpose of data analysis. Furthermore, in an ongoing project in which containers will be used for performing data science, the student can help with implementing the optimal container platform. A direction for this research might be Serverless. Serverless computing is a new trend in which tiny applications are spawned whenever they are needed. Such architectures tend to scale well, and potentially scale to zero running instances, whenever a good scheduling service is available. With a large number of services available, composing higher level applications (consisting of serverless functions) can be a challenge. In this project the student will look into the use of reinforcement learning for creating planned pipelines in serverless architectures. The project consists of a literature study and an implementation component. Scientific references: [1][2][3] Software references: [4][5][6] Contact: Frank Blaauw.
  • Verification of process centric software (BSc/Int). Verification entails proving or disproving the correctness of a system model with respect to its specification. One approach towards verification is model checking, which has been used to verify complex systems ranging from electronic circuits to communication protocols. In our research, we use model checking to verify possible executions of process models against sets of formal specifications using a custom package based on Petri nets. The package is included in the widely used Apromore process analytics platform as a plugin. In this project, the student is asked to contribute to this package by researching and adding support for additional types of process definitions through either a translation to Petri nets or a direct verification. Contact: Heerko Groefsema.
  • Model checking process centric software (MSc). When model checking, a system model is automatically, systematically, and exhaustively explored while each explored state is verified for compliance with a formal specification. In our research, we use model checking to verify possible executions of process models using a custom package based on Petri nets. The package is included in the widely used Apromore process analytics platform as a plugin. Current versions of the package uses an external model checker for the verification process. In this project, the student is asked to contribute to this package by researching and adding support for select model checking or satisfiability techniques. Contact: Heerko Groefsema.
  • Specification design for verification (MSc). Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. Although such logics are perfect for formal verification, they are unintuitive and difficult to understand for the average user. In our research, we use formal verification processes to check possible executions of process models using a custom package. The package is included in the widely used Apromore process analytics platform as a plugin. In this project, the student is asked to contribute to this package by researching and implementing an Apromore plugin that visualizes the design of specifications. Contact: Heerko Groefsema.
  • Energy-efficient Data Centers Models (BSc/Int/MSc). Decreasing energy consumption in data center is a very important topic nowadays. This MSc project will focus on translating key aspects of data center operation to workable data center models. The project features a collaboration with Target Holding/CIT, who manage the university data center. In this project you will discuss with data center operators to identify operational processes and key parameters, and then translate those into tools that can be used for predicting and modeling data center behavior. As such this is a unique opportunity to get a look behind the scenes of data center operation. For this project you will cooperate with the SMS-ENTEG group. More info: (pdf) Contact: Alexander or Tobias van Damme.

External Projects

Make a difference in Energy Transition with Machine Learning (intern needed) eWEning star is a “fresh from the oven” Start-Up, which is currently developing a discovery tool that serves stakeholders in the renewable energy sector with relevant scientific information regarding renewable energy. Currently people in this sector use key-word based search queries in order to find scientific papers and reports, but with eWEning star’s concept, these papers are smartly categorized, saving users a lot of time and nerves. By making the search process more efficient we can make the energy transition towards renewables faster! Currently we have around 900 documents that are manually categorized in three different ways: (i) perspective, (ii) position in value chain, and (iii) geographical location. Combined, we have created 15 categories. Depending on the length of your internship, it is possible to work on these all, or choose one out of the three options. While this manual approach is feasible for a small number of papers, it does not scale well. Our aim is to apply Machine Learning to improve this process. We expect that machine learning can provide us with a fast solution for categorizing already published papers according to eWEning star concept. You are given the freedom to design, develop and test a process which leads to the automated categorization. You have a background in Data Science and/or computer science, and you have natural curiosity for solving issues. You aren’t afraid to ask questions if you seem to “hit the wall”, but are capable of working independently. Some entrepreneurial mentality is a benefit as eWEning star is a Start-Up. Good communication skills are needed towards non-technical founder. Contact: Contact: Viktoriya Degeler or Frank Blaauw.

  • Large scale data quality monitoring IT-solution (MSc). TNO is performing research on how to increase the trustworthiness of operational data driven services. Key for this is continuously guarding the data quality of the incoming data and the if the data is still fitting the requirements of the analysis model. Each incoming data stream must be evaluated using many different quality metrics at the same time. Quality metrics can be simple min, max evaluations, more complex distribution matching (e.g. normal distribution) or advanced fitting against the original training data set of the guarded model. The challenge of this assignment is to a) design and implement a scalable data quality monitoring tool which can continuously be adapted to (sensor) data input changes, model requirement changes and/or quality metrices updates. b) design and implement specific quality metrics for the TNO project related to your assignment. Prerequisites: understanding of AI and scalable event processing. You will receive some money as an intern in the company. Location: TNO Groningen. Contact: Bram van der Waaij.
  • In-company internship (Int/Short programming). Researchable B.V. is a small startup located in Groningen. They aim to improve science by developing software in the early phases of research projects (e.g. developing software to collect data, or automate other parts of research) and at the final phase of research projects (i.e., the valorisation of research). During this internship, the student will be part of the Researchable team, and work on various projects that they are currently running. Their office is located on Zernike. Contact: Frank Blaauw.
  • [GDBC] Large scale indexing for image hashes (BSc/Int/MSc). Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide. Contact: Alexander Lazovik or Mathijs Homminga.
  • Model development when the data may not be shared (MSc). Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner. Contact: Elena Lazovik or Toon Albers
  • Dynamic on-the-fly switching between data sources for distributed Big Data analysis (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Runtime validation of software against constraints from context (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Measures of Social Behaviour (BSc/Int/MSc). Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients. Contact: Niels Jongs
  • Passive Behavioural Monitoring (MSc). Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment. Contact: Martrien Kas.
  • Flexible computing infrastructures (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Privacy-friendly context-aware services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Interaction with devices in a household for the purpose of enabling smart grid services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).