Distributed Systems

Available Projects

Many projects can be adjusted so that they fit the constraints of a Master's term, a Bachelor project, or an internship. Although some projects may have been available for some time, this does not mean they have become less relevant.

Internal Projects

  • Topic: Evaluating the Cost of Compensating Failures for In-Network Analytics (BSc). Many IoT applications depend on the fast detection of situational changes in order to adapt physical processes, e.g., a robot interacting with its environment, changing a manufacturing process, planning the route of a vehicle. In order to detect and to be able to react with low latency to situational changes, modern communication networks offer limited support to execute data analytics inside the network, i.e., on the network path between the producers of data and consumers of data. For example, in-network processing models are accomplished by executing data analytics functions on programmable switch hardware with the P4 language or data centers at the edge. A key problem addressed, as part of this thesis proposal, is that the computational models for in-network processing (INP) data are limited. These usually comprise small buffers and processing at the packet header level in order to forward at line rate received network packages. Therefore, it is for INP-analytics, particularly hard to compensate for loss and reception or loss of data. This thesis will study methods for detecting for data-parallel processing models possible inconsistencies that occur due to out of order arrival and methods to compensate for such inconsistencies. The thesis will propose and evaluate approaches for detecting and executing approaches dependent on important system aspects, e.g., buffer length of networking functions, time to compensate for failures, and time to detect failures. Contact Boris Koldehofe
  • Indoor movement tracking (BSc/Int/MSc). The technology for tracking indoor movement of people has great importance and applicability in diverse fields. It can improve people’s navigation in airports, train stations, museums, malls, etc. It can also help facility managers to analyse how people navigate in their facilities and identify related problems and potential improvements. Retail industry, in particular, can improve the positioning and merchandising of its goods in the supermarkets depending on paths that people take inside the shop. What’s more important, such a retail setting allows for a relatively easy IoT setup for path tracking, because it allows incorporating transmitting devices into shopping baskets and trolleys, thus preserving full anonymity of people who use these baskets. This project aims to construct an IoT system (e.g. by using Bluetooth Low Energy beacons and a smartphone as a receiver) that can track the movement of certain objects within an indoor environment, map it onto a floor plan, and provide insights based on the collected data. Among such insights are: calculate the most common paths between two points; detect “points-of-interest” (e.g. points where tracked objects, and thus people who hold them, spend extra time) and contrast them to the “transit locations” (e.g. hallways, through which people move without much stopping); construct related heat maps (both for movements and for intermediate stopping locations). Contact Viktoriya Degeler
  • Energy consumption patterns profiling and similarity inference (BSc/Int/MSc). When analysing energy consumption of households, among the most important questions are the prediction of consumption and the profiling of consumption patterns. What are the types of consumption profiles and can we understand them by understanding typical habits and behavior of people in the household? Are there any repeated consumption patterns that we can extract from historical data? Can we find similarities in those patterns, and connect them to external factors, such as calendar (e.g. work days vs weekends) or weather? Can we find periodicity in these patterns? This project is aimed at answering these questions, by analysing a dataset of historical household energy consumption. Contact Viktoriya Degeler
  • Detection of used devices from energy consumption patterns (BSc/Int/MSc). In smart automated buildings, activity recognition is usually helpful to describe the contextual situation of the environment. This is used by the smart system to decide, which automated actions should be performed. Knowing the operational status of household devices (e.g. turned on/off or stand by mode for most devices, light level for dimmer lights, air-conditioning mode, etc.) is heavily used as helpful features for activity recognition. Each type of household devices has its own unique energy consumption signature (consumption pattern over time). A trained recognition classification system can recognize the device easily from its energy signature. But in many cases a house only has power meters on the level of a room or a floor, not on the level of separate devices. So extracting the information about the status of particular devices from an aggregated energy consumption pattern is a very interesting problem. This project aims to create a classification system that uses labeled historical data of energy consumption to construct a system that is able to recognize individual devices from an aggregated energy consumption pattern. Contact Viktoriya Degeler
  • Mobility traces analysis (BSc/Int/MSc). Studying mobility patterns of pedestrians or cars can provide considerable insights into the operations of a smart city. Among such questions are the construction of a heatmap of most commonly visited places, commonly visited routes, change in patterns over time or depending on different day types (weekend/weekday) or time of the day, etc. This project aims at constructing a system that is able to answer these types of questions, and investigate the patterns of historical GPS dataset. Contact Viktoriya Degeler
  • Reducing Redundant Exploration of Parallel Search Algorithms (BSc/Int/MSc). Finding the optimal solution to a constraint satisfaction problem is an NP-complete problem, meaning that, in the worst case, the time complexity of a search algorithm grows exponentially with respect to the size of the problem. Many techniques have been developed to reduce the search space, including constraint propagation and pruning based on a bound on the cost. Besides that, parallel search algorithms can be used to explore multiple solutions in parallel, thereby potentially reducing the overall search time. Parallel search algorithms may potentially perform some redundant search, however, as it is not always possible to make the most effective use of pruning techniques. This redundant search may occur because, unlike sequential algorithms that can use the upper or lower bound on the cost to disregard subsequent solutions, a parallel algorithm may have already started to explore such solutions before the bound is known. The impact of this problem becomes considerably worse when the search is distributed and involves many local decisions without any global coordination, such as when the algorithm uses decomposition techniques. This project aims to explore the influence that this redundant search has on the overall execution time of the algorithm and possible techniques that can be used to avoid it, at least partially. Contact Michel Medema
  • Decomposition Techniques for Optimisation Problems (BSc/Int/MSc). Decomposition techniques try to divide a constraint satisfaction problem into independent subproblems based on the dependencies that exist between the variables. The decomposed problem often has a lower worst-case complexity than the original problem, and finding a solution to the problem is generally faster. One such algorithm is Backtracking with Tree-Decomposition, which applies standard backtracking search on the decomposed version of a problem. However, this algorithm was originally designed to solve satisfaction problems rather than optimisation problems, meaning performance results are not available for optimisation problems. The evaluation of Backtracking with Tree-Decomposition on optimisation problems is the focus of this project, as well as comparing these results to other constraint solvers and decomposition techniques. Contact Michel Medema
  • Dynamic Programming with Limited Memory Consumption for Backtracking Search (BSc/Int/MSc). When solving constraint satisfaction problems, standard backtracking search algorithms oftentimes solve a particular subproblem more than once, which unnecessarily increases the overall computational time of the algorithm. Dynamic programming techniques can be used to record the solutions to all the subproblems that are encountered during the search process, making it possible to return the recorded solution in case the subproblem is encountered again without the need to solve it. Unfortunately, the number of solutions that have to be recorded, and therefore the amount of memory that is required to store those solutions, is, in the worst case, exponential with respect to the size of the problem, making this generally infeasible in practice. Nonetheless, it is still possible to record a fixed number of solutions during the search process, which establishes an upper bound on the amount of memory that is required. Several criteria can be used to decide for which subproblems a solution should be recorded, for example, keeping those that are encountered most frequently, and the goal of this project is to try different strategies, analyse the impact on the performance and explore the influence of the number of solutions that are recorded. Contact Michel Medema
  • Optimal Deployment of Actors in a Distributed Setting (BSc/Int/MSc). The actor model defines actors as the main primitives of concurrent computation, where actors can only communicate by exchanging messages, and each actor processes the received messages sequentially. This model of concurrent computation abstracts away low-level constructs that are normally used in concurrent applications such as locks. Actors can also be deployed in a distributed environment, making it possible to utilise the resources of a number of machines. An important question that arises is how the actors should be distributed across the machines to optimise the performance of the application. Using a constraint solver as an example, this project aims to find the best strategy to distribute the actors over a cluster of machines such that the search time is minimised. Contact Michel Medema
  • Prediction of Energy Consumption and Usage of Appliances in Smart Environments (BSc/Int/MSc). Residential and office buildings are responsible for 30% of global energy consumption. Traditionally, space heating and hot water demand have been considered the main domestic energy loads. However, the need for electricity has grown significantly due to the increasing ownership of appliances, and, as a consequence, its environmental impact. Demand-side management programs aim to control the residential power demand in response to signals or incentive schemes; future peer-to-peer energy markets will involve small-scale producers and consumers in energy trading. Accurate energy predictions are required for optimal decision making. The goal of this project is to apply Artificial Intelligence and Machine Learning techniques to predict both the short-term and long-term energy consumption of households and individual appliances, as well as to extract user profiles from historical data. Contact Michel Medema
  • [GDBC] Smart Buildings and Digital Twins (BSc/Int/MSc). The Internet of Things (IoT) is a concept in which a system of connected devices, or things, are able to communicate to each other without the necessity of human involvment. While this infrastructure is currently mainly used for collecting data and using these data as is, one can also imagine a world in which the combination of multiple devices can be used for the creation of so called digital twins: digital representations of real world objects (e.g., buildings). With these twins, one would essentially ask targeted questions to the twin regarding the state of its physical counterpart, aggregated from the different sensors, instead of inspecting the raw sensor values. This project will investigate the use of digital twins, algorithms to create digital twins, and working on applications that deal with sensor data. References: [1] [2]. Contact Viktoriya Degeler or Frank Blaauw.
  • Data science in water management (BSc/Int/MSc). The current water infrastructure in the northern part of the Netherlands generates a large amount of data. In this project, the student is asked to work on one (or more) data science projects related to water management. This includes, but is not limited to: leakage detection, usage prediction, anomaly detection, missing data reparation, GIS analysis, and many more. Findings from this research might find their way back into the production systems of the water management company we work with. Please contact Frank if you would like to hear more about the available use cases. Contact: Frank Blaauw.
  • Containers for analysis (BSc/Int/MSc). Containerization is currently a hot topic in software engineering. Many companies rewrite their traditional applications for make use of this new and promising technology. In data science, however, not much is currently being done with containerization. In this project the student do a comparison of different container types (e.g., Docker, RKT, Singularity, etc.) and will analyse which one works best for the purpose of data analysis. Furthermore, in an ongoing project in which containers will be used for performing data science, the student can help with implementing the optimal container platform. A direction for this research might be Serverless. Serverless computing is a new trend in which tiny applications are spawned whenever they are needed. Such architectures tend to scale well, and potentially scale to zero running instances, whenever a good scheduling service is available. With a large number of services available, composing higher level applications (consisting of serverless functions) can be a challenge. In this project the student will look into the use of reinforcement learning for creating planned pipelines in serverless architectures. The project consists of a literature study and an implementation component. Scientific references: [1][2][3] Software references: [4][5][6] Contact: Frank Blaauw.
  • Verification of process centric software (BSc/Int). Verification entails proving or disproving the correctness of a system model with respect to its specification. One approach towards verification is model checking, which has been used to verify complex systems ranging from electronic circuits to communication protocols. In our research, we use model checking to verify possible executions of process models against sets of formal specifications using a custom package based on Petri nets. The package is included in the widely used Apromore process analytics platform as a plugin. In this project, the student is asked to contribute to this package by researching and adding support for additional types of process definitions through either a translation to Petri nets or a direct verification. Contact: Heerko Groefsema.
  • Model checking process centric software (MSc). When model checking, a system model is automatically, systematically, and exhaustively explored while each explored state is verified for compliance with a formal specification. In our research, we use model checking to verify possible executions of process models using a custom package based on Petri nets. The package is included in the widely used Apromore process analytics platform as a plugin. Current versions of the package uses an external model checker for the verification process. In this project, the student is asked to contribute to this package by researching and adding support for select model checking or satisfiability techniques. Contact: Heerko Groefsema.
  • Specification design for verification (MSc). Verification entails proving or disproving the correctness of a system model with respect to its specification. Such specifications are often expressed using formal methods of mathematics such as temporal logics. Although such logics are perfect for formal verification, they are unintuitive and difficult to understand for the average user. In our research, we use formal verification processes to check possible executions of process models using a custom package. The package is included in the widely used Apromore process analytics platform as a plugin. In this project, the student is asked to contribute to this package by researching and implementing an Apromore plugin that visualizes the design of specifications. Contact: Heerko Groefsema.
  • Energy-efficient Data Centers Models (BSc/Int/MSc). Decreasing energy consumption in data center is a very important topic nowadays. This MSc project will focus on translating key aspects of data center operation to workable data center models. The project features a collaboration with Target Holding/CIT, who manage the university data center. In this project you will discuss with data center operators to identify operational processes and key parameters, and then translate those into tools that can be used for predicting and modeling data center behavior. As such this is a unique opportunity to get a look behind the scenes of data center operation. For this project you will cooperate with the SMS-ENTEG group. More info: (pdf) Contact: Alexander or Tobias van Damme.

External Projects

  • [GDBC] HackerOne HackerOne, the global leader in Hacker-Powered security, is now looking for two interns to join us in Autumn 2020 (September 2020 - January 2021) in our Groningen Office (or remote, depending on COVID-19). As a software engineer intern, you will spend anywhere between 3-6 months working on one of our special projects (as detailed below). You will work closely with a software engineer mentor that has been assigned to you and they will be your main source of technical help and support for the duration of your internship. Along with your mentor, you will also be assigned to work under the supervision of an engineering manager, who will be there to support your journey at HackerOne - anything from engineering topics to project management to personal development. Additionally, you will also have the opportunity to utilize a product manager assigned to work with you, to help you figure out prioritization, feasibility, deadlines, etc. And should you need it, we have designers to help prettify your frontend work and have it ready to be productized (more information). Contact: Frank Blaauw or Bas Baalmans.
  • [GDBC - CURRENTLY NOT AVAILABLE] BuildinG - Transferring learned skills in construction (Int/MSc). Nowadays, one of the big problems in the construction industry is the transfer of experience from one project to the other. The common modus operandi in construction is to see each project as new and unique, which leads to reinventing the wheel in most of those cases. A second reason is that construction projects follow a strong linear, hierarchical, and highly juridicized procedure, which doesn’t further the transmission of experience. And a third that due to the long lead time of construction projects it may take years before gained experience becomes common knowledge. As a solution to this problem, the company BuildinG has developed the Scholenbouwmaatje application, which aims to bring the right knowledge to the right people at the right time. This application serves as a tool to support the traditional development process - the way the construction industry works at this moment -. However, since the construction sector is currently undergoing a shift towards digitalization, which will entail a paradigm shift in the way the industry operates. As a more digital sector, the process will change, but the issue of retrieving the right knowledge at the right time remains. The experience people possess consists of different groups of rules. Some of these rules have a logical character (like in traditional expert systems; if, then, else), while other rules have a more stochastic nature. Some of these rules could be automated (e.g., don't put sunshades in front of a ventilation grille), while others should be chosen by the user him or herself (can the windows be opened?). Some rules are simple, and can be automated easily, and some span multiple parties and objects, with which elaborate estimations of reality should be made. In these cases it might actually be easier to propose the rules to a human expert (e.g., the rule that no drains should be placed below a leaf-dropping tree). Contact: Frank Blaauw or Hanneke van Brakel.
  • [GDBC - CURRENTLY NOT AVAILABLE] BuildinG - The current state of the art in digitalisation of the construction sector in the SMEs in the Groningen area, and which actions should be taken to improve the digitalisation? (Int/MSc). In the next months, three account managers of the Economic Board Groningen (EBG) will administer a questionnaire at approximately fifty companies to determine their position with respect to digitalisation in the construction industry. This initial questionnaire should be followed by a more thorough research project to determine what the exact position (wrt. digitalisation) and what needs it has. Based on these research projects, a masterplan will bedeveloped to promote and implement lifelong learning: which informative and educational activities are needed in the different layers in the organisation (on site, tactical, strategic) to ease the transition towards digitalization as easy as possible. Also we should determine how SMEs can best be supported in this process (company visits, SME office, intervision groups). Besides this support discussions are being held with the leaders of industry to get an idea on how large companies plan the future, and connect this to the plans of the government (e.g., agenda for construction, natural gas free in 2030, circular building in 2050). Besides this, discussions will be held with the top of the research sector. All these aspects combined should form the basis of an AI white paper: a yearly report that should provide guidance for making strategic policy decisions. In the proposed research, one or more students could participate to create part of a plan for a strategic masterclass for SMEs. This masterclass can then be given at BuildinG. Contact: Frank Blaauw or Hanneke van Brakel.
  • Large scale data quality monitoring IT-solution (MSc). TNO is performing research on how to increase the trustworthiness of operational data driven services. Key for this is continuously guarding the data quality of the incoming data and the if the data is still fitting the requirements of the analysis model. Each incoming data stream must be evaluated using many different quality metrics at the same time. Quality metrics can be simple min, max evaluations, more complex distribution matching (e.g. normal distribution) or advanced fitting against the original training data set of the guarded model. The challenge of this assignment is to a) design and implement a scalable data quality monitoring tool which can continuously be adapted to (sensor) data input changes, model requirement changes and/or quality metrices updates. b) design and implement specific quality metrics for the TNO project related to your assignment. Prerequisites: understanding of AI and scalable event processing. You will receive some money as an intern in the company. Location: TNO Groningen. Contact: Bram van der Waaij.
  • In-company internship (Int/Short programming). Researchable B.V. is a small startup located in Groningen. They aim to improve science by developing software in the early phases of research projects (e.g. developing software to collect data, or automate other parts of research) and at the final phase of research projects (i.e., the valorisation of research). During this internship, the student will be part of the Researchable team, and work on various projects that they are currently running. Their office is located on Zernike. Contact: Frank Blaauw.
  • [GDBC] Large scale indexing for image hashes (BSc/Int/MSc). Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide. Contact: Alexander Lazovik or Mathijs Homminga.
  • Model development when the data may not be shared (MSc). Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner. Contact: Elena Lazovik or Toon Albers
  • Dynamic on-the-fly switching between data sources for distributed Big Data analysis (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Runtime validation of software against constraints from context (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Measures of Social Behaviour (BSc/Int/MSc). Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients. Contact: Niels Jongs
  • Passive Behavioural Monitoring (MSc). Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment. Contact: Martrien Kas.
  • Flexible computing infrastructures (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Privacy-friendly context-aware services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Interaction with devices in a household for the purpose of enabling smart grid services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).