Distributed Systems > CS > JBI > FWN > RUG

Available Projects

Many projects can be adjusted so that they fit the constraints of a Master's term, a Bachelor project, or an internship. Although some projects may have been available for some time, this does not mean they have become less relevant.

Internal Projects

  • [GDBC] Smart Buildings and Digital Twins. The Internet of Things (IoT) is a concept in which a system of connected devices, or things, are able to communicate to each other without the necessity of human involvment. While this infrastructure is currently mainly used for collecting data and using these data as is, one can also imagine a world in which the combination of multiple devices can be used for the creation of so called digital twins: digital representations of real world objects (e.g., buildings). With these twins, one would essentially ask targeted questions to the twin regarding the state of its physical counterpart, aggregated from the different sensors, instead of inspecting the raw sensor values. This project will investigate the use of digital twins, algorithms to create digital twins, and working on applications that deal with sensor data. If this project sounds interesting to you, please contact Viktoriya Degeler or Frank Blaauw.
  • Data science in water management. The current water infrastructure in the northern part of the Netherlands generates a large amount of data. In this project, the student is asked to work on one (or more) data science projects related to water management. This includes, but is not limited to: leakage detection, usage prediction, anomaly detection, missing data reparation, GIS analysis, and many more. Findings from this research might find their way back into the production systems of the water management company we work with. Please contact Frank if you would like to hear more about the available use cases. Contact: Frank Blaauw.
  • Containers for analysis. Containerization is currently a hot topic in software engineering. Many companies rewrite their traditional applications for make use of this new and promising technology. In data science, however, not much is currently being done with containerization. In this project the student do a comparison of different container types (e.g., Docker, RKT, Singularity, etc.) and will analyse which one works best for the purpose of data analysis. Furthermore, in an ongoing project in which containers will be used for performing data science, the student can help with implementing the optimal container platform. A direction for this research might be Serverless. Serverless computing is a new trend in which tiny applications are spawned whenever they are needed. Such architectures tend to scale well, and potentially scale to zero running instances, whenever a good scheduling service is available. With a large number of services available, composing higher level applications (consisting of serverless functions) can be a challenge. In this project the student will look into the use of reinforcement learning for creating planned pipelines in serverless architectures. The project consists of a literature study and an implementation component. Contact: Frank Blaauw.
  • Verification of Service Compositions. Originally designed to support rigid repetitive units of work, business processes currently are required to support flexible and variable processes implemented as service compositions. These flexible compositions, however, must remain true to its initial process requirements and business rules. We developed a Java package that uses a model checking approach to verify the compliance of compositions against sets of formal specifications. The package takes a Petri net and a set of specifications as input, internally converts and optimizes the composition to a verifiable model, verifies the set of specifications against the model, and returns the results of verification. The package is included in the Apromore process analytics platform as a plugin. In this context, the following assignments are available:
    1. Investigate BPMN file format to Petri net PNML file format conversion (BSc).
    2. Investigate WS-BPEL file format to Petri-net PNML file format conversion (BSc).
    3. Investigate UML activity diagram to Petri-net PNML file format conversion (BSc).
    4. Investigate EPC file format to Petri-net PNML file format conversion (BSc).
    5. Investigate direct support for BPMN file format verification (BSc/int).
    6. Investigate direct support for WS-BPEL file format verification (BSc/int).
    7. Investigate direct support for UML activity diagram verification (BSc/int).
    8. Investigate direct support for EPC file format verification (BSc/int).
    9. Investigate the use of CSP/Integer programming to identify satisfiability of conditions on different execution branches (Int/MSc).
    10. Research verification algorithms for internal use in our tools (MSc).
    11. Identify and develop a visual specification design plugin for our tools (MSc).
    Contact: Heerko.
  • Energy-efficient Data Centers Models. Decreasing energy consumption in data center is a very important topic nowadays. This MSc project will focus on translating key aspects of data center operation to workable data center models. The project features a collaboration with Target Holding/CIT, who manage the university data center. In this project you will discuss with data center operators to identify operational processes and key parameters, and then translate those into tools that can be used for predicting and modeling data center behavior. As such this is a unique opportunity to get a look behind the scenes of data center operation. For this project you will cooperate with the SMS-ENTEG group. More info: (pdf) Contact: Alexander or Tobias van Damme.
  • Sustainable Data Centers. In the context of a regional project in collaboration with KPN and an international project with Cognizant India, the research aims at studying techniques to save energy in modern data centres. Internet of things and machine learning are central to the approach. In particular, the project will involve one or more of the following items: *) environmental model of data center for steering/controlling energy consumption (preferably generalisable); *) energy consumption model of a data center and its components; *) report containing recommendations for reducing carbon-dioxide footprints of datacenter; *) adaptive planning and scheduling techniques to save energy in data centres. Contact: Alexander or Wico Mulder.
  • Distributed Discrete Optimisation. Constraint satisfaction problems are a type of search problem with a broad range of applications, including planning, scheduling and resource allocation. Solving these problems with respect to a certain objective function allows optimisation of that particular problem, for example, optimising the energy consumption of a building. Unfortunately, this problem is NP-hard, meaning that algorithms that are guaranteed to find the optimal solution to a constraint satisfaction problem require exponential time to do so. Consequently, the size that algorithms can handle is limited (e.g. constructing a CSP to model an entire building would be impossible). When dealing with dynamic environments, the problem also has to be solved continuously and possibly in real time, requiring a solution to be available within a limited amount of time. Constraint networks of real-world problems are often sparse, however, and if the problem domain exhibits inherent locality, large-scale problems can be solved more efficiently by exploiting these structures (e.g. processes within a building are often mostly localised within a single room or area). This relative independence also facilitates parallelism in the search process, allowing a distributed cluster of machines to solve the problem faster and enables scaling with respect to the problem size. Many projects related to this topic are available, such as realising a more efficient distributed search algorithm, dealing with dynamicity within the environment by continuously solving the problem, increasing the level of parallelism of the algorithm and more. Contact: Michel Medema.

External Projects

  • In-company internship (Int/Short programming). Researchable B.V. is a small startup located in Groningen. They aim to improve science by developing software in the early phases of research projects (e.g. developing software to collect data, or automate other parts of research) and at the final phase of research projects (i.e., the valorisation of research). During this internship, the student will be part of the Researchable team, and work on various projects that they are currently running. Their office is located on Zernike. Contact: Frank Blaauw.
  • Large scale indexing for image hashes (MSc/Int/BSc). Facebook has open sourced its TMK+PDQF video hashing and PDQ image hashing algorithm (article, pdf). Web-IQ is supporting Law Enforcement Agencies and NGO’s in their fight against online child exploitation. We want to add the PDQ hashing algorithm to our image matching services. Computing PDQ hashes from images is straight forward, but indexing these hashes to support efficient search (hamming distance) in millions of hashes is a different story. During this research project you will investigate, design, build, evaluate and compare different methods to index bit vectors at large scale. You will have the opportunity to work with (anonymised) real-world data and via our partners your results will directly contribute to the fight against online child exploitation world-wide. Contact: Alexander Lazovik or Mathijs Homminga.
  • Model development when the data may not be shared (MSc). Big Data and AI are becoming a bigger and bigger influence on our daily life. People and companies become increasingly aware of the potential of their data and the impact on losing control on who is using their data. Therefore, companies are no longer willing to share their (private, business critical) data. Traditionally, a company with data would send their data to another company that is developing an analysis model (for example, a Machine Learning model). TNO is investigating the possibilities of developing models in an environment where data is not allowed to be freely transported. One of the solutions is to no longer bring the data to the analysis model (D2A), but to bring the analysis model to the data (A2D). This master student assignment is about investigating and building a prototype of an approach to be able to develop analysis models in an A2D manner. Contact: Elena Lazovik or Toon Albers
  • Dynamic on-the-fly switching between data sources for distributed Big Data analysis (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. In both of these cases the data is retrieved from one or more data sources, analyzed or transformed, which results in output to a data "sink", such as another database or a message queue. At TNO we are looking into ways to update such long running analysis processes in runtime, and part of that is updating the data sources: The longer a data analysis process is running, the more likely it is that new sources of data are introduced (think of user behavior data from a newly created part of a website, or sensor data from a new data provider) or that outdated data sources must be switched to newly created sources (think of switching from SQL to NoSQL). Your challenge is to develop a technical library that would support the switching of both streaming and historical data sources for distributed analysis platforms in runtime (for example Apache Spark). Knowledge of distributed systems (through the Distributed Systems, Scalable Computing and Web & Cloud Computing courses) is key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. Internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Runtime validation of software against constraints from context (MSc). Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of verifying whether a running distributed analysis meets these limitations and requirements. We have some experience in working with constraints for IT systems. Your challenge would be to investigate and experiment on capturing the different kinds of constraints that can be defined on a system, and to develop a solution that can validate a running data analysis against these constraints. The validation against given constraints should happen in runtime when it is needed (for example, when new constraints are added). Knowledge of distributed systems (through the Scalable Computing course) and good understanding of mathematics/logic are key, and we are looking for people that enjoy both research as well as the actual development of software. TNO provides a physical cluster to run your experiments on. An internship compensation is also provided. Contact: Elena Lazovik or Toon Albers
  • Measures of Social Behaviour. Questionnaires are sensitive to several sources of noise. And above all, the moment-by-moment quantification of behaviour is impossible while using questionnaires. To manoeuvre away from these deficiencies we have developed a passive monitoring system that is based on the ubiquity smartphone technology. Due to the advances in technology, the World Economic Forum announced in February 2016, that the world is entering its Fourth Industrial Revolution based on hyper-connectivity, data-driven solutions and artificial intelligence (World Economic Forum, 2016). hyper-connectivity is characterised by a state of being constantly connected to individuals and machines through devices such as smartphones. hyper-connectivity and large-scale data collection through smartphones are the fundamental elements of new technological initiatives in healthcare and biomedical-research. These smartphone-based technological initiatives are largely due to the fact that the number of sensors embedded in smartphones have exploded over the past few years. Nowadays the majority of smartphones are equipped with sensors such as a GPS, accelerometer, gyroscope, WIFI, bluetooth, camera and microphone. These smartphones aggregate a large amount of user related data which are in the context of research largely untouched. Our ambition is to develop several objective measures of social behaviour by using the data collected through our passive monitoring application. The objective quantification of social behaviour is important since the majority of psychiatric disorders affect social behaviour. In the context of a master thesis, we would like a master student with good knowledge of R to develop several of these measures that are related to social behaviour and test these measures on data of psychiatric patients. Contact: Niels Jongs
  • Passive Behavioural Monitoring (MSc). Advances in low power communication technologies and large scale data processing continue to give rise to the concept of mobile healthcare systems as an integral part of clinical care/research processes. This project will focus on the data that is collected by a passive behavioural monitoring system in which personal mobile devices are used as a measuring instrument. The data mainly consists of sensor and activity data which might allow us to differentiate between healthy and non-healthy individuals. In this project, our aim is to establish behavioural profiles which are related to neuropsychiatric disorders by using advanced data analysis and data mining techniques. These behavioural profiles are derived from the sensor and activity data collected from a passive behavioural monitoring system and are used to predict the onset or relapse of neuropsychiatric disorders. Additionally, our aim is translate these behavioural profiles to animal behavioural models of which the data is collected in a controlled lab environment. Contact: Martrien Kas.
  • Flexible computing infrastructures (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Privacy-friendly context-aware services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).
  • Interaction with devices in a household for the purpose of enabling smart grid services (proposed by TNO Groningen). More information: pdf. Contact: Alexander or TNO directly (contact details in the PDF).