InfoSys

Information Systems Group > BI > FSE > RUG * Print * Search

Crafting User-Centric Experiences: A Novel Taxonomy for Web and Mobile Application Research The project begins by conducting a thorough review of the latest user studies conducted in the web and mobile domains to identify key elements and research gaps. Subsequently, we design and execute an extensive user study, incorporating crucial elements identified during the literature review. One potential focus for the user study is the examination of user behaviors related to Cookie Consent.

Unmasking Mobile Espionage: A Deep Dive into Contemporary Spyware Threats Delving into the realm of cybersecurity, this research explores and exposes modern spyware threats affecting mobile devices. The study aims to uncover the tactics, techniques, and procedures employed by contemporary spyware, contributing valuable insights into mobile security and countermeasures. (e.g. AndroidRAT).

Quantum-Resilient Data Preservation Strategies The objective of this project is to shed light on the current landscape of quantum-resilient cryptographic technologies and their implementations. The findings of this study hold significant importance for sensitive-mission agencies and organizations that could face substantial risks from potential attackers currently intercepting encrypted data, especially in anticipation of the advent of quantum computing.

Securing Mobile Payments: Assessing the Safety of Android Payment Libraries In the era of digital transactions, this research evaluates the security of Android payment libraries. It seeks to assess vulnerabilities and weaknesses in current mobile payment systems, proposing improvements and safeguards to ensure the secure processing of financial transactions on Android platforms.

Academic Sentry: Evaluating Cyber Resilience Against Phishing Attacks Focusing on the educational sector, this research involves the development and evaluation of an "Academic Sentry" system. The goal is to assess the resilience of academic institutions against phishing attacks, providing insights into effective cybersecurity measures to safeguard sensitive information.

Maritime Cyber Incident Repository The goal of the project is to research and deploy a small Industrial Honeynet. A honeynet is a network of honeypots hosted on a server to attract bad actors. The primary purpose of the honeynet is to gather information about the bad actors' tactics, techniques, and procedures (TTP). There are already some examples of research into Industrial Honeynet. For example HoneyICS [1] and HoneyPLC[2]. The so-called honeynet needs to simulate industrial control systems (ICS). When a bad actor scans the honeynet it should have the impression that he is connecting to a “real” ICS network, for example, a power plant. Results of the attack information in the honeynet can be for example presented in Elasticsearch Stack (ELK)[3]. The ELK Stack is a set of open-source tools for log and data analytics.

  • [1] M. Lucchese, F. Lupia, M. Merro, F. Paci, N. Zannone, and A. Furfaro, “HoneyICS: A High-interaction Physics-aware Honeynet for Industrial Control Systems,” Jun. 2023. doi: 10.1145/3600160.3604984.
  • [2] “HoneyPLC: A Next-Generation Honeypot for Industrial Control Systems | Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.” Accessed: Dec. 12, 2023. [Online]. Available: https://dl.acm.org/doi/abs/10.1145/3372297.3423356
  • [3] H. Almohannadi, I. Awan, J. Al Hamar, A. Cullen, J. P. Disso, and L. Armitage, “Cyber Threat Intelligence from Honeypot Data Using Elasticsearch,” in 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), Krakow: IEEE, May 2018, pp. 900–906. doi: 10.1109/AINA.2018.00132.

Exploring and Implementing an Industrial Honeynet Solution The Global Maritime Transportation System (GMTS) is vulnerable to cyberattacks[1], [2]. The Maritime Research Group of NHL Stenden University of Applied Sciences developed the Maritime Cyber Attack Database (MCAD)[3]. MCAD is a database of incidents involving the worldwide maritime sector. The database contains over 160 incidents, dating back to 2001. The incidents the location spoofing of NATO ships visiting Ukraine in the Black Sea in 2021. The database, which was developed in collaboration with students, is now online and publicly available. Utilizing open-source information. The database not only covers incidents impacting vessels but also ports and other maritime facilities worldwide. The GMTS is a system of systems and includes not just vessels but also waterways, ports, and land-side connections, moving people and goods to and from the water. The database utilizes Structured Threat Information Expression (STIX) from Mitre Corporation, which is a language and serialization format used to exchange cyber threat intelligence (CTI). The project consists of researching open-source information about maritime cyber-attacks. The goal of the project is to research maritime cyber incidents and add these to the database. Analyzing the database for trends about the attacks.

Projects with external collaborators

Web-IQ

Web-IQ provides products and expertise in the area of web intelligence. Our customers are national and international companies, public services and law enforcement agencies. We help them use the wealth of information from the open internet, Dark web and social media to gain new insights into their business, solve crimes, help victims and improve their overall effectiveness and efficiency. Research topics for Computing science / informatica

Input query compilation for graph analysis pipelines For: Master or Bachelor student In short: The Web-IQ data platform is pivoting to the use of RDF as it’s core data model. The aim of this project is to create a graph analysis framework in which plugins define their inputs using SPARQL and emit results as RDF statements. Our current pipelines are based on forward-propagation without explicit input or output interface. While flexible, it’s impossible to see upfront how information flows through a pipeline. Also, it requires all data to be pushed through a pipeline. Certainly for incremental analysis this can be wasteful. This new framework should significantly improve our batch analysis throughput, but it should also be usable in in low latency streaming oriented pipelines.

Graph query object mapping API endpoints For: Master or Bachelor student In short: The Web-IQ data-platform exposes a SPARQL graph query interface to access OSINT data. While versatile, integrations of external tools with our products may benefit from a simpler data model (e.g. object/entity oriented JSON or XML) for consumptions. Secondly, we would like to provide use-case specific API endpoints through configuration for easy access to our datasets. The aim of this project is to design and implement a solution that transpiles object oriented query templates into SPARQL queries (or directly as query algebra) and executes them against the Web-IQ data-platform. The project involves analysing known query patterns, exploring the state of the art of API interface (e.g., REST and GraphQL) and how the map to database query languages, and developing a functional prototype.

On-demand analysis in graph queries For: Master or Bachelor student In short: The Web-IQ data-platform exposes a SPARQL graph query interface to access OSINT data. Next to querying the underlying data-stores, for particular use cases on-demand analysis is required / preferred over pre-computation. Example extensions text translation or performing lookups with (“expensive”) remote services. The aim of this project is to design and implement a mechanism to integrate analysis capabilities into a SPARQL graph query service. Key design challenges are maintainability, extensibility and scalability of the system, handling latency of extensions with a large I/O component (e.g. calling a remote service).

Research topics for Artificial Intelligence

Capture the flag For: University Thesis AI Bachelor Description: Flags in images are no rare sight. Identifying them can useful in both identifying affiliation of the individual or building the flag is attached to, but may also help in geolocating an image. While most flags are easily distinguished with the human eye, scanning all available images for them is very time consuming. However, entrusting this to an algorithm comes with its own challenges. Flags can be encountered in a multitude of noisy situations, such as mirrored, rotated, folded and partially obscured. And correctly labeling a flag requires solving all these issues. Goal: Detect flags in an image and label them. Required Skills: • Python programming • Basic scientific reading Skills you will develop: • Reasoning about data quality and detectable signals • Image data preprocessing • Application of deep learning framework (Pytorch) • Object detection and image classification in Computer Vision Steps: 1. Define project scope: which set of flags to consider in which contexts 2. Read up on existing methods 3. Acquire training data 4. Use object detection + classification 5. Error analysis

Model Drift Detection For: University Thesis AI Bachelor/Master Description: Whenever a model is trained on a set of data and used in production, the question arises whether the performance of that model will remain consistent. The same is true for our drug advertisement classification model. This model is based on a BERT encoder and finetuned on internal data, for which a cross-section of our database was selected and labeled. The easiest way to track this issue is to periodically annotate a new set of data, run it through the model and see if the performance has changed. However, this requires a large investment of time from human annotators. Techniques exist to monitor model performance over time, based on a wide range of metrics. In this project, these techniques will be explored. Further reading/options:

 Detecting Concept Drift With Neural Network Model Uncertainty 
 Machine Learning Model Drift Detection Via Weak Data Slices 

• Using frozen encoder layers, train a decoder that can recreate the original training data. Use that same encoder-decoder to score more recent data samples, using model recreation performance as an indicator of shifting structure in the data. Goal: Develop a way to track model performance deviation without regular intensive human annotation efforts. Required Skills: • Python programming • Basic scientific reading • Understanding of deep learning training procedure and model validation Skills you will develop: • Bringing ML models to production level (MLOps) • Understanding of the ML model lifecycle • Data analytics

Timeline interpretation For: University Thesis AI Bachelor/Master Description: When gathering open source intelligence, it is often important to detect upcoming events before/as they happen. Messages, posts and news items can be important clues to detect new events. However, when searching for such items, the same search terms will also yield many reports, news items, and posts, about past events. The topic of this assignment is to investigate how we can distinguish items about current or upcoming events (related to the present time of the search) from items about past events. An example would be the January 6th insurrection. Calls for action and descriptions of the crowd at the Capitol building will have gone out before and during the event. How can we distinguish such items from others describing different incidents in the past? Further reading/options: TBA Goal: Research and train a model that can classify items (social media posts, blogs, news items) as being about a current of upcoming event, versus an event in the past. Required Skills: • Python programming • Scientific reading • Understanding of basic ML models and their capabilities Skills you will develop: • Data analytics • Researching and implementing a model for a new proposition • Data preparation tools, including preprocessing and labeling tools

Improved focused crawling using page-relevance features For: University Thesis AI Bachelor/Master Description: Web-IQ has worked on a focused web crawler that can crawl for specific topics. It determines its priority url queue based on features related to topic relevance. There are many avenues for research for improving the performance of this crawler. For example, using a relevance score instead of a binary on/off topic classification for past and current pages. Or strategies for getting out of initial good/bad areas, when there is little information yet about path and domain relevance for the current state. This project is relatively open. It involves selecting one or more propositions that could help improve the focused crawler, and working them out. Two optional avenues are: 1. Improved feature representation 2. Reinforcement learning approach No. 1 focuses on improving the current prioritization approach by improving the features, and learning the optimal weighting for combining them. No. 2 explores the option of implementing focused crawling as a reinforcement learning problem, where the crawler is rewarded for choosing promising directions. Further reading/options: https://arxiv.org/abs/2112.07620 Goal: Investigate one or more initiatives that can improve the focused crawler. Required Skills: • Python programming • Scientific reading • Understanding of basic ML models and their capabilities Skills you will develop: • Data acquisition tools such as scraping • Data analytics • Data preparation tools such as preprocessing • Researching new ideas and implementing one or more of them as practical applications • Feature engineering • Writing production-level code • A deeper understanding of using large language models in practice

Preventing Adversarial Attacks on LLMs For: University Thesis AI Bachelor/Master Description: In generative artificial intelligence for language tasks, LLMs are the state-of-the-art and current standard. These models are (usually) extensively red-teamed to instil the model with guardrails to prevent use of the model for illicit or offensive uses, such as give tips to perform various crimes, generate hate speech, generate sexual content, etc. This has given rise to a new brand of bad actors trying to break these guardrails in order to use the model to do exactly what is was intended not to be used for. These attempts to break a models alignment, without having to fine-tune the model, are done through specific adversarial prompts. For our uses, these prompts could crop up in dark-web content in order to make the content uninterpretable for large language models by giving the model new instructions when we use such a piece of text as input.

Further reading/options: • Goal: In this project, the student is asked to research the state of these adversarial prompts, try them against some well-known LLMs (possibly those we use in production), and see whether it is indeed possible to derail these models. If so the student is asked to devise a solution through either prompt engineering (bachelor and master level), or fine-tuning of the model (master level). Required Skills: • Python programming • Scientific reading • Basic understanding of generative AI, its uses and limitations Skills you will develop: • Deeper understanding of LLMs, how they learn and how they’re alligned to prevent mis-use • Prompt engineering • (Master) LLM finetuning