Distributed Systems

Dr. Kawsar Haghshenas

Assistant professor

  • Room number: 582
  • E-mail: k.haghshenas [at] rug.nl

Research

  • Networked Systems
  • Cloud Computing

Recent publications

  1. Enough Hot Air: The Role of Immersion Cooling (, , and ), In Energy Informatics, .

    Abstract

    Air cooling is the traditional solution to chill servers in data centers. However, the continuous increase in global data center energy consumption combined with the increase of the racks’ power dissipation calls for the use of more efficient alternatives. Immersion cooling is one such alternative. In this paper, we quantitatively examine and compare air cooling and immersion cooling solutions. The examined characteristics include power usage efficiency (PUE), computing and power density, cost, and maintenance overheads. A direct comparison shows a reduction of about 50% in energy consumption and a reduction of about two-thirds of the occupied space, by using immersion cooling. In addition, the higher heat capacity of used liquids in immersion cooling compared to air allows for much higher rack power densities. Moreover, immersion cooling requires less capital and operational expenditures. However, challenging maintenance procedures together with the increased number of IT failures are the main downsides. By selecting immersion cooling, cloud providers must trade-off the decrease in energy and cost and the increase in power density with its higher maintenance and reliability concerns. Finally, we argue that retrofitting an air-cooled data center with immersion cooling will result in high costs and is generally not recommended.


    BibTeX



    url
  2. Carbon Emission-Aware Job Scheduling for Kubernetes Deployments (, and ), In The Journal of Supercomputing, .

    Abstract

    Decreasing carbon emissions of data centers while guaranteeing Quality of Service (QoS) is one of the major challenges for efficient resource management of large-scale cloud infrastructures and societal sustainability. Previous works in the area of carbon reduction mostly focus on decreasing overall energy consumption, replacing energy sources with renewable ones, and migrating workloads to locations where lower emissions are expected. These measures do not consider the energy mix of the power used for the data center. In other words, all KWh of energy are considered the same from the point of view of emissions, which is rarely the case in practice. In this paper, we overcome this deficit by proposing a novel practical CO2-aware workload scheduling algorithm implemented in the Kubernetes orchestrator to shift non-critical jobs in time. The proposed algorithm predicts future CO2 emissions by using historical data of energy generation, selects time-shiftable jobs, and creates job schedules utilizing greedy sub-optimal CO2 decisions. The proposed algorithm is implemented using Kubernetes’ scheduler extender solution due to its ease of deployment with little overheads. The algorithm is evaluated with real-world workload traces and compared to the default Kubernetes scheduling implementation on several actual scenarios.


    BibTeX



    url
  3. Optimal Joint Operation of Coupled Transportation and Power Distribution Urban Networks (, , and ), In Energy Informatics, .

    Abstract

    The number of Electric Vehicles (EVs) and consequently their penetration level into urban society is increasing which has imperatively reinforced the need for a joint stochastic operational planning of Transportation Network (TN) and Power Distribution Network (PDN). This paper solves a stochastic multi-agent simulation-based model with the objective of minimizing the total cost of interdependent TN and PDN systems. Capturing the temporally dynamic inter-dependencies between the coupled networks, an equilibrium solution results in optimized system cost. In addition, the impact of large-scale EV integration into the PDN is assessed through the mutual coupling of both networks by solving the optimization problems, i.e., optimal EV routing using traffic assignment problem and optimal power flow using branch flow model. Previous works in the area of joint operation of TN and PDN networks fall short in considering the time-varying and dynamic nature of all effective parameters in the coupled TN and PDN system. In this paper, a Dynamic User Equilibrium (DUE) network model is proposed to capture the optimal traffic distribution in TN as well as optimal power flow in PDN. A modified IEEE 30 bus system is adapted to a low voltage power network to examine the EV charging impact on the power grid. Our case study demonstrates the enhanced operation of the joint networks incorporating heterogeneous EV characteristics such as battery State of Charge (SoC), charging requests as well as PDN network’s marginal prices. The results of our simulations show how solving our defined coupled optimization problem reduces the total cost of the defined case study by 36% compared to the baseline scenario. The results also show a 45% improvement on the maximum EV penetration level with only minimal voltage deviation (less than 0.3%).


    BibTeX



    url
  4. CO2 Emission Aware Scheduling for Deep Neural Network Training Workloads (, and ), In 2022 IEEE International Conference on Big Data (Big Data), IEEE, .

    Abstract

    Machine Learning (ML) training is a growing workload in high-performance computing clusters and data centers; furthermore, it is computationally intensive and requires substantial amounts of energy with associated emissions. To the best of our knowledge, previous works in the area of load management have never focused on decreasing the carbon emission of ML training workloads. In this paper, we explore the potential emission reduction achievable by leveraging the iterative nature of the training process as well as the variability of CO 2 signal intensity as coming from the power grid. To this end, we introduce two emission-aware mechanisms to shift the training jobs in time and migrate them between geographical locations. We present experimental results on power and carbon emission of the training process together with delay overheads associated with emission reduction mechanisms, for various, representative, deep neural network models. The results show that following emission signals, one can effectively reduce emissions by an amount that varies from 13% to 57% of the baseline cases. Moreover, the experimental results show that the total delay overhead for applying emission-aware mechanisms multiple times is negligible compared to the jobs’ completion time.


    BibTeX



    urldoi
  5. Prediction-Based Underutilized and Destination Host Selection Approaches for Energy-Efficient Dynamic VM Consolidation in Data Centers ( and ), In The Journal of Supercomputing, .

    Abstract

    Improving the energy efficiency while guaranteeing quality of services (QoS) is one of the main challenges of efficient resource management of large-scale data centers. Dynamic virtual machine (VM) consolidation is a promising approach that aims to reduce the energy consumption by reallocating VMs to hosts dynamically. Previous works mostly have considered only the current utilization of resources in the dynamic VM consolidation procedure, which imposes unnecessary migrations and host power mode transitions. Moreover, they select the destinations of VM migrations with conservative approaches to keep the service-level agreements , which is not in line with packing VMs on fewer physical hosts. In this paper, we propose a regression-based approach that predicts the resource utilization of the VMs and hosts based on their historical data and uses the predictions in different problems of the whole process. Predicting future utilization provides the opportunity of selecting the host with higher utilization for the destination of a VM migration, which leads to a better VMs placement from the viewpoint of VM consolidation. Results show that our proposed approach reduces the energy consumption of the modeled data center by up to 38% compared to other works in the area, guaranteeing the same QoS. Moreover, the results show a better scalability than all other approaches. Our proposed approach improves the energy efficiency even for the largest simulated benchmarks and takes less than 5% time overhead to execute for a data center with 7600 physical hosts.


    BibTeX



    url

(For more publications go to Kawsar's publication page)