Distributed Systems

Kawsar Haghshenas

  1. Enough Hot Air: The Role of Immersion Cooling (, , and ), In Energy Informatics, .

    Abstract

    Air cooling is the traditional solution to chill servers in data centers. However, the continuous increase in global data center energy consumption combined with the increase of the racks’ power dissipation calls for the use of more efficient alternatives. Immersion cooling is one such alternative. In this paper, we quantitatively examine and compare air cooling and immersion cooling solutions. The examined characteristics include power usage efficiency (PUE), computing and power density, cost, and maintenance overheads. A direct comparison shows a reduction of about 50% in energy consumption and a reduction of about two-thirds of the occupied space, by using immersion cooling. In addition, the higher heat capacity of used liquids in immersion cooling compared to air allows for much higher rack power densities. Moreover, immersion cooling requires less capital and operational expenditures. However, challenging maintenance procedures together with the increased number of IT failures are the main downsides. By selecting immersion cooling, cloud providers must trade-off the decrease in energy and cost and the increase in power density with its higher maintenance and reliability concerns. Finally, we argue that retrofitting an air-cooled data center with immersion cooling will result in high costs and is generally not recommended.


    BibTeX



    url
  2. Carbon Emission-Aware Job Scheduling for Kubernetes Deployments (, and ), In The Journal of Supercomputing, .

    Abstract

    Decreasing carbon emissions of data centers while guaranteeing Quality of Service (QoS) is one of the major challenges for efficient resource management of large-scale cloud infrastructures and societal sustainability. Previous works in the area of carbon reduction mostly focus on decreasing overall energy consumption, replacing energy sources with renewable ones, and migrating workloads to locations where lower emissions are expected. These measures do not consider the energy mix of the power used for the data center. In other words, all KWh of energy are considered the same from the point of view of emissions, which is rarely the case in practice. In this paper, we overcome this deficit by proposing a novel practical CO2-aware workload scheduling algorithm implemented in the Kubernetes orchestrator to shift non-critical jobs in time. The proposed algorithm predicts future CO2 emissions by using historical data of energy generation, selects time-shiftable jobs, and creates job schedules utilizing greedy sub-optimal CO2 decisions. The proposed algorithm is implemented using Kubernetes’ scheduler extender solution due to its ease of deployment with little overheads. The algorithm is evaluated with real-world workload traces and compared to the default Kubernetes scheduling implementation on several actual scenarios.


    BibTeX



    url
  3. Optimal Joint Operation of Coupled Transportation and Power Distribution Urban Networks (, , and ), In Energy Informatics, .

    Abstract

    The number of Electric Vehicles (EVs) and consequently their penetration level into urban society is increasing which has imperatively reinforced the need for a joint stochastic operational planning of Transportation Network (TN) and Power Distribution Network (PDN). This paper solves a stochastic multi-agent simulation-based model with the objective of minimizing the total cost of interdependent TN and PDN systems. Capturing the temporally dynamic inter-dependencies between the coupled networks, an equilibrium solution results in optimized system cost. In addition, the impact of large-scale EV integration into the PDN is assessed through the mutual coupling of both networks by solving the optimization problems, i.e., optimal EV routing using traffic assignment problem and optimal power flow using branch flow model. Previous works in the area of joint operation of TN and PDN networks fall short in considering the time-varying and dynamic nature of all effective parameters in the coupled TN and PDN system. In this paper, a Dynamic User Equilibrium (DUE) network model is proposed to capture the optimal traffic distribution in TN as well as optimal power flow in PDN. A modified IEEE 30 bus system is adapted to a low voltage power network to examine the EV charging impact on the power grid. Our case study demonstrates the enhanced operation of the joint networks incorporating heterogeneous EV characteristics such as battery State of Charge (SoC), charging requests as well as PDN network’s marginal prices. The results of our simulations show how solving our defined coupled optimization problem reduces the total cost of the defined case study by 36% compared to the baseline scenario. The results also show a 45% improvement on the maximum EV penetration level with only minimal voltage deviation (less than 0.3%).


    BibTeX



    url
  4. CO2 Emission Aware Scheduling for Deep Neural Network Training Workloads (, and ), In 2022 IEEE International Conference on Big Data (Big Data), IEEE, .

    Abstract

    Machine Learning (ML) training is a growing workload in high-performance computing clusters and data centers; furthermore, it is computationally intensive and requires substantial amounts of energy with associated emissions. To the best of our knowledge, previous works in the area of load management have never focused on decreasing the carbon emission of ML training workloads. In this paper, we explore the potential emission reduction achievable by leveraging the iterative nature of the training process as well as the variability of CO 2 signal intensity as coming from the power grid. To this end, we introduce two emission-aware mechanisms to shift the training jobs in time and migrate them between geographical locations. We present experimental results on power and carbon emission of the training process together with delay overheads associated with emission reduction mechanisms, for various, representative, deep neural network models. The results show that following emission signals, one can effectively reduce emissions by an amount that varies from 13% to 57% of the baseline cases. Moreover, the experimental results show that the total delay overhead for applying emission-aware mechanisms multiple times is negligible compared to the jobs’ completion time.


    BibTeX



    urldoi
  5. Prediction-Based Underutilized and Destination Host Selection Approaches for Energy-Efficient Dynamic VM Consolidation in Data Centers ( and ), In The Journal of Supercomputing, .

    Abstract

    Improving the energy efficiency while guaranteeing quality of services (QoS) is one of the main challenges of efficient resource management of large-scale data centers. Dynamic virtual machine (VM) consolidation is a promising approach that aims to reduce the energy consumption by reallocating VMs to hosts dynamically. Previous works mostly have considered only the current utilization of resources in the dynamic VM consolidation procedure, which imposes unnecessary migrations and host power mode transitions. Moreover, they select the destinations of VM migrations with conservative approaches to keep the service-level agreements , which is not in line with packing VMs on fewer physical hosts. In this paper, we propose a regression-based approach that predicts the resource utilization of the VMs and hosts based on their historical data and uses the predictions in different problems of the whole process. Predicting future utilization provides the opportunity of selecting the host with higher utilization for the destination of a VM migration, which leads to a better VMs placement from the viewpoint of VM consolidation. Results show that our proposed approach reduces the energy consumption of the modeled data center by up to 38% compared to other works in the area, guaranteeing the same QoS. Moreover, the results show a better scalability than all other approaches. Our proposed approach improves the energy efficiency even for the largest simulated benchmarks and takes less than 5% time overhead to execute for a data center with 7600 physical hosts.


    BibTeX



    url
  6. Infrastructure Aware Heterogeneous-Workloads Scheduling for Data Center Energy Cost Minimization (, , and ), In IEEE Transactions on Cloud Computing, volume 10, .

    Abstract

    A huge amount of energy consumption, the cost of this usage and environmental effects have become serious issues for commercial cloud providers. Solar energy is a promising clean energy source, to provide some portion of the Internet data center's (IDC's) energy usage which can reduce environmental effects and total energy costs. Moreover, due to the high energy consumption of the cooling system, considering cooling power in job scheduling can provide efficient solutions to reduce total energy consumption. In this article, we investigate the problem of minimizing the energy cost of an IDC and propose an algorithm which schedules heterogeneous IDC workloads, by considering available renewable energy, cooling subsystem, and electricity rate structure. We evaluate the effectiveness and feasibility of our algorithm using real and synthetic workload traces. The simulation results illustrate how our proposed solution reduces the data center's energy cost by up to 46 percent compared to previous solutions. Moreover, results show that our solution is capable of reducing energy cost of data centers under different weather conditions, and rate structures.


    BibTeX



    urldoi
  7. MAGNETIC: Multi-Agent Machine Learning-Based Approach for Energy Efficient Dynamic Consolidation in Data Centers (, , and ), In IEEE Transactions on Services Computing, volume 15, .

    Abstract

    Improving the energy efficiency of data centers while guaranteeing Quality of Service (QoS), together with detecting performance variability of servers caused by either hardware or software failures, are two of the major challenges for efficient resource management of large-scale cloud infrastructures. Previous works in the area of dynamic Virtual Machine (VM) consolidation are mostly focused on addressing the energy challenge, but fall short in proposing comprehensive, scalable, and low-overhead approaches that jointly tackle energy efficiency and performance variability. Moreover, they usually assume over-simplistic power models, and fail to accurately consider all the delay and power costs associated with VM migration and host power mode transition. These assumptions are no longer valid in modern servers executing heterogeneous workloads and lead to unrealistic or inefficient results. In this paper, we propose a centralized-distributed low-overhead failure-aware dynamic VM consolidation strategy to minimize energy consumption in large-scale data centers. Our approach selects the most adequate power mode and frequency of each host during runtime using a distributed multi-agent Machine Learning (ML) based strategy, and migrates the VMs accordingly using a centralized heuristic. Our Multi-AGent machine learNing-based approach for Energy efficienT dynamIc Consolidation (MAGNETIC) is implemented in a modified version of the CloudSim simulator, and considers the energy and delay overheads associated with host power mode transition and VM migration, and is evaluated using power traces collected from various workloads running in real servers and resource utilization logs from cloud data center infrastructures. Results show how our strategy reduces data center energy consumption by up to 15 percent compared to other works in the state-of-the-art (SoA), guaranteeing the same QoS and reducing the number of VM migrations and host power mode transitions by up to 86 and 90 percent, respectively. Moreover, it shows better scalability than all other approaches, taking less than 0.7 percent time overhead to execute for a data center with 1,500 VMs. Finally, our solution is capable of detecting host performance variability due to failures, automatically migrating VMs from failing hosts and draining them from workload.


    BibTeX



    urldoi
  8. Fast and Energy-Efficient CNFET Adders with CDM and Sensitivity-Based Device-Circuit Co-Optimization (, and ), In IEEE Transactions on Nanotechnology, volume 17, .

    Abstract

    Since integrated circuit technology entered into the nanoscale regime, energy efficiency has become one of the most significant challenges. The carbon nanotube field effect transistor (CNFET) is one of the highly appreciated nanoscale devices for replacement due to its similar process to the current CMOS technology. The big question in this paper is what are the other specific controllable parameters in CNFET technology for designers to design high-performance and energy-efficient circuits and how much these parameters impact the circuit characteristics? In this regard, two energy-efficient full adders, as the crucial building blocks of digital systems, in 32 nm CNFET technology are designed. Cell design methodology as an efficient logic style is used for the new designs, and CNFET-SEA is used for the optimization. The CNFET-SEA, which is a modification of simple exact algorithm (SEA), is proposed as an appropriate sizing algorithm for circuits in CNFET technology. The sensitivity analysis, as a new approach, is used in the CNFET-SEA algorithm to obtain better sizing results in shorter runtime. The number of tubes, the diameter of tubes, and pitch are considered as the three specific device parameters in the CNFET technology for device-circuit co-optimization, and their effect on the circuit characteristics is investigated. The simulation results show a 15-97% delay, 8-87% power-delay product (PDP), and 22-99% energy-delay product improvement for the proposed full adders compared with the referenced ones. The PDP optimization with CNFET-SEA in comparison with SEA shows 11-20% improvement with a significant runtime reduction for selected adders.


    BibTeX



    urldoi
  9. CNTFET Full-Adders for Energy-Efficient Arithmetic Applications (, , , , and ), In 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, .

    Abstract

    In this paper, we present two energy-efficient full adders (FAs) which are a crucial building block of nano arithmetic logic units (nano-ALUs) with the Cell Design Methodology (CDM). Since the most suitable design configuration for CNT-based ICs is pass transistor configuration (PTL), CDM which properly benefits from PTL advantages is utilized. So the designs herewith take full advantages of simplicity, fewer transistors and better immunity against threshold voltage fluctuations of the PTL than the CCMOS configuration. CDM also resolves two problems of PTL by employing elegant mechanisms which are threshold voltage drop and loss of gain. Using the amend mechanisms and SEA sizing algorithm for CNTFETs, the proposed circuits enjoy full swing in all outputs and internal nodes, structural symmetry, reduced power-delay product (PDP) and energy-delay product (EDP), fairly balanced outputs and high driving capability. The state of the art includes both bulk CMOS and CNTFET technologies. The simulation results exhibit an average PDP and EDP improvement of 9-98% and 55-99% respectively compared with the referenced FAs. All HSPICE simulations were performed on 32nm CNTFET and CMOS process technologies.


    BibTeX



    urldoi