JOURNAL ARTICLE 1: STATISTICAL BEHAVIOUR GUIDED BLOCK ALLOCATION IN HYBRID CACHE-BASED EDGE COMPUTING FOR CYBER-PHYSICAL- SOCIAL SYSTEMS
With the proliferate rate of the Internet of Things (IoT), the cyber, physical, and social worlds are integrated. It is also referred to as Cyber-Physical-Social Systems (CPSS). In Cyber-Physical-Social Systems (CPSS), large-scale data are continually generated from edge computing devices in our daily lives. These heterogeneous data collected from CPSS are urgently needed to be processed efficiently with low power consumption. To utilize the benefit of both SRAM and STT-RAM, the hybrid cache architectures have been explored to optimize the inefficient write operations.
Recent work proposes a trace-based prediction hybrid cache to predict write burst blocks dynamically, but this design brings significant overhead that cannot be ignored. The evaluation of the effectiveness of the SBOA was conducted through 3 methods- Methodology, prediction accuracy, energy consumption, execution time, and the overall overhead. One of the key findings and contributions of this research are : It provided a theoretical analysis of the energy consumption in hybrid cache with data allocation.
Since the cost of write operations with SRAM is much smaller than STT-RAM, these policies help to drop the number of write operations in STT-RAM and thereby improving the write performance as well as reducing the write energy. This research proposed a statistics-based SBOA scheme to improve hybrid cache efficiency in CPSS and a hybrid cache architecture based on reuse distance prediction, it is the distance from the first reference to the last reference.
This research also proposed a novel approach called Statistical Behaviour guided block Allocation (SBOA) scheme to process CPSS data. Based on these designs, a set of block placement and migration policies are used to map write-intensive blocks from STT-RAM to SRAM. In CPSS, large-scale data are generated every day from the edge computing devices, thus edge computing methods become more attractive and can improve performance for edge devices.
Many researchers have explored the hybrid cache architecture with various memory technologies. The simulation results showed that the proposed technique can improve energy efficiency as well as the performance with acceptable overhead in CPSS, compared to the state-of-the-art approach. The model identified the cache block characteristics from the read/write statistical behavior with historical data, which was recorded in the sampler.
JOURNAL ARTICLE 2: HARDWARE IMPLEMENTATION AND ANALYSIS OF GEN-Z PROTOCOL FOR MEMORY CENTRIC ARCHITECTURE
With the increase in memory-intensive applications, a memory-centric architecture has been proposed in which the central processing units (CPUs) access a pool of fabric-attached memory. Resource disaggregation technology has been studied to solve the problem of system performance deterioration, which results from low resource utilization. This technology connects the CPU and the resource through a network rather than a system bus.
However, with advancements in the network hardware, the feasibility of disaggregated memory has been evaluated using commercial network equipment Developing a disaggregated memory architecture requires new support with the operating system (OS), hardware, and protocol. The main reason for avoiding memory disaggregation is the presence of network limitations; this is because the bandwidth between the CPU and the memory is large and the latency is extremely small. The CPU nodes are connected directly to the shared memory pool via the Gen-Z fabric. This allows the MC to decouple from the CPU, eliminating CPU and memory dependencies, which solves the problem of inefficient remote memory access, improves resource utilization efficiency, and provides operators with fine-grained resource control.
There are two types of disaggregated memory architecture: partially disaggregated memory that treats local memory as cache memory and fully disaggregated memory that does not use local memory as cache memory. Lim introduced a prototype of a disaggregated memory blade and investigated the feasibility of partially disaggregated memory where the local physical address and remote physical address were mapped sequentially.
Disaggregated recursive data centers in a box – proposed fully disaggregated memory architecture underlying software-defined networking, and constructed a prototype. The feasibility of using commercial networks based on the disaggregated memory blade model was evaluated. The main contribution of the aforementioned work was the disaggregated MC (DMC) and remote memory device driver. They reported that fully disaggregated memory is feasible and argued that the performance can be improved by developing the software architecture.
The performance of the Gen-Z prototype was better than that of local memory and SSD in some situations, the remote memory performance was inadequate, owing to the inability to leverage the CPU cache and packet communication. In a future study, this research intends to improve the Gen-Z prototype according to the discussion and construct a 4-TB disaggregated memory pool by physically separating the Gen-Z host adapter and the Gen-Z media controller.
The results indicated that the performance of the Gen-Z media controller had a few benefits for write requests compared with local memory, and was better than that of SSD. THIS RESEARCH measured the performances of the remote memory access using the Gen-Z media controller and the performance of the local memory and SSD.
JOURNAL ARTICLE 3: OPTIMIZING HEAP MEMORY OBJECT PLACEMENT IN THE HYBRID MEMORY SYSTEM WITH ENERGY CONSTRAINTS
Main memory significantly impacts the power and energy utilization of the overall server system. Non-Volatile Memory (NVM) devices are one of the most important devices that play an integral role for the main memory to reduce static energy consumption. These Non-Volatile Memory (NVM) devices, such as STT-RAM and 3D-XPoint, become a more convenient and safe alternative than DRAM as a main memory.
This happens due to the presence of vital properties such as byte-addressability, persistence, high density, and less energy consumption. Furthermore, various memory devices such as DRAM, PCM, and STT-RAM showcase various characteristics for performance and energy consumption. Application energy consumption varies with different workloads as the application’s memory object access patterns, such as lifetime, size, accessed volume, read/write ratio, spatial & temporal locality, and gradually, change with the workload.
CPU-based energy reduction methodologies are studied for DRAM as well, like powering down the memory ranks, controlling the base memory voltage and frequency.
In HMMS, optimally placing memory variables to a specific memory module will lead to optimized performance and high energy efficiency. New configuring devices to further create and design memory devices, such as Spin-Transfer Torque RAM (STT-RAM), Phase Change Memory (PCM), Magnetic RAM (MRAM), and 3D-XPoint, are being studied to either be used as a main memory or along with the traditional memory, DRAM. eMap considers the fine-grained memory objects access patterns and per-object energy consumption of an application to provide optimal placement policies for memory objects to meet the energy limiting constraint while optimizing performance in HMMS.
While eMPlan decides the placement of objects to optimize the performance and meet the given energy limiting constraints, the runtime memory allocator of eMDyn can re-allocate the objects following the decided placement during the application execution. This research provided the following specific contributions: eMPlan employs the memory object profiler, Integer Linear Programming (ILP) based Energy Estimator, Placement Planner, and a Runtime Memory Allocator. This research proposed an optimal memory object placement solution by considering both memory access patterns and the nature of memory devices of HMMS.