Scalability and Performance Evaluation of Edge Cloud Systems
This project aims to analyze the scalability and performance of an edge cloud system for the latency constrained applications. Capacity impacts are to be analyzed for key parameters such as core cloud and edge cloud resource distribution, application performance, edge cloud to core cloud bandwidth and inter-edge cloud bandwidth for various system load.
Edge clouds promise to meet the stringent latency requirements of emerging classes of real time applications such as augmented reality (AR) and virtual reality (VR) by bringing compute, storage and networking resources closer to user devices. Edge compute resources which are strategically placed near the users in the access network do not incur the irreducible propagation delays associated with offloading of compute intensive tasks to a distant data center. In addition, the use of edge computing can also lower wide area backhaul costs associated with carrying user data back and forth from the central cloud. AR and VR applications enable users to view and interact with virtual objects in real time, hence requiring fast end-to-end delivery of compute services such as image analytics and video rendering.
While edge clouds have significant potential for improved system-level performance, there are some important trade-offs between edge and core clouds that need to be considered. Specifically, core clouds implemented as large-scale data centers have the important advantage of service aggregation from large numbers of users, thus making the traffic volume predictable. Further, service requests entering a large data center can be handled in a close to optimal manner via centralized routing and load balancing algorithms. In contrast, edge clouds are intrinsically local and have a smaller scale and are thus subject to significantly larger fluctuations in offered traffic due to factors such as correlated events and user mobility. In addition, we note that edge computing systems by definition are distributed across multiple edge networks and hence are associated with considerable heterogeneity in bandwidth and compute resources. Moreover, the data center model of centralized control of resources is not applicable to a distributed system implemented across multiple edge network domains, possibly involving a multiplicity of service providers.
A general technology solution for edge clouds will thus require suitable distributed control algorithms and associated control plane protocols necessary for realization. The unique nature of the distributed edge cloud system poses key design challenges such as specification of a control plane for distributed edge, distributed or centralized resource assignment strategies, traffic load balancing, orchestration of computing functions and related network routing of data, mobility management techniques and so on. In order to address these challenges, a simulation based system model is the foundation for understanding performance and evaluating alternative strategies for any of the above design issues.
Low-latency application performance is the prime metric for this study. We deployed two sample applications in WINLAB using the Microsoft HoloLens head-mounted device (HMD). The first application is about smart navigation and meeting using AR wherein a user can get these cubes appearing on the path while walking to reach the destination. The second application is annotation based assistance wherein you look at a device or an object and the real-time status appear on your HMD. We note that apart from low-latency and high compute requirements, there is also a paradigm shift for these applications. Here, the user sends a continuous stream of video/audio to the server which after complex processing, returns a small output to the user. This is a shift from how current HTTP based requests work where the server is a data provider and client is a consumer while in this case, the server consumes the data and upload traffic much higher than the download. This is also different from HTTP longtail nature, where the users slow down as they age. But here, the older client also have consistent upload rates.
We observe that despite providing features such as massive aggregation, the cloud only system which is without any edge cloud deployments is not sufficient for supporting AR. The situation gets worse when we limit the bandwidth to the cloud where the latency shoots up as high as 300 ms. Therefore, we use a hybrid edge cloud system (shown below) assuming that the edge clouds are placed closer to the access points in the Chicago city and the core cloud is at a single central location at Salem in Oregon. We assume that the compute capacity of this edge cloud system is constant and as we disaggregate the core cloud resources to the edge, the total remains constant. The system load is varied from 1 to 10 where 10 is 100%. The application is decomposed using a four tuple consisting of tasks per unit time, geographical block, availability of edge compute and latency threshold. The compute is modeled using M/M/C queuing while the latencies of edge and cloud are linear combinations of transmission latency, propagation latency, switching delay and processing latency.
Next, we limit the bandwidth and/or the compute capacity. The Baseline is designed as follows. We assume that each application task can choose its closet AP edge cloud or the 'p' closest neighbors i.e. one can offload to a local edge cloud or go to the core cloud. This is rather a local optimization and usage of compute resources. The resources are distributed between edge and core cloud. CE 80-20 implies that the 80% resource are at the core cloud and 20% are at the edge. By doing so, we observe that the application response time is tolerable if the inter-edge bandwidth is high. From the edge cloud owners perspective, this is not enough as there are still resources at the edge which are left unused while the tasks are offloaded to the cloud. Finally, we design an optimization framework to find a usable server such that its application latency is lower than the latency threshold. The delay-constraint is defined as number of requests out of 100 served within the latency threshold. The objective is then to find a usable server which is a maximum cardinal bin packing problem (ECON) and therefore NP-Hard. The relaxation is thus to find the higher compute node anywhere in the edge layer.
Below figure presents a sample result comparing ECON with the Baseline for different resource distribution and system load. For higher bandwidth and edge resources, both ECON and Baseline are comparable but ECON can do much better when the edge resources are lower but bandwidth is sufficient.
This project provides a framework for modeling and analyzing capacity of a city-scale hybrid edge cloud system intended to serve augmented reality application with service time constraints. A baseline distributed decision scheme is compared with a centralized decision (ECON) approach for various system load, edge-cloud resource distribution, interedge bandwidth and edge-core bandwidth parameters. The results show that a core cloud only system outperforms the edge-only system when inter-edge fronthaul bandwidth is low. The system analysis results provide guidance for selecting right balance between edge and core cloud resources given a specified application delay constraint. We have shown that for the case with higher inter-edge bandwidth and edge computing resources, a distributed edge selection achieves performance close to centralized optimization, whereas with ample core cloud resources and no bandwidth constraints, ECON provides a lower average response time. Our study shows that adding capacity to an existing edge resource without increasing internetwork bandwidth may actually increase network-wide congestion and can result in reduced system capacity. Future work includes evaluating alternative application profiles, and analyzing the impact of mobility on the system capacity and edge placement.