SHF: Small: A General Framework for Accelerating AI on Resource-Constrained Edge Devices

Dates: 07/01/22 โ€“ 06/30/25
Award Amount: $600,000.00 (FY2022), $16,000.00 (FY2023)
Award #: 2211163

PI: Yingying Chen
Co-PIs: Narayan Mandayam, Waheed Bajwa


The upward trend of the pervasive usage of edge devices provides excellent opportunities for on-device intelligence in future mobile and IoT applications, including mobile augmented reality (AR)/Virtual reality (VR), smart manufacturing, mobile healthcare, and autonomous vehicles. While these edge devices have complete software/hardware stacks to execute machine-learning models, they usually have constrained computing resources. They cannot afford to execute the machine-learning models directly. To keep up with the fast growing deployment of mobile and IoT applications, it is urgently needed to design new neural-network architectures for accelerating artificial intelligence (AI) on resource-constrained edge devices. This proposal aims to develop a novel framework that can efficiently design neural-network architectures suitable for execution on edge devices. The proposed framework develops network architectures that simultaneously balance memory cost, computing efficiency, and prediction accuracy, which can advance on-device AI applications with low latency and high-efficiency requirements. The new deployment optimization methods can generally benefit neural-network implementation and deployment on heterogeneous commodity computing platforms without customized hardware. The project will lead to a solid foundation for a broad range of research topics related to computing architecture design and edge-computing systems. The research results can benefit interdisciplinary curriculums with new research topics and tasks for undergraduate/graduate and minority students.

This project develops a holistic framework for designing efficient and effective neural-network architectures considering edge devices? hardware constraints. It first develops an automated hardware-aware neural-network architecture-design method to efficiently generate optimal neural-network architectures that can balance the trade-offs between the required accuracy and computational performance. A further investigation is conducted to develop novel neural network optimization methods to reduce memory footprints and computational costs in a fine-grained way while satisfying the accuracy requirement. New pruning methods are designed to accurately track the importance of network parameters and effectively reduce pruning iterations and floating-point operations. Moreover, novel implementation mechanisms, such as weight sharing-aware fine-tuning, dynamic partitioning, and on-demand loading schemes, are developed to minimize the loading time overhead and enable efficient deployment and evaluation of the designed architectures on edge devices. Commodity edge devices and FPGAs are employed to implement and evaluate the designed neural-network architecture.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.