As artificial intelligence (Al) transforms various industries, state-of-the-art models have exploded in size and capability. The growth in Al model complexity is rapidly outstripping hardware evolution, making the deployment of these models on edge devices challenging. To enable advanced Al locally, models must be optimized for fitting into the hardware constraints.
In this presentation, we will first discuss how computing hardware designs impact the effectiveness of commonly used Al model optimizations for efficiency, including techniques like quantization and pruning. Additionally, we will present several methods, such as hardware-aware quantization and structured pruning, to demonstrate the significance of software/hardware co-design. We will also demonstrate how these methods can be understood via a straightforward theoretical framework, facilitating their seamless integration into practical applications and their straightforward extension to distributed edge computing. After our presentation, we will share our insights and vision for achieving efficient and robust Al at the edge.
Yiran Chen is the John Cocke Distinguished Professor of Electrical and Computer Engineering at Duke University. He serves as the director of the NSF Al Institute for Edge Computing Leveraging Next-generation Networks (Athena), the NSF Industry-University Cooperative Research Center (IUCRC) for Alternative Sustainable and Intelligent Computing (ASIC), and as the co-director of the Duke Center for Computational Evolutionary Intelligence (DCEl).
An author of over 600 publications and 96 US patents, Dr. Chen has been recognized with numerous awards and honors, including the IEEE Circuits and Systems Society’s Charles A. Desoer Technical Achievement Award and the IEEE Computer Society’s Edward J. McCluskey Technical Achievement Award. He is a Fellow of the AAAS, ACM, IEEE, and NAI, and currently serves as the chair of ACM SIGDA. He is a founding member of the steering committee of the Academic Alliance on Al Policy (AAAIP).