Designing RISC-V Instruction Set Extensions for Artificial Neural Networks: An LLVM Compiler-Driven Perspective

oleh: Karthikeyan Kalyanasundaram Balasubramanian, Mirco Di Salvo, Walter Rocchia, Sergio Decherchi, Marco Crepaldi

Format: Article
Diterbitkan: IEEE 2024-01-01

Deskripsi

The demand for Artificial Intelligence (AI) based solutions is exponentially increasing in all application fields, including low-power devices on the edge. However, due to their limited computational capabilities, these devices, which run Central Processing Units (CPUs) tailored to embedded applications, are typically not optimized to run complex neural networks. Providing ad-hoc extensions to the instruction set architecture of a RISC-V processor can be a viable solution to address this issue. In this work, we propose the use of the PyTorch Graph Lowering (Glow)&#x2013;LLVM toolchain to understand the impact of compiled code of AI models on a RISC-V machine and extend its instruction set to improve runtime performance. This approach allows code profiling, detection of computational bottlenecks, and provisioning of the necessary CPU enhancements to be implemented in the LLVM backend before hardware implementation. After profiling known Artificial Neural Networks (ANNs) quantized to <monospace>int8</monospace> (particularly, a single perceptron, RESNET18, VGG11, and LENET5), we have identified and devised three additional instructions, we named <monospace>LWM</monospace>, <monospace>LWA</monospace> and <monospace>LWS</monospace> respectively indicating Load Word-and-Multiply, -Add, and -Subtract. As a result, we obtained an edge AI-oriented, significantly improved processor description in terms of inference time and program density, ready to be hardware-designed. For <inline-formula> <tex-math notation="LaTeX">$128\times 128$ </tex-math></inline-formula> RGB images, the custom extensions enabled up to <inline-formula> <tex-math notation="LaTeX">$13\times $ </tex-math></inline-formula> speed up compared to RV32I and <inline-formula> <tex-math notation="LaTeX">$5\times $ </tex-math></inline-formula> compared to RV32IM, with a maximum of 11.7&#x0025; lower code. Together with these findings the paper systematically highlights the main methodological steps to include new instructions in an LLVM backend.