Neural network inference operations based on CMSIS-NN kernel have significantly improved runtime/throughput and energy efficiency

At present, in many "always-on" IoT edge devices that require local data analysis, neural networks are becoming increasingly popular. This is mainly due to their ability to significantly reduce the delay and power consumption associated with data transmission. When it comes to deploying neural networks on IoT edge devices, the Arm Cortex-M series processor cores naturally come to mind. If you're looking to enhance performance while reducing memory usage, CMSIS-NN is an excellent choice. Neural network inference using the CMSIS-NN kernel can achieve a 4.6X improvement in runtime/throughput and a 4.9X improvement in energy efficiency. The CMSIS-NN library is divided into two main components: NNFunction and NNSupportFunctions. The NNFunction module contains functions for implementing common neural network layers such as convolution, depthwise separable convolution, fully connected (inner-product) layers, pooling, and activation functions. These functions are used by application code to build and run neural network inference applications. The API is kept simple, making it easy to integrate with any machine learning framework. The NNSupportFunctions provide utility functions, including data conversion and activation options used by the NNFunctions. These utilities can also be leveraged by developers to construct more complex neural network modules, such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU). For certain kernels, like fully connected and convolutional layers, multiple versions of the functions are available. Arm provides a basic version that can be used "as-is" for any layer configuration. Additionally, optimized versions are available, but they may impose constraints on input formats or layer parameters. Ideally, a script can analyze the network topology and automatically select the most suitable function. We tested the CMSIS-NN kernel on a Convolutional Neural Network (CNN) trained on the CIFAR-10 dataset, which consists of 60,000 32x32 color images across 10 classes. The network architecture was based on the built-in example in Caffe, featuring three convolutional layers and one fully connected layer. The following table shows the layer parameters and detailed runtime results when using the CMSIS-NN core. The test was conducted on an ARM Cortex-M7 core, specifically the STM32 NUCLEO-F746ZG mbed development board running at 216 MHz. The entire image classification process takes approximately 99.1 milliseconds per image, equivalent to about 10.1 images per second. The CPU's computational throughput is roughly 249 MOps per second. The pre-quantized model achieved an accuracy of 80.3% on the CIFAR-10 test set, while the 8-bit quantized model reached 79.9%. The maximum memory footprint when using the CMSIS-NN kernel is around 133 KB, thanks to the use of local im2col for convolution, which helps save memory. Without local im2col, memory usage would be approximately 332 KB, which would make the network unsuitable for the board. To quantify the benefits of the CMSIS-NN core compared to existing solutions, we also implemented a benchmark using a one-dimensional convolution function (arm_conv from CMSIS-DSP), similar to Caffe’s pooling and ReLU operations. For CNN applications, the following table summarizes the comparison between the benchmark function and the CMSIS-NN core. The CMSIS-NN core demonstrates a runtime/throughput improvement of 2.6 to 5.4 times over the benchmark function, with energy efficiency improvements aligning closely with these gains. An efficient neural network core is essential for maximizing the potential of the Arm Cortex-M CPU. CMSIS-NN offers optimized functions that accelerate key neural network operations like convolution, pooling, and activation. Moreover, it plays a crucial role in reducing memory footprint—critical for microcontrollers with limited resources. By combining high performance with low memory usage, CMSIS-NN enables powerful AI capabilities on resource-constrained IoT edge devices.

Multifunction Cutting Machine

Multifunction Cutting Machine,Cutting Machine Pipe,Plastic Cutting Machine,Metal Cutting Machine

Kunshan Bolun Automation Equipment Co., Ltd , https://www.bolunmachinery.com