Alleviating Memory Usage Bottleneck on IoT Devices

Machine learning provides powerful tools to researchers to identify and predict patterns and behaviors, as well as learn, optimize, and perform tasks. This ranges from applications like vision systems on autonomous vehicles or social robots to smart thermostats to wearable and mobile devices like smartwatches and apps that can monitor health changes. While these algorithms and their architectures are becoming more powerful and efficient, they typically require tremendous amounts of memory, computation, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of the devices that these algorithms can run on, all the way down to a microcontroller unit (MCU) that’s found in billions of internet-of-things (IoT) devices. An MCU is memory-limited minicomputer housed in compact integrated circuit that lacks an operating system and runs simple commands. These relatively cheap edge devices require low power, computing, and bandwidth, and offer many opportunities to inject AI technology to expand their utility, increase privacy, and democratize their use — a field called TinyML.

Now, an MIT team working in TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, assistant professor in the Department of Electrical Engineering and Computer Science (EECS), has designed a technique to shrink the amount of memory needed even smaller, while improving its performance on image recognition in live videos. “Our new technique can do a lot more and paves the way for tiny machine learning on edge devices,” says Han, who designs TinyML software and hardware. To increase TinyML efficiency, Han and his colleagues from EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically-inspired models after neurons in the brain and are often applied to evaluate and identify visual features within imagery, like a person walking through a video frame. In their study, they discovered an imbalance in memory utilization, causing front-loading on the computer chip and creating a bottleneck. By developing a new inference technique and neural architecture, the team alleviated the problem and reduced peak memory usage by four-to-eight times. Further, the team deployed it on their own tinyML vision system, equipped with a camera and capable of human and object detection, creating its next generation, dubbed MCUNetV2. When compared to other machine learning methods running on microcontrollers, MCUNetV2 outperformed them with high accuracy on detection, opening the doors to additional vision applications not before possible.

The results will be presented in a paper at the conference on Neural Information Processing Systems (NeurIPS). The team includes Han, lead author and graduate student Ji Lin, postdoc Wei-Ming Chen, graduate student Han Cai, and MIT-IBM Watson AI Lab Research Scientist Chuang Gan.

For more, please click here.