It might seem that all the excitement around machine learning right now is around generative AI, and the new generation of large language models (LLMs). But sometimes you should take a step back, and remember what we were all excited about last time. Because it could be time to get excited about it again.
Before LLMs made it big, we were all excited by embedded machine learning — what’s called TinyML — and a Raspberry Pi is probably the most affordable way to get started. The inferencing performance we see with Raspberry Pi 4 was comparable to or better than some of the new accelerator hardware, but your overall hardware cost is that much lower.
But what if you want to go really tiny — microcontroller tiny? We have actually seen some really crazy projects making use of our RP2040 chip, and Raspberry Pi Pico, and the RP2040 port of TensorFlow Lite for Microcontrollers. Unfortunately, in the three years that have passed since that port was done by Pete Warden, who at that point headed up the TensorFlow mobile team at Google, the RP2040 port has been languishing. But a couple of weeks ago, that all changed.
A pull request the size of Wales
Now at Useful Sensors, Pete has been doing some interesting things with RP2040. He has just upstreamed the last three years of changes — after all, as he puts it, “We all love pull requests with 1,129 changed files, right?” — and he’s taking on maintenance of the port on a best-effort basis.
Dual-core support on RP2040
But there’s more: beyond that, Pete has updated the port, speeding up the default CMSIS-NN implementation for Conv2D by splitting it across both cores on the RP2040, and adding dual-core optimisations to depthwise convolutions. For the first time, we have dual-core support for TensorFlow on RP2040! If you’re interested in some of the detail behind the updates, Pete has put together a fascinating write-up of the memory layout issues he ran into while debugging the optimisations.
The upshot? These updates and changes reduce the time for the person detection benchmark code from 824ms to 588ms. That’s a ×1.4 speed increase!
If you’re thinking about doing some tiny machine learning, RP2040 just became your go-to platform.