A big bang update for TensorFlow Lite for Microcontrollers

It might seem that all the excitement around machine learning right now is around generative AI, and the new generation of large language models (LLMs). But sometimes you should take a step back, and remember what we were all excited about last time. Because it could be time to get excited about it again.

Before LLMs made it big, we were all excited by embedded machine learning — what’s called TinyML — and a Raspberry Pi is probably the most affordable way to get started. The inferencing performance we see with Raspberry Pi 4 was comparable to or better than some of the new accelerator hardware, but your overall hardware cost is that much lower.

But what if you want to go really tiny — microcontroller tiny? We have actually seen some really crazy projects making use of our RP2040 chip, and Raspberry Pi Pico, and the RP2040 port of TensorFlow Lite for Microcontrollers. Unfortunately, in the three years that have passed since that port was done by Pete Warden, who at that point headed up the TensorFlow mobile team at Google, the RP2040 port has been languishing. But a couple of weeks ago, that all changed.

A pull request the size of Wales

Now at Useful Sensors, Pete has been doing some interesting things with RP2040. He has just upstreamed the last three years of changes — after all, as he puts it, “We all love pull requests with 1,129 changed files, right?” — and he’s taking on maintenance of the port on a best-effort basis.

Dual-core support on RP2040

But there’s more: beyond that, Pete has updated the port, speeding up the default CMSIS-NN implementation for Conv2D by splitting it across both cores on the RP2040, and adding dual-core optimisations to depthwise convolutions. For the first time, we have dual-core support for TensorFlow on RP2040! If you’re interested in some of the detail behind the updates, Pete has put together a fascinating write-up of the memory layout issues he ran into while debugging the optimisations.

The upshot? These updates and changes reduce the time for the person detection benchmark code from 824ms to 588ms. That’s a ×1.4 speed increase!

If you’re thinking about doing some tiny machine learning, RP2040 just became your go-to platform.

Jump to the comment form

Nicko avatar

This is great, but TBH what we really need is an update to the now venerable RP2040. Three years is a long time in the microcontroller world!

I know you were working on the Pi5 and finishing up your RP1 chip, but now that that’s done, can we please have an update to the RP2040 with some more RAM, a faster clock, and maybe even a more modern Cortex core? Now _that_ would really help for doing AI at the edge!

Reply to Nicko

Bindeshwar S. Kushwaha avatar

I have Nano BLE sense, and I am trying to deploy TinyML on it.

Reply to Bindeshwar S. Kushwaha

Leave a Comment