One of our engineers, David Plowman, describes machine learning and shares news of a Raspberry Pi depth estimation challenge run by ETH Zürich (Swiss Federal Institute of Technology).
Spoiler alert – it’s all happening virtually, so you can definitely make the trip and attend, or maybe even enter yourself.
What is Machine Learning?
Machine Learning (ML) and Artificial Intelligence (AI) are some of the top engineering-related buzzwords of the moment, and foremost among current ML paradigms is probably the Artificial Neural Network (ANN).
They involve millions of tiny calculations, merged together in a giant biologically inspired network – hence the name. These networks typically have millions of parameters that control each calculation, and they must be optimised for every different task at hand.
This process of optimising the parameters so that a given set of inputs correctly produces a known set of outputs is known as training, and is what gives rise to the sense that the network is “learning”.
Machine Learning frameworks
A number of well known companies produce free ML frameworks that you can download and use on your own computer. The network training procedure runs best on machines with powerful CPUs and GPUs, but even using one of these pre-trained networks (known as inference) can be quite expensive.
One of the most popular frameworks is Google’s TensorFlow (TF), and since this is rather resource intensive, they also produce a cut-down version optimised for less powerful platforms. This is TensorFlow Lite (TFLite), which can be run effectively on Raspberry Pi.
ANNs have proven very adept at a wide variety of image processing tasks, most notably object classification and detection, but also depth estimation. This is the process of taking one or more images and working out how far away every part of the scene is from the camera, producing a depth map.
Here’s an example:
The image on the right shows, by the brightness of each pixel, how far away the objects in the original (left-hand) image are from the camera (darker = nearer).
We distinguish between stereo depth estimation, which starts with a stereo pair of images (taken from marginally different viewpoints; here, parallax can be used to inform the algorithm), and monocular depth estimation, working from just a single image.
The applications of such techniques should be clear, ranging from robots that need to understand and navigate their environments, to the fake bokeh effects beloved of many modern smartphone cameras.
Depth Estimation Challenge
We were very interested then to learn that, as part of the CVPR (Computer Vision and Pattern Recognition) 2021 conference, Andrey Ignatov and Radu Timofte of ETH Zürich were planning to run a Monocular Depth Estimation Challenge. They are specifically targeting the Raspberry Pi 4 platform running TFLite, and we are delighted to support this effort.
For more information, or indeed if any technically minded readers are interested in entering the challenge, please visit:
The conference and workshops are all taking place virtually in June, and we’ll be sure to update our blog with some of the results and models produced for Raspberry Pi 4 by the competing teams. We wish them all good luck!