Vectors from coarse motion estimation

Liz: Gordon Hollingworth, our Director of Software, has been pointing the camera board at things, looking at dots on a screen, and cackling a lot over the last couple of weeks. We asked him what he was doing, so he wrote this for me. Thanks Gordon!

The Raspberry Pi is based on a BCM2835 System on a Chip (SoC), which was originally developed to do lots of media acceleration for mobile phones. Mobile phone media systems tend to follow behind desktop systems, but are far more energy efficient. You can see this efficiency at work in your Raspberry Pi: to decode H264 video on a standard Intel desktop processor requires GHz of processing capability, and many (30-40) Watts of power; whereas the BCM2835 on your Raspberry Pi can decode full 1080p30 video at a clock rate of 250MHz, and only burn 200mW.

Grodon

Because we have this amazing hardware it enables us to do things like video encode and decode in real time without actually doing much work at all on the processor (all the work is done on the GPU, leaving the ARM free to shuffle bits around!) This also means we have access to very interesting bits of the encode pipeline that you’d otherwise not be able to look at.

One of the most interesting of these parts is the motion estimation block in the H264 encoder. To encode video, one of the things the hardware does is to compare the current frame with the previous (or a fixed) reference frame, and work out where the current macroblock (16×16 pixels) best matches the reference frame. It then outputs a set of vectors which tell you where the block came from – i.e. a measure of the motion in the image.

In general, this is the mechanism used within the application motion. It compares the image on the screen with the previous image (or a long-term reference), and uses the information to trigger events, like recording the video or writing a image to a disk, or triggering an alarm. Unfortunately, at this resolution it takes a huge amount of processing to achieve this in the pixel domain; which is silly if the hardware has already done all the hard work for you!

So over the last few weeks I’ve been trying to get the vectors out of the video encoder for you, and the attached animated gif shows you the results of that work. What you are seeing is the magnitude of the vector for each 16×16 macroblock equivalent to the speed at which it is moving! The information comes out of the encoder as side information (it can be enabled in raspivid with the -x flag). It is one integer per macroblock and is ((mb_width+1) × mb_height) × 4 bytes per frame, so for 1080p30 that is 120 × 68 × 4 == 32KByte per frame. And here are the results. (If you think you can guess what the movement you’re looking at here represents, let us know in the comments.)

blamenuttall

Since this represents such a small amount of data, it can be processed very easily which should lead to 30fps motion identification and object tracking with very little actual work!

Go forth and track your motion!

63 comments

Allen Heard avatar

Juggling!!

RaTTuS avatar

Juggling look ace like that

Ken MacIver avatar

Juggling; whilst wearing kitchen gloves..
HOT X Easter buns maybe..

Brian avatar

So, essentially, you’ve replicated functionality like that of the Kinect or PS Move technology with a Raspberry Pi??

Pretty cool.

jdb avatar

Yes.

But using any arbitrary HD video source (max 1080p30). The most obvious one to use is the Pi CSI camera, since that has the lowest overheads for processing (most of the work is done GPU side).

Alternatively you could write a custom MMAL program to take e.g. a V4L2 video source from a webcam and push it across to the h264 encoder, then extract the motion vectors from that.

The Raspberry Pi Guy avatar

Juggling! Awesome stuff…

You forgot to mention, along with Gordon’s job title, that he is also a soldering perfectionist.

Paul F avatar

That’s someone throwing a ball and a dog jumping up to catch it… (Freud will have something to say about that i’m sure….)

And love where this is going Kinect style motion detecting on $35 (plus the camera so $70)

Aaron avatar

So we should run motion on the output of this, instead of the full video feed?

Or is there a better way to do this altogether, other than using motion?

Cheers

PiGraham avatar

Looks good. Is it easy to get the direction as well?

Tom West avatar

Given a vector has both magnitude and direction, is there any potential to expose the direction aspect?

(NB: after the phrase “… magnitude and direction”, I really wanted to add “Oh yeah!”, as per ‘Despicable Me’)

Gordon avatar

Yes each integer in the output stream represents a single 16×16 block….

It is encoded as follows

struct motion_vector {

Gordon avatar

That’s strange!! My computer just decided to post that in the middle… must hurry with my responses!!!

struct motion_vector {
    short sad;
    char y_vector;
    char x_vector;
}

So encodes more than just the vector but also the SAD (Sum of Absolute Difference) for the block. You can look at this value to get a feel for how well the vector represents the match to the reference frame (I’ve ignored it in creating the gif)

Jon Colt avatar

Juggling a running chain-saw, two hedgehogs, and a working pressure-cooker just removed from the stove-top,

paddy gaunt avatar

Presumably the proximity of the post on ‘motion tracking’ to the post on ‘toilet bugging’ was not a coincidence. I hate to think what’s being juggled.

Bill Stephenson avatar

That is cool stuff.

That’s someone (probably you) juggling tennis balls.

Great work!

ZeRaler avatar

Nice feature, maybe a start to a Camera Motion Estimation software. Useful for robotic !
I hoped that someone can make a plugin to OpenCV to able to use GPU, but this stuff is another way to do it.

Thanks!

Rob McPeak avatar

Looks like a lava lamp to me…

Simon Walters avatar

Master Nuttall playing with his balls

AndrewS avatar

Interesting. How come there appears to be “static” across the bottom of the frame?

Gordon avatar

I’m not sure, it’s probably just a remnant of the way I’ve processed the image (or maybe a bug…)

Gordon

6by9 avatar

More likely that 1080 is not a multiple of 16, so the last row of macroblocks also encodes some rubbish that hasn’t been written by the camera or otherwise initialised.
(Having never had the motion vectors visualised like this, I’m guessing no one here has thought about that bit, and that the CME motion vectors are going to be compromised there. Hmm).

nathanael avatar

I wonder if there are applications for using such an image alongside one image generating by a camera module without an Infra Red filter? That could be good for security; Heat + Motion = Intruder (Rustling Leaves = Not Intruder / Warm Brick Wall In Summer = Not Intruder).

jbeale avatar

Please note that the Pi No-IR camera is not a thermal imager (for anything cooler than a hot soldering iron). For body temperature, thermal IR wavelengths are around 4 microns (mid-IR), but silicon chips stop detecting around 1 micron (near-IR).

nathanael avatar

bummer

Jonathan Chetwynd avatar

it can be enabled in raspivid with the -x flag

sudo rpi-update
raspivid -v
Camera App v1.3.11
raspivid -x
Invalid command line option (-x)

how does one access the side information?
ie once -x is valid…

love the gif

Gordon avatar

Sorry, I didn’t get it into raspivid before leaving to pick up the kids… Will get it pushed in tomorrow, promise!

Bantammenace avatar

Is that you juggling the work/life balance in the video ?

jbeale avatar

Great work, it is very neat to see motion-vector juggling! The paper I read on this technique suggested that there is a lot of noise in the H.264 motion vectors in real-world images, but if you compare data across the frame (contiguous groups of motion vectors all moving the same way) and temporal coherence (moving the same way over many frames) then you’ve really got something useful. Maybe I can finally stop ‘motion’ from taking so many pictures of my yard on partly cloudy days when tree shadows appear and disappear.

jbeale avatar

My limited understanding of the existing ‘motion’ code is that it does not detect motion at all, but simply a localized difference from the long-term average pixel values. That’s a computationally easy problem, and why it will trigger on scene lighting changes (cloud moving over sun) as well as real motion. Actually detecting if something has moved is a big advance.

Gordon avatar

Ah OK, fair enough, still require’s you to touch lots more data though… Like you say putting this together with some interesting tracking stuff would be cool.

One of the things I wanted to do was to use it for triggering my digital camera to take photo’s of sports (moutainbikers coming into the picture very quickly and then triggering the camera when the motion is tracked into a target area)

Gordon

Gordon avatar

Well, it would be interesting to see how it works out, there are some stats that you can adjust to change the output of the CME although this may give you worse H264 encode performance…

Currently shorter vectors are preferred over longer ones (they take less bits to encode) so by manipulating those numbers it may be possible to make it output less ‘thresholded’ vectors.

I did want to put together a bit of image segmentation software on top of this and then some object tracking (something I did a little of when I worked for the Ministry of Defence a long time ago, not sure this’ll be used in the same way though!)

simon avatar

I am told a bee’s flight-control uses visual velocity vectors to avoid hitting things.

Velocity vectors is roughly the raw output of our eyes anyway. Does it by comparing the rise/fall of intensity between a pair of cells. In this way we can see a moving line.

For landing it squares the velocities left/right+up/down in its central vision; so correcting for drift. Then slows by keeping the vectors even speed for final approach. It should see an even rosette. As the LZ gets nearer it will apparently rush towards it, so the bee’s control loop reduces groundspeed to keep it happy. Et voila a perfect landing.

Presumably this is within the ability of a RPi? Work better for automated flight control?

Lee avatar

so would work wonderfully as an additional PID input for a quadcopter?

Will avatar

Club juggling, clearly. Good work

Bantammenace avatar

Thinking of this in reverse. If I am in a room and as I walk around I video the walls and other staionary objects within the room would this functionality allow me to compute the location and orientation of the camera wrt. the atationary objects ?
alternatively, might the Pi-noir and a suitable IR light source be used like the kinect IR sensor to create a moving depth camera ?

tzj avatar

Mmm… could this theoretically work with a ‘fibre-optic setup’ to detect changes and caculate an object(s) in 3d space?

By ‘fiber-optic setup’ i mean, a matrix of fibres against the camara/sensor and then the oposite ends dotted around a 3d space.

ColinD avatar

I watched that video for an hour and he never dropped those balls once…wait… you’re telling me the animgif is on a loop?

It’s very impressive – I can see I’m going to try to implement this in CatSec, my in progress cat detecting camera.

Haggishunter avatar

ColinD, you should know by now: Beware of geeks bearing gifs!

Ravelo avatar

This must be Liz doing a seventies dance.

Andy Crofts avatar

Everyone has the video wrong.
Simples.
It’s some worker at the Swansea Raspberry Pi production line, taking some freshly-baked Pi’s out of the reflow oven..

Gavin Greig avatar

Assuming those are balls being juggled, interesting to see a bit of an effect in the image. They look most compact at the crest of their path, where their velocity’s low, but as they rise or fall at greater speed they develop a tail. Could be an issue for some applications. (Not knocking the impressive achievement at all, which is a way bigger deal than what I’m pointing out! Just thought it would be worth noting.)

Niall avatar

Hmmmm . . . . .

Thinks, “What if this could be used to track a star that ixists in the field of an astronomy image? The main image is being exposed for, say, ten minutes. The RasPi and PiCam is looking through the same optical train (common enough in astro-imaging), but the RasPi sees the monitored star ‘shimmering’ – due to atmospheric distortion, maybe even due to mount and/or tracking defects. The RaspPi then sends the information obtained from the motion detection to a pair of piezo-electric transducers that are attached to the main imaging chip, causing the chip to compensate for the atmospheric turbulence. Result? A much clearer image!”
Can you let me know when I can pre-order this unit?

Cheers,
Niall Saunders
Clinterty Observatories
Aberdeen, SCOTLAND

Halsenhauser avatar

As many people said before: you’re juggling. But I say: you’re juggling with Raspberry Pi’s ;)

Linda avatar

Anyone tried gesture recognition on raspberry pi yet? It is an extension of the coarse motion detection. Be fun. Will try one day.

David Palmer avatar

(Apologies if a dupe. WordPress blah blah blah)

This will be great for optical flow algorithms.

When you have a moving platform, e.g., a quadcopter, you can tell a lot of things. Attitude changes cause a shift of the whole image. Take that out and your direction of motion is towards the null of the optical flow. How fast you are approaching what’s in front of you is given by how fast it is looming (optical flow directly away from the null point). How far away obstacles are to the side is given by how fast the slide by (power poles zip past the car faster than mountains.)

This is how biologicals do it.

For astronomy, looking at stars, moving points can probably be tracked more quickly and accurately with specialized algorithms. Most 16×16 blocks will not have a bright enough star in them, and many stars will be split across block boundaries. But it may be a good place to start.

Robert M avatar

The video appears to be what you do with hamsters when they’re too tired to do the dancing by themselves.

dratcliff avatar

I think in exploring my new Pi this last week I have run
across this video played in different speeds and
bowling pins are being tossed.

pk avatar

So this can be used for home security camera with picam and only send notification if detects motion in an otherwise empty room? Has anyone done that or is there a githum project for it?

Thank you for all the whole team does at RPI.

pk

McPeppr avatar

Unfortunately it’s a disappointing 404 by Nigel Whyte

Yggdrasil avatar

Was the official raspivid already updated? I’ve checked https://github.com/raspberrypi/userland but found no changes.

Steve Gulick avatar

Hi can’t locate the raspivid with a -x option for the motion vectors. Please help. Thank you

McPeppr avatar

raspivid –help
slowmotion with motion vectors
raspivid -w 640 -h 480 -fps 90 -t 10000 -o test90fps.h264 -x test90fps.vec
Don’t forget to update
uname -r
>> 3.12.25+
sudo rpi-update

Steve Gulick avatar

The code please? Thanks!

Quique avatar

What I find remarkable is the human ability to detect juggling from so little information

Jonathan Chetwynd avatar

yesterday seems so far away,
jam, jam, jam.

~:”

Clive Harris avatar

Please consider providing the ability to link this hardware accelerator with existing python opencv.
For ideas on where such a link could prove useful, see the optical flow example.
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade.html#lucas-kanade

McPeppr avatar

Very nice Link! I was working in the area of optical flow in my master thesis (optical flow was part of). I was investigating car-following machanisms to detect driver distraction.
This Weblink and containing information would have done a great addition.
An interesting information on optical flow is the derivation to dedect acceleration. One of the important scientists in this are is David Lee.

Steve Gulick avatar

Looking forward to trying the motion vector extraction code. If it can’t be pushed yet to raspivid, maybe a “raspividX”?
Providing access to the motion vectors via gpu will be so useful in many applications and I am sure a feature unique to the Raspberry Pi platform. Thank you for your work, Gordon!

David Allen avatar

Could the extracted motion vectors be used to create statistical models of human movement over time? If so, would they be detailed enough to identify distinct patterns? I’m a cognitive scientist studying motivation and have wondered for some time whether motion could be linked to motivation (in some situations), but so far I haven’t found anything that could record the necessary quantitative data without being intrusive or too expensive.

McPeppr avatar

Hi, in my opinion you will need to be intrusive to get the training set for one kind of algorithm. Once you have a set of movement pattern and the motivation, you might be able to extrapolate for new movement patterns (with an algorithm I cannot provide here)
Don’t forget: you shall not mix records of training patterns and test patterns because then test pattern will of course proove a match – your professor will know the trick.

McPeppr avatar

Hey there, is anyone here to provide a way to read the “raspivid -x” output file? I am confronted with a file of pure bytes – not human-readable.
There was Gordon to tell about the data:

struct motion_vector {
short sad;
char y_vector;
char x_vector;
}

And one step further…
can we read out and process a continuing stream of vector data?

James Hughes avatar

This sort of question is better asked on the forums…and may already be answered there.

Comments are closed