Jeff Geerling’s Petabyte Pi

Jeff Geerling, YouTube human man, has a special place in our hearts for his enthusiasm for doing extremely difficult stuff to absolutely no practical purpose. In today’s example of the genre, he is setting up 60 hard drives – 1.2 petabytes altogether – as a RAID 0 array, all driven by one Raspberry Pi Compute Module 4 (CM4). For reasons.

Living on a Prayer

(RAID 0 is the weird one which is not actually redundant – which is what the “R” is supposed to stand for. Tortured discussion on what constitutes a Redundant Array of Independent Disks below, please.)

For this proof-of-concept, the enterprise-grade kit in the rack that Jeff is replacing with a humble CM4 comprises: a 26-core Xeon CPU, a SuperMicro server motherboard with 7 (count ’em) PCI Express slots, a dual 10 gigabit Ethernet card and 256 GB RAM. As Jeff says, the chip in the Raspberry Pi was never meant for this kind of thing. Our PCIe bus has one lane. It’s supposed to interface with one hard drive. And very definitely not 60 of them. Many other features of the Compute Module were really, really (8 GB RAM!) never designed to do anything like this. The Ethernet only serves up 1 Gbps – more than enough for all of you who aren’t trying to talk to 60 hard drives, not sufficient for grandiose showing-off of this sort.

We do not make a Raspberry Pi with 26 cores.

Humans are not really equipped to grok very large numbers, so here’s a real-life example. A petabyte is really, really big. People requiring this amount of storage are very few. One organisation that does need this much is Facebook, which stores 10 billion user photos, adding up to about 1.8 PB of storage space. So 1.2 PB is…a lot. It’s especially a lot if you’re hosting it on one CM4.

Jeff did, of course, meet bottlenecks and some really weird bugs and hiccups. But he actually got this thing up and running, which, frankly, the laws of space and time should not permit. “I took the Pi to the bleeding edge, and it started bleeding out.” Darn straight, this is an awful idea – don’t try this at home, folks. This is so very not the thing we designed a Raspberry Pi to do. Still cool, though.


JJSploit avatar

I’ve always admired Jeff Geerling for his contribution to the community. Totally loved this write-up.

Henrik avatar

“Facebook, which stores 10 billion user photos, adding up to about 1.8 PB“
I think this is from 2008. I can’t find newer data, but I’m pretty sure this is somewhat outdated, given that Facebook has 2-3 billion monthly users…I image they store closer to an exabyte or so of data.

Soeren avatar

Interesting project. But now raid 0 is for those who are quite indifferent to data security, so I would prefer for example raid 5 etc ..

Jeff Geerling avatar

This particular build is more about the “can it” rather than the “should it”.

I’ve been working on other builds that use 4-6 HDDs in RAID 5 using either Btrfs, ZFS, or mdadm RAID, and that actually works well enough on a Pi (at least for gigabit networking), if you have a decent SATA controller with it. But beyond a few drives, it starts to get slower… I’ll have more in a follow-up video soon!

Paul avatar

This reminds me of when I compiled the network block device server for Windows NT and used it to export the floppy drives from about 20 desktops and combined them into a RAID 5 array on a linux box. It worked surprisingly well until someone ejected one of the floppys! Oddly enough the error handling didn’t cope and it fell over in a heap. Good fun though. Doing things because “you can” is the best way to learn stuff.

Shishu avatar

Finally someone reached the Pi-tabyte capacity.

Some Guy avatar

Clearly doing XCH.

Jacob D Carroll avatar

Beautiful! Id love to set up a RPI gaming server that can be accessed by those with approved Mac Addresses. I have two RPI 4b’s and loads of old computers. Just cannibalized a ssd out of a laptop with a shot GPU. Oh the fun

Cody A Terry avatar

The more important question here….can it run Crysis? 😋

Budiarno avatar

It’s for fun for sure. 👍

Comments are closed