Why would you build a physical cluster? Today you can go to Amazon, or Digital Ocean, or any of the other cloud providers, and spin up a virtual machine in seconds. But the cloud is just someone else’s computers: a Raspberry Pi cluster is a low-cost, versatile system you can use for all kinds of clustered-computing related technologies, and you have total control over the machines that constitute it. Building something from the ground up can teach you lessons you can’t learn elsewhere.
We’re going to put together an eight-node cluster connected to a single managed switch. One of the nodes will be the so-called “head” node: this node will have a second Gigabit Ethernet connection out to the LAN/WAN via a USB3 Ethernet dongle, and an external 1TB SSD mounted via a USB3-to-SATA connector. While the head node will boot from an SD card as normal, the other seven nodes — the “compute” nodes — will be configured to network boot, with the head node acting as the boot server and the OS images being stored on the external disk. As well as serving as the network boot volume, the 1TB disk will also host a scratch partition that is shared to all the compute nodes in the cluster.
All eight of our Raspberry Pi boards will have a Raspberry Pi PoE+ HAT attached. This means that, since we’re using a PoE+ enabled switch, we only need to run a single Ethernet cable to each of our nodes and don’t need a separate USB hub to power them.
The list of parts you’ll need to put together a Raspberry Pi cluster — sometimes known as a “bramble” — can be short, or it can be quite long, depending on what size and type of cluster you intend to build. So it’s important to think about what you want the cluster to do before you start ordering the parts to put it together. The list above is what we used for our eight-Pi cluster, but your requirements might well be different.
What you will need is a full bramble of Raspberry Pi computers, and if you’re intending to power them over PoE as we are, you’ll need a corresponding number of Raspberry Pi PoE+ HAT boards and an appropriate PoE+ switch. Beyond that, however, you’ll need a micro SD card, some Ethernet cables, a USB to Ethernet adapter, a USB to SATA adapter cable along with an appropriately sized SSD drive, and some sort of case to put all the components into after you’ve bought them. The case can either be a custom-designed “cluster case” or, perhaps, something rack-mountable depending on what you’re thinking of doing with the cluster after you’ve built it.
There is however a lot of leeway in choosing your components, depending on exactly what you’re setting up your cluster to do. For instance, depending on the sorts of jobs you’re anticipating running across the cluster, you might be able to get away with using cheaper 2GB or 1GB boards rather than the 4GB model I used. Alternatively, having a local disk present on each node might be important, so you might need to think about attaching a disk to each board to provide local storage.
However, perhaps the biggest choice when you’re thinking about building a cluster is how you’re going to power the nodes. We used PoE for this cluster, which involved adding a PoE+ HAT board to each node and purchasing a more expensive switch capable of powering our Raspberry Pi boards: for larger clusters, this is probably the best approach. For smaller clusters, you could instead think about powering the nodes from a USB hub, or for the smallest clusters — perhaps four nodes or fewer — powering each node directly from an individual power supply.
If you decide to power your cluster using PoE, you’ll find you may have to make up some franken-cables. For instance, the fans at the back of the case I’m using were intended to connect to the GPIO header block on the Raspberry Pi, but since we’re using the Raspberry Pi PoE+ HAT to power our nodes, we don’t have access to the GPIO headers.
Therefore, for me at least, it’s time to grab some donor USB cables and make up some cables. If you snip the end from a USB cable and peel back the plastic you’ll find four wires; these will often be inside an insulating metal sheath. The wires inside the cable are small and delicate, so carefully strip back the cover if present. You’re looking for the red (+5V) and black (GND) wires. The other two, normally coloured white and green, carry data. You can just cut these data wires off; you won’t need them.
Solder the red and black wires from the fan to the red and black wires in the USB cable. The best thing to do here is to use a bit of heat-shrink tubing over each of the individual solder connections, and then use a bigger bit of heat-shrink over both of the soldered connectors. This will give an electrically insulated, and mechanically secure, connection between the fan and the USB plug end of the new cable.
The cluster case I’m using has four fans, mounted at the rear. I’m going to be powering the left-hand two from the head node, or potentially from the first compute node on the left if I need more USB sockets on the head node, and the right-hand two from the right-most compute node.
The most common case where you’ll need Franken-cables is probably this one — powering a fan over USB due to lack of access to the GPIO header. But there are other reasons you might need them. For instance, for a cluster I built a few years back, I needed to put together a cable to power an Ethernet switch from a USB hub, rather than from +5V power supply unit.
We’re going to bring up the head node from an SD card. The easiest, and recommended, way to install Raspberry Pi OS is to use Raspberry Pi Imager. So go ahead and install Imager on your laptop, and then grab a microSD card (minimum 16GB) and an adapter if you need one, and start the installation process.
Click on the “CHOOSE OS” button and select “Raspberry Pi OS (other)” and then “Raspberry Pi OS Lite (32-bit)”. Then click on “CHOOSE STORAGE” and select your SD card from the drop-down list.
Next hit Ctrl-Shift-X
, or click on the Cog Wheel which appeared after you selected your OS, to open the “Advanced” menu. This will let you set the hostname (I went with “cluster”), as well as enable the SSH server and set up the default user — I went with “pi” for simplicity — along with configuring the wireless interface so your head node will pop up on your home LAN.
Afterwards, click on the “SAVE” button and then the “WRITE” button to write your operating system to the card.
The exact way you plug things together is going to depend on your cluster components and whether you picked up a case, or more likely what sort of case you have. I’m going to slot my head node into the far left-hand side of my case. This lets me mount the SSD drive against one wall of the case using a mounting screw to secure it in place.
We configured the head node to know about our local wireless network during setup, so we should just be able to ssh directly into the head node using the name we gave it during setup:
$ ssh pi@cluster.local
pi@cluster.local's password:
$
If we take a look at the network configuration
$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.253.7 netmask 255.255.0.0 broadcast 169.254.255.255
inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet)
RX packets 15 bytes 2150 (2.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 29 bytes 4880 (4.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 14 bytes 1776 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 1776 (1.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet)
RX packets 81 bytes 12704 (12.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 66 bytes 11840 (11.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$
you can see that wlan0
is connected to our local network with a 192.168.*
address, while eth0
which we’ve plugged into our switch has a self-assigned 169.245.*
address. We get this self-assigned IP address because the PoE switch I’m using here is a managed switch, rather than a dumb switch. We’ll resolve this later in the project by turning our head node into a DHCP server that will assign an IP address to each of the compute nodes, as well as to our smart switch.
We’ve been able to reach our head node over the network because we configured our wireless interface wlan0
when we set up our SD card. However, it would be good to hardwire our cluster to the network rather than rely on wireless, because we might want to transfer large files back and forth, and wired interfaces are a lot more stable.
To do that we’re going to need an additional Ethernet connection, so I’m going to add a USB 3-to-Gigabit Ethernet adaptor to the head node. We’ll leave the onboard Ethernet socket (eth0
) connected to our PoE switch to serve as the internal connection to the cluster, while we use the second Ethernet connection (eth1
) to talk to the outside world.
We’ll therefore configure eth1
to pick up an IP address from our LAN’s DHCP server. Go ahead and create a new file called /etc/network/interfaces.d/eth1
which should like this:
auto eth1
allow-hotplug eth1
iface eth1 inet dhcp
We’ll leave eth0
, the onboard Ethernet socket, connected to the Ethernet switch to serve as the internal connection to the cluster. Internally we’ll allocate 192.168.50.*
addresses to the cluster, with our head node having the IP address 192.168.50.1
.
Create a new file called /etc/network/interfaces.d/eth0
which, this time, should like this:
auto eth0
allow-hotplug eth0
iface eth0 inet static
address 192.168.50.1
netmask 255.255.255.0
network 192.168.50.0
broadcast 192.168.50.255
Afterwards, reboot. Then, if everything has gone to plan, you should see something like this:
$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.50.1 netmask 255.255.255.0 broadcast 192.168.50.255
inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet)
RX packets 14 bytes 840 (840.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 37 bytes 5360 (5.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.166 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::9350:f7d2:8ccd:151f prefixlen 64 scopeid 0x20<link>
ether 00:e0:4c:68:1d:da txqueuelen 1000 (Ethernet)
RX packets 164 bytes 26413 (25.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 95 bytes 15073 (14.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 14 bytes 1776 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 1776 (1.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet)
RX packets 120 bytes 22780 (22.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 38 bytes 5329 (5.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$
Now we have a “second” Gigabit Ethernet connection out to the world via eth1
, and our onboard Ethernet is configured with a static IP address, it’s time to make our Raspberry Pi into a DHCP server for our cluster on eth0
.
Start by installing the DHCP server itself
$ sudo apt install isc-dhcp-server
and then edit the /etc/dhcp/dhcpd.conf
file as follows:
ddns-update-style none;
authoritative;
log-facility local7;
# No service will be given on this subnet
subnet 192.168.1.0 netmask 255.255.255.0 {
}
# The internal cluster network
group {
option broadcast-address 192.168.50.255;
option routers 192.168.50.1;
default-lease-time 600;
max-lease-time 7200;
option domain-name "cluster";
option domain-name-servers 8.8.8.8, 8.8.4.4;
subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;
# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}
}
}
Then edit the /etc/default/isc-dhcp-server
file to reflect our new server setup
DHCPDv4_CONF=/etc/dhcp/dhcpd.conf
DHCPDv4_PID=/var/run/dhcpd.pid
INTERFACESv4="eth0"
as well as the /etc/hosts
file
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.1.1 cluster
192.168.50.1 cluster
and then you can reboot the head node to start the DHCP service.
We’ve set things up so that known hosts that aren’t known are allocated an IP address starting from 192.168.50.20
. Once we know the MAC addresses of our compute nodes we can add them to the /etc/dhcp/dhcpd.conf
file so they grab static IP addresses going forward rather than getting a random one as they come up.
Logging back into your head node after the reboot if you have a managed switch for your cluster, like the NETGEAR switch I’m using which will grab an IP address of its own, you can check your DHCP service is working.
$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
==================================================================================
80:cc:9c:94:53:35 192.168.50.20 GS308EPP 2021-12-06 14:19:52 NETGEAR
$
Otherwise, you’ll have to wait until you add your first node as unmanaged switches won’t request their own address.
However, if you do have a managed switch, you might well want to give it a static IP address inside the cluster by adding one to the /etc/dhcp/dhcpd.conf
and /etc/hosts
files in a similar fashion to the head node. I went with switch
as the hostname,
192.168.50.1 cluster
192.168.50.254 switch
and 192.168.50.254
as the allocated IP address.
subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;
# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}
# NETGEAR Switch
host switch {
hardware ethernet 80:cc:9c:94:53:35;
fixed-address 192.168.50.254;
}
}
If we’re going to network boot our compute nodes, we’re going to need a bit more space. You could do this by plugging a flash stick into one of the USB ports on the head node, but I’m going to use a USB 3 to SATA Adaptor Cable to attach a 1TB SSD that I had on the shelf in the lab to give the cluster plenty of space for data.
Plugging the disk into one of the USB 3 sockets on the head node I’m going to format it with a GUID partition table, and a creat single ext4
partition on the disk.
$ sudo parted -s /dev/sda mklabel gpt
$ sudo parted --a optimal /dev/sda mkpart primary ext4 0% 100%
$ sudo mkfs -t ext4 /dev/sda1
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 244175218 4k blocks and 61046784 inodes
Filesystem UUID: 1a312035-ffdb-4c2b-9149-c975461de8f2
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
$
We can then mount the disk manually to check everything is okay,
$ sudo mkdir /mnt/usb
$ sudo mount /dev/sda1 /mnt/usb
and then make sure it will automatically mount on boot by adding the following to the /etc/fstab
file.
/dev/sda1 /mnt/usb auto defaults,user 0 1
You should ensure that you can mount the disk manually before rebooting, as adding it as an entry in the /etc/fstab
file might cause the Raspberry Pi to hang during boot if the disk isn’t available.
We’re going to want to make the disk available across the cluster. You’ll need to install the NFS server software,
$ sudo apt install nfs-kernel-server
create a mount point which we can share,
$ sudo mkdir /mnt/usb/scratch
$ sudo chown pi:pi /mnt/usb/scratch
$ sudo ln -s /mnt/usb/scratch /scratch
and then edit the /etc/exports
file to add a list of IP addresses from which you want to be able to mount your disk.
/mnt/usb/scratch 192.168.50.0/24(rw,sync)
Here we’re exporting it to 192.168.50.0/24
which is shorthand for “…all the IP addresses between 192.168.50.0
and 192.168.50.254
.”
After doing this you should enable, and then start, both the rpcbind
and nfs-server
services,
$ sudo systemctl enable rpcbind.service
$ sudo systemctl start rpcbind.service
$ sudo systemctl enable nfs-server.service
$ sudo systemctl start nfs-server.service
and then reboot.
$ sudo reboot
We’re going to set up our compute node to network boot from our head node. To do that we’re first going to have to configure our nodes for network boot. How to do this is different between Raspberry Pi models. However, for Raspberry Pi 4 the board will need to be booted a single time from an SD card and the boot order configured using the raspi-config
command-line tool.
The easiest way to proceed is to use the Raspberry Pi Imager software to burn a second SD card with Raspberry Pi OS Lite (32-bit). There isn’t any need to specially configure this installation before booting the board as we did for the head node, except to enable SSH.
Next boot the board attached to the cluster switch.
The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list
.
$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
===============================================================================================
dc:a6:32:6a:16:87 192.168.50.21 raspberrypi 2021-12-07 11:54:29 Raspberry Pi Ltd
$
We can now go ahead and SSH into the new board and enable network booting using raspi-config
from the command line.
$ ssh pi@192.168.50.21
$ sudo raspi-config
Choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.
If you get an error when trying to enable network boot complaining that “No EEPROM bin file found” then you need to update the firmware on your Raspberry Pi before proceeding. You should do this,
$ sudo apt install rpi-eeprom
$ sudo rpi-eeprom-update -d -a
$ sudo reboot
and then after the node comes back up from its reboot, try to set up network boot once again.
Once the Raspberry Pi has rebooted, check that the boot order using vcgencmd
,
$ vcgencmd bootloader_config
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
[all]
BOOT_ORDER=0xf21
$
which should now show that the BOOT_ORDER
is 0xf21
which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.
$ ethtool -P eth0
Permanent address: dc:a6:32:6a:16:87
$ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16
6a5ef8b0
$
Afterwards, you can shut down the board, at least for now, and remove the SD card.
We now need to configure our head node to act as a boot server. There are several options here, but we’re going to use our existing DHCP server, along with a standalone TFTP server. You should create a mount point for the server, and install it:
$ sudo apt install tftpd-hpa
$ sudo apt install kpartx
$ sudo mkdir /mnt/usb/tftpboot
$ sudo chown tftp:tftp /mnt/usb/tftpboot
edit the /etc/default/tftpd-hpa
file:
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/mnt/usb/tftpboot"
TFTP_ADDRESS=":69"
TFTP_OPTIONS="--secure --create"
and restart the service.
$ sudo systemctl restart tftpd-hpa
We then need to set up our boot image, and we’re going to need to create one image per client. The first step is to grab the latest image from the web and mount it so we can make some changes, and then mount the partitions inside the image so we can copy the contents to our external disk.
$ sudo su
# mkdir /tmp/image
# cd /tmp/image
# wget -O raspbian_lite_latest.zip https://downloads.raspberrypi.org/raspbian_lite_latest
# unzip raspbian_lite_latest.zip
# rm raspbian_lite_latest.zip
# kpartx -a -v *.img
# mkdir bootmnt
# mkdir rootmnt
# mount /dev/mapper/loop0p1 bootmnt/
# mount /dev/mapper/loop0p2 rootmnt/
# mkdir -p /mnt/usb/rpi1
# mkdir -p /mnt/usb/tftpboot/6a5ef8b0
# cp -a rootmnt/* /mnt/usb/rpi1
# cp -a bootmnt/* /mnt/usb/rpi1/boot
Afterwards, we can customise the root file system:
# touch /mnt/usb/rpi1/boot/ssh
# sed -i /UUID/d /mnt/usb/rpi1/etc/fstab
# echo "192.168.50.1:/mnt/usb/tftpboot /boot nfs defaults,vers=4.1,proto=tcp 0 0" >> /mnt/usb/rpi1/etc/fstab
# echo "console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi1,vers=4.1,proto=tcp rw ip=dhcp rootwait" > /mnt/usb/rpi1/boot/cmdline.txt
add it to the /etc/fstab
and /etc/exports
files on the head node:
# echo "/mnt/usb/rpi1/boot /mnt/usb/tftpboot/6a5ef8b0 none defaults,bind 0 0" >> /etc/fstab
# echo "/mnt/usb/rpi1 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
and then clean up after ourselves.
# systemctl restart rpcbind
# systemctl restart nfs-server
# umount bootmnt/
# umount rootmnt/
# cd /tmp; rm -rf image
# exit
$
Finally, we need to edit the /etc/dhcp/dhcpd.conf
file as follows:
ddns-update-style none;
authoritative;
log-facility local7;
option option-43 code 43 = text;
option option-66 code 66 = text;
# No service will be given on this subnet
subnet 192.168.1.0 netmask 255.255.255.0 {
}
# The internal cluster network
group {
option broadcast-address 192.168.50.255;
option routers 192.168.50.1;
default-lease-time 600;
max-lease-time 7200;
option domain-name "cluster";
option domain-name-servers 8.8.8.8, 8.8.4.4;
subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;
# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}
# NETGEAR Switch
host switch {
hardware ethernet 80:cc:9c:94:53:35;
fixed-address 192.168.50.254;
}
host rpi1 {
option root-path "/mnt/usb/tftpboot/";
hardware ethernet dc:a6:32:6a:16:87;
option option-43 "Raspberry Pi Boot";
option option-66 "192.168.50.1";
next-server 192.168.50.1;
fixed-address 192.168.50.11;
option host-name "rpi1";
}
}
}
and reboot our Raspberry Pi.
$ sudo reboot
Make sure you’ve removed the SD card from the compute node, and plug the Raspberry Pi back into your switch. If you’ve got a spare monitor handy it might be a good idea to plug it into the HDMI port so you can watch the diagnostics screen as the node boots.
If all goes to plan the board should boot up without incident. Although there are a few things we will need to tidy up, you should now be able to SSH directly into the compute node.
$ ssh 192.168.50.11
pi@192.168.50.11's password:
$
If you were watching the boot messages on a monitor, or if you check in the logs, you can see that our image didn’t come up entirely cleanly. If you log back into the compute node you can make sure that doesn’t happen in future by turning off the feature where the Raspberry Pi tries to resize its filesystem on the first boot, and also by uninstalling the swap daemon.
$ sudo systemctl disable resize2fs_once.service
$ sudo apt remove dphys-swapfile
Next, we can make things slightly easier on ourselves, so that we don’t have to use the IP address of our compute and head nodes every time, by adding our current and future compute nodes to the /etc/hosts
file on both our head and compute nodes.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.1.1 cluster
192.168.50.1 cluster
192.168.50.254 switch
192.168.50.11 rpi1
192.168.50.12 rpi2
192.168.50.13 rpi3
192.168.50.14 rpi4
192.168.50.15 rpi5
192.168.50.16 rpi6
192.168.50.17 rpi7
Finally, we should change the hostname from the default raspberrypi
to rpi1
using the raspi-config
command-line tool.
$ sudo raspi-config
Select “Network Options,” then “Hostname” to change the hostname of the compute node, and select “Yes” to reboot.
Normally if we were mounting a network disk we’d make use autofs
rather than adding it as an entry directly into the /etc/fstab
file. However here, with our entire root filesystem mounted via the network, that seems like unnecessary effort.
After it reboots log back into your compute node, add a mount point:
$ sudo mkdir /scratch
$ sudo chown pi:pi scratch
and edit the /etc/fstab
file there to add the scratch disk.
192.168.50.1:/mnt/usb/scratch /scratch nfs defaults 0 0
Then reboot the compute node.
$ sudo reboot
It’s going to get pretty tiresome secure-shelling between the cluster head node and the compute nodes and having to type your password each time. So let’s enable secure shell without a password by generating a public/private key pair.
On the compute node you should edit the /etc/ssh/sshd_config
file to enable public key login:
PubkeyAuthentication yes
PasswordAuthentication yes
PermitEmptyPasswords no
and then restart the sshd
server.
$ sudo systemctl restart ssh
Then going back to the head node we need to generate our public/private key pair and distribute the public key to the compute node. Use a blank passphrase when asked.
$ ssh-keygen -t rsa -b 4096 -C "pi@cluster"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pi/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pi/.ssh/id_rsa
Your public key has been saved in /home/pi/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:XdaHog/sAf1QbFiZj7sS9kkFhCJU9tLN0yt8OvZ52gA pi@cluster
The key's randomart image is:
+---[RSA 4096]----+
| ...o *+o |
| ...+o+*o . |
| .o.=.B++ .|
| = B.ooo |
| S * Eoo |
| .o+o= |
| ..+=o. |
| ..+o +.|
| . +o.|
+----[SHA256]-----+
$ ssh-copy-id -i /home/pi/.ssh/id_rsa.pub pi@rpi1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/pi/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
pi@rpi1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'pi@rpi1'"
and check to make sure that only the key(s) you wanted were added.
$
Afterwards, you should be able to login to the compute node without having to type your password.
One thing our compute node doesn’t have right now is access to the LAN. Right now the compute node can only see the head node and eventually, once we add them, the rest of the compute nodes. But we can fix that! On the head node go and edit the /etc/sysctl.conf
file by uncommenting the line saying,
net.ipv4.ip_forward=1
After activating forwarding we’ll need to configure iptables
:
$ apt install iptables
$ sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
$ sudo iptables -A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
$ sudo iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT
$ sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"
and then add a line — just above the exit 0
line — in the /etc/rc.local
file a line to load the tables on boot:
_IP=$(hostname -I) || true
if [ "$_IP" ]; then
printf "My IP address is %s\n" "$_IP"
fi
iptables-restore < /etc/iptables.ipv4.nat
exit 0
and reboot.
$ sudo reboot
Note that if you still have the compute node running, you should log on to that first and shut it down, as the root filesystem for that lives on a disk attached to our head node.
Adding the rest of the compute nodes is going to be much more straightforward than adding our first node as we can now use our customised image and avoid some of the heavy lifting we did for the first compute node.
Go ahead and grab your SD card again and boot your next Raspberry Pi attached to the cluster switch.
The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list
.
$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
===============================================================================================
dc:a6:32:6a:15:e2 192.168.50.21 raspberrypi 2021-12-08 21:15:00 Raspberry Pi Ltd
$
We can now go ahead and SSH into the new board and again enable network booting for this board using raspi-config
from the command line:
$ rm /home/pi/.ssh/known_hosts
$ ssh pi@129.168.50.21
$ sudo raspi-config
choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.
Once the Raspberry Pi has rebooted, check the boot order using vcgencmd
:
$ vcgencmd bootloader_config
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
[all]
BOOT_ORDER=0xf21
$
which should now show that the BOOT_ORDER
is 0xf21
which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.
$ ethtool -P eth0
Permanent address: dc:a6:32:6a:15:e2
$ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16
54e91338
$
Afterwards, you can shut down the board, at least for now, and remove the SD card.
Moving back to our head node we can use our already configured image as the basis of the operating system for the next compute node.
$ sudo su
# mkdir -p /mnt/usb/rpi2
# cp -a /mnt/usb/rpi1/* /mnt/usb/rpi2
# mkdir -p /mnt/usb/tftpboot/54e91338
# echo "/mnt/usb/rpi2/boot /mnt/usb/tftpboot/54e91338 none defaults,bind 0 0" >> /etc/fstab
# echo "/mnt/usb/rpi2 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
# exit
$
Then we need to edit the /mnt/usb/rpi2/boot/cmdline.txt
, replacing “rpi1
” with “rpi2
“:
console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi2,vers=4.1,proto=tcp rw ip=dhcp rootwait
and similarly for /mnt/usb/rpi2/etc/hostname
.
rpi2
Finally, we need to edit the /etc/dhcp/dhcpd.conf
file on the head node:
host rpi2 {
option root-path "/mnt/usb/tftpboot/";
hardware ethernet dc:a6:32:6a:15:e2;
option option-43 "Raspberry Pi Boot";
option option-66 "192.168.50.1";
next-server 192.168.50.1;
fixed-address 192.168.50.12;
option host-name "rpi2";
}
and reboot our head node.
$ sudo reboot
Afterwards, you should see both rpi1
and rpi2
are up and running. If you’re interested, we can get a better look at our cluster network by installing nmap
on the head node.
$ sudo apt install nmap
$ nmap 192.168.50.0/24
Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-09 11:40 GMT
Nmap scan report for cluster (192.168.50.1)
Host is up (0.0018s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
2049/tcp open nfs
Nmap scan report for rpi1 (192.168.50.11)
Host is up (0.0017s latency).
Not shown: 999 closed ports
PORT STATE SERVICE
22/tcp open ssh
Nmap scan report for rpi2 (192.168.50.12)
Host is up (0.00047s latency).
Not shown: 999 closed ports
PORT STATE SERVICE
22/tcp open ssh
Nmap scan report for switch (192.168.50.254)
Host is up (0.014s latency).
Not shown: 999 filtered ports
PORT STATE SERVICE
80/tcp open http
Nmap done: 256 IP addresses (4 hosts up) scanned in 6.91 seconds
$
Adding the remaining five compute nodes is now more or less a mechanical process. You’ll need to follow the process we went through for rpi2
for rpi3
, rpi4
, rpi5
, rpi6
, and rpi7
. Substituting the appropriate MAC address, serial number, and hostname for each of the new compute nodes.
Hostname | MAC Address | Serial Number |
rpi1 | dc:a6:32:6a:16:87 | 6a5ef8b0 |
rpi2 | dc:a6:32:6a:15:e2 | 54e91338 |
rpi3 | dc:a6:32:6a:15:16 | 6124b5e4 |
rpi4 | dc:a6:32:6a:15:55 | 52cddb85 |
rpi5 | dc:a6:32:6a:16:1b | a0f55410 |
rpi6 | dc:a6:32:6a:15:bb | c5fb02d3 |
rpi7 | dc:a6:32:6a:15:4f | f57fbb98 |
When bringing the last compute node up I also went ahead and plugged the two remaining franken-cables into the final node to power the right-most fans in my case.
Now we have all our nodes up and running, we need some cluster control tools. One of my favourites is the parallel-ssh
toolkit. You can install this on the head node from the command line,
$ apt install pssh
and, along with the excellent Python library allowing you to build your own cluster automation, this will install a number of command-line tools; parallel-ssh
, parallel-scp
, parallel-rsync
, parallel-slurp
, and parallel-nuke
. These tools can help you run and control jobs, and move and copy files, between the head node and the compute nodes.
To use the command line tools you’ll need to create a hosts file listing all the compute nodes, I saved mine as .ppsh_hosts
in my home directory.
$ cat .pssh_hosts
rpi1
rpi2
rpi3
rpi4
rpi5
rpi6
rpi7
$
After creating the file we can use the command line tools to, amongst other things, execute a command on all seven of our compute nodes.
$ parallel-ssh -i -h .pssh_hosts free -h
[1] 12:10:15 [SUCCESS] rpi4
total used free shared buff/cache available
Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[2] 12:10:15 [SUCCESS] rpi1
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[3] 12:10:15 [SUCCESS] rpi2
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[4] 12:10:15 [SUCCESS] rpi7
total used free shared buff/cache available
Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 97Mi 3.6Gi
Swap: 0B 0B 0B
[5] 12:10:15 [SUCCESS] rpi3
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 16Mi 104Mi 3.6Gi
Swap: 0B 0B 0B
[6] 12:10:15 [SUCCESS] rpi5
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 16Mi 72Mi 3.6Gi
Swap: 0B 0B 0B
[7] 12:10:15 [SUCCESS] rpi6
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
$
Although you should take note that the results will come back in a random order depending on how quickly the command was executed on each of the compute nodes.
While parallel-ssh
is a great tool to allow you to deploy software and do other tasks across your cluster, sometimes you just want to shut the cluster down cleanly with a single command. There are a bunch of ways you can approach this, the simplest is just to write a shell script to login to each of the compute nodes and shut them down before shutting down the head node itself. Alternatively, you could deploy something like the rshutdown
service, editing the command appropriately.
Up until this point, the cluster we’ve built is pretty flexible, and now we have a firm base we can start installing software depending on exactly what we’re looking to do with our cluster. For instance, if we’re building a compute cluster for modelling, we’d probably look to install MPI and OpenMP to do parallel processing across our cluster. Alternatively, you might be looking to build out a cluster to host Kubernetes.