Diskless DRBD

I've been using DRBD for quite some time now. When I started as a Linux system administrator at my first real job, DRBD was this RAID1 high available storage thing that was magic to me. In combination with PiranHA, which retired ten years ago, I built a setup I demonstrated to my colleague as a high availability setup.

Although this was ten years ago, the people at LINBIT haven't been sitting still. DRBD 9 came to life in 2015, but I had never had any experience with its new features other than "Just using it". When I started looking into Kubernetes, I started looking deeper in DRBD.

If you've used DRBD 9, you probably know it can, contrary to DRBD 8,  replicate to more than two nodes. Replication means you can use DRBD in a cluster of 3 nodes and not decide which two nodes can run which workload. The downside of this is the increased disk usage, and an even more significant problem is the scalability and flexibility problems when you're reaching ten or even 100 nodes. You're not going to replicate all the data over all the 100 nodes.

Flexibility is where diskless DRBD comes in. Once you've installed DRBD on every node of your cluster, your disks aren't bound to the metal casing they reside in. You can expose a single disk on one node to another without replicating the data. Let's dive into the technical stuff now!

Installing virtual machines is manually boring. Use Hyper-V on your Windows workstation to quickly get some virtual machines up and running. Soon I'll talk about doing the same thing with Terraform on AWS, Azure and GCP!

Manual labour! :(

So I've set up three virtual machines with ubuntu 20.04. I did the first things first. I placed my keys on the server and updated them to the latest packages. Make a snapshot; it saves you time.

Something I wanted to do differently in this blog is compiling DRBD myself. Usually, I would use the LINBIT PPA repo, and of course, you can do so. However, time will pass, this post gets old, versions change, and its accuracy will rot.

Let's prepare the servers for DRBD! The commands used to compile are so simple.

# DRBD Kernel Module

sudo apt install build-essential flex
wget https://pkg.linbit.com/downloads/drbd/9/drbd-9.2.0-rc.4.tar.gz
tar zxvf drbd-9.2.0-rc.4.tar.gz
cd drbd-9.2.0-rc.4/
make -j 8
sudo make install
cd - # return to previous directory

# DRBD Utils

wget https://pkg.linbit.com/downloads/drbd/utils/drbd-utils-9.20.2.tar.gz
tar zxvf drbd-utils-9.20.2.tar.gz
cd drbd-utils-9.20.2
./configure --with-manual=no --with-pacemaker=no --with-xen=no --without-83support --without-84support --with-heartbeat=no --prefix=/opt/drbd
make -j 8
sudo make install

# Copy multipathd file to prevent it from locking the drbd disk once it's open

mkdir /etc/multipath/conf.d
cp /opt/drbd/etc/multipath/conf.d/drbd.conf /etc/multipath/conf.d/drbd.conf
systemctl restart multipathd

# Verify it's working

sudo modprobe drbd
cat /proc/drbd

# version: 9.2.0-rc.4 (api:2/proto:110-121)
# GIT-hash: 5828124e330af6238cec2bf396145b4e04487c5f build by feax@drdb1, 2022-02-22 20:55:11
# Transports (api:18): tcp (9.2.0-rc.4)
Compiling the DRBD tools

Do this on all three nodes. DRBD should be running, but nothing to be created yet. First, let's create two persistent disks. You can do diskless with even a single node, but let's test the availability if one of them fails in the next blog post.

feax@drbd1:~$ sudo lvcreate -L 5G -n test-disk ubuntu-vg
[sudo] password for feax:
  Logical volume "test-disk" created.
Creating LV device

Create a DRBD resource file on all three nodes.

root@drbd1:/opt/drbd/etc/drbd.d# cat test-disk.res
resource test-disk {
  device      minor 1;
  disk        /dev/ubuntu-vg/test-disk;
  meta-disk   internal;

  on drbd1 {
    address   192.168.178.199:7100;
    node-id   1;
  }
  on drbd2 {
    address   192.168.178.103:7100;
    node-id   2;
  }
  on drbd3 {
    disk      none;
    address   192.168.178.119:7100;
    node-id   3;
  }

  connection-mesh {
        hosts drbd1 drbd2 drbd3;
  }
}
DRBD resource config

Prepare the persistent disks on both nodes with disks.

feax@drbd1:~$ sudo drbdadm create-md test-disk
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:
    you are the 17th user to install this version
initializing activity log
initializing bitmap (160 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
success
feax@drbd1:~$ sudo drbdsetup new-current-uuid --clear-bitmap test-disk
Prepare DRBD disks

Adjust the DRBD resources on all nodes and check their status.

feax@drbd1:~$ sudo drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:14.59
  drbd3 role:Secondary
    peer-disk:Diskless
DRBD status after set up

You'll see that one node says Secondary/Diskless. Let's make this disk primary, create a filesystem and mount it.

feax@drbd3:~$ sudo drbdadm primary test-disk
feax@drbd3:~$ sudo drbdadm status
test-disk role:Primary
  disk:Diskless
  drbd1 role:Secondary
    peer-disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
feax@drbd3:~$ sudo drbdadm primary test-disk
feax@drbd3:~$ sudo mkfs.ext4 /dev/drbd1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 1310671 4k blocks and 327680 inodes
Filesystem UUID: ece9ddae-a57c-498e-bccf-251adecf85d2
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

feax@drbd3:~$ sudo mount /dev/drbd/by-res/test-disk/0 /mnt
feax@drbd3:~$ ls /mnt
lost+found
Preparing a filesystem

Now we can write files to this filesystem without having the disk available here. Let's dive into the reliability of node failures in the next blog post.