Reliability DRBD

In my last post, I showed that DRBD could be used diskless, which effectively does the same as exposing a disk with iSCSI. However, DRBD can do more than just become an iSCSI target, and its most known feature is replicating disks over a network.

This post will look into mounting a DRBD device diskless and testing its reliability when one of the two backing nodes fails and more.

I started by mounting the DRBD disk on node 3, the diskless node. If you run drbdadm status it should show the following:

[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless
drbdadm status

After it's mounted, I've created a small test file and installed pv. I started writing the test file slowly to the disk. For now, we don't want to overload the disk or fill it up too quickly to perform reliability tests.

I gave node one a shutdown command to test the reliability under normal circumstances. After it came back, I gave node two a shutdown command.

# Diskless DRBD node 3

[email protected]:~# dd if=/dev/urandom bs=1M count=1 > /testfile
[email protected]:~# cat /testfile | pv -L 40000 -r -p -e -s 1M > /mnt/testfile
[39.3KiB/s] [=====>                          ] 11% ETA 0:00:23

# DRBD node 1

[email protected]:~# reboot
Connection to 192.168.178.199 closed by remote host.
Connection to 192.168.178.199 closed.

# DRBD node 2

[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd1 connection:Connecting
  drbd3 role:Primary
    peer-disk:Diskless

# DRBD node 1

[email protected]:~# drbdadm adjust all
Marked additional 4096 KB as out-of-sync based on AL.
[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless

# Diskless DRBD node 3

[email protected]:~# md5sum /testfile && md5sum /mnt/testfile
553118a49cea22b739c2cf43fa53ae86  /testfile
553118a49cea22b739c2cf43fa53ae86  /mnt/testfile
Testing reliability with graceful reboots

During the reboot of DRBD node one, the writes on DRBD node three were halted shortly but came back very soon after.

When applying more pressure on the disks using a 3GB test file and unlimited speed, the disk of the rebooted server became inconsistent and needed a resync.

[email protected]:~# reboot
Connection to 192.168.178.103 closed by remote host.
Connection to 192.168.178.103 closed.
[email protected]:~# ssh 192.168.178.103
[email protected]:~# drbdadm status
# No currently configured DRBD found.
[email protected]:~# drbdadm adjust all
[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd1 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:7.28
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency
    
# Diskless DRBD node 3

[email protected]:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile
[email protected]:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile
[email protected]:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile
Same test but with 3GB file at 500MBps write speed.

So DRBD seems to be very stable when the servers are rebooted gracefully. But what happens if we reboot them both?

[email protected]:~# cat /testfile | pv -r -p -e -s 3000M > /mnt/testfile; md5sum /testfile && md5sum /mnt/testfile
[3.67MiB/s] [=======================================>                                                                         ] 36% ETA 0:00:22
pv: write failed: Read-only file system

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.824570] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.825393] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.826171] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.826876] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.827601] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.828365] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from [email protected] at Feb 28 13:50:38 ...
 kernel:[ 4498.829102] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)
d67f12594b8f29c77fc37a1d81f6f981  /testfile
md5sum: /mnt/testfile: Input/output error
[email protected]:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
2f80ddfb7fe21b9294b2e3663c0a0644  /mnt/testfile
[email protected]:~# mount | grep mnt
/dev/drbd1 on /mnt type ext4 (ro,relatime)
Testing both persistent disk reboots at the same time

It doesn't like that. But the disk seems to be OK to the point where it could write. Of course, you don't want this to happen, but at least the disks are still mountable and readable.

What if the network starts flapping?

[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless

... Connectivity failure due to tagging VM with wrong VLAN in Hyper-V

... Restoring VLAN settings

[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 connection:Connecting
  drbd3 connection:Connecting

[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd2 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:5.39
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency
Interrupting network connectivity

Writing to the disk was just as fast as writing when both were available.

What if we have a broken network connection that allows 10Mbps?

Normal speed
Hyper-V setting
New speed

However, this only seems to work on outgoing traffic, not incoming traffic. While reading from the disk, both nodes are limited at 10Mbps if one of them is.

Above DRBD node 1, limited at 10Mbps. Below DRBD node 2, unlimited

When setting the DRBD "test-disk" down on node 1, the speed of node 2 became unlimited again.

After "drbdadm down test-disk."

Interesting to see it balances the reads on both nodes.

What if a node gets panicked during writes?

Let's reset DRBD node two while writing at full speed.

[email protected]:~# packet_write_wait: Connection to 192.168.178.103 port 22: Broken pipe
[email protected]:~# ssh 192.168.178.103
Last login: Mon Feb 28 14:32:18 2022 from 192.168.178.47
[email protected]:~# drbdadm status
# No currently configured DRBD found.
[email protected]:~# drbdadm adjust all
Marked additional 4948 MB as out-of-sync based on AL.
[email protected]:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd1 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:0.21
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency
Reset node two

Besides a short hiccup, we don't notice anything after DRBD declares node two unavailable.

Conclusion

DRBD has shown to be very stable. Rebooting or resetting DRBD nodes will result in a short hiccup but will continue to work just fine. I couldn't yet figure out why limiting one node its network bandwidth results in both nodes being limited in read-speed, and I'd like to see that being balanced based on the congestion of the network. In the next DRBD post, I hope to look at LINSTOR.