In my last post, I showed that DRBD could be used diskless, which effectively does the same as exposing a disk with iSCSI. However, DRBD can do more than just become an iSCSI target, and its most known feature is replicating disks over a network.
This post will look into mounting a DRBD device diskless and testing its reliability when one of the two backing nodes fails and more.
I started by mounting the DRBD disk on node 3, the diskless node. If you run drbdadm status it should show the following:
After it's mounted, I've created a small test file and installed pv. I started writing the test file slowly to the disk. For now, we don't want to overload the disk or fill it up too quickly to perform reliability tests.
I gave node one a shutdown command to test the reliability under normal circumstances. After it came back, I gave node two a shutdown command.
During the reboot of DRBD node one, the writes on DRBD node three were halted shortly but came back very soon after.
When applying more pressure on the disks using a 3GB test file and unlimited speed, the disk of the rebooted server became inconsistent and needed a resync.
So DRBD seems to be very stable when the servers are rebooted gracefully. But what happens if we reboot them both?
It doesn't like that. But the disk seems to be OK to the point where it could write. Of course, you don't want this to happen, but at least the disks are still mountable and readable.
What if the network starts flapping?
Writing to the disk was just as fast as writing when both were available.
What if we have a broken network connection that allows 10Mbps?
However, this only seems to work on outgoing traffic, not incoming traffic. While reading from the disk, both nodes are limited at 10Mbps if one of them is.
When setting the DRBD "test-disk" down on node 1, the speed of node 2 became unlimited again.
Interesting to see it balances the reads on both nodes.
What if a node gets panicked during writes?
Let's reset DRBD node two while writing at full speed.
Besides a short hiccup, we don't notice anything after DRBD declares node two unavailable.
Conclusion
DRBD has shown to be very stable. Rebooting or resetting DRBD nodes will result in a short hiccup but will continue to work just fine. I couldn't yet figure out why limiting one node its network bandwidth results in both nodes being limited in read-speed, and I'd like to see that being balanced based on the congestion of the network. In the next DRBD post, I hope to look at LINSTOR.