DRBD node failure recovery.
Step 1 - Set all drbd volumes as secondary to avoid incidents + invalidate all data
@BROKEN BOX
drbdadm secondary all drbdadm invalidate all
(Instead of 'all', you can of course use drbd0, drbd1 etc.)
Step 2 - Overwrite data of the other peer (“bad box”)
@GOOD BOX:
drbdadm -- --overwrite-data-of-peer primary all
Step 3 - Connect the volumes to start the transfer
@BOTH BOXES:
drbdadm connect all
Step 3 - Verify that your boxes are syncing, and wait for them to be done
@GOOD/BAD, doesn't matter:
while true; do cat /proc/drbd; echo "------------------"; sleep 1; done
You can expect something like this:
0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r---- ns:0 nr:220496 dw:220108 dr:200 al:0 bm:23 lo:97 pe:15483 ua:97 ap:0 ep:1 wo:b oos:5546848 [>....................] sync'ed: 3.9% (5416/5628)M finish: 0:04:03 speed: 22,672 (15,720) K/sec
Step 4 - Set all drbd volumes as primary again ONCE THE SYNC IS COMPLETE!
@(FORMER) BROKEN BOX:
drbdadm primary all
Done.