BiggiWiki

DRBD node failure recovery.

Step 1 - Set all drbd volumes as secondary to avoid incidents + invalidate all data
@BROKEN BOX

drbdadm secondary all
drbdadm invalidate all

(Instead of 'all', you can of course use drbd0, drbd1 etc.)

Step 2 - Overwrite data of the other peer (“bad box”)
@GOOD BOX:

drbdadm -- --overwrite-data-of-peer primary all

Step 3 - Connect the volumes to start the transfer
@BOTH BOXES:

drbdadm connect all

Step 3 - Verify that your boxes are syncing, and wait for them to be done
@GOOD/BAD, doesn't matter:

while true; do cat /proc/drbd; echo "------------------"; sleep 1; done

You can expect something like this:

 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----
    ns:0 nr:220496 dw:220108 dr:200 al:0 bm:23 lo:97 pe:15483 ua:97 ap:0 ep:1 wo:b oos:5546848
        [>....................] sync'ed:  3.9% (5416/5628)M
        finish: 0:04:03 speed: 22,672 (15,720) K/sec

Step 4 - Set all drbd volumes as primary again ONCE THE SYNC IS COMPLETE!
@(FORMER) BROKEN BOX:

drbdadm primary all

Done.