• Philipp Reisner's avatar
    drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right.... · bc9ef10b
    Philipp Reisner authored
    drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right. (Caused missing resyncs) Bugz 246
    
    Connection drop while transmitting last ack:
    SyncSource losses connection, SyncTarget sees the end of resync.
    
    Aug 18 08:39:42 uml1 drbd0: Handshake successful: Agreed network protocol version 90
    Aug 18 08:39:42 uml1 drbd0: conn( WFConnection -> WFReportParams )
    Aug 18 08:39:42 uml1 drbd0: drbd_sync_handshake:
    Aug 18 08:39:42 uml1 drbd0: self 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0
    Aug 18 08:39:42 uml1 drbd0: peer 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0
    Aug 18 08:39:42 uml1 drbd0: uuid_compare()=1 by rule 70
    Aug 18 08:39:42 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate )
    Aug 18 08:39:42 uml1 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
    Aug 18 08:39:42 uml1 drbd0: Began resync as SyncSource (will sync 17020 KB [4255 bits set]).
    Aug 18 08:39:43 uml1 drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting )
    
    Aug 18 08:39:42 uml2 drbd0: Handshake successful: Agreed network protocol version 90
    Aug 18 08:39:42 uml2 drbd0: conn( WFConnection -> WFReportParams )
    Aug 18 08:39:42 uml2 drbd0: drbd_sync_handshake:
    Aug 18 08:39:42 uml2 drbd0: self 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0
    Aug 18 08:39:42 uml2 drbd0: peer 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0
    Aug 18 08:39:42 uml2 drbd0: uuid_compare()=-1 by rule 50
    Aug 18 08:39:42 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
    Aug 18 08:39:42 uml2 drbd0: conn( WFBitMapT -> WFSyncUUID )
    Aug 18 08:39:42 uml2 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
    Aug 18 08:39:43 uml2 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
    
    Only uml2 recognised the end of resync.
    
    Aug 18 09:49:51 uml1 drbd0: Handshake successful: Agreed network protocol version 90
    Aug 18 09:49:51 uml1 drbd0: conn( WFConnection -> WFReportParams )
    Aug 18 09:49:51 uml1 drbd0: drbd_sync_handshake:
    Aug 18 09:49:51 uml1 drbd0: self 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0
    Aug 18 09:49:51 uml1 drbd0: peer 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0
    Aug 18 09:49:51 uml1 drbd0: uuid_compare()=0 by rule 40
    Aug 18 09:49:51 uml1 drbd0: No resync, but 3 bits in bitmap!
    Aug 18 09:49:51 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( Inconsistent -> UpToDate )
    
    Aug 18 09:49:51 uml2 drbd0: Handshake successful: Agreed network protocol version 90
    Aug 18 09:49:51 uml2 drbd0: conn( WFConnection -> WFReportParams )
    Aug 18 09:49:51 uml2 drbd0: drbd_sync_handshake:
    Aug 18 09:49:51 uml2 drbd0: self 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0
    Aug 18 09:49:51 uml2 drbd0: peer 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0
    Aug 18 09:49:51 uml2 drbd0: uuid_compare()=0 by rule 40
    Aug 18 09:49:51 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
    
    => No resync, but 3 bits in bitmap! message on uml1.
    
    rule 3.4:
      If Cs = Cp & Bs != 0 & Bp = 0 & Bs = H1p & H1s = H2p
     => I have not realized end of resync. I was SyncSource, target saw the end of resync.
    
        Correct my UUIDs: Bs = 0 (with rotate)
    
    rule 3.5:
      If Cs = Cp & Bs = 0 & Bp != 0 & H1s = Bp & H2s = H1p
     => Peer has not realized end of resync. I was SyncTarget, resync is actually done.
    
        Correct peer's UUIDS: Bp = 0 (with rotate)
    Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    bc9ef10b
drbd_receiver.c 121 KB