drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right....
drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right. (Caused missing resyncs) Bugz 246 Connection drop while transmitting last ack: SyncSource losses connection, SyncTarget sees the end of resync. Aug 18 08:39:42 uml1 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 08:39:42 uml1 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 08:39:42 uml1 drbd0: drbd_sync_handshake: Aug 18 08:39:42 uml1 drbd0: self 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0 Aug 18 08:39:42 uml1 drbd0: peer 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0 Aug 18 08:39:42 uml1 drbd0: uuid_compare()=1 by rule 70 Aug 18 08:39:42 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) Aug 18 08:39:42 uml1 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 18 08:39:42 uml1 drbd0: Began resync as SyncSource (will sync 17020 KB [4255 bits set]). Aug 18 08:39:43 uml1 drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting ) Aug 18 08:39:42 uml2 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 08:39:42 uml2 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 08:39:42 uml2 drbd0: drbd_sync_handshake: Aug 18 08:39:42 uml2 drbd0: self 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0 Aug 18 08:39:42 uml2 drbd0: peer 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0 Aug 18 08:39:42 uml2 drbd0: uuid_compare()=-1 by rule 50 Aug 18 08:39:42 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Aug 18 08:39:42 uml2 drbd0: conn( WFBitMapT -> WFSyncUUID ) Aug 18 08:39:42 uml2 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Aug 18 08:39:43 uml2 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Only uml2 recognised the end of resync. Aug 18 09:49:51 uml1 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 09:49:51 uml1 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 09:49:51 uml1 drbd0: drbd_sync_handshake: Aug 18 09:49:51 uml1 drbd0: self 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0 Aug 18 09:49:51 uml1 drbd0: peer 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0 Aug 18 09:49:51 uml1 drbd0: uuid_compare()=0 by rule 40 Aug 18 09:49:51 uml1 drbd0: No resync, but 3 bits in bitmap! Aug 18 09:49:51 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( Inconsistent -> UpToDate ) Aug 18 09:49:51 uml2 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 09:49:51 uml2 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 09:49:51 uml2 drbd0: drbd_sync_handshake: Aug 18 09:49:51 uml2 drbd0: self 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0 Aug 18 09:49:51 uml2 drbd0: peer 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0 Aug 18 09:49:51 uml2 drbd0: uuid_compare()=0 by rule 40 Aug 18 09:49:51 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) => No resync, but 3 bits in bitmap! message on uml1. rule 3.4: If Cs = Cp & Bs != 0 & Bp = 0 & Bs = H1p & H1s = H2p => I have not realized end of resync. I was SyncSource, target saw the end of resync. Correct my UUIDs: Bs = 0 (with rotate) rule 3.5: If Cs = Cp & Bs = 0 & Bp != 0 & H1s = Bp & H2s = H1p => Peer has not realized end of resync. I was SyncTarget, resync is actually done. Correct peer's UUIDS: Bp = 0 (with rotate) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Showing
Please register or sign in to comment