1. 14 Sep, 2009 6 commits
  2. 09 Sep, 2009 10 commits
  3. 28 Aug, 2009 9 commits
  4. 19 Aug, 2009 15 commits
    • Stephen Rothwell's avatar
      linux-next: drbd tree build failure · 83f2029c
      Stephen Rothwell authored
      Today's linux-next build (x86_64 allmodconfig) failed like this:
      
      drivers/block/drbd/drbd_nl.c: In function 'drbd_setup_queue_param':
      drivers/block/drbd/drbd_nl.c:707: error: implicit declaration of function 'blk_queue_stack_limits'
      
      Caused by commit 6dc986e736ca1e76a45d025a920f3a66855fc2aa ("block:
      Deprecate blk_queue_stack_limits") from the block tree.
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      83f2029c
    • Lars Ellenberg's avatar
    • Philipp Reisner's avatar
    • Philipp Reisner's avatar
      drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right.... · bc9ef10b
      Philipp Reisner authored
      drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right. (Caused missing resyncs) Bugz 246
      
      Connection drop while transmitting last ack:
      SyncSource losses connection, SyncTarget sees the end of resync.
      
      Aug 18 08:39:42 uml1 drbd0: Handshake successful: Agreed network protocol version 90
      Aug 18 08:39:42 uml1 drbd0: conn( WFConnection -> WFReportParams )
      Aug 18 08:39:42 uml1 drbd0: drbd_sync_handshake:
      Aug 18 08:39:42 uml1 drbd0: self 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0
      Aug 18 08:39:42 uml1 drbd0: peer 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0
      Aug 18 08:39:42 uml1 drbd0: uuid_compare()=1 by rule 70
      Aug 18 08:39:42 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate )
      Aug 18 08:39:42 uml1 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
      Aug 18 08:39:42 uml1 drbd0: Began resync as SyncSource (will sync 17020 KB [4255 bits set]).
      Aug 18 08:39:43 uml1 drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting )
      
      Aug 18 08:39:42 uml2 drbd0: Handshake successful: Agreed network protocol version 90
      Aug 18 08:39:42 uml2 drbd0: conn( WFConnection -> WFReportParams )
      Aug 18 08:39:42 uml2 drbd0: drbd_sync_handshake:
      Aug 18 08:39:42 uml2 drbd0: self 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0
      Aug 18 08:39:42 uml2 drbd0: peer 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0
      Aug 18 08:39:42 uml2 drbd0: uuid_compare()=-1 by rule 50
      Aug 18 08:39:42 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
      Aug 18 08:39:42 uml2 drbd0: conn( WFBitMapT -> WFSyncUUID )
      Aug 18 08:39:42 uml2 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
      Aug 18 08:39:43 uml2 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
      
      Only uml2 recognised the end of resync.
      
      Aug 18 09:49:51 uml1 drbd0: Handshake successful: Agreed network protocol version 90
      Aug 18 09:49:51 uml1 drbd0: conn( WFConnection -> WFReportParams )
      Aug 18 09:49:51 uml1 drbd0: drbd_sync_handshake:
      Aug 18 09:49:51 uml1 drbd0: self 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0
      Aug 18 09:49:51 uml1 drbd0: peer 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0
      Aug 18 09:49:51 uml1 drbd0: uuid_compare()=0 by rule 40
      Aug 18 09:49:51 uml1 drbd0: No resync, but 3 bits in bitmap!
      Aug 18 09:49:51 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( Inconsistent -> UpToDate )
      
      Aug 18 09:49:51 uml2 drbd0: Handshake successful: Agreed network protocol version 90
      Aug 18 09:49:51 uml2 drbd0: conn( WFConnection -> WFReportParams )
      Aug 18 09:49:51 uml2 drbd0: drbd_sync_handshake:
      Aug 18 09:49:51 uml2 drbd0: self 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0
      Aug 18 09:49:51 uml2 drbd0: peer 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0
      Aug 18 09:49:51 uml2 drbd0: uuid_compare()=0 by rule 40
      Aug 18 09:49:51 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
      
      => No resync, but 3 bits in bitmap! message on uml1.
      
      rule 3.4:
        If Cs = Cp & Bs != 0 & Bp = 0 & Bs = H1p & H1s = H2p
       => I have not realized end of resync. I was SyncSource, target saw the end of resync.
      
          Correct my UUIDs: Bs = 0 (with rotate)
      
      rule 3.5:
        If Cs = Cp & Bs = 0 & Bp != 0 & H1s = Bp & H2s = H1p
       => Peer has not realized end of resync. I was SyncTarget, resync is actually done.
      
          Correct peer's UUIDS: Bp = 0 (with rotate)
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      bc9ef10b
    • Lars Ellenberg's avatar
    • Lars Ellenberg's avatar
    • Lars Ellenberg's avatar
    • Lars Ellenberg's avatar
      fix theoretical imbalance in ldev refcount · 8089af76
      Lars Ellenberg authored
      never observed, unlikely to be real. Still a coding bug.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      8089af76
    • Philipp Reisner's avatar
      drbd_uuid_compare(): Also undo the changes of last unsuccessful start of... · 794112a0
      Philipp Reisner authored
      drbd_uuid_compare(): Also undo the changes of last unsuccessful start of resync also on the peer's UUIDs
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      794112a0
    • Philipp Reisner's avatar
      drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost. · 5600d7d4
      Philipp Reisner authored
      Aditional errata to
      http://www.drbd.org/fileadmin/drbd/publications/drbd8.pdf
      
      5 Data Generation UUIDs
      
      Algorithm:
      
      Rule 5a:
      Cs = H1p & H1s = H2p  ==>
        Connection was lost before SyncUUID Packet came through. Become Sync target.
      
      Rule 7a:
      Cp = H1s & H1p = H2s  ==>
        Connection was lost before SyncUUID Packet came through. Correct my UUIDs:
        B  = H1
        H1 = H2
        H2 = 0
        Become Sync source.
      
      Here are the relevant log lines, showing the issue:
      
      Aug 12 16:20:48 garcon1 kernel: [4941237.376998] drbd10: self 68CFBC7C5C0E6D4F:1C4BD1C2E7B77E75:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0
      Aug 12 16:20:48 garcon1 kernel: [4941237.377003] drbd10: peer 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:0 flags:0
      Aug 12 16:20:48 garcon1 kernel: [4941237.377006] drbd10: uuid_compare()=1 by rule 7
      Aug 12 16:20:48 garcon1 kernel: [4941237.378131] drbd10: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
      Aug 12 16:20:50 garcon1 kernel: [4941238.911877] drbd10: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
      Aug 12 16:20:50 garcon1 kernel: [4941238.911889] drbd10: Began resync as SyncSource (will sync 340 KB [85 bits set]).
      Aug 12 16:21:10 garcon1 kernel: [4941259.500053] drbd10: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )
      
      Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: self 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:0 flags:0
      Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: peer 68CFBC7C5C0E6D4F:1C4BD1C2E7B77E75:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0
      Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: uuid_compare()=-1 by rule 5
      Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
      Aug 12 07:20:48 portman kernel: [11341862.146623] drbd10: conn( WFBitMapT -> WFSyncUUID )
      Aug 12 07:21:10 portman kernel: [11341884.923517] drbd10: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
      
      Aug 12 16:21:34 garcon1 kernel: [4941283.720095] drbd10: self 68CFBC7C5C0E6D4F:6AA2B905AD177D20:1C4BD1C2E7B77E75:F1812682E82B178A bits:91 flags:0
      Aug 12 16:21:34 garcon1 kernel: [4941283.720099] drbd10: peer 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0
      Aug 12 16:21:34 garcon1 kernel: [4941283.720102] drbd10: uuid_compare()=2 by rule 8
      Aug 12 16:21:34 garcon1 kernel: [4941283.720105] drbd10: Writing the whole bitmap, full sync required after drbd_sync_handshake.
      
      Aug 12 07:22:08 portman kernel: [11341943.908341] drbd10: self 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0
      Aug 12 07:22:08 portman kernel: [11341943.908390] drbd10: peer 68CFBC7C5C0E6D4F:6AA2B905AD177D20:1C4BD1C2E7B77E75:F1812682E82B178A bits:7864320 flags:0
      Aug 12 07:22:08 portman kernel: [11341943.908437] drbd10: uuid_compare()=-2 by rule 6
      Aug 12 07:22:08 portman kernel: [11341943.908461] drbd10: Writing the whole bitmap, full sync required after drbd_sync_handshake.
      Aug 12 07:22:08 portman kernel: [11341943.925199] drbd10: 30 GB (7864320 bits) marked out-of-sync by on disk bit-map.
      Aug 12 07:22:08 portman kernel: [11341943.925199] drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
      Aug 12 07:22:09 portman kernel: [11341944.896914] drbd10: conn( WFBitMapT -> WFSyncUUID )
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      5600d7d4
    • Lars Ellenberg's avatar
      update comment: you must not hold the req_lock in drbd_free_ee · 07e7f673
      Lars Ellenberg authored
      You do not NEED to hold the req_lock since
      107a7ca67f9fffd0c0c94018f5a1f61a0afe7bf8 (2005-03-15).
      Since the drbd_pp_lock is not spin_lock_irqsave anymore, you MUST NOT
      hold the req_lock when trying to aquire the drbd_pp_lock.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      07e7f673
    • Lars Ellenberg's avatar
      move drbd_free_ee outside of req_lock · ebfcb47b
      Lars Ellenberg authored
      fixes recently introduced potential spinlock deadlock.
      reduces the size of various critical sections.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      ebfcb47b
    • Lars Ellenberg's avatar
      don't hardcode the timeout for the handshake packet · 1cec97ee
      Lars Ellenberg authored
      2 seconds may be too small for high latency long distance medium packet loss links
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      1cec97ee
    • Lars Ellenberg's avatar
    • Lars Ellenberg's avatar