- 14 Sep, 2009 6 commits
-
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
This may have led to spurious "IO Errors" on a receiving node, when barriers are not disabled, and not supported by the lower level device. regression was introduced by: commit a2ad6507a0b5a2dc1f5a6a823b9a28d9c4430200 Author: Lars Ellenberg <lars.ellenberg@linbit.com> Date: Thu Aug 27 20:40:39 2009 +0200 fix potential endless retry loop on IO error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 09 Sep, 2009 10 commits
-
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
That is necessary to detect the loss of connection before doing the after resync UUID modifications. The bug was usually triggered by failing before-resync-target handlers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
no need to wait for the network stack to get its act together Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
fix for bug#252 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 28 Aug, 2009 9 commits
-
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Otherwise, if we do a very quick detach/attach, and the after-state-change send_state() of the detach got delayed for some reason, it may be processed after the new diskless->attaching state change, probably confusing the peer. Paranoia fix, not yet observed in real life. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
On barrier request on receiving node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Because _drbd_clear_done_ee() was just an acient special case of drbd_process_done_ee(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Unfortunately the use needs to give the dead-time of his cluster stack as argument to the crm-fence-peer.sh script now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Besides removing some lines of code, this removes also a deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Florian Haas authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 19 Aug, 2009 15 commits
-
-
Stephen Rothwell authored
Today's linux-next build (x86_64 allmodconfig) failed like this: drivers/block/drbd/drbd_nl.c: In function 'drbd_setup_queue_param': drivers/block/drbd/drbd_nl.c:707: error: implicit declaration of function 'blk_queue_stack_limits' Caused by commit 6dc986e736ca1e76a45d025a920f3a66855fc2aa ("block: Deprecate blk_queue_stack_limits") from the block tree. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
drbd_uuid_compare(): Handle loss of last P_WRITE_ACK packet of a resync right. (Caused missing resyncs) Bugz 246 Connection drop while transmitting last ack: SyncSource losses connection, SyncTarget sees the end of resync. Aug 18 08:39:42 uml1 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 08:39:42 uml1 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 08:39:42 uml1 drbd0: drbd_sync_handshake: Aug 18 08:39:42 uml1 drbd0: self 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0 Aug 18 08:39:42 uml1 drbd0: peer 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0 Aug 18 08:39:42 uml1 drbd0: uuid_compare()=1 by rule 70 Aug 18 08:39:42 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) Aug 18 08:39:42 uml1 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 18 08:39:42 uml1 drbd0: Began resync as SyncSource (will sync 17020 KB [4255 bits set]). Aug 18 08:39:43 uml1 drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting ) Aug 18 08:39:42 uml2 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 08:39:42 uml2 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 08:39:42 uml2 drbd0: drbd_sync_handshake: Aug 18 08:39:42 uml2 drbd0: self 16EF5753AD5FA994:0000000000000000:95B9E9AD329C137A:A4B1B25AC5927436 bits:0 flags:0 Aug 18 08:39:42 uml2 drbd0: peer 81DAF2FF6134FC1E:16EF5753AD5FA994:95B9E9AD329C137B:A4B1B25AC5927436 bits:4255 flags:0 Aug 18 08:39:42 uml2 drbd0: uuid_compare()=-1 by rule 50 Aug 18 08:39:42 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Aug 18 08:39:42 uml2 drbd0: conn( WFBitMapT -> WFSyncUUID ) Aug 18 08:39:42 uml2 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Aug 18 08:39:43 uml2 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Only uml2 recognised the end of resync. Aug 18 09:49:51 uml1 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 09:49:51 uml1 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 09:49:51 uml1 drbd0: drbd_sync_handshake: Aug 18 09:49:51 uml1 drbd0: self 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0 Aug 18 09:49:51 uml1 drbd0: peer 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0 Aug 18 09:49:51 uml1 drbd0: uuid_compare()=0 by rule 40 Aug 18 09:49:51 uml1 drbd0: No resync, but 3 bits in bitmap! Aug 18 09:49:51 uml1 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( Inconsistent -> UpToDate ) Aug 18 09:49:51 uml2 drbd0: Handshake successful: Agreed network protocol version 90 Aug 18 09:49:51 uml2 drbd0: conn( WFConnection -> WFReportParams ) Aug 18 09:49:51 uml2 drbd0: drbd_sync_handshake: Aug 18 09:49:51 uml2 drbd0: self 81DAF2FF6134FC1E:0000000000000000:CB7A2BEB83B25C28:16EF5753AD5FA994 bits:0 flags:0 Aug 18 09:49:51 uml2 drbd0: peer 81DAF2FF6134FC1E:CB7A2BEB83B25C28:16EF5753AD5FA994:95B9E9AD329C137B bits:3 flags:0 Aug 18 09:49:51 uml2 drbd0: uuid_compare()=0 by rule 40 Aug 18 09:49:51 uml2 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) => No resync, but 3 bits in bitmap! message on uml1. rule 3.4: If Cs = Cp & Bs != 0 & Bp = 0 & Bs = H1p & H1s = H2p => I have not realized end of resync. I was SyncSource, target saw the end of resync. Correct my UUIDs: Bs = 0 (with rotate) rule 3.5: If Cs = Cp & Bs = 0 & Bp != 0 & H1s = Bp & H2s = H1p => Peer has not realized end of resync. I was SyncTarget, resync is actually done. Correct peer's UUIDS: Bp = 0 (with rotate) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
never observed, unlikely to be real. Still a coding bug. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
drbd_uuid_compare(): Also undo the changes of last unsuccessful start of resync also on the peer's UUIDs Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Aditional errata to http://www.drbd.org/fileadmin/drbd/publications/drbd8.pdf 5 Data Generation UUIDs Algorithm: Rule 5a: Cs = H1p & H1s = H2p ==> Connection was lost before SyncUUID Packet came through. Become Sync target. Rule 7a: Cp = H1s & H1p = H2s ==> Connection was lost before SyncUUID Packet came through. Correct my UUIDs: B = H1 H1 = H2 H2 = 0 Become Sync source. Here are the relevant log lines, showing the issue: Aug 12 16:20:48 garcon1 kernel: [4941237.376998] drbd10: self 68CFBC7C5C0E6D4F:1C4BD1C2E7B77E75:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0 Aug 12 16:20:48 garcon1 kernel: [4941237.377003] drbd10: peer 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:0 flags:0 Aug 12 16:20:48 garcon1 kernel: [4941237.377006] drbd10: uuid_compare()=1 by rule 7 Aug 12 16:20:48 garcon1 kernel: [4941237.378131] drbd10: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Aug 12 16:20:50 garcon1 kernel: [4941238.911877] drbd10: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 12 16:20:50 garcon1 kernel: [4941238.911889] drbd10: Began resync as SyncSource (will sync 340 KB [85 bits set]). Aug 12 16:21:10 garcon1 kernel: [4941259.500053] drbd10: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure ) Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: self 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:0 flags:0 Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: peer 68CFBC7C5C0E6D4F:1C4BD1C2E7B77E75:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0 Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: uuid_compare()=-1 by rule 5 Aug 12 07:20:48 portman kernel: [11341861.890379] drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Aug 12 07:20:48 portman kernel: [11341862.146623] drbd10: conn( WFBitMapT -> WFSyncUUID ) Aug 12 07:21:10 portman kernel: [11341884.923517] drbd10: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 12 16:21:34 garcon1 kernel: [4941283.720095] drbd10: self 68CFBC7C5C0E6D4F:6AA2B905AD177D20:1C4BD1C2E7B77E75:F1812682E82B178A bits:91 flags:0 Aug 12 16:21:34 garcon1 kernel: [4941283.720099] drbd10: peer 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0 Aug 12 16:21:34 garcon1 kernel: [4941283.720102] drbd10: uuid_compare()=2 by rule 8 Aug 12 16:21:34 garcon1 kernel: [4941283.720105] drbd10: Writing the whole bitmap, full sync required after drbd_sync_handshake. Aug 12 07:22:08 portman kernel: [11341943.908341] drbd10: self 1C4BD1C2E7B77E74:0000000000000000:F1812682E82B178A:3E381E47943D1E4B bits:85 flags:0 Aug 12 07:22:08 portman kernel: [11341943.908390] drbd10: peer 68CFBC7C5C0E6D4F:6AA2B905AD177D20:1C4BD1C2E7B77E75:F1812682E82B178A bits:7864320 flags:0 Aug 12 07:22:08 portman kernel: [11341943.908437] drbd10: uuid_compare()=-2 by rule 6 Aug 12 07:22:08 portman kernel: [11341943.908461] drbd10: Writing the whole bitmap, full sync required after drbd_sync_handshake. Aug 12 07:22:08 portman kernel: [11341943.925199] drbd10: 30 GB (7864320 bits) marked out-of-sync by on disk bit-map. Aug 12 07:22:08 portman kernel: [11341943.925199] drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Aug 12 07:22:09 portman kernel: [11341944.896914] drbd10: conn( WFBitMapT -> WFSyncUUID ) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
You do not NEED to hold the req_lock since 107a7ca67f9fffd0c0c94018f5a1f61a0afe7bf8 (2005-03-15). Since the drbd_pp_lock is not spin_lock_irqsave anymore, you MUST NOT hold the req_lock when trying to aquire the drbd_pp_lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
fixes recently introduced potential spinlock deadlock. reduces the size of various critical sections. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
2 seconds may be too small for high latency long distance medium packet loss links Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-