- 26 Apr, 2007 28 commits
-
-
Mark Fasheh authored
The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Mark Fasheh authored
Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Tiger Yang authored
Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Tiger Yang authored
Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Sunil Mushran authored
We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Sunil Mushran authored
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Mark Fasheh authored
We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Sunil Mushran authored
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after processing each lockres in a hash bucket. Move it outside the loop so as to call it only after the entire hash bucket has been processed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Srinivas Eeda authored
There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
-
Linus Torvalds authored
.. ok, enough waffling about it already. "Just do it!" Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 25 Apr, 2007 7 commits
-
-
Linus Torvalds authored
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [PARPORT] SUNBPP: Fix OOPS when debugging is enabled. [SPARC] openprom: Switch to ref counting PCI API
-
Linus Torvalds authored
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETLINK]: Infinite recursion in netlink.
-
Andrew Morton authored
The packet driver is assuming (reasonably) that the (undocumented) request.errors is an errno. But it is in fact some mysterious bitfield. When things go wrong we return weird positive numbers to the VFS as pointers and it goes oops. Thanks to William Heimbigner for reporting and diagnosis. (It doesn't oops, but this driver still doesn't work for William) Cc: William Heimbigner <icxcnika@mar.tar.cc> Cc: Peter Osterlund <petero2@telia.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Kuznetsov authored
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jens Axboe authored
There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
YOSHIFUJI Hideaki authored
Oops, thinko. The test for accempting a RH0 was exatly the wrong way around. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [BNX2]: Fix occasional NETDEV WATCHDOG on 5709. [IPV6]: Disallow RH0 by default. [XFRM]: beet: fix pseudo header length value [TCP]: Congestion control initialization.
-
- 24 Apr, 2007 5 commits
-
-
Michael Chan authored
Tweak a register setting to prevent the tx mailbox from halting. Update version to 1.5.8. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki authored
A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ralf Baechle authored
This did cause oprofile to fail on non-multithreaded systems with more than 2 processors such as the BCM1480. Reported by Manish Lachwani (mlachwani@mvista.com). Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
-
Linus Torvalds authored
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: drivers/net/hamradio/baycom_ser_fdx build fix usb-net/pegasus: fix pegasus carrier detection sis900: Allocate rx replacement buffer before rx operation [netdrvr] depca: handle platform_device_add() failure
-
Andrew Morton authored
sparc64: drivers/net/hamradio/baycom_ser_fdx.c: In function `ser12_open': drivers/net/hamradio/baycom_ser_fdx.c:417: error: `NR_IRQS' undeclared (first us e in this function) drivers/net/hamradio/baycom_ser_fdx.c:417: error: (Each undeclared identifier is reported only once drivers/net/hamradio/baycom_ser_fdx.c:417: error: for each function it appears i n.) Cc: Folkert van Heusden <folkert@vanheusden.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
-