Commits · 83418978827324918a8cd25ce5227312de1d4468 · linux / linux-davinci

26 Apr, 2007 28 commits

Mark Fasheh authored Apr 23, 2007

The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

83418978

ocfs2: Remember rw lock level during direct io · 7cdfc3a1

Mark Fasheh authored Apr 16, 2007

Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

7cdfc3a1

ocfs2: Fix up i_blocks calculation to know about holes · 8110b073

Mark Fasheh authored Mar 22, 2007

Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

8110b073

ocfs2: Fix extent lookup to return true size of holes · 4f902c37

Mark Fasheh authored Mar 09, 2007

Initially, we had wired things to return a size '1' of holes. Cook up a
small amount of code to find the next extent and calculate the number of
clusters between the virtual offset and the next allocated extent.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

4f902c37

ocfs2: Read from an unwritten extent returns zeros · 49cb8d2d

Mark Fasheh authored Mar 09, 2007

Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

49cb8d2d

ocfs2: make room for unwritten extents flag · e48edee2

Mark Fasheh authored Mar 07, 2007

Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

e48edee2

ocfs2: Use own splice write actor · 6af67d82

Mark Fasheh authored Mar 06, 2007

We need to fill holes during a splice write. Provide our own splice write
actor which can call ocfs2_file_buffered_write() with a splice-specific
callback.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

6af67d82

ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() · fa41045f

Mark Fasheh authored Mar 01, 2007

Do this instead of filemap_fdatawrite() - this way we sync only the
range between i_size and the cluster boundary.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

fa41045f

[PATCH] Turn do_sync_file_range() into do_sync_mapping_range() · 5b04aa3a

Mark Fasheh authored Mar 01, 2007

do_sync_file_range() accepts a file * from which it takes an address_space to
sync.  Abstract out the bulk of the function into do_sync_mapping_range()
which takes the address_space directly.  This way callers who want to sync an
address_space directly can take advantage of the functionality provided.

do_sync_file_range() is preserved as a small wrapper around
do_sync_mapping_range().

Ocfs2 in particular would like to use this to initiate a sync of a specific
inode range during truncate, where a file * may not be available.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5b04aa3a

ocfs2: zero tail of sparse files on truncate · 60b11392

Mark Fasheh authored Feb 16, 2007

Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.

This introduced a new helper, which can be used in ocfs2_write().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

60b11392

ocfs2: Teach ocfs2_get_block() about holes · 25baf2da

Mark Fasheh authored Feb 14, 2007

ocfs2_get_block() didn't understand sparse files, fix that. Also remove some
code that isn't really useful anymore. We can fix up
ocfs2_direct_IO_get_blocks() at the same time.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

25baf2da

ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() · 5069120b

Mark Fasheh authored Feb 09, 2007

These are no longer used, and can't handle file systems with sparse file
allocation.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

5069120b

ocfs2: teach ocfs2_file_aio_write() about sparse files · 9517bac6

Mark Fasheh authored Feb 09, 2007

Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
because allocating writes will require zeroing of pages adjacent to the I/O
for cluster sizes greater than page size.

Implement a custom file write here, which can order page locks for zeroing.
This also has the advantage that cluster locks can easily be ordered outside
of the page locks.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

9517bac6

ocfs2: Turn off shared writeable mmap for local files systems with holes. · 89488984

Mark Fasheh authored Jan 17, 2007

This will be turned back on once we can do allocation in ->page_mkwrite().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

89488984

ocfs2: abstract out allocation locking · abf8b156

Mark Fasheh authored Jan 17, 2007

Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
which is either called from ->setattr() (for an i_size change), or at the
top of ocfs2_file_aio_write().

Inodes on file systems with sparse file support will want to do their
allocation during the actual write call.

In either case the cluster locking decisions are the same. We abstract out
that code into a new function, ocfs2_lock_allocators() which will be used by
a later patch to enable writing to sparse files.

This also provides a nice cleanup of ocfs2_extend_allocation().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

abf8b156

ocfs2: teach extend/truncate about sparse files · 3a0782d0

Mark Fasheh authored Jan 17, 2007

For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
longer exists since i_size is not tied to i_clusters. In
ocfs2_extend_file(), we skip the allocation / page zeroing code for file
systems which understand sparse files.

The core truncate code is changed to do a bottom up tree traversal. This
gets abstracted out into it's own function. To make things more readable,
most of the special case handling for in-inode extents from
ocfs2_do_truncate() is also removed.

Though write support for sparse files comes in a later patch, we at least
update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

3a0782d0

ocfs2: temporarily remove extent map caching · 363041a5

Mark Fasheh authored Jan 17, 2007

The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

363041a5

ocfs2: sparse b-tree support · dcd0538f

Mark Fasheh authored Jan 16, 2007

Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.

This patch only adds the rotation code and does minimal updates to callers
of the extent api.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

dcd0538f

ocfs2: small cleanup of ocfs2_request_delete() · 6f16bf65

Mark Fasheh authored Mar 20, 2007

There are two checks in there (one for inode newness, one for other mounted
nodes) which are unnecessary, so remove them. The DLM will allow the trylock
in either case without any messaging overhead.

Removing these makes ocfs2_request_delete() a one liner function, so just
move the trylock out one level into ocfs2_query_inode_wipe().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

6f16bf65

ocfs2: remove unused code · 68e2b740

Tiger Yang authored Mar 20, 2007

Remove node messaging code that becomes unused with the delete inode vote
removal.

[Removed even more cruft which I spotted during review --Mark]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

68e2b740

ocfs2: Remove delete inode vote · 50008630

Tiger Yang authored Mar 20, 2007

Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

50008630

ocfs2: filter more error prints · a9f5f707

Mark Fasheh authored Apr 26, 2007

We don't want to print anything at all in ocfs2_lookup() when getting an
error from ocfs2_iget() - it could be something as innocuous as a signal
being detected in the dlm.

ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
return if the inode was deleted on another node.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

a9f5f707

ocfs2: Replace panic() with emergency_restart() when fencing · bebe6f12

Sunil Mushran authored Apr 17, 2007

We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

bebe6f12

ocfs2: Silence compiler warnings · 5d262cc7

Sunil Mushran authored Apr 17, 2007

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

5d262cc7

ocfs2: Local mounts should skip inode updates · be9e986b

Mark Fasheh authored Apr 18, 2007

We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

be9e986b

ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan · 0d01af6e

Sunil Mushran authored Apr 17, 2007

In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

0d01af6e

ocfs2_dlm: fix race in dlm_remaster_locks · 756a1501

Srinivas Eeda authored Apr 17, 2007

There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

756a1501

Linux 2.6.21 · de46c337

Linus Torvalds authored Apr 25, 2007

.. ok, enough waffling about it already. "Just do it!"
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

de46c337

25 Apr, 2007 7 commits

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · 2fb90b12

Linus Torvalds authored Apr 25, 2007

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [PARPORT] SUNBPP: Fix OOPS when debugging is enabled.
  [SPARC] openprom: Switch to ref counting PCI API

2fb90b12

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 707abb79
Linus Torvalds authored Apr 25, 2007
```
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETLINK]: Infinite recursion in netlink.
```
707abb79

packet: fix error handling · cbc31a47

Andrew Morton authored Apr 25, 2007

The packet driver is assuming (reasonably) that the (undocumented)
request.errors is an errno.  But it is in fact some mysterious bitfield.  When
things go wrong we return weird positive numbers to the VFS as pointers and it
goes oops.

Thanks to William Heimbigner for reporting and diagnosis.

(It doesn't oops, but this driver still doesn't work for William)

Cc: William Heimbigner <icxcnika@mar.tar.cc>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cbc31a47

[NETLINK]: Infinite recursion in netlink. · 1194ed0a

Alexey Kuznetsov authored Apr 25, 2007

Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.

The bug is present in all kernel versions since the feature appeared.

The patch also makes some minimal cleanup:

1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

1194ed0a

cfq-iosched: fix alias + front merge bug · 5044eed4

Jens Axboe authored Apr 25, 2007

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL.  One example of the resulting
oops is seen here:

	http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

	http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5044eed4

IPv6: fix Routing Header Type 0 handling thinko · a23cf14b

YOSHIFUJI Hideaki authored Apr 25, 2007

Oops, thinko.  The test for accempting a RH0 was exatly the wrong way
around.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a23cf14b

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 12145387

Linus Torvalds authored Apr 24, 2007

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [BNX2]: Fix occasional NETDEV WATCHDOG on 5709.
  [IPV6]: Disallow RH0 by default.
  [XFRM]: beet: fix pseudo header length value
  [TCP]: Congestion control initialization.

12145387

24 Apr, 2007 5 commits

[BNX2]: Fix occasional NETDEV WATCHDOG on 5709. · 68c9f75a

Michael Chan authored Apr 24, 2007

Tweak a register setting to prevent the tx mailbox from halting.

Update version to 1.5.8.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68c9f75a

[IPV6]: Disallow RH0 by default. · 0bcbc926

YOSHIFUJI Hideaki authored Apr 24, 2007

A security issue is emerging.  Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
Note: We allow RH2 by default because it is harmless.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0bcbc926

[MIPS] Fix oprofile logic to physical counter remapping · 6f4c5bde

Ralf Baechle authored Apr 24, 2007

This did cause oprofile to fail on non-multithreaded systems with more
than 2 processors such as the BCM1480.

Reported by Manish Lachwani (mlachwani@mvista.com).
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

6f4c5bde

Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · 89d8ab69

Linus Torvalds authored Apr 24, 2007

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  drivers/net/hamradio/baycom_ser_fdx build fix
  usb-net/pegasus: fix pegasus carrier detection
  sis900: Allocate rx replacement buffer before rx operation
  [netdrvr] depca: handle platform_device_add() failure

89d8ab69

drivers/net/hamradio/baycom_ser_fdx build fix · 5efb764c

Andrew Morton authored Apr 24, 2007

sparc64:

drivers/net/hamradio/baycom_ser_fdx.c: In function `ser12_open':
drivers/net/hamradio/baycom_ser_fdx.c:417: error: `NR_IRQS' undeclared (first us
e in this function)
drivers/net/hamradio/baycom_ser_fdx.c:417: error: (Each undeclared identifier is
 reported only once
drivers/net/hamradio/baycom_ser_fdx.c:417: error: for each function it appears i
n.)

Cc: Folkert van Heusden <folkert@vanheusden.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

5efb764c