Commits · c806e68f5647109350ec546fee5b526962970fd2 · linux / linux-davinci

10 Oct, 2008 4 commits

ext4: fix initialization of UNINIT bitmap blocks · c806e68f

Frederic Bohe authored Oct 10, 2008

This fixes a bug which caused on-line resizing of filesystems with a
1k blocksize to fail.  The root cause of this bug was the fact that if
an uninitalized bitmap block gets read in by userspace (which
e2fsprogs does try to avoid, but can happen when the blocksize is less
than the pagesize and an adjacent blocks is read into memory)
ext4_read_block_bitmap() was erroneously depending on the buffer
uptodate flag to decide whether it needed to initialize the bitmap
block in memory --- i.e., to set the standard set of blocks in use by
a block group (superblock, bitmaps, inode table, etc.).  Essentially,
ext4_read_block_bitmap() assumed it was the only routine that might
try to read a block containing a block bitmap, which is simply not
true.  

To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
must always initialize uninitialized bitmap blocks.  Once a block or
inode is allocated out of that bitmap, it will be marked as
initialized in the block group descriptor, so in general this won't
result any extra unnecessary work.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c806e68f

ext4: Remove old legacy block allocator · c2ea3fde
Theodore Ts'o authored Oct 10, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
c2ea3fde

ext4: Use readahead when reading an inode from the inode table · 240799cd

Theodore Ts'o authored Oct 09, 2008

With modern hard drives, reading 64k takes roughly the same time as
reading a 4k block.  So request readahead for adjacent inode table
blocks to reduce the time it takes when iterating over directories
(especially when doing this in htree sort order) in a cold cache case.
With this patch, the time it takes to run "git status" on a kernel
tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
is reduced by 21%.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

240799cd

ext4: Improve the documentation for ext4's /proc tunables · 37515fac

Theodore Ts'o authored Oct 09, 2008

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alex Tomas <bzzz@sun.com>
Cc: Andreas Dilger <adilger@sun.com>

37515fac

23 Sep, 2008 2 commits

ext4: Combine proc file handling into a single set of functions · 5e8814f2

Theodore Ts'o authored Sep 23, 2008

Previously mballoc created a separate set of functions for each proc
file.  This combines the tunables into a single set of functions which
gets used for all of the per-superblock proc files, saving
approximately 2k of compiled object code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5e8814f2

ext4: move /proc setup and teardown out of mballoc.c · 9f6200bb

Theodore Ts'o authored Sep 23, 2008

...and into the core setup/teardown code in fs/ext4/super.c so that
other parts of ext4 can define tuning parameters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9f6200bb

22 Sep, 2008 1 commit

ext4: Don't use 'struct dentry' for internal lookups · f702ba0f

Theodore Ts'o authored Sep 22, 2008

This is a port of a patch from Linus which fixes a 200+ byte stack
usage problem in ext4_get_parent().

It's more efficient to pass down only the actual parts of the dentry
that matter: the parent inode and the name, instead of allocating a
struct dentry on the stack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f702ba0f

07 Oct, 2008 1 commit
- ext4/jbd2: Avoid WARN() messages when failing to write to the superblock · 914258bf
  Theodore Ts'o authored Oct 06, 2008
```
This fixes some very common warnings reported by kerneloops.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  914258bf
13 Sep, 2008 2 commits

ext4: use percpu data structures for lg_prealloc_list · 730c213c

Eric Sandeen authored Sep 13, 2008

lg_prealloc_list seems to cry out for a per-cpu data structure; on a large
smp system I think this should be better.  I've lightly tested this change
on a 4-cpu system.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

730c213c

ext4: Renumber EXT4_IOC_MIGRATE · 8eea80d5

Theodore Ts'o authored Sep 13, 2008

Pick an ioctl number for EXT4_IOC_MIGRATE that won't conflict with
other ext4 ioctl's.  Since there haven't been any major userspace
users of this ioctl, we can afford to change this now, to avoid
potential problems later.

Also, reorder the ioctl numbers in ext4.h to avoid this sort of
mistake in the future.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8eea80d5

09 Oct, 2008 1 commit

ext4: hook the ext3 migration interface to the EXT4_IOC_SETFLAGS ioctl · 4db46fc2

Aneesh Kumar K.V authored Oct 08, 2008

This patch hooks the ext3 to ext4 migrate interface to
EXT4_IOC_SETFLAGS ioctl. The userspace interface is via chattr +e.  We
only allow setting extent flags.  Clearing extent flag (migrating from
ext4 to ext3) is not supported.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

4db46fc2

13 Sep, 2008 1 commit

ext4: elevate write count for migrate ioctl · 2a43a878

Aneesh Kumar K.V authored Sep 13, 2008

The migrate ioctl writes to the filsystem, so we need to elevate the
write count.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

2a43a878

08 Sep, 2008 1 commit

ext4: add missing unlock in ext4_check_descriptors() on error path · 7ee1ec4c

Li Zefan authored Sep 08, 2008

If there group descriptors are corrupted we need unlock the block
group lock before returning from the function; else we will oops when
freeing a spinlock which is still being held.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

7ee1ec4c

16 Sep, 2008 1 commit

jbd2: clean up how the journal device name is printed · 05496769

Theodore Ts'o authored Sep 16, 2008

Calculate the journal device name once and stash it away in the
journal_s structure.  This avoids needing to call bdevname()
everywhere and reduces stack usage by not needing to allocate an
on-stack buffer.  In addition, we eliminate the '/' that can appear in
device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that
can cause problems when creating proc directory names, and include the
inode number to support ocfs2 which creates multiple journals with
different inode numbers.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

05496769

14 Sep, 2008 1 commit

ext4: fix #11321: create /proc/ext4/*/stats more carefully · 899fc1a4

Alexey Dobriyan authored Sep 14, 2008

ext4 creates per-suberblock directory in /proc/ext4/ . Name used as
basis is taken from bdevname, which, surprise, can contain slash.

However, proc while allowing to use proc_create("a/b", parent) form of
PDE creation, assumes that parent/a was already created.

bdevname in question is 'cciss/c0d0p9', directory is not created and all
this stuff goes directly into /proc (which is real bug).

Warning comes when _second_ partition is mounted.

http://bugzilla.kernel.org/show_bug.cgi?id=11321Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

899fc1a4

08 Sep, 2008 1 commit

Update flex_bg free blocks and free inodes counters when resizing. · c62a11fd

Frederic Bohe authored Sep 08, 2008

This fixes a bug which prevented the newly created inodes after a
resize from being used on filesystems with flex_bg.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c62a11fd

09 Oct, 2008 1 commit

ext4: Avoid printk floods in the face of directory corruption · 9d9f1775

Eric Sandeen authored Oct 09, 2008

Note: some people thinks this represents a security bug, since it
might make the system go away while it is printing a large number of
console messages, especially if a serial console is involved.  Hence,
it has been assigned CVE-2008-3528, but it requires that the attacker
either has physical access to your machine to insert a USB disk with a
corrupted filesystem image (at which point why not just hit the power
button), or is otherwise able to convince the system administrator to
mount an arbitrary filesystem image (at which point why not just
include a setuid shell or world-writable hard disk device file or some
such).  Me, I think they're just being silly. --tytso
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Cc: Eugene Teo <eugeneteo@kernel.sg>

9d9f1775

13 Sep, 2008 2 commits

ext4: Properly update i_disksize. · cf17fea6

Aneesh Kumar K.V authored Sep 13, 2008

With delayed allocation we use i_data_sem to update i_disksize. We need
to update i_disksize only if the new size specified is greater than the
current value and we need to make sure we don't race with other
i_disksize update. With delayed allocation we will switch to the
write_begin function for non-delayed allocation if we are low on free
blocks. This means the write_begin function for non-delayed allocation
also needs to use the same locking.

We also need to check and update i_disksize even if the new size is less
that inode.i_size because of delayed allocation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cf17fea6

ext4: truncate block allocated on a failed ext4_write_begin · ae4d5372

Aneesh Kumar K.V authored Sep 13, 2008

For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated pages locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ae4d5372

09 Sep, 2008 3 commits

ext4: Retry block allocation if we have free blocks left · df22291f

Aneesh Kumar K.V authored Sep 08, 2008

When we truncate files, the meta-data blocks released are not reused
untill we commit the truncate transaction.  That means delayed get_block
request will return ENOSPC even if we have free blocks left.  Force a
journal commit and retry block allocation if we get ENOSPC with free
blocks left.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

df22291f

ext4: Don't add the inode to journal handle until after the block is allocated · 166348dd

Aneesh Kumar K.V authored Sep 08, 2008

    
Make sure we don't add the inode to the journal handle until after the
block allocation, so that a journal commit will not include the inode in
case of block allocation failure.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

166348dd

ext4: Fix ext4 nomballoc allocator for ENOSPC · 68629f29

Aneesh Kumar K.V authored Sep 08, 2008

We run into ENOSPC error on nonmballoc ext4, even when there is free blocks
on the filesystem.

The patch includes two changes:

a) Set reservation to NULL if we trying to allocate near group_target_block
from the goal group if the free block in the group is less than windows.
This should give us a better chance to allocate near group_target_block.
This also ensures that if we are not allocating near group_target_block
then we don't trun off reservation. This should enable us to allocate
with reservation from other groups that have large free blocks count.

b) we don't need to check the window size if the block reservation is off.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

68629f29

09 Oct, 2008 2 commits

ext4: Signed arithmetic fix · 5c791616

Aneesh Kumar K.V authored Oct 08, 2008

This patch converts some usage of ext4_fsblk_t to s64.  This is needed
so that some of the sign conversion works as expected in if loops.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5c791616

ext4: Switch to non delalloc mode when we are low on free blocks count. · 79f0be8d

Aneesh Kumar K.V authored Oct 08, 2008

The delayed allocation code allocates blocks during writepages(), which
can not handle block allocation failures.  To deal with this, we switch
away from delayed allocation mode when we are running low on free
blocks.  This also allows us to avoid needing to reserve a large number
of meta-data blocks in case all of the requested blocks are
discontiguous.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

79f0be8d

10 Oct, 2008 1 commit

ext4: Add percpu dirty block accounting. · 6bc6e63f

Aneesh Kumar K.V authored Oct 10, 2008

This patch adds dirty block accounting using percpu_counters.  Delayed
allocation block reservation is now done by updating dirty block
counter.  In a later patch we switch to non delalloc mode if the
filesystem free blocks is greater than 150% of total filesystem dirty
blocks
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6bc6e63f

09 Sep, 2008 1 commit

ext4: Retry block reservation · 030ba6bc

Aneesh Kumar K.V authored Sep 08, 2008

During block reservation if we don't have enough blocks left, retry
block reservation with smaller block counts. This makes sure we try
fallocate and DIO with smaller request size and don't fail early. The
delayed allocation reservation cannot try with smaller block count. So
retry block reservation to handle temporary disk full conditions. Also
print free blocks details if we fail block allocation during writepages.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

030ba6bc

09 Oct, 2008 1 commit

ext4: Make sure all the block allocation paths reserve blocks · a30d542a

Aneesh Kumar K.V authored Oct 09, 2008

With delayed allocation we need to make sure block are reserved before
we attempt to allocate them. Otherwise we get block allocation failure
(ENOSPC) during writepages which cannot be handled. This would mean
silent data loss (We do a printk stating data will be lost). This patch
updates the DIO and fallocate code path to do block reservation before
block allocation. This is needed to make sure parallel DIO and fallocate
request doesn't take block out of delayed reserve space.

When free blocks count go below a threshold we switch to a slow patch
which looks at other CPU's accumulated percpu counter values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a30d542a

20 Aug, 2008 1 commit

ext4: invalidate pages if delalloc block allocation fails. · c4a0c46e

Aneesh Kumar K.V authored Aug 19, 2008

We are a bit agressive in invalidating all the pages. But
it is ok because we really don't know why the block allocation
failed and it is better to come of the writeback path
so that user can look for more info.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

c4a0c46e

09 Sep, 2008 3 commits
- ext4: Fix whitespace checkpatch warnings/errors · af5bc92d
  Theodore Ts'o authored Sep 08, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  af5bc92d
- ext4: Fix long long checkpatch warnings · e5f8eab8
  Theodore Ts'o authored Sep 08, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  e5f8eab8
- ext4: Add printk priority levels to clean up checkpatch warnings · 4776004f
  Theodore Ts'o authored Sep 08, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  4776004f
09 Oct, 2008 9 commits

percpu counter: clean up percpu_counter_sum_and_set() · 1f7c14c6

Mingming Cao authored Oct 09, 2008

percpu_counter_sum_and_set() and percpu_counter_sum() is the same except
the former updates the global counter after accounting.  Since we are
taking the fbc->lock to calculate the precise value of the counter in
percpu_counter_sum() anyway, it should simply set fbc->count too, as the
percpu_counter_sum_and_set() does.

This patch merges these two interfaces into one.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1f7c14c6

Linux 2.6.27 · 3fa8749e
Linus Torvalds authored Oct 09, 2008

3fa8749e

Don't allow splice() to files opened with O_APPEND · efc968d4

Linus Torvalds authored Oct 09, 2008

This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.

It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example. So
we could make up any semantics we want, including the old ones.

But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).

So disallow O_APPEND entirely for now. I doubt anybody cares, and this
way we have one less gray area to worry about.
Reported-and-argued-for-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Jens Axboe <ens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

efc968d4

Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 · 07f40554

Linus Torvalds authored Oct 09, 2008

* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  hwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X
  hwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X
  hwmon: (adt7473) Fix some bogosity in documentation file
  hwmon: Define sysfs interface for energy consumption register
  hwmon: (it87) Prevent power-off on Shuttle SN68PT
  eeepc-laptop: Fix hwmon interface

07f40554

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq · 9283dfed
Linus Torvalds authored Oct 09, 2008
```
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] correct broken links and email addresses
```
9283dfed

SLOB: fix bogus ksize calculation fix · 70096a56

Matt Mackall authored Oct 08, 2008

This fixes the previous fix, which was completely wrong on closer
inspection. This version has been manually tested with a user-space
test harness and generates sane values. A nearly identical patch has
been boot-tested.

The problem arose from changing how kmalloc/kfree handled alignment
padding without updating ksize to match. This brings it in sync.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

70096a56

[CPUFREQ] correct broken links and email addresses · 8d592257

Németh Márton authored Oct 09, 2008

Replace the no longer working links and email address in the
documentation and in source code.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Dave Jones <davej@redhat.com>

8d592257

hwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X · 5e5cddbc

Alistair John Strachan authored Oct 09, 2008

Enable driver checking of the DMI product name (when enabled) on
an Abit AT8 32X, instead of falling back to a manual probe. This
eliminates false negatives and eventually will help avoid
unnecessary bus probes on unsupported mainboards.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

5e5cddbc

hwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X · 8748a71e

Alistair John Strachan authored Oct 09, 2008

The table for the Abit AT8 32X was incorrectly missing an entry
for the sixth ("AUX3") fan. Add this entry, exporting the fan
reading to userspace.

Closes lm-sensors.org ticket #2339.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

8748a71e