An error occurred fetching the project authors.
- 17 Mar, 2009 1 commit
-
-
Eric Sandeen authored
This is for Red Hat bug 490026: EXT4 panic, list corruption in ext4_mb_new_inode_pa ext4_lock_group(sb, group) is supposed to protect this list for each group, and a common code flow to remove an album is like this: ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL); ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); so it's critical that we get the right group number back for this prealloc context, to lock the right group (the one associated with this pa) and prevent concurrent list manipulation. however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a comment, "-1 is to protect from crossing allocation group". This makes sense for the group_pa, where pa_pstart is advanced by the length which has been used (in ext4_mb_release_context()), and when the entire length has been used, pa_pstart has been advanced to the first block of the next group. However, for inode_pa, pa_pstart is never advanced; it's just set once to the first block in the group and not moved after that. So in this case, if we subtract one in ext4_mb_put_pa(), we are actually locking the *previous* group, and opening the race with the other threads which do not subtract off the extra block. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 14 Mar, 2009 1 commit
-
-
Eric Sandeen authored
Thiemo Nagel reported that: # dd if=/dev/zero of=image.ext4 bs=1M count=2 # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \ -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4 # mount -o loop image.ext4 mnt/ # dd if=/dev/zero of=mnt/file oopsed, with a BUG_ON in ext4_mb_normalize_request because size == EXT4_BLOCKS_PER_GROUP It appears to me (esp. after talking to Andreas) that the BUG_ON is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should be allowed, though larger sizes do indicate a problem. Fix that an another (apparently rare) codepath with a similar check. Reported-by:
Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 14 Feb, 2009 1 commit
-
-
Aneesh Kumar K.V authored
When creating a new ext4_prealloc_space structure, we have to initialize its list_head pointers before we add them to any prealloc lists. Otherwise, with list debug enabled, we will get list corruption warnings. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 10 Feb, 2009 1 commit
-
-
Aneesh Kumar K.V authored
We should not call ext4_mb_add_n_trim while holding alloc_semp. ============================================= [ INFO: possible recursive locking detected ] 2.6.29-rc4-git1-dirty #124 --------------------------------------------- ffsb/3116 is trying to acquire lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 but task is already holding lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 http://bugzilla.kernel.org/show_bug.cgi?id=12672Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 27 Jan, 2009 1 commit
-
-
Thadeu Lima de Souza Cascardo authored
When bg_free_blocks_count was renamed to bg_free_blocks_count_lo in 560671a0, its uses under EXT4FS_DEBUG were not changed to the helper ext4_free_blks_count. Another commit, 498e5f24, also did not change everything needed under EXT4FS_DEBUG, thus making it spill some warnings related to printing format. This commit fixes both issues and makes ext4 build again when EXT4FS_DEBUG is enabled. Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 04 Jan, 2009 1 commit
-
-
Theodore Ts'o authored
Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2009 5 commits
-
-
Aneesh Kumar K.V authored
With nodelalloc option we need to update the dirty block counter on block allocation failure. This is needed because we increment the dirty block counter early in the block allocation phase. Without the patch s_dirty_blocks_counter goes wrong so that filesystem's free blocks decreases incorrectly. Tested-by:
Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Aneesh Kumar K.V authored
We need to init the complete page during buddy cache init by setting the contents to '1'. Otherwise we can see the following errors after doing an online resize of the filesystem: EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used: Allocating block 1040385 in system zone of 127 group Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Aneesh Kumar K.V authored
After we mark the blocks in the buddy cache as allocated, we need to ensure that we don't reinit the buddy cache until the block bitmap is updated. This commit achieves this by holding the group_info alloc_semaphore till ext4_mb_release_context Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Aneesh Kumar K.V authored
We need to mark the block/inode bitmap beyond the end of the group with '1'. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Aneesh Kumar K.V authored
For uninit block group, the on-disk bitmap is not initialized. That implies we cannot depend on the uptodate flag on the bitmap buffer_head to find bitmap validity. Use a new buffer_head flag which would be set after we properly initialize the bitmap. This also prevents (re-)initializing the uninit group bitmap every time we call ext4_read_block_bitmap(). Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 04 Jan, 2009 1 commit
-
-
Aneesh Kumar K.V authored
Rename some variables. We also unlock locks in the reverse order we acquired as a part of cleanup. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2009 3 commits
-
-
Aneesh Kumar K.V authored
Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Aneesh Kumar K.V authored
We need to make sure we update the block bitmap and clear EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide whether to initialize the block bitmap each time it is called (introduced by commit c806e68f), and this can race with block allocations in ext4_mb_mark_diskspace_used(). ext4_read_block_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { ext4_init_block_bitmap(sb, bh, block_group, desc); Now on the block allocation side we do mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); .... spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ie on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a parallel ext4_read_block_bitmap can zero out the bitmap in between the above mb_set_bits and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105 EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block .. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Aneesh Kumar K.V authored
The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 24 Nov, 2008 1 commit
-
-
Aneesh Kumar K.V authored
In ext4_mb_init_group(), if the filesystem block size is less than PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block groups in a loop. We need to allow for this by using down_write_nested() and passing in the loop index as a lock subclass number. This works because no other code path needs to take multiple alloc_sem's. Note that lockdep will fail for filesystem blocksize smaller than to PAGE_SIZE/16k. (e.g., a 1k filesystem blocksize with a 32k page size, or a 2k filesystem blocksize with a 64k blocksize, etc.) Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2009 1 commit
-
-
Aneesh Kumar K.V authored
When we generate buddy cache (especially during resize) we need to make sure we don't use the blocks freed but not yet comitted. This makes sure we have the right value of free blocks count in the group info and also in the bitmap. This also ensures the ordered mode consistency Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 25 Nov, 2008 1 commit
-
-
Aneesh Kumar K.V authored
Move some of the forward declaration of the static functions to mballoc.c where they are used. This enables us to include mballoc.h in other .c files. Also correct the buddy cache documentation. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2009 1 commit
-
-
Aneesh Kumar K.V authored
The new groups added during resize are flagged as need_init group. Make sure we properly initialize these groups. When we have block size < page size and we are adding new groups the page may still be marked uptodate even though we haven't initialized the group. While forcing the init of buddy cache we need to make sure other groups part of the same page of buddy cache is not using the cache. group_info->alloc_sem is added to ensure the same. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> cc: stable@kernel.org
-
- 22 Nov, 2008 1 commit
-
-
Aneesh Kumar K.V authored
* Change EXT4_HAS_*_FEATURE to return a boolean * Add a function prototype for ext4_fiemap() in ext4.h * Make ext4_ext_fiemap_cb() and ext4_xattr_fiemap() be static functions * Add lock annotations to mb_free_blocks() Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 05 Nov, 2008 1 commit
-
-
Theodore Ts'o authored
Convert the unsigned longs that are most responsible for bloating the stack usage on 64-bit systems. Nearly all places in the ext3/4 code which uses "unsigned long" is probably a bug, since on 32-bit systems a ulong a 32-bits, which means we are wasting stack space on 64-bit systems. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2009 3 commits
-
-
Theodore Ts'o authored
Nearly all places in the ext3/4 code which uses "unsigned long" is probably a bug, since on 32-bit systems a ulong a 32-bits, which means we are wasting stack space on 64-bit systems. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Aneesh Kumar K.V authored
We can call ext4_mb_check_limits even after successfully allocating the requested blocks. In that case, make sure we don't overwrite ac_status if it already has the status AC_STATUS_FOUND. This fixes the lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 2.6.28-rc6-autokern1 #1 --------------------------------------------- fsstress/11948 is trying to acquire lock: (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278 ..... stack backtrace: ..... [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44 ..... but task is already holding lock: (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278 Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Theodore Ts'o authored
This removes annoying blank syslog entries emitted by ext4_error() or ext4_warning(), since these functions add their own newline. Signed-off-by:
Nick Warne <nick@ukfsn.org> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 07 Jan, 2009 1 commit
-
-
Frank Mayhar authored
A few weeks ago I posted a patch for discussion that allowed ext4 to run without a journal. Since that time I've integrated the excellent comments from Andreas and fixed several serious bugs. We're currently running with this patch and generating some performance numbers against both ext2 (with backported reservations code) and ext4 with and without a journal. It just so happens that running without a journal is slightly faster for most everything. We did iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2 which creates 4 threads, each of which create and do reads and writes on a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens to bypass the page cache. Results: ext2 ext4, default ext4, no journal initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s reads 15.2 MB/s 16.9 MB/s 17.2 MB/s re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s So it seems that, so far, this was a useful exercise. Signed-off-by:
Frank Mayhar <fmayhar@google.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 17 Dec, 2008 1 commit
-
-
Yasunori Goto authored
I chased the cause of following ext4 oops report which is tested on ia64 box. http://bugzilla.kernel.org/show_bug.cgi?id=12018 The cause is the size of s_mb_maxs array that is defined as "unsigned short" in ext4_sb_info structure. If the file system's block size is 8k or greater, an unsigned short is not wide enough to contain the value fs->blocksize << 3. Signed-off-by:
Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: stable@kernel.org
-
- 04 Nov, 2008 1 commit
-
-
Aneesh Kumar K.V authored
If we try to free a block which is already freed, the code was returning without first unlocking the group. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 17 Oct, 2008 2 commits
-
-
Theodore Ts'o authored
The multiblock allocator needs to be able to release blocks (and issue a blkdev discard request) when the transaction which freed those blocks is committed. Previously this was done via a polling mechanism when blocks are allocated or freed. A much better way of doing things is to create a jbd2 callback function and attaching the list of blocks to be freed directly to the transaction structure. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Manish Katiyar authored
Signed-off-by:
Manish Katiyar <mkatiyar@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 16 Oct, 2008 2 commits
-
-
Theodore Ts'o authored
Let the block device know when unused blocks can be discarded, using the new sb_issue_discard() interface. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Aneesh Kumar K.V authored
With this patch we track the block freed during a transaction using red-black tree. We also make sure contiguous blocks freed are collected in one node in the tree. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 13 Oct, 2008 1 commit
-
-
Aneesh Kumar K.V authored
We should use kmem_cache_free to free memory allocated via kmem_cache_alloc Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 10 Oct, 2008 2 commits
-
-
Frederic Bohe authored
This fixes a bug which caused on-line resizing of filesystems with a 1k blocksize to fail. The root cause of this bug was the fact that if an uninitalized bitmap block gets read in by userspace (which e2fsprogs does try to avoid, but can happen when the blocksize is less than the pagesize and an adjacent blocks is read into memory) ext4_read_block_bitmap() was erroneously depending on the buffer uptodate flag to decide whether it needed to initialize the bitmap block in memory --- i.e., to set the standard set of blocks in use by a block group (superblock, bitmaps, inode table, etc.). Essentially, ext4_read_block_bitmap() assumed it was the only routine that might try to read a block containing a block bitmap, which is simply not true. To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap() must always initialize uninitialized bitmap blocks. Once a block or inode is allocated out of that bitmap, it will be marked as initialized in the block group descriptor, so in general this won't result any extra unnecessary work. Signed-off-by:
Frederic Bohe <frederic.bohe@bull.net> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Theodore Ts'o authored
Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 23 Sep, 2008 2 commits
-
-
Theodore Ts'o authored
Previously mballoc created a separate set of functions for each proc file. This combines the tunables into a single set of functions which gets used for all of the per-superblock proc files, saving approximately 2k of compiled object code. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Theodore Ts'o authored
...and into the core setup/teardown code in fs/ext4/super.c so that other parts of ext4 can define tuning parameters. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 13 Sep, 2008 1 commit
-
-
Eric Sandeen authored
lg_prealloc_list seems to cry out for a per-cpu data structure; on a large smp system I think this should be better. I've lightly tested this change on a 4-cpu system. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Acked-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 14 Sep, 2008 1 commit
-
-
Alexey Dobriyan authored
ext4 creates per-suberblock directory in /proc/ext4/ . Name used as basis is taken from bdevname, which, surprise, can contain slash. However, proc while allowing to use proc_create("a/b", parent) form of PDE creation, assumes that parent/a was already created. bdevname in question is 'cciss/c0d0p9', directory is not created and all this stuff goes directly into /proc (which is real bug). Warning comes when _second_ partition is mounted. http://bugzilla.kernel.org/show_bug.cgi?id=11321Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 10 Oct, 2008 1 commit
-
-
Aneesh Kumar K.V authored
This patch adds dirty block accounting using percpu_counters. Delayed allocation block reservation is now done by updating dirty block counter. In a later patch we switch to non delalloc mode if the filesystem free blocks is greater than 150% of total filesystem dirty blocks Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 09 Sep, 2008 1 commit
-
-
Aneesh Kumar K.V authored
During block reservation if we don't have enough blocks left, retry block reservation with smaller block counts. This makes sure we try fallocate and DIO with smaller request size and don't fail early. The delayed allocation reservation cannot try with smaller block count. So retry block reservation to handle temporary disk full conditions. Also print free blocks details if we fail block allocation during writepages. Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Mingming Cao <cmm@us.ibm.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-