An error occurred fetching the project authors.
  1. 15 May, 2009 1 commit
  2. 03 May, 2009 1 commit
  3. 02 May, 2009 1 commit
  4. 01 May, 2009 2 commits
    • Theodore Ts'o's avatar
      ext4: Don't avoid using BLOCK_UNINIT block groups in mballoc · 75507efb
      Theodore Ts'o authored
      By avoiding the use of not-yet-used block groups (i.e., block groups
      with the BLOCK_UNINIT flag), mballoc had a tendency to create large
      files with large non-contiguous gaps.  In addition avoiding the use of
      new block groups had a tendency to push regular file data into the
      first block group in a flex_bg group, which slows down the speed of
      e2fsck pass 2, since it has a tendency to seek much more.  For
      example:
      
                     Before Patch                       After Patch
                    Time in seconds                   Time in seconds
                  Real /  User/  Sys   MB/s      Real /  User/  Sys    MB/s
      Pass 1      8.52 / 2.21 / 0.46  20.43      8.84 / 4.97 / 1.11   19.68
      Pass 2     21.16 / 1.02 / 1.86  11.30      6.54 / 1.77 / 1.78   36.39
      Pass 3      0.01 / 0.00 / 0.00 139.00      0.01 / 0.01 / 0.00  128.90
      Pass 4      0.16 / 0.15 / 0.00   0.00      0.17 / 0.17 / 0.00    0.00
      Pass 5      2.52 / 1.99 / 0.09   0.79      2.31 / 1.78 / 0.06    0.86
      Total      32.40 / 5.11 / 2.49  12.81     17.99 / 8.75 / 2.98   23.01
      
      This was on a sample 80 gig root filesystem which was approximately
      50% full.  Note the improved e2fsck pass 2 performance, by over a
      factor of 3, due to a decreased number of seeks.  (The total amount of
      I/O in pass 2 was unchanged; the layout of the directory blocks was
      simply much better from e2fsck's's perspective.)
      
      Other changes as a result of this patch on this sample filesystem:
      
                                   Before Patch    After Patch
      # of non-contig files           762             779
      # of non-contig directories     571             570
      # of BLOCK_UNINIT bg's          307             293
      # of INODE_UNINIT bg's          503             503
      
      Out of 640 block groups, of which 333 were in use, this patch caused
      an extra 14 block groups to be utilized.  The number of non-contiguous
      files did go up slightly, but when measured against the 99.9% of the
      files (603,154) which were contiguously allocated, this is pretty
      insignificant.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndreas Dilger <adilger@sun.com>
      75507efb
    • Theodore Ts'o's avatar
      ext4: Avoid races caused by on-line resizing and SMP memory reordering · 8df9675f
      Theodore Ts'o authored
      Ext4's on-line resizing adds a new block group and then, only at the
      last step adjusts s_groups_count.  However, it's possible on SMP
      systems that another CPU could see the updated the s_group_count and
      not see the newly initialized data structures for the just-added block
      group.  For this reason, it's important to insert a SMP read barrier
      after reading s_groups_count and before reading any (for example) the
      new block group descriptors allowed by the increased value of
      s_groups_count.
      
      Unfortunately, we rather blatently violate this locking protocol
      documented in fs/ext4/resize.c.  Fortunately, (1) on-line resizes
      happen relatively rarely, and (2) it seems rare that the filesystem
      code will immediately try to use just-added block group before any
      memory ordering issues resolve themselves.  So apparently problems
      here are relatively hard to hit, since ext3 has been vulnerable to the
      same issue for years with no one apparently complaining.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8df9675f
  5. 27 Mar, 2009 3 commits
  6. 26 Mar, 2009 2 commits
  7. 17 Mar, 2009 1 commit
    • Eric Sandeen's avatar
      ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976
      Eric Sandeen authored
      This is for Red Hat bug 490026: EXT4 panic, list corruption in
      ext4_mb_new_inode_pa
      
      ext4_lock_group(sb, group) is supposed to protect this list for
      each group, and a common code flow to remove an album is like
      this:
      
          ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
          ext4_lock_group(sb, grp);
          list_del(&pa->pa_group_list);
          ext4_unlock_group(sb, grp);
      
      so it's critical that we get the right group number back for
      this prealloc context, to lock the right group (the one 
      associated with this pa) and prevent concurrent list manipulation.
      
      however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
      comment, "-1 is to protect from crossing allocation group".
      
      This makes sense for the group_pa, where pa_pstart is advanced
      by the length which has been used (in ext4_mb_release_context()),
      and when the entire length has been used, pa_pstart has been
      advanced to the first block of the next group.
      
      However, for inode_pa, pa_pstart is never advanced; it's just
      set once to the first block in the group and not moved after
      that.  So in this case, if we subtract one in ext4_mb_put_pa(),
      we are actually locking the *previous* group, and opening the
      race with the other threads which do not subtract off the extra
      block.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      d33a1976
  8. 14 Mar, 2009 1 commit
    • Eric Sandeen's avatar
      ext4: fix bogus BUG_ONs in in mballoc code · 8d03c7a0
      Eric Sandeen authored
      Thiemo Nagel reported that:
      
      # dd if=/dev/zero of=image.ext4 bs=1M count=2
      # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \
        -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4
      # mount -o loop image.ext4 mnt/
      # dd if=/dev/zero of=mnt/file
      
      oopsed, with a BUG_ON in ext4_mb_normalize_request because
      size == EXT4_BLOCKS_PER_GROUP
      
      It appears to me (esp. after talking to Andreas) that the BUG_ON
      is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should
      be allowed, though larger sizes do indicate a problem.
      
      Fix that an another (apparently rare) codepath with a similar check.
      Reported-by: default avatarThiemo Nagel <thiemo.nagel@ph.tum.de>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8d03c7a0
  9. 05 Mar, 2009 1 commit
  10. 31 Mar, 2009 1 commit
  11. 12 Mar, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o authored
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  12. 14 Feb, 2009 1 commit
  13. 10 Feb, 2009 1 commit
  14. 27 Jan, 2009 1 commit
  15. 04 Jan, 2009 1 commit
  16. 06 Jan, 2009 5 commits
  17. 04 Jan, 2009 1 commit
  18. 06 Jan, 2009 3 commits
    • Aneesh Kumar K.V's avatar
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V authored
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • Aneesh Kumar K.V's avatar
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V authored
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • Aneesh Kumar K.V's avatar
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V authored
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  19. 24 Nov, 2008 1 commit
    • Aneesh Kumar K.V's avatar
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V authored
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  20. 06 Jan, 2009 1 commit
  21. 25 Nov, 2008 1 commit
  22. 06 Jan, 2009 1 commit
  23. 22 Nov, 2008 1 commit
  24. 05 Nov, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: Change unsigned long to unsigned int · 498e5f24
      Theodore Ts'o authored
      Convert the unsigned longs that are most responsible for bloating the
      stack usage on 64-bit systems.
      
      Nearly all places in the ext3/4 code which uses "unsigned long" is
      probably a bug, since on 32-bit systems a ulong a 32-bits, which means
      we are wasting stack space on 64-bit systems.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      498e5f24
  25. 06 Jan, 2009 3 commits
  26. 07 Jan, 2009 1 commit
    • Frank Mayhar's avatar
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar authored
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: default avatarFrank Mayhar <fmayhar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  27. 17 Dec, 2008 1 commit
  28. 04 Nov, 2008 1 commit