1. 11 Dec, 2009 7 commits
    • Andy Poling's avatar
      xfs: Wrapped journal record corruption on read at recovery · fc5bc4c8
      Andy Poling authored
      Summary of problem:
      
      If a journal record wraps at the physical end of the journal, it has to be
      read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
      read at the physical beginning.  If xlog_bread() has to re-align the first
      read, the second read request does not take that re-alignment into account.
      If the first read was re-aligned, the second read over-writes the end of the
      data from the first read, effectively corrupting it.  This can happen either
      when reading the record header or reading the record data.
      
      The first sanity check in xlog_recover_process_data() is to check for a valid
      clientid, so that is the error reported.
      
      Summary of fix:
      
      If there was a first read at the physical end, XFS_BUF_PTR() returns where the
      data was requested to begin.  Conversely, because it is the result of
      xlog_align(), offset indicates where the requested data for the first read
      actually begins - whether or not xlog_bread() has re-aligned it.
      
      Using offset as the base for the calculation of where to place the second read
      data ensures that it will be correctly placed immediately following the data
      from the first read instead of sometimes over-writing the end of it.
      
      The attached patch has resolved the reported problem of occasional inability
      to recover the journal (reporting "bad clientid").
      Signed-off-by: default avatarAndy Poling <andy@realbig.com>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      fc5bc4c8
    • Christoph Hellwig's avatar
      xfs: cleanup data end I/O handlers · 5ec4fabb
      Christoph Hellwig authored
      Currently we have different end I/O handlers for read vs the different
      types of write I/O.  But they are all very similar so we could just
      use one with a few conditionals and reduce code size a lot.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      5ec4fabb
    • Christoph Hellwig's avatar
      xfs: use WRITE_SYNC_PLUG for synchronous writeout · 06342cf8
      Christoph Hellwig authored
      The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for
      synchronous writeout.  Right now I can't see any changes in performance
      numbers with this, but we're getting some beating for not using it,
      and the knowledge definitely could help the block code to make better
      decisions.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      06342cf8
    • Christoph Hellwig's avatar
      xfs: reset the i_iolock lock class in the reclaim path · 033da48f
      Christoph Hellwig authored
      The iolock is used for protecting reads, writes and block truncates
      against each other.  We have two classes of callers, the first one is
      induced by a file operation and requires a reference to the inode be
      held and not dropped after the operation is done:
      
       - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read,
         xfs_splice_write and xfs_setattr are all implementations of VFS
         methods that require a live inode
       - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the
         same is true
       - xfs_truncate_file is only called on quota inodes just returned from
         xfs_iget
       - xfs_sync_inode_data does the lock just after an igrab()
       - xfs_filestream_associate and xfs_filestream_new_ag take the iolock
         on the parent inode of an inode which by VFS rules must be referenced
      
      And we have various calls to truncate blocks past EOF or the whole
      file when dropping the last reference to an inode.  Unfortunately
      lockdep complains when we do memory allocations that can recurse into
      the filesystem in the first class because the second class happens to
      take the same lock.  To avoid this re-init the iolock in the beginning
      of xfs_fs_clear_inode to get a new lock class.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      033da48f
    • Christoph Hellwig's avatar
      xfs: I/O completion handlers must use NOFS allocations · 80641dc6
      Christoph Hellwig authored
      When completing I/O requests we must not allow the memory allocator to
      recurse into the filesystem, as we might deadlock on waiting for the
      I/O completion otherwise.  The only thing currently allocating normal
      GFP_KERNEL memory is the allocation of the transaction structure for
      the unwritten extent conversion.  Add a memflags argument to
      _xfs_trans_alloc to allow controlling the allocator behaviour.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarThomas Neumann <tneumann@users.sourceforge.net>
      Tested-by: default avatarThomas Neumann <tneumann@users.sourceforge.net>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      80641dc6
    • Christoph Hellwig's avatar
      xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks · c56c9631
      Christoph Hellwig authored
      When xfs_free_eofblocks is called from ->release the VM might already
      hold the mmap_sem, but in the write path we take the iolock before
      taking the mmap_sem in the generic write code.
      
      Switch xfs_free_eofblocks to only trylock the iolock if called from
      ->release and skip trimming the prellocated blocks in that case.
      We'll still free them later on the final iput.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      c56c9631
    • Christoph Hellwig's avatar
      xfs: simplify inode teardown · 848ce8f7
      Christoph Hellwig authored
      Currently the reclaim code for the case where we don't reclaim the
      final reclaim is overly complicated.  We know that the inode is clean
      but instead of just directly reclaiming the clean inode we go through
      the whole process of marking the inode reclaimable just to directly
      reclaim it from the calling context.  Besides being overly complicated
      this introduces a race where iget could recycle an inode between
      marked reclaimable and actually being reclaimed leading to panics.
      
      This patch gets rid of the existing reclaim path, and replaces it with
      a simple call to xfs_ireclaim if the inode was clean.  While we're at
      it we also use the slightly more lax xfs_inode_clean check we'd use
      later to determine if we need to flush the inode here.
      
      Finally get rid of xfs_reclaim function and place the remaining small
      bits of reclaim code directly into xfs_fs_destroy_inode.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarPatrick Schreurs <patrick@news-service.com>
      Reported-by: default avatarTommy van Leeuwen <tommy@news-service.com>
      Tested-by: default avatarPatrick Schreurs <patrick@news-service.com>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      848ce8f7
  2. 03 Dec, 2009 1 commit
  3. 02 Dec, 2009 26 commits
  4. 01 Dec, 2009 6 commits