1. 14 Dec, 2009 2 commits
    • Frederic Weisbecker's avatar
      reiserfs: Fix reiserfs lock and journal lock inversion dependency · cb1c2e51
      Frederic Weisbecker authored
      When we were using the bkl, we didn't care about dependencies against
      other locks, but the mutex conversion created new ones, which is why
      we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock
      before acquiring another mutex.
      
      But this trick actually fails if we have acquired the reiserfs lock
      recursively, as we try to unlock it to acquire the new mutex without
      inverted dependency, but we eventually only decrease its depth.
      
      This happens in the case of a nested inode creation/deletion.
      Say we have no space left on the device, we create an inode
      and tak the lock but fail to create its entry, then we release the
      inode using iput(), which calls reiserfs_delete_inode() that takes
      the reiserfs lock recursively. The path eventually ends up in
      journal_begin() where we try to take the journal safely but we
      fail because of the reiserfs lock recursion:
      
      [ INFO: possible circular locking dependency detected ]
      2.6.32-06486-g053fe57a #2
      -------------------------------------------------------
      vi/23454 is trying to acquire lock:
       (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0
      
      but task is already holding lock:
       (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
             [<c104f8f3>] validate_chain+0xa23/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c11106a8>] reiserfs_write_lock+0x28/0x40
             [<c110dacb>] do_journal_begin_r+0x6b/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10f76c2>] reiserfs_remount+0x212/0x4d0
             [<c1093997>] do_remount_sb+0x67/0x140
             [<c10a9ca6>] do_mount+0x436/0x6b0
             [<c10a9f86>] sys_mount+0x66/0xa0
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      -> #0 (&journal->j_mutex){+.+...}:
             [<c104fe38>] validate_chain+0xf68/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c110dac4>] do_journal_begin_r+0x64/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
             [<c10a55fc>] generic_delete_inode+0x9c/0x150
             [<c10a56ed>] generic_drop_inode+0x3d/0x60
             [<c10a4607>] iput+0x47/0x50
             [<c10e915c>] reiserfs_create+0x16c/0x1c0
             [<c109a9c1>] vfs_create+0xc1/0x130
             [<c109dbec>] do_filp_open+0x81c/0x920
             [<c109004f>] do_sys_open+0x4f/0x110
             [<c1090179>] sys_open+0x29/0x40
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      other info that might help us debug this:
      
      2 locks held by vi/23454:
       #0:  (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>]
      do_filp_open+0x27e/0x920
       #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>]
      reiserfs_write_lock+0x28/0x40
      
      stack backtrace:
      Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57a #2
      Call Trace:
       [<c134b202>] ? printk+0x18/0x1e
       [<c104e960>] print_circular_bug+0xc0/0xd0
       [<c104fe38>] validate_chain+0xf68/0xf70
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c1050325>] __lock_acquire+0x4e5/0xa70
       [<c105092a>] lock_acquire+0x7a/0xa0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110ff80>] ? delete_one_xattr+0x0/0x1c0
       [<c110dac4>] do_journal_begin_r+0x64/0x2f0
       [<c110ddcf>] journal_begin+0x7f/0x120
       [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50
       [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
       [<c10a55bf>] ? generic_delete_inode+0x5f/0x150
       [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140
       [<c10a55fc>] generic_delete_inode+0x9c/0x150
       [<c10a56ed>] generic_drop_inode+0x3d/0x60
       [<c10a4607>] iput+0x47/0x50
       [<c10e915c>] reiserfs_create+0x16c/0x1c0
       [<c1099a5d>] ? inode_permission+0x7d/0xa0
       [<c109a9c1>] vfs_create+0xc1/0x130
       [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0
       [<c109dbec>] do_filp_open+0x81c/0x920
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c134dc0d>] ? _spin_unlock+0x1d/0x20
       [<c10a6eea>] ? alloc_fd+0xba/0xf0
       [<c109004f>] do_sys_open+0x4f/0x110
       [<c1090179>] sys_open+0x29/0x40
       [<c1002c50>] sysenter_do_call+0x12/0x36
      
      To fix this, use reiserfs_lock_once() from reiserfs_delete_inode()
      which prevents from adding reiserfs lock recursion.
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      cb1c2e51
    • Frederic Weisbecker's avatar
      reiserfs: Fix possible recursive lock · 500f5a0b
      Frederic Weisbecker authored
      While allocating the bitmap using vmalloc, we hold the reiserfs lock,
      which makes lockdep later reporting a possible deadlock as we may
      swap out pages to allocate memory and then take the reiserfs lock
      recursively:
      
      inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      kswapd0/312 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11108a8>] reiserfs_write_lock+0x28/0x40
      {RECLAIM_FS-ON-W} state was registered at:
        [<c104e1c2>] mark_held_locks+0x62/0x90
        [<c104e28a>] lockdep_trace_alloc+0x9a/0xc0
        [<c108e396>] kmem_cache_alloc+0x26/0xf0
        [<c10850ec>] __get_vm_area_node+0x6c/0xf0
        [<c10857de>] __vmalloc_node+0x7e/0xa0
        [<c108597b>] vmalloc+0x2b/0x30
        [<c10e00b9>] reiserfs_init_bitmap_cache+0x39/0x70
        [<c10f8178>] reiserfs_fill_super+0x2e8/0xb90
        [<c1094345>] get_sb_bdev+0x145/0x180
        [<c10f5a11>] get_super_block+0x21/0x30
        [<c10931f0>] vfs_kern_mount+0x40/0xd0
        [<c10932d9>] do_kern_mount+0x39/0xd0
        [<c10a9857>] do_mount+0x2c7/0x6b0
        [<c10a9ca6>] sys_mount+0x66/0xa0
        [<c161589b>] mount_block_root+0xc4/0x245
        [<c1615a75>] mount_root+0x59/0x5f
        [<c1615b8c>] prepare_namespace+0x111/0x14b
        [<c1615269>] kernel_init+0xcf/0xdb
        [<c10031fb>] kernel_thread_helper+0x7/0x1c
      
      This is actually fine for two reasons: we call vmalloc at mount time
      then it's not in the swapping out path. Also the reiserfs lock can be
      acquired recursively, but since its implementation depends on a mutex,
      it's hard and not necessary worth it to teach that to lockdep.
      
      The lock is useless at mount time anyway, at least until we replay the
      journal. But let's remove it from this path later as this needs
      more thinking and is a sensible change.
      
      For now we can just relax the lock around vmalloc,
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      500f5a0b
  2. 07 Dec, 2009 1 commit
  3. 03 Dec, 2009 1 commit
  4. 02 Dec, 2009 26 commits
  5. 01 Dec, 2009 10 commits