1. 01 May, 2007 28 commits
    • David Teigland's avatar
      [DLM] fix mode munging · 7d3c1feb
      David Teigland authored
      There are flags to enable two specialized features in the dlm:
      1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
         changing the granted mode of locks to NL.
      2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
         or CW to grant them if the normal requested mode can't be granted.
      
      GFS direct i/o exercises both of these features, especially when mixed
      with buffered i/o.  The dlm has problems with them.
      
      The first problem is on the master node. If it demotes a lock as a part of
      converting it, the actual step of converting the lock isn't being done
      after the demotion, the lock is just left sitting on the granted queue
      with a granted mode of NL.  I think the mistaken assumption was that the
      call to grant_pending_locks() would grant it, but that function naturally
      doesn't look at locks on the granted queue.
      
      The second problem is on the process node.  If the master either demotes
      or gives an altmode, the munging of the gr/rq modes is never done in the
      process copy of the lock, leaving the master/process copies out of sync.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7d3c1feb
    • Robert Peterson's avatar
      [GFS2] lockdump improvements · 5f882096
      Robert Peterson authored
      The patch below consists of the following changes (in code order):
      
      1. I fixed a minor compiler warning regarding the printing of
         a kernel symbol address.
      2. I implemented a suggestion from Dave Teigland that moves
         the debugfs information for gfs2 into a subdirectory so
         we can easily expand our use of debugfs in the future.
         The current code keeps the glock information in:
         /debug/gfs2/<fs>
         With the patch, the new code keeps the glock information in:
         /debug/gfs2/<fs>/glock
         That will allow us to create more debugfs files in the future.
      3. This fixes a bug whereby a failed mount attempt causes the
         debugfs file to not be deleted.  Failed mount attempts should
         always clean up after themselves, including deleting the
         debugfs file and/or directory.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5f882096
    • Steven Whitehouse's avatar
      [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks · bdd19a22
      Steven Whitehouse authored
      This patch detects when the number of entries in a leaf block or inode
      block (in the case of stuffed directories) is corrupt and informs the
      user. It prevents us from running off the end of the array thats been
      allocated for the sorting in this case,
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      bdd19a22
    • Robert Peterson's avatar
      [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) · 7a0079d9
      Robert Peterson authored
      This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
      (lock dump) seen at the "gfs2 summit".  This also fixes the bug that caused
      garbage to be printed by the "initialized at" field.  I apologize for the
      kludge, but that code will all be ripped out anyway when the official
      sprint_symbol function becomes available in the Linux kernel.  I also
      changed some formatting so that spaces are replaced by proper tabs.
      Signed-off-by: default avatarRobert Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7a0079d9
    • Adrian Bunk's avatar
      [DLM] fs/dlm/ast.c should #include "ast.h" · 8fa1de38
      Adrian Bunk authored
      Every file should include the headers containing the prototypes for
      it's global functions.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      8fa1de38
    • Patrick Caulfield's avatar
      [DLM] Consolidate transport protocols · 6ed7257b
      Patrick Caulfield authored
      This patch consolidates the TCP & SCTP protocols for the DLM into a single file
      and makes it switchable at run-time (well, at least before the DLM actually
      starts up!)
      
      For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
      socket API but that has already been twice ACKed so it should be OK.
      
      The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
      & lowcomms-tcp.c files.
      Signed-off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      6ed7257b
    • Patrick Caulfield's avatar
      [DLM] Remove redundant assignment · fc7c44f0
      Patrick Caulfield authored
      This patch removes a redundant (and incorrect) assignment from compat_output
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      fc7c44f0
    • Steven Whitehouse's avatar
      [GFS2] Fix bz 234168 (ignoring rgrp flags) · a43a4906
      Steven Whitehouse authored
      Ths following patch makes GFS2 use the rgrp flags properly. Although
      there are also separate flags for both data and metadata as well, I've
      not implemented these as there seems little use for them. On the
      otherhand, the "noalloc" flag is generally useful for future changes we
      might which to make, so this ensures that we interpret it correctly.
      
      In addition I fixed the comment above the function which was incorrect.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      a43a4906
    • David Teigland's avatar
      [DLM] change lkid format · ce03f12b
      David Teigland authored
      A lock id is a uint32 and is used as an opaque reference to the lock.  For
      userland apps, the lkid is passed up, through libdlm, as the return value
      from a write() on the dlm device.  This created a problem when the high
      bit was 1, making the lkid look like an error.  This is fixed by changing
      how the lkid is composed.  The low 16 bits identified the hash bucket for
      the lock and the high 16 bits were a per-bucket counter (which eventually
      hit 0x8000 causing the problem).  These are simply swapped around; the
      number of hash table buckets is far below 0x8000, making all lkid's
      positive when viewed as signed.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      ce03f12b
    • David Teigland's avatar
      [DLM] interface for purge (2/2) · 72c2be77
      David Teigland authored
      Add code to accept purge commands from userland.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      72c2be77
    • David Teigland's avatar
      [DLM] add orphan purging code (1/2) · 8499137d
      David Teigland authored
      Add code for purging orphan locks.  A process can also purge all of its
      own non-orphan locks by passing a pid of zero.  Code already exists for
      processes to create persistent locks that become orphans when the process
      exits, but the complimentary capability for another process to then purge
      these orphans has been missing.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      8499137d
    • David Teigland's avatar
      [DLM] split create_message function · 7e4dac33
      David Teigland authored
      This splits the current create_message() function into two parts so that
      later patches can call the new lower-level _create_message() function when
      they don't have an rsb struct.  No functional change in this patch.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7e4dac33
    • Steven Whitehouse's avatar
      [GFS2] Set drop_count to 0 (off) by default · f01963f2
      Steven Whitehouse authored
      This sets the drop_count to 0 by default which is a better default
      for most people.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      f01963f2
    • David Teigland's avatar
      [GFS2] use log_error before LM_OUT_ERROR · b9af8a78
      David Teigland authored
      We always want to see the details of the error returned to gfs, but
      log_debug is often turned off, so use log_error (printk).
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      b9af8a78
    • David Teigland's avatar
      [DLM] overlapping cancel and unlock · ef0c2bb0
      David Teigland authored
      Full cancel and force-unlock support.  In the past, cancel and force-unlock
      wouldn't work if there was another operation in progress on the lock.  Now,
      both cancel and unlock-force can overlap an operation on a lock, meaning there
      may be 2 or 3 operations in progress on a lock in parallel.  This support is
      important not only because cancel and force-unlock are explicit operations
      that an app can use, but both are used implicitly when a process exits while
      holding locks.
      
      Summary of changes:
      
      - add-to and remove-from waiters functions were rewritten to handle situations
        with more than one remote operation outstanding on a lock
      
      - validate_unlock_args detects when an overlapping cancel/unlock-force
        can be sent and when it needs to be delayed until a request/lookup
        reply is received
      
      - processing request/lookup replies detects when cancel/unlock-force
        occured during the op, and carries out the delayed cancel/unlock-force
      
      - manipulation of the "waiters" (remote operation) state of a lock moved under
        the standard rsb mutex that protects all the other lock state
      
      - the two recovery routines related to locks on the waiters list changed
        according to the way lkb's are now locked before accessing waiters state
      
      - waiters recovery detects when lkb's being recovered have overlapping
        cancel/unlock-force, and may not recover such locks
      
      - revert_lock (cancel) returns a value to distinguish cases where it did
        nothing vs cases where it actually did a cancel; the cancel completion ast
        should only be done when cancel did something
      
      - orphaned locks put on new list so they can be found later for purging
      
      - cancel must be called on a lock when making it an orphan
      
      - flag user locks (ENDOFLIFE) at the end of their useful life (to the
        application) so we can return an error for any further cancel/unlock-force
      
      - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
        either a completion or blocking ast
      
      - clear an unread bast on a lock that's become unlocked
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      ef0c2bb0
    • Patrick Caulfield's avatar
      [DLM] fix coverity-spotted stupidity · 03206727
      Patrick Caulfield authored
      Replacement patch to remove redundant code rather than moving it around.
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      03206727
    • Robert Peterson's avatar
      [GFS2] Red Hat bz 228540: owner references · 04b933f2
      Robert Peterson authored
      In Testing the previously posted and accepted patch for
      https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
      I uncovered some gfs2 badness.  It turns out that the current
      gfs2 code saves off a process pointer when glocks is taken
      in both the glock and glock holder structures.  Those
      structures will persist in memory long after the process has
      ended; pointers to poisoned memory.
      
      This problem isn't caused by the 228540 fix; the new capability
      introduced by the fix just uncovered the problem.
      
      I wrote this patch that avoids saving process pointers
      and instead saves off the process pid.  Rather than
      referencing the bad pointers, it now does process lookups.
      There is special code that makes the output nicer for
      printing holder information for processes that have ended.
      
      This patch also adds a stub for the new "sprint_symbol"
      function that exists in Andrew Morton's -mm patch set, but
      won't go into the base kernel until 2.6.22, since it adds
      functionality but doesn't fix a bug.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      04b933f2
    • Benjamin Marzinski's avatar
      [GFS2] flush the log if a transaction can't allocate space · 172e045a
      Benjamin Marzinski authored
      This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
      aren't available for reuse until the resource group is successfully written
      to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
      that the filesystem is out of space, when in reality, you are just waiting for
      a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
      to a file, and then truncate it, after a hundred interations, the write will
      fail with -ENOSPC, even though the filesystem is just 1% full.
      
      The attached patch calls a log flush in these cases.  I tested this patch
      fairly heavily to check if there were any locking issues that I missed, and
      it seems to work just fine. Also, this patch only does the log flush if
      get_local_rgrp makes a complete loop of resource groups without skipping
      any do to locking issues. The code would be slightly simpler if it just always
      did the log flush after the first failed pass, and you could only ever have
      to go through the loop twice, instead of up to three times. However, I guessed
      that failing to find a rg simply do to locking issues would be common enough
      to skip the log flush in that case, but I'm not certain that this is the right
      way to go. Either way, I don't suppose this code will be hit all that often.
      Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      172e045a
    • Benjamin Marzinski's avatar
      [GFS2] Fix log entry list corruption · 68835625
      Benjamin Marzinski authored
      When glock_lo_add and rg_lo_add attempt to add an element to the log, they
      check to see if has already been added before locking the log. If another
      process adds that element to the log in this window between the check and
      locking the log, the element will be added to the list twice. This causes
      the log element list to become corrupted in such a way that the log element
      can never be successfully removed from the list. This patch pulls the
      list_empty() check inside the log lock, to remove this window.
      Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      68835625
    • Steven Whitehouse's avatar
      [GFS2] Speed up lock_dlm's locking (move sprintf) · f35ac346
      Steven Whitehouse authored
      The following patch speeds up lock_dlm's locking by moving the sprintf
      out from the lock acquisition path and into the lock creation path. This
      reduces the amount of CPU time used in acquiring locks by a fair amount.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: default avatarDavid Teigland <teigland@redhat.com>
      f35ac346
    • Patrick Caulfield's avatar
      [DLM] Don't delete misc device if lockspace removal fails · 254da030
      Patrick Caulfield authored
      Currently if the lockspace removal fails the misc device associated with a
      lockspace is left deleted. After that there is no way to access the orphaned
      lockspace from userland.
      
      This patch recreates the misc device if th dlm_release_lockspace fails. I
      believe this is better than attempting to remove the lockspace first because
      that leaves an unattached device lying around. The potential gap in which there
      is no access to the lockspace between removing the misc device and recreating it
      is acceptable ... after all the application is trying to remove it, and only new
      users of the lockspace will be affected.
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      254da030
    • Steven Whitehouse's avatar
      [GFS2] Fix a bug on i386 due to evaluation order · 420d2a10
      Steven Whitehouse authored
      Since gcc didn't evaluate the last two terms of the expression in
      glock.c:1881 as a constant expression, it resulted in an error on
      i386 due to the lack of a 64bit divide instruction. This adds some
      brackets to fix the problem.
      
      This was reported by Andrew Morton.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      420d2a10
    • Steven Whitehouse's avatar
      [GFS2] Fix bz 224480 and cleanup glock demotion code · 3b8249f6
      Steven Whitehouse authored
      This patch prevents the printing of a warning message in cases where
      the fs is functioning normally by handing off responsibility for
      unlinked, but still open inodes, to another node for eventual deallocation.
      Also, there is now an improved system for ensuring that such requests
      to other nodes do not get lost. The callback on the iopen lock is
      only ever called when i_nlink == 0 and when a node is unable to deallocate
      it due to it still being in use on another node. When a node receives
      the callback therefore, it knows that i_nlink must be zero, so we mark
      it as such (in gfs2_drop_inode) in order that it will then attempt
      deallocation of the inode itself.
      
      As an additional benefit, queuing a demote request no longer requires
      a memory allocation. This simplifies the code for dealing with gfs2_holders
      as it removes one special case.
      
      There are two new fields in struct gfs2_glock. gl_demote_state is the
      state which the remote node has requested and gl_demote_time is the
      time when the request came in. Both fields are only valid when the
      GLF_DEMOTE flag is set in gl_flags.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3b8249f6
    • Josef Whiter's avatar
      [GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write · 1de91390
      Josef Whiter authored
      If we are writing a file, and in the middle of writing the file
      another node attempts to get a shared lock on that file (by doing a du for
      example) the process doing the writing will hang waiting on lock_page.  The
      reason for this is because when we have waiters on a exclusive glock, we will go
      through and flush out all dirty pages associated with that inode and release the
      lock.  The problem is that when we flush the dirty pages, we could hit a page
      that we have locked durring the generic_file_buffered_write part of this
      operation.  This patch unlocks the page before we go to dequeue the lock and
      locks it immediatly afterwards, since generic_file_buffered_write needs the page
      locked when the commit_write is completed.  This patch resolves the problem,
      however if somebody sees a better way to do this please don't hesistate to yell.
      Signed-off-by: default avatarJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      1de91390
    • Patrick Caulfield's avatar
      [DLM] Fix uninitialised variable in receiving · 89adc934
      Patrick Caulfield authored
      The length of the second element of the kvec array was not initialised before
      being added to the first one. This could cause invalid lengths to be passed to
      kernel_recvmsg
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      89adc934
    • Josef Whiter's avatar
      [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option · 5c7342d8
      Josef Whiter authored
      If you specify an invalid mount option when trying to mount a gfs2 filesystem,
      gfs2 will oops.  The attached patch resolves this problem.
      Signed-off-by: default avatarJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5c7342d8
    • Robert Peterson's avatar
      [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) · 7c52b166
      Robert Peterson authored
      The attached patch resolves bz 228540.  This adds the capability
      for gfs2 to dump gfs2 locks through the debugfs file system.
      This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
      gfs2 because all the ioctls were stripped out.  Please see the bugzilla
      for more history about the fix.  This patch is also attached to the bugzilla
      record.
      
      The patch is against Steve Whitehouse's latest nmw git tree kernel
      (2.6.21-rc1) and has been tested on system trin-10.
      Signed-off-by: default avatarRobert Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7c52b166
    • Linus Torvalds's avatar
      libata: honour host controllers that want just one host · dc87c398
      Linus Torvalds authored
      The Marvell IDE interface on my machine would hit a BUG_ON() in
      lib/iomem.c because it was calling ata_pci_init_one() specifying just a
      single port on the host, but that would actually end up trying to
      initialize two ports, the second one with bogus information.
      
      This fixes "ata_pci_init_one()" so that it actually passes down the
      n_ports variable that it got from the low-level driver to the host
      allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA
      layer actually having the correct port number information.
      
      And in order to make it all work, I also needed to fix a few places that
      had incorrectly hard-coded the fact that a host always had exactly two
      ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would
      just always iterate over both ports).
      Acked-by: default avatarJeff Garzik <jeff@garzik.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc87c398
  2. 30 Apr, 2007 12 commits