1. 06 Jan, 2009 40 commits
    • Nick Piggin's avatar
      mm: write_cache_pages integrity fix · 05fe478d
      Nick Piggin authored
      In write_cache_pages, nr_to_write is heeded even for data-integrity syncs,
      so the function will return success after writing out nr_to_write pages,
      even if that was not sufficient to guarantee data integrity.
      
      The callers tend to set it to values that could break data interity
      semantics easily in practice.  For example, nr_to_write can be set to
      mapping->nr_pages * 2, however if a file has a single, dirty page, then
      fsync is called, subsequent pages might be concurrently added and dirtied,
      then write_cache_pages might writeout two of these newly dirty pages,
      while not writing out the old page that should have been written out.
      
      Fix this by ignoring nr_to_write if it is a data integrity sync.
      
      This is a data integrity bug.
      
      The reason this has been done in the past is to avoid stalling sync
      operations behind page dirtiers.
      
       "If a file has one dirty page at offset 1000000000000000 then someone
        does an fsync() and someone else gets in first and starts madly writing
        pages at offset 0, we want to write that page at 1000000000000000.
        Somehow."
      
      What we do today is return success after an arbitrary amount of pages are
      written, whether or not we have provided the data-integrity semantics that
      the caller has asked for.  Even this doesn't actually fix all stall cases
      completely: in the above situation, if the file has a huge number of pages
      in pagecache (but not dirty), then mapping->nrpages is going to be huge,
      even if pages are being dirtied.
      
      This change does indeed make the possibility of long stalls lager, and
      that's not a good thing, but lying about data integrity is even worse.  We
      have to either perform the sync, or return -ELINUXISLAME so at least the
      caller knows what has happened.
      
      There are subsequent competing approaches in the works to solve the stall
      problems properly, without compromising data integrity.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05fe478d
    • Nick Piggin's avatar
      mm: write_cache_pages writepage error fix · 00266770
      Nick Piggin authored
      In write_cache_pages, if ret signals a real error, but we still have some
      pages left in the pagevec, done would be set to 1, but the remaining pages
      would continue to be processed and ret will be overwritten in the process.
      
      It could easily be overwritten with success, and thus success will be
      returned even if there is an error.  Thus the caller is told all writes
      succeeded, wheras in reality some did not.
      
      Fix this by bailing immediately if there is an error, and retaining the
      first error code.
      
      This is a data integrity bug.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      00266770
    • Nick Piggin's avatar
      mm: write_cache_pages early loop termination · bd19e012
      Nick Piggin authored
      We'd like to break out of the loop early in many situations, however the
      existing code has been setting mapping->writeback_index past the final
      page in the pagevec lookup for cyclic writeback.  This is a problem if we
      don't process all pages up to the final page.
      
      Currently the code mostly keeps writeback_index reasonable and hacked
      around this by not breaking out of the loop or writing pages outside the
      range in these cases.  Keep track of a real "done index" that enables us
      to terminate the loop in a much more flexible manner.
      
      Needed by the subsequent patch to preserve writepage errors, and then
      further patches to break out of the loop early for other reasons.  However
      there are no functional changes with this patch alone.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd19e012
    • Nick Piggin's avatar
      mm: write_cache_pages cyclic fix · 31a12666
      Nick Piggin authored
      In write_cache_pages, scanned == 1 is supposed to mean that cyclic
      writeback has circled through zero, thus we should not circle again.
      However it gets set to 1 after the first successful pagevec lookup.  This
      leads to cases where not enough data gets written.
      
      Counterexample: file with first 10 pages dirty, writeback_index == 5,
      nr_to_write == 10.  Then the 5 last pages will be found, and scanned will
      be set to 1, after writing those out, we will not cycle back to get the
      first 5.
      
      Rework this logic, now we'll always cycle unless we started off from index
      0.  When cycling, only write out as far as 1 page before the start page
      from the first cycle (so we don't write parts of the file twice).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31a12666
    • Miquel van Smoorenburg's avatar
      do_mpage_readpage(): don't submit lots of small bios on boundary · 38c8e618
      Miquel van Smoorenburg authored
      While tracing I/O patterns with blktrace (a great tool) a few weeks ago I
      identified a minor issue in fs/mpage.c
      
      As the comment above mpage_readpages() says, a fs's get_block function
      will set BH_Boundary when it maps a block just before a block for which
      extra I/O is required.
      
      Since get_block() can map a range of pages, for all these pages the
      BH_Boundary flag will be set.  But we only need to push what I/O we have
      accumulated at the last block of this range.
      
      This makes do_mpage_readpage() send out the largest possible bio instead
      of a bunch of page-sized ones in the BH_Boundary case.
      Signed-off-by: default avatarMiquel van Smoorenburg <mikevs@xs4all.net>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38c8e618
    • David Rientjes's avatar
      oom: print triggering task's cpuset and mems allowed · 75aa1994
      David Rientjes authored
      When cpusets are enabled, it's necessary to print the triggering task's
      set of allowable nodes so the subsequently printed meminfo can be
      interpreted correctly.
      
      We also print the task's cpuset name for informational purposes.
      
      [rientjes@google.com: task lock current before dereferencing cpuset]
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75aa1994
    • David Rientjes's avatar
      oom: fix zone_scan_mutex name · c7d4caeb
      David Rientjes authored
      zone_scan_mutex is actually a spinlock, so name it appropriately.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7d4caeb
    • Nick Piggin's avatar
      mm: invoke oom-killer from page fault · 1c0fe6e3
      Nick Piggin authored
      Rather than have the pagefault handler kill a process directly if it gets
      a VM_FAULT_OOM, have it call into the OOM killer.
      
      With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
      oom killing throttling, oom priority adjustment or selective disabling,
      panic on oom, etc), it's silly to unconditionally kill the faulting
      process at page fault time.  Create a hook for pagefault oom path to call
      into instead.
      
      Only converted x86 and uml so far.
      
      [akpm@linux-foundation.org: make __out_of_memory() static]
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c0fe6e3
    • Brice Goglin's avatar
      mm: move_pages: no need to set pp->page to ZERO_PAGE(0) by default · 5bd1455c
      Brice Goglin authored
      pp->page is never used when not set to the right page, so there is no need
      to set it to ZERO_PAGE(0) by default.
      Signed-off-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5bd1455c
    • Brice Goglin's avatar
      mm: rework do_pages_move() to work on page_sized chunks · 3140a227
      Brice Goglin authored
      Rework do_pages_move() to work by page-sized chunks of struct page_to_node
      that are passed to do_move_page_to_node_array().  We now only have to
      allocate a single page instead a possibly very large vmalloc area to store
      all page_to_node entries.
      
      As a result, new_page_node() will now have a very small lookup, hidding
      much of the overall sys_move_pages() overhead.
      Signed-off-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: default avatarNathalie Furmento <Nathalie.Furmento@labri.fr>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3140a227
    • Hugh Dickins's avatar
      mm: don't mark_page_accessed in shmem_fault · 390722ba
      Hugh Dickins authored
      Following "mm: don't mark_page_accessed in fault path", which now
      places a mark_page_accessed() in zap_pte_range(), we should remove
      the mark_page_accessed() from shmem_fault().
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      390722ba
    • Nick Piggin's avatar
      mm: don't mark_page_accessed in fault path · bf3f3bc5
      Nick Piggin authored
      Doing a mark_page_accessed at fault-time, then doing SetPageReferenced at
      unmap-time if the pte is young has a number of problems.
      
      mark_page_accessed is supposed to be roughly the equivalent of a young pte
      for unmapped references. Unfortunately it doesn't come with any context:
      after being called, reclaim doesn't know who or why the page was touched.
      
      So calling mark_page_accessed not only adds extra lru or PG_referenced
      manipulations for pages that are already going to have pte_young ptes anyway,
      but it also adds these references which are difficult to work with from the
      context of vma specific references (eg. MADV_SEQUENTIAL pte_young may not
      wish to contribute to the page being referenced).
      
      Then, simply doing SetPageReferenced when zapping a pte and finding it is
      young, is not a really good solution either. SetPageReferenced does not
      correctly promote the page to the active list for example. So after removing
      mark_page_accessed from the fault path, several mmap()+touch+munmap() would
      have a very different result from several read(2) calls for example, which
      is not really desirable.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@saeurebad.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf3f3bc5
    • Mel Gorman's avatar
      mm: report the MMU pagesize in /proc/pid/smaps · 3340289d
      Mel Gorman authored
      The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
      kernel to back a VMA.  This matches the size used by the MMU in the
      majority of cases.  However, one counter-example occurs on PPC64 kernels
      whereby a kernel using 64K as a base pagesize may still use 4K pages for
      the MMU on older processor.  To distinguish, this patch reports
      MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3340289d
    • Mel Gorman's avatar
      mm: report the pagesize backing a VMA in /proc/pid/smaps · 08fba699
      Mel Gorman authored
      It is useful to verify a hugepage-aware application is using the expected
      pagesizes for its memory regions. This patch creates an entry called
      KernelPageSize in /proc/pid/smaps that is the size of page used by the
      kernel to back a VMA. The entry is not called PageSize as it is possible
      the MMU uses a different size. This extension should not break any sensible
      parser that skips lines containing unrecognised information.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatar"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08fba699
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm · 238c6d54
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
        dm snapshot: extend exception store functions
        dm snapshot: split out exception store implementations
        dm snapshot: rename struct exception_store
        dm snapshot: separate out exception store interface
        dm mpath: move trigger_event to system workqueue
        dm: add name and uuid to sysfs
        dm table: rework reference counting
        dm: support barriers on simple devices
        dm request: extend target interface
        dm request: add caches
        dm ioctl: allow dm_copy_name_and_uuid to return only one field
        dm log: ensure log bitmap fits on log device
        dm log: move region_size validation
        dm log: avoid reinitialising io_req on every operation
        dm: consolidate target deregistration error handling
        dm raid1: fix error count
        dm log: fix dm_io_client leak on error paths
        dm snapshot: change yield to msleep
        dm table: drop reference at unbind
      238c6d54
    • Jonathan Brassow's avatar
      dm snapshot: extend exception store functions · a159c1ac
      Jonathan Brassow authored
      Supply dm_add_exception as a callback to the read_metadata function.
      Add a status function ready for a later patch and name the functions
      consistently.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      a159c1ac
    • Alasdair G Kergon's avatar
      dm snapshot: split out exception store implementations · 4db6bfe0
      Alasdair G Kergon authored
      Move the existing snapshot exception store implementations out into
      separate files.  Later patches will place these behind a new
      interface in preparation for alternative implementations.
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4db6bfe0
    • Jonathan Brassow's avatar
      dm snapshot: rename struct exception_store · 1ae25f9c
      Jonathan Brassow authored
      Rename struct exception_store to dm_exception_store.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      1ae25f9c
    • Jonathan Brassow's avatar
      dm snapshot: separate out exception store interface · aea53d92
      Jonathan Brassow authored
      Pull structures that bridge the gap between snapshot and
      exception store out of dm-snap.h and put them in a new
      .h file - dm-exception-store.h.  This file will define the
      API for new exception stores.
      
      Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
      should be using it.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      aea53d92
    • Alasdair G Kergon's avatar
      dm mpath: move trigger_event to system workqueue · fe9cf30e
      Alasdair G Kergon authored
      The same workqueue is used both for sending uevents and processing queued I/O.
      Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
      for the queued I/O to be processed.  Use scheduled_work() for the asynchronous
      uevents instead.
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      fe9cf30e
    • Milan Broz's avatar
      dm: add name and uuid to sysfs · 784aae73
      Milan Broz authored
      Implement simple read-only sysfs entry for device-mapper block device.
      
      This patch adds a simple sysfs directory named "dm" under block device
      properties and implements
      	- name attribute (string containing mapped device name)
      	- uuid attribute (string containing UUID, or empty string if not set)
      
      The kobject is embedded in mapped_device struct, so no additional
      memory allocation is needed for initializing sysfs entry.
      
      During the processing of sysfs attribute we need to lock mapped device
      which is done by a new function dm_get_from_kobj, which returns the md
      associated with kobject and increases the usage count.
      
      Each 'show attribute' function is responsible for its own locking.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      784aae73
    • Mikulas Patocka's avatar
      dm table: rework reference counting · d5816876
      Mikulas Patocka authored
      Rework table reference counting.
      
      The existing code uses a reference counter. When the last reference is
      dropped and the counter reaches zero, the table destructor is called.
      Table reference counters are acquired/released from upcalls from other
      kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
      If the reference counter reaches zero in one of the upcalls, the table
      destructor is called from almost random kernel code.
      
      This leads to various problems:
      * dm_any_congested being called under a spinlock, which calls the
        destructor, which calls some sleeping function.
      * the destructor attempting to take a lock that is already taken by the
        same process.
      * stale reference from some other kernel code keeps the table
        constructed, which keeps some devices open, even after successful
        return from "dmsetup remove". This can confuse lvm and prevent closing
        of underlying devices or reusing device minor numbers.
      
      The patch changes reference counting so that the table destructor can be
      called only at predetermined places.
      
      The table has always exactly one reference from either mapped_device->map
      or hash_cell->new_map. After this patch, this reference is not counted
      in table->holders.  A pair of dm_create_table/dm_destroy_table functions
      is used for table creation/destruction.
      
      Temporary references from the other code increase table->holders. A pair
      of dm_table_get/dm_table_put functions is used to manipulate it.
      
      When the table is about to be destroyed, we wait for table->holders to
      reach 0. Then, we call the table destructor.  We use active waiting with
      msleep(1), because the situation happens rarely (to one user in 5 years)
      and removing the device isn't performance-critical task: the user doesn't
      care if it takes one tick more or not.
      
      This way, the destructor is called only at specific points
      (dm_table_destroy function) and the above problems associated with lazy
      destruction can't happen.
      
      Finally remove the temporary protection added to dm_any_congested().
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      d5816876
    • Andi Kleen's avatar
      dm: support barriers on simple devices · ab4c1424
      Andi Kleen authored
      Implement barrier support for single device DM devices
      
      This patch implements barrier support in DM for the common case of dm linear
      just remapping a single underlying device. In this case we can safely
      pass the barrier through because there can be no reordering between
      devices.
      
       NB. Any DM device might cease to support barriers if it gets
           reconfigured so code must continue to allow for a possible
           -EOPNOTSUPP on every barrier bio submitted.  - agk
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      ab4c1424
    • Kiyoshi Ueda's avatar
      dm request: extend target interface · 7d76345d
      Kiyoshi Ueda authored
      This patch adds the following target interfaces for request-based dm.
      
        map_rq    : for mapping a request
      
        rq_end_io : for finishing a request
      
        busy      : for avoiding performance regression from bio-based dm.
                    Target can tell dm core not to map requests now, and
                    that may help requests in the block layer queue to be
                    bigger by I/O merging.
                    In bio-based dm, this behavior is done by device
                    drivers managing the block layer queue.
                    But in request-based dm, dm core has to do that
                    since dm core manages the block layer queue.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7d76345d
    • Kiyoshi Ueda's avatar
      dm request: add caches · 8fbf26ad
      Kiyoshi Ueda authored
      This patch prepares some kmem_caches for request-based dm.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      8fbf26ad
    • Milan Broz's avatar
      dm ioctl: allow dm_copy_name_and_uuid to return only one field · 23d39f63
      Milan Broz authored
      Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of
      the fields.
      
      (Required by a following patch that adds these fields to sysfs.)
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Reviewed-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      23d39f63
    • Milan Broz's avatar
      dm log: ensure log bitmap fits on log device · ac1f0ac2
      Milan Broz authored
      Check that the log bitmap will fit within the log device.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      ac1f0ac2
    • Milan Broz's avatar
      dm log: move region_size validation · 2045e88e
      Milan Broz authored
      Move log size validation from mirror target to log constructor.
      
      Removed PAGE_SIZE restriction we no longer think necessary.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      2045e88e
    • Takahiro Yasui's avatar
      dm log: avoid reinitialising io_req on every operation · 6f3af01c
      Takahiro Yasui authored
      rw_header function updates three members of io_req data every time
      when I/O is processed. bi_rw and notify.fn are never modified once
      they get initialized, and so they can be set in advance.
      
      header_to_disk() can also be pulled out of write_header() since only one
      caller needs it and write_header() can be replaced by rw_header()
      directly.
      Signed-off-by: default avatarTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      6f3af01c
    • Mikulas Patocka's avatar
      dm: consolidate target deregistration error handling · 10d3bd09
      Mikulas Patocka authored
      Change dm_unregister_target to return void and use BUG() for error
      reporting.
      
      dm_unregister_target can only fail because of programming bug in the
      target driver. It can't fail because of user's behavior or disk errors.
      
      This patch changes unregister_target to return void and use BUG if
      someone tries to unregister non-registered target or unregister target
      that is in use.
      
      This patch removes code duplication (testing of error codes in all dm
      targets) and reports bugs in just one place, in dm_unregister_target. In
      some target drivers, these return codes were ignored, which could lead
      to a situation where bugs could be missed.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      10d3bd09
    • Jonathan Brassow's avatar
      dm raid1: fix error count · d460c65a
      Jonathan Brassow authored
      Always increase the error count when I/O on a leg of a mirror fails.
      
      The error count is used to decide whether to select an alternative
      mirror leg.  If the target doesn't use the "handle_errors" feature, the
      error count is not updated and the bio can get requeued forever by the
      read callback.
      
      Fix it by increasing error_count before the handle_errors feature
      checking.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      d460c65a
    • Takahiro Yasui's avatar
      dm log: fix dm_io_client leak on error paths · c7a2bd19
      Takahiro Yasui authored
      In create_log_context function, dm_io_client_destroy function needs
      to be called, when memory allocation of disk_header, sync_bits and
      recovering_bits failed, but dm_io_client_destroy is not called.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarTakahiro Yasui <tyasui@redhat.com>
      Acked-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      c7a2bd19
    • Mikulas Patocka's avatar
      dm snapshot: change yield to msleep · 90fa1527
      Mikulas Patocka authored
      Change yield() to msleep(1). If the thread had realtime priority,
      yield() doesn't really yield, so the yielding process would loop
      indefinitely and cause machine lockup.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      90fa1527
    • Mikulas Patocka's avatar
      dm table: drop reference at unbind · a1b51e98
      Mikulas Patocka authored
      Move one dm_table_put() so that the last reference in the thread
      gets dropped in __unbind().
      
      This is required for a following patch,
      dm-table-rework-reference-counting.patch, which will change the logic in
      such a way that table destructor is called only at specific points in
      the code.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      a1b51e98
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.o-hand.com/linux-mfd · 8e128ce3
      Linus Torvalds authored
      * 'for-next' of git://git.o-hand.com/linux-mfd: (30 commits)
        mfd: Fix section mismatch in da903x
        mfd: move drivers/i2c/chips/menelaus.c to drivers/mfd
        mfd: move drivers/i2c/chips/tps65010.c to drivers/mfd
        mfd: dm355evm msp430 driver
        mfd: Add missing break from wm3850-core
        mfd: Add WM8351 support
        mfd: Support configurable numbers of DCDCs and ISINKs on WM8350
        mfd: Handle missing WM8350 platform data
        mfd: Add WM8352 support
        mfd: Use irq_to_desc in twl4030 code
        power_supply: Add Dialog DA9030 battery charger driver
        mfd: Dialog DA9030 battery charger MFD driver
        mfd: Register WM8400 codec device
        mfd: Pass driver_data onto child devices
        mfd: Fix twl4030-core.c build error
        mfd: twl4030 regulator bug fixes
        mfd: twl4030: create some regulator devices
        mfd: twl4030: cleanup symbols and OMAP dependency
        mfd: twl4030: simplified child creation code
        power_supply: Add battery health reporting for WM8350
        ...
      8e128ce3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus · 0bbb2753
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
        module: convert to stop_machine_create/destroy.
        stop_machine: introduce stop_machine_create/destroy.
        parisc: fix module loading failure of large kernel modules
        module: fix module loading failure of large kernel modules for parisc
        module: fix warning of unused function when !CONFIG_PROC_FS
        kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
        remove CONFIG_KMOD
      0bbb2753
    • Linus Torvalds's avatar
      Merge branch 'core-fixes-for-linus' of... · 0578c3b4
      Linus Torvalds authored
      Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
        intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
        swiotlb: add missing __init annotations
      0578c3b4
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm · 7d8a804c
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
        dlm: fs/dlm/ast.c: fix warning
        dlm: add new debugfs entry
        dlm: add time stamp of blocking callback
        dlm: change lock time stamping
        dlm: improve how bast mode handling
        dlm: remove extra blocking callback check
        dlm: replace schedule with cond_resched
        dlm: remove kmap/kunmap
        dlm: trivial annotation of be16 value
        dlm: fix up memory allocation flags
      7d8a804c
    • Linus Torvalds's avatar
      Merge branch 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux · c58bd34d
      Linus Torvalds authored
      * 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux:
        i2c-omap: fix type of irq handler function
        i2c-s3c2410: Change IRQ to be plain integer.
        i2c-s3c2410: Allow more than one i2c-s3c2410 adapter
        i2c-s3c2410: Remove default platform data.
        i2c-s3c2410: Use platform data for gpio configuration
        i2c-s3c2410: Fixup style problems from checkpatch.pl
        i2c-omap: Enable I2C wakeups for 34xx
        i2c-omap: reprogram OCP_SYSCONFIG register after reset
        i2c-omap: convert 'rev1' flag to generic 'rev' u8
        i2c-omap: fix I2C timeouts due to recursive omap_i2c_{un,}idle()
        i2c-omap: Clean-up i2c-omap
        i2c-omap: Don't compile in OMAP15xx I2C ISR for non-OMAP15xx builds
        i2c-omap: Mark init-only functions as __init
        i2c-omap: Add support for omap34xx
        i2c-omap: FIFO handling support and broken hw workaround for i2c-omap
        i2c-omap: Add high-speed support to omap-i2c
        i2c-omap: Close suspected race between omap_i2c_idle() and omap_i2c_isr()
        i2c-omap: Do not use interruptible wait call in omap_i2c_xfer_msg
      
      Fix up apparently-trivial conflict in drivers/i2c/busses/i2c-s3c2410.c
      c58bd34d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 8606ab6d
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (22 commits)
        HID: fix error condition propagation in hid-sony driver
        HID: fix reference count leak hidraw
        HID: add proper support for pensketch 12x9 tablet
        HID: don't allow DealExtreme usb-radio be handled by usb hid driver
        HID: fix default Kconfig setting for TopSpeed driver
        HID: driver for TopSeed Cyberlink quirky remote
        HID: make boot protocol drivers depend on EMBEDDED
        HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER
        HID: hiddev cleanup -- handle all error conditions properly
        HID: force feedback driver for GreenAsia 0x12 PID
        HID: switch specialized drivers from "default y" to !EMBEDDED
        HID: set proper dev.parent in hidraw
        HID: add dynids facility
        HID: use GFP_KERNEL in hid_alloc_buffers
        HID: usbhid, use usb_endpoint_xfer_int
        HID: move usbhid flags to usbhid.h
        HID: add n-trig digitizer support
        HID: add phys and name ioctls to hidraw
        HID: struct device - replace bus_id with dev_name(), dev_set_name()
        HID: automatically call usbhid_set_leds in usbhid driver
        ...
      8606ab6d