An error occurred fetching the project authors.
  1. 11 May, 2009 3 commits
    • Tejun Heo's avatar
      block: drop request->hard_* and *nr_sectors · 2e46e8b2
      Tejun Heo authored
      struct request has had a few different ways to represent some
      properties of a request.  ->hard_* represent block layer's view of the
      request progress (completion cursor) and the ones without the prefix
      are supposed to represent the issue cursor and allowed to be updated
      as necessary by the low level drivers.  The thing is that as block
      layer supports partial completion, the two cursors really aren't
      necessary and only cause confusion.  In addition, manual management of
      request detail from low level drivers is cumbersome and error-prone at
      the very least.
      
      Another interesting duplicate fields are rq->[hard_]nr_sectors and
      rq->{hard_cur|current}_nr_sectors against rq->data_len and
      rq->bio->bi_size.  This is more convoluted than the hard_ case.
      
      rq->[hard_]nr_sectors are initialized for requests with bio but
      blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
      initialized for all request but blk_rq_bytes() uses it only for pc
      requests.  This causes good amount of confusion throughout block layer
      and its drivers and determining the request length has been a bit of
      black magic which may or may not work depending on circumstances and
      what the specific LLD is actually doing.
      
      rq->{hard_cur|current}_nr_sectors represent the number of sectors in
      the contiguous data area at the front.  This is mainly used by drivers
      which transfers data by walking request segment-by-segment.  This
      value always equals rq->bio->bi_size >> 9.  However, data length for
      pc requests may not be multiple of 512 bytes and using this field
      becomes a bit confusing.
      
      In general, having multiple fields to represent the same property
      leads only to confusion and subtle bugs.  With recent block low level
      driver cleanups, no driver is accessing or manipulating these
      duplicate fields directly.  Drop all the duplicates.  Now rq->sector
      means the current sector, rq->data_len the current total length and
      rq->bio->bi_size the current segment length.  Everything else is
      defined in terms of these three and available only through accessors.
      
      * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
        now handles pc and fs requests equally other than rq->sector update.
        This means that now pc requests can use partial completion too (no
        in-kernel user yet tho).
      
      * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
        now uses byte count as the primary data length.
      
      * blk_rq_pos() is now guranteed to be always correct.  In-block users
        converted.
      
      * blk_rq_bytes() is now guaranteed to be always valid as is
        blk_rq_sectors().  In-block users converted.
      
      * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
        More convenient one is used.
      
      * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
        pointer to request.
      
      [ Impact: API cleanup, single way to represent one property of a request ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      2e46e8b2
    • Tejun Heo's avatar
      block: convert to pos and nr_sectors accessors · 83096ebf
      Tejun Heo authored
      With recent cleanups, there is no place where low level driver
      directly manipulates request fields.  This means that the 'hard'
      request fields always equal the !hard fields.  Convert all
      rq->sectors, nr_sectors and current_nr_sectors references to
      accessors.
      
      While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
      
      [ Impact: use pos and nr_sectors accessors ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Tested-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Tested-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarMike Miller <mike.miller@hp.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Dario Ballabio <ballabio_dario@emc.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      83096ebf
    • Tejun Heo's avatar
      block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones · 5b93629b
      Tejun Heo authored
      Implement accessors - blk_rq_pos(), blk_rq_sectors() and
      blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
      and rq->hard_cur_sectors respectively and convert direct references of
      the said fields to the accessors.
      
      This is in preparation of request data length handling cleanup.
      
      Geert	: suggested adding const to struct request * parameter to accessors
      Sergei	: spotted error in patch description
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Acked-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Ackec-by: default avatarSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5b93629b
  2. 28 Apr, 2009 14 commits
    • Nikanth Karthikesan's avatar
    • Jens Axboe's avatar
      block: make blk_do_io_stat() do the full "is this rq accountable" checks · c2553b58
      Jens Axboe authored
      We currently check for file system requests outside of blk_do_io_stat(rq),
      but we may as well just include it.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c2553b58
    • Tejun Heo's avatar
      block: kill rq->data · 731ec497
      Tejun Heo authored
      Now that all block request data transfer is done via bio, rq->data
      isn't used.  Kill it.
      
      While at it, make the roles of rq->special and buffer clear.
      
      [ Impact: drop now unncessary field from struct request ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      731ec497
    • Tejun Heo's avatar
      block: implement and use [__]blk_end_request_all() · 40cbbb78
      Tejun Heo authored
      There are many [__]blk_end_request() call sites which call it with
      full request length and expect full completion.  Many of them ensure
      that the request actually completes by doing BUG_ON() the return
      value, which is awkward and error-prone.
      
      This patch adds [__]blk_end_request_all() which takes @rq and @error
      and fully completes the request.  BUG_ON() is added to to ensure that
      this actually happens.
      
      Most conversions are simple but there are a few noteworthy ones.
      
      * cdrom/viocd: viocd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/block/dasd: dasd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/char/tape_block: tapeblock_end_request() replaced with direct
        calls to blk_end_request_all().
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      40cbbb78
    • Tejun Heo's avatar
      block: move rq->start_time initialization to blk_rq_init() · b243ddcb
      Tejun Heo authored
      rq->start_time was initialized in init_request_from_bio() so special
      requests didn't have start_time set.  This has been okay as start_time
      has been used only for fs requests; however, there is no indication of
      this actually is the case or not.  Set rq->start_time in blk_rq_init()
      and guarantee that all initialized rq's have its start_time set.  This
      improves consistency at virtually no cost and future changes will make
      use of the timestamp for !bio requests.
      
      [ Impact: rq->start_time is valid for all requests ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b243ddcb
    • Tejun Heo's avatar
      block: clean up request completion API · 2e60e022
      Tejun Heo authored
      Request completion has gone through several changes and became a bit
      messy over the time.  Clean it up.
      
      1. end_that_request_data() is a thin wrapper around
         end_that_request_data_first() which checks whether bio is NULL
         before doing anything and handles bidi completion.
         blk_update_request() is a thin wrapper around
         end_that_request_data() which clears nr_sectors on the last
         iteration but doesn't use the bidi completion.
      
         Clean it up by moving the initial bio NULL check and nr_sectors
         clearing on the last iteration into end_that_request_data() and
         renaming it to blk_update_request(), which makes blk_end_io() the
         only user of end_that_request_data().  Collapse
         end_that_request_data() into blk_end_io().
      
      2. There are four visible completion variants - blk_end_request(),
         __blk_end_request(), blk_end_bidi_request() and end_request().
         blk_end_request() and blk_end_bidi_request() uses blk_end_request()
         as the backend but __blk_end_request() and end_request() use
         separate implementation in __blk_end_request() due to different
         locking rules.
      
         blk_end_bidi_request() is identical to blk_end_io().  Collapse
         blk_end_io() into blk_end_bidi_request(), separate out request
         update into internal helper blk_update_bidi_request() and add
         __blk_end_bidi_request().  Redefine [__]blk_end_request() as thin
         inline wrappers around [__]blk_end_bidi_request().
      
      3. As the whole request issue/completion usages are about to be
         modified and audited, it's a good chance to convert completion
         functions return bool which better indicates the intended meaning
         of return values.
      
      4. The function name end_that_request_last() is from the days when it
         was a public interface and slighly confusing.  Give it a proper
         internal name - blk_finish_request().
      
      5. Add description explaning that blk_end_bidi_request() can be safely
         used for uni requests as suggested by Boaz Harrosh.
      
      The only visible behavior change is from #1.  nr_sectors counts are
      cleared after the final iteration no matter which function is used to
      complete the request.  I couldn't find any place where the code
      assumes those nr_sectors counters contain the values for the last
      segment and this change is good as it makes the API much more
      consistent as the end result is now same whether a request is
      completed using [__]blk_end_request() alone or in combination with
      blk_update_request().
      
      API further cleaned up per Christoph's suggestion.
      
      [ Impact: cleanup, rq->*nr_sectors always updated after req completion ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      2e60e022
    • Tejun Heo's avatar
      block: kill blk_end_request_callback() · 0b302d5a
      Tejun Heo authored
      With recent IDE updates, blk_end_request_callback() doesn't have any
      user now.  Kill it.
      
      [ Impact: removal of unused convoluted interface ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0b302d5a
    • Tejun Heo's avatar
      block: reorganize request fetching functions · 158dbda0
      Tejun Heo authored
      Impact: code reorganization
      
      elv_next_request() and elv_dequeue_request() are public block layer
      interface than actual elevator implementation.  They mostly deal with
      how requests interact with block layer and low level drivers at the
      beginning of rqeuest processing whereas __elv_next_request() is the
      actual eleveator request fetching interface.
      
      Move the two functions to blk-core.c.  This prepares for further
      interface cleanup.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      158dbda0
    • Tejun Heo's avatar
      block: reorder request completion functions · 5efccd17
      Tejun Heo authored
      Reorder request completion functions such that
      
      * All request completion functions are located together.
      
      * Functions which are used by only one caller is put right above the
        caller.
      
      * end_request() is put after other completion functions but before
        blk_update_request().
      
      This change is for completion function cleanup which will follow.
      
      [ Impact: cleanup, code reorganization ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5efccd17
    • Tejun Heo's avatar
      block: cleanup REQ_SOFTBARRIER usages · 10732f56
      Tejun Heo authored
      blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
      Don't set it.  Combined with recent ide updates, REQ_SOFTBARRIER is
      now only used in elevator proper and for discard requests.
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      10732f56
    • Tejun Heo's avatar
      block: don't set REQ_NOMERGE unnecessarily · e4025f6c
      Tejun Heo authored
      RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
      mergeable.  There is no reason to specify it superflously.  It only
      adds to confusion.  Don't set REQ_NOMERGE for barriers and requests
      with specific queueing directive.  REQ_NOMERGE is now exclusively used
      by the merging code.
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e4025f6c
    • Tejun Heo's avatar
      block: kill blk_start_queueing() · a7f55792
      Tejun Heo authored
      blk_start_queueing() is identical to __blk_run_queue() except that it
      doesn't check for recursion.  None of the current users depends on
      blk_start_queueing() running request_fn directly.  Replace usages of
      blk_start_queueing() with [__]blk_run_queue() and kill it.
      
      [ Impact: removal of mostly duplicate interface function ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a7f55792
    • Tejun Heo's avatar
      block: merge blk_invoke_request_fn() into __blk_run_queue() · a538cd03
      Tejun Heo authored
      __blk_run_queue wraps blk_invoke_request_fn() such that it
      additionally removes plug and bails out early if the queue is empty.
      Both extra operations have their own pending mechanisms and don't
      cause any harm correctness-wise when they are done superflously.
      
      The only user of blk_invoke_request_fn() being blk_start_queue(),
      there isn't much reason to keep both functions around.  Merge
      blk_invoke_request_fn() into __blk_run_queue() and make
      blk_start_queue() use __blk_run_queue() instead.
      
      [ Impact: merge two subtly different internal functions ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a538cd03
    • Tejun Heo's avatar
      block: clear req->errors on bio completion only for fs requests · 924cec77
      Tejun Heo authored
      Impact: subtle behavior change
      
      For fs requests, rq is only carrier of bios and rq error status as a
      whole doesn't mean much.  This is the reason why rq->errors is being
      cleared on each partial completion of a request as on each partial
      completion the error status is transferred to the respective bios.
      
      For pc requests, rq->errors is used to carry error status to the
      issuer and thus __end_that_request_first() doesn't clear it on such
      cases.
      
      The condition was fine till now as only fs and pc requests have used
      bio and thus the bio completion path.  However, future changes will
      unify data accesses to bio and all non fs users care about rq error
      status.  Clear rq->errors on bio completion only for fs requests.
      
      In general, the implicit clearing is a bit too subtle especially as
      the meaning of rq->errors is completely dependent on low level
      drivers.  Unifying / cleaning up rq->errors usage and letting llds
      manage it would be better.  TODO comment added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      924cec77
  3. 24 Apr, 2009 1 commit
  4. 07 Apr, 2009 2 commits
  5. 06 Apr, 2009 3 commits
  6. 03 Apr, 2009 1 commit
    • Li Zefan's avatar
      blktrace: fix pdu_len when tracing packet command requests · e2494e1b
      Li Zefan authored
      Impact: output all of packet commands - not just the first 4 / 8 bytes
      
      Since commit d7e3c324 ("block: add
      large command support"), struct request->cmd has been changed from
      unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.
      
      v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      
      - make sure rq->cmd_len is always intialized, and then we can use
        rq->cmd_len instead of BLK_MAX_CDB.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49D4507E.2060602@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e2494e1b
  7. 26 Mar, 2009 1 commit
  8. 24 Mar, 2009 2 commits
  9. 02 Feb, 2009 1 commit
  10. 30 Jan, 2009 3 commits
  11. 29 Dec, 2008 5 commits
    • Jens Axboe's avatar
      block: don't use plugging on SSD devices · a31a9738
      Jens Axboe authored
      We just want to hand the first bits of IO to the device as fast
      as possible. Gains a few percent on the IOPS rate.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      a31a9738
    • Tejun Heo's avatar
      block: remove duplicate or unused barrier/discard error paths · a7384677
      Tejun Heo authored
      * Because barrier mode can be changed dynamically, whether barrier is
        supported or not can be determined only when actually issuing the
        barrier and there is no point in checking it earlier.  Drop barrier
        support check in generic_make_request() and __make_request(), and
        update comment around the support check in blk_do_ordered().
      
      * There is no reason to check discard support in both
        generic_make_request() and __make_request().  Drop the check in
        __make_request().  While at it, move error action block to the end
        of the function and add unlikely() to q existence test.
      
      * Barrier request, be it empty or not, is never passed to low level
        driver and thus it's meaningless to try to copy back req->sector to
        bio->bi_sector on error.  In addition, the notion of failed sector
        doesn't make any sense for empty barrier to begin with.  Drop the
        code block from __end_that_request_first().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      a7384677
    • Cheng Renquan's avatar
      block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9
      Cheng Renquan authored
      After many improvements on kblockd_flush_work, it is now identical to
      cancel_work_sync, so a direct call to cancel_work_sync is suggested.
      
      The only difference is that cancel_work_sync is a GPL symbol,
      so no non-GPL modules anymore.
      Signed-off-by: default avatarCheng Renquan <crquan@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      64d01dc9
    • Keith Mannthey's avatar
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey authored
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: default avatarKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      08bafc03
    • Jens Axboe's avatar
      block: leave the request timeout timer running even on an empty list · 70ed28b9
      Jens Axboe authored
      For sync IO, we'll often do them serialized. This means we'll be touching
      the queue timer for every IO, as opposed to only occasionally like we
      do for queued IO. Instead of deleting the timer when the last request
      is removed, just let continue running. If a new request comes up soon
      we then don't have to readd the timer again. If no new requests arrive,
      the timer will expire without side effect later.
      
      This improves high iops sync IO by ~1%.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      70ed28b9
  12. 03 Dec, 2008 2 commits
    • Milan Broz's avatar
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz authored
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Reviewed-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      0e435ac2
    • Tejun Heo's avatar
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo authored
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      53a08807
  13. 26 Nov, 2008 2 commits