1. 10 Dec, 2006 40 commits
    • NeilBrown's avatar
      [PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx · b875e531
      NeilBrown authored
      stripe_to_pdidx finds the index of the parity disk for a given stripe.  It
      assumes raid5 in that it uses "disks-1" to determine the number of data disks.
      
      This is incorrect for raid6 but fortunately the two usages cancel each other
      out.  The only way that 'data_disks' affects the calculation of pd_idx in
      raid5_compute_sector is when it is divided into the sector number.  But as
      that sector number is calculated by multiplying in the wrong value of
      'data_disks' the division produces the right value.
      
      So it is innocuous but needs to be fixed.
      
      Also change the calculation of raid_disks in compute_blocknr to make it
      more obviously correct (it seems at first to always use disks-1 too).
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b875e531
    • Raz Ben-Jehuda(caro)'s avatar
      [PATCH] md: enable bypassing cache for reads · 52488615
      Raz Ben-Jehuda(caro) authored
      Call the chunk_aligned_read where appropriate.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      52488615
    • Raz Ben-Jehuda(caro)'s avatar
      [PATCH] md: allow reads that have bypassed the cache to be retried on failure · 46031f9a
      Raz Ben-Jehuda(caro) authored
      If a bypass-the-cache read fails, we simply try again through the cache.  If
      it fails again it will trigger normal recovery precedures.
      
      update 1:
      
      From: NeilBrown <neilb@suse.de>
      
      1/
        chunk_aligned_read and retry_aligned_read assume that
            data_disks == raid_disks - 1
        which is not true for raid6.
        So when an aligned read request bypasses the cache, we can get the wrong data.
      
      2/ The cloned bio is being used-after-free in raid5_align_endio
         (to test BIO_UPTODATE).
      
      3/ We forgot to add rdev->data_offset when submitting
         a bio for aligned-read
      
      4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
         so we need to invalidate the segment counts.
      
      5/ We don't de-reference the rdev when the read completes.
         This means we need to record the rdev to so it is still
         available in the end_io routine.  Fortunately
         bi_next in the original bio is unused at this point so
         we can stuff it in there.
      
      6/ We leak a cloned bio if the target rdev is not usable.
      
      From: NeilBrown <neilb@suse.de>
      
      update 2:
      
      1/ When aligned requests fail (read error) they need to be retried
         via the normal method (stripe cache).  As we cannot be sure that
         we can process a single read in one go (we may not be able to
         allocate all the stripes needed) we store a bio-being-retried
         and a list of bioes-that-still-need-to-be-retried.
         When find a bio that needs to be retried, we should add it to
         the list, not to single-bio...
      
      2/ We were never incrementing 'scnt' when resubmitting failed
         aligned requests.
      
      [akpm@osdl.org: build fix]
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      46031f9a
    • Raz Ben-Jehuda(caro)'s avatar
    • Raz Ben-Jehuda(caro)'s avatar
      [PATCH] md: define raid5_mergeable_bvec · 23032a0e
      Raz Ben-Jehuda(caro) authored
      This will encourage read request to be on only one device, so we will often be
      able to bypass the cache for read requests.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      23032a0e
    • NeilBrown's avatar
      [PATCH] md: tidy up device-change notification when an md array is stopped · 0d4ca600
      NeilBrown authored
      An md array can be stopped leaving all the setting still in place, or it can
      torn down and destroyed.  set_capacity and other change notifications only
      happen in the latter case, but should happen in both.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0d4ca600
    • Paul Mackerras's avatar
      [PATCH] Fbdev driver for IBM GXT4500P videocards · a3d89983
      Paul Mackerras authored
      This is an fbdev driver for the IBM GXT4500P display card found in some IBM
      System P (pSeries) machines.  These cards have hardware 2D and 3D
      capabilities, but the driver does not use them; it just exports a dumb
      framebuffer.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarJames Simmons <jsimmons@infradead.org>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a3d89983
    • Alan Cox's avatar
      [PATCH] ide-cd: Handle strange interrupt on the Intel ESB2 · ee2f344b
      Alan Cox authored
      The ESB2 appears to emit spurious DMA interrupts when configured for native
      mode and handling ATAPI devices.  Stratus were able to pin this bug down and
      produce a patch.  This is a rework which applies the fixup only to the ESB2
      (for now).  We can apply it to other chips later if the same problem is found.
      
      This code has been tested and confirmed to fix the problem on the tested
      systems.
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      (Most of the hard work done by Stratus however)
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ee2f344b
    • Miguel Ojeda Sandonis's avatar
      [PATCH] kernel/sched.c: whitespace cleanups · 33859f7f
      Miguel Ojeda Sandonis authored
      [akpm@osdl.org: additional cleanups]
      Signed-off-by: default avatarMiguel Ojeda Sandonis <maxextreme@gmail.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      33859f7f
    • Chen, Kenneth W's avatar
      [PATCH] sched: optimize activate_task for RT task · 62ab616d
      Chen, Kenneth W authored
      RT task does not participate in interactiveness priority and thus shouldn't
      be bothered with timestamp and p->sleep_type manipulation when task is
      being put on run queue.  Bypass all of the them with a single if (rt_task)
      test.
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      62ab616d
    • Chen, Kenneth W's avatar
      [PATCH] sched: remove lb_stopbalance counter · 06066714
      Chen, Kenneth W authored
      Remove scheduler stats lb_stopbalance counter.  This counter can be
      calculated by: lb_balanced - lb_nobusyg - lb_nobusyq.  There is no need to
      create gazillion counters while we can derive the value.
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      06066714
    • Siddha, Suresh B's avatar
      [PATCH] sched: decrease number of load balances · 783609c6
      Siddha, Suresh B authored
      Currently at a particular domain, each cpu in the sched group will do a
      load balance at the frequency of balance_interval.  More the cores and
      threads, more the cpus will be in each sched group at SMP and NUMA domain.
      And we endup spending quite a bit of time doing load balancing in those
      domains.
      
      Fix this by making only one cpu(first idle cpu or first cpu in the group if
      all the cpus are busy) in the sched group do the load balance at that
      particular sched domain and this load will slowly percolate down to the
      other cpus with in that group(when they do load balancing at lower
      domains).
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      783609c6
    • Mike Galbraith's avatar
      [PATCH] sched: improve migration accuracy · b18ec803
      Mike Galbraith authored
      Co-opt rq->timestamp_last_tick to maintain a cache_hot_time evaluation
      reference timestamp at both tick and sched times to prevent said reference,
      formerly rq->timestamp_last_tick, from being behind task->last_ran at
      evaluation time, and to move said reference closer to current time on the
      remote processor, intent being to improve cache hot evaluation and
      timestamp adjustment accuracy for task migration.
      
      Fix minor sched_time double accounting error which occurs when a task
      passing through schedule() does not schedule off, and takes the next timer
      tick.
      
      [kenneth.w.chen@intel.com: cleanup]
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Cc: Don Mullis <dwm@meer.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b18ec803
    • Christoph Lameter's avatar
      [PATCH] sched: add option to serialize load balancing · 08c183f3
      Christoph Lameter authored
      Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
      to the sched domain flags.  If that flag is set then we make sure that no
      other such domain is being balanced.
      
      [akpm@osdl.org: build fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      08c183f3
    • Christoph Lameter's avatar
      [PATCH] sched: call tasklet less frequently · 1bd77f2d
      Christoph Lameter authored
      Trigger softirq less frequently
      
      We trigger the softirq before this patch using offset of sd->interval.
      However, if the queue is busy then it is sufficient to schedule the softirq
      with sd->interval * busy_factor.
      
      So we modify the calculation of the next time to balance by taking
      the interval added to last_balance again. This is only the
      right value if the idle/busy situation continues as is.
      
      There are two potential trouble spots:
      - If the queue was idle and now gets busy then we call rebalance
        early. However, that is not a problem because we will then use
        the longer interval for the next period.
      
      - If the queue was busy and becomes idle then we potentially
        wait too long before rebalancing. However, when the task
        goes idle then idle_balance is called. We add another calculation
        of the next balance time based on sd->interval in idle_balance
        so that we will rebalance soon.
      
      V2->V3:
      - Calculate rebalance time based on current jiffies and not
        based on the jiffies at the last time we load balanced.
        We no longer rely on staggering and therefore we can
        affort to do this now.
      
      V3->V4:
      - Use functions to do jiffy comparisons.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1bd77f2d
    • Christoph Lameter's avatar
      [PATCH] sched: use softirq for load balancing · c9819f45
      Christoph Lameter authored
      Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
      softirq.
      
      We calculate the earliest time for each layer of sched domains to be rescanned
      (this is the rescan time for idle) and use the earliest of those to schedule
      the softirq via a new field "next_balance" added to struct rq.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c9819f45
    • Christoph Lameter's avatar
      [PATCH] sched: move idle status calculation into rebalance_tick() · e418e1c2
      Christoph Lameter authored
      Perform the idle state determination in rebalance_tick.
      
      If we separate balancing from sched_tick then we also need to determine the
      idle state in rebalance_tick.
      
      V2->V3
      	Remove useless idlle != 0 check. Checking nr_running seems
      	to be sufficient. Thanks Suresh.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e418e1c2
    • Christoph Lameter's avatar
      [PATCH] sched: extract load calculation from rebalance_tick · 7835b98b
      Christoph Lameter authored
      A load calculation is always done in rebalance_tick() in addition to the real
      load balancing activities that only take place when certain jiffie counts have
      been reached.  Move that processing into a separate function and call it
      directly from scheduler_tick().
      
      Also extract the time slice handling from scheduler_tick and put it into a
      separate function.  Then we can clean up scheduler_tick significantly.  It
      will no longer have any gotos.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7835b98b
    • Christoph Lameter's avatar
      [PATCH] sched: disable interrupts for locking in load_balance() · fe2eea3f
      Christoph Lameter authored
      Interrupts must be disabled for request queue locks if we want to run
      load_balance() with interrupts enabled.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fe2eea3f
    • Christoph Lameter's avatar
      [PATCH] sched: remove staggering of load balancing · 4211a9a2
      Christoph Lameter authored
      Timer interrupts already are staggered.  We do not need an additional layer of
      time staggering for short load balancing actions that take a reasonably small
      portion of the time slice.
      
      For load balancing on large sched_domains we will add a serialization later
      that avoids concurrent load balance operations and thus has the same effect as
      load staggering.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4211a9a2
    • Christoph Lameter's avatar
      [PATCH] sched: avoid taking rq lock in wake_priority_sleeper · 571f6d2f
      Christoph Lameter authored
      Avoid taking the request queue lock in wake_priority_sleeper if there are no
      running processes.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      571f6d2f
    • Siddha, Suresh B's avatar
      [PATCH] sched domain: increase the SMT busy rebalance interval · ac7d5504
      Siddha, Suresh B authored
      With SMT, if the logical processor is busy, load balance happens for every
      8msec(min)-16msec(max).  There is no need to do this often, as this is just
      for fairness(to maintain uniform runqueue lengths) and default time slice
      anyhow is 100msec.
      
      Appended patch increases this interval to 64msec(min)-128msec(max) when the
      logical processor is busy.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ac7d5504
    • Kirill Korotaev's avatar
      [PATCH] move_task_off_dead_cpu() should be called with disabled ints · 054b9108
      Kirill Korotaev authored
      move_task_off_dead_cpu() requires interrupts to be disabled, while
      migrate_dead() calls it with enabled interrupts.  Added appropriate
      comments to functions and added BUG_ON(!irqs_disabled()) into
      double_rq_lock() and double_lock_balance() which are the origin sources of
      such bugs.
      Signed-off-by: default avatarKirill Korotaev <dev@openvz.org>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      054b9108
    • Siddha, Suresh B's avatar
      [PATCH] ched domain: move sched group allocations to percpu area · 6711cab4
      Siddha, Suresh B authored
      Move the sched group allocations to percpu area.  This will minimize cross
      node memory references and also cleans up the sched groups allocation for
      allnodes sched domain.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6711cab4
    • Robert P. J. Day's avatar
    • Ralf Baechle's avatar
      [PATCH] Don't build some broken ISDN drivers on big endian MIPS · 596afa41
      Ralf Baechle authored
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Cc: Karsten Keil <kkeil@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      596afa41
    • Andrew Morton's avatar
      [PATCH] io-accounting: add to getdelays · cf709844
      Andrew Morton authored
      Wire up the IO accounting into getdelays.c.
      
      Usage:
      
      To display I/O stats for each exitting task:
      
      vmm:/home/akpm> ./getdelays -m0,1,2,3 -i -l
      cpumask 0 maskset 1
      printing IO accounting
      listen forever
      rm: read=8192, write=0, cancelled_write=0
      cvs: read=733184, write=4255744, cancelled_write=4096
      make: read=217088, write=0, cancelled_write=0
      cc1: read=4263936, write=12288, cancelled_write=0
      as: read=811008, write=8192, cancelled_write=0
      gcc: read=323584, write=0, cancelled_write=12288
      cc1: read=0, write=8192, cancelled_write=0
      as: read=4096, write=4096, cancelled_write=0
      gcc: read=16384, write=0, cancelled_write=4096
      as: read=4096, write=4096, cancelled_write=0
      gcc: read=16384, write=0, cancelled_write=8192
      ld: read=1011712, write=16384, cancelled_write=0
      collect2: read=626688, write=0, cancelled_write=0
      gcc: read=204800, write=0, cancelled_write=0
      cc1: read=0, write=8192, cancelled_write=0
      as: read=4096, write=4096, cancelled_write=0
      gcc: read=16384, write=0, cancelled_write=8192
      ld: read=8192, write=16384, cancelled_write=0
      collect2: read=49152, write=0, cancelled_write=0
      gcc: read=0, write=0, cancelled_write=0
      cc1: read=0, write=4096, cancelled_write=0
      ld: read=4096, write=12288, cancelled_write=0
      collect2: read=49152, write=0, cancelled_write=0
      gcc: read=0, write=0, cancelled_write=0
      
      To display I/O stats for a particular presently-running task:
      
      vmm:/home/akpm> ./getdelays -i -p $(pidof crond)
      printing IO accounting
      crond: read=61440, write=0, cancelled_write=0
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cf709844
    • Andrew Morton's avatar
      [PATCH] getdelays: various fixes · d2f7bf13
      Andrew Morton authored
      - Various cleanups
      
      - Report errors to stderr, not stdout
      
      - A printf was missing a \n and was hiding from me.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d2f7bf13
    • Andrew Morton's avatar
      [PATCH] io-accounting: via taskstats · 4a7864ca
      Andrew Morton authored
      Deliver IO accounting via taskstats.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4a7864ca
    • Andrew Morton's avatar
      [PATCH] cleanup taskstats.h · f2f1f8a3
      Andrew Morton authored
      Fix weird whitespace mangling in taskstats.h
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f2f1f8a3
    • Andrew Morton's avatar
      [PATCH] io-accounting: report in procfs · aba76fdb
      Andrew Morton authored
      Add a simple /proc/pid/io to show the IO accounting fields.
      
      Maybe this shouldn't be merged in mainline - the preferred reporting channel
      is taskstats.  But given the poor state of our userspace support for
      taskstats, this is useful for developer-testing, at least.  And it improves
      the changes that the procps developers will wire it up into top(1).  Opinions
      are sought.
      
      The patch also wires up the existing IO-accounting fields.
      
      It's a bit racy on 32-bit machines: if process A reads process B's
      /proc/pid/io while process B is updating one of those 64-bit counters, process
      A could see an intermediate result.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aba76fdb
    • Andrew Morton's avatar
      [PATCH] io-accounting: direct-io · 98c4d57d
      Andrew Morton authored
      Account for direct-io.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      98c4d57d
    • Andrew Morton's avatar
      [PATCH] io-accounting-read-accounting cifs fix · 6f88cc2e
      Andrew Morton authored
      CIFS implements ->readpages and doesn't use read_cache_pages().  So wire the
      read IO accounting up within CIFS.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6f88cc2e
    • Andrew Morton's avatar
      [PATCH] io-accounting-read-accounting nfs fix · 8bde37f0
      Andrew Morton authored
      nfs's ->readpages uses read_cache_pages().  Wire it up there.
      
      [wfg@mail.ustc.edu.cn: account only successful nfs/fuse reads]
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8bde37f0
    • Andrew Morton's avatar
      [PATCH] io-accounting: read accounting · faccbd4b
      Andrew Morton authored
      Wire up read accounting for block devices, within submit_bio().
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      faccbd4b
    • Andrew Morton's avatar
      [PATCH] io-accounting: write-cancel accounting · e08748ce
      Andrew Morton authored
      Account for the number of byte writes which this process caused to not happen
      after all.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e08748ce
    • Andrew Morton's avatar
      [PATCH] io-accounting: write accounting · 55e829af
      Andrew Morton authored
      Accounting writes is fairly simple: whenever a process flips a page from clean
      to dirty, we accuse it of having caused a write to underlying storage of
      PAGE_CACHE_SIZE bytes.
      
      This may overestimate the amount of writing: the page-dirtying may cause only
      one buffer_head's worth of writeout.  Fixing that is possible, but probably a
      bit messy and isn't obviously important.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      55e829af
    • Andrew Morton's avatar
      [PATCH] clean up __set_page_dirty_nobuffers() · 8c08540f
      Andrew Morton authored
      Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
      and a few other places.  No functional changes.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8c08540f
    • Andrew Morton's avatar
      [PATCH] io-accounting: core statistics · 7c3ab738
      Andrew Morton authored
      The present per-task IO accounting isn't very useful.  It simply counts the
      number of bytes passed into read() and write().  So if a process reads 1MB
      from an already-cached file, it is accused of having performed 1MB of I/O,
      which is wrong.
      
      (David Wright had some comments on the applicability of the present logical IO accounting:
      
        For billing purposes it is useless but for workload analysis it is very
        useful
      
        read_bytes/read_calls  average read request size
        write_bytes/write_calls average write request size
      
        read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
        write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                      be missed
      
        I often look for logical larger than physical to see filesystem cache
        problems.  And the bytes/cpusec can help find applications that are
        dominating the cache and causing slow interactive response from page cache
        contention.
      
        I want to find the IO intensive applications and make sure they are doing
        efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
        IO commands).
      
      This patchset adds new accounting which tries to be more accurate.  We account
      for three things:
      
      reads:
      
        attempt to count the number of bytes which this process really did cause
        to be fetched from the storage layer.  Done at the submit_bio() level, so it
        is accurate for block-backed filesystems.  I also attempt to wire up NFS and
        CIFS.
      
      writes:
      
        attempt to count the number of bytes which this process caused to be sent
        to the storage layer.  This is done at page-dirtying time.
      
        The big inaccuracy here is truncate.  If a process writes 1MB to a file
        and then deletes the file, it will in fact perform no writeout.  But it will
        have been accounted as having caused 1MB of write.
      
        So...
      
      cancelled_writes:
      
        account the number of bytes which this process caused to not happen, by
        truncating pagecache.
      
        We _could_ just subtract this from the process's `write' accounting.  But
        that means that some processes would be reported to have done negative
        amounts of write IO, which is silly.
      
        So we just report the raw number and punt this decision up to userspace.
      
      Now, we _could_ account for writes at the physical I/O level.  But
      
      - This would require that we track memory-dirtying tasks at the per-page
        level (would require a new pointer in struct page).
      
      - It would mean that IO statistics for a process are usually only available
        long after that process has exitted.  Which means that we probably cannot
        communicate this info via taskstats.
      
      This patch:
      
      Wire up the kernel-private data structures and the accessor functions to
      manipulate them.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7c3ab738
    • Sergei Shtylyov's avatar
      [PATCH] pdc202xx_new: fix PLL/timing issues · 47694bb8
      Sergei Shtylyov authored
      Fix the CRC errors in the higher UltraDMA modes with the Promise PDC20268
      and newer chips that always occur on non-x86 machines and when there are
      more than 2 adapters on x86 machines.  Fix the overclocking issue for
      PDC20269 and newer chips that occurs when an UltraDMA/133 capable drive is
      connected.  Here's the summary of changes:
      
      - add code to detect the PLL input clock detection and setup it output clock,
        remove the PowerMac hacks;
      
      - replace the macros accessing the indexed regiters with functions, switch to
        using them where appropriate, gather the PIO/MWDMA/UDMA timings into tables;
      
      - rewrite the speedproc() handler to set the drive's transfer mode first, and
        then override the timing registers set by hardware on UltraDMA/133 chips;
      
      - use better criterion for determining higher UltraDMA modes, and add comment
        concerning the doubtful value of the code enabling IORDY/prefetch;
      
      - replace the stupid 'pdcnew_new_' prefixes with mere 'pdcnew_';
      
      - get rid of unneded spaces, parens and type casts, clean up some printk's,
        add some new lines here and there...
      
      This work is loosely based on these former patches by Albert Lee:
      
      [1] http://marc.theaimsgroup.com/?l=linux-ide&m=110992442032300
      [2] http://marc.theaimsgroup.com/?l=linux-ide&m=110992457729382
      [3] http://marc.theaimsgroup.com/?l=linux-ide&m=110992474205555
      [4] http://marc.theaimsgroup.com/?l=linux-ide&m=111019224802939
      
      Some PLL clock detection code was backported from his pata_pdc2027x driver...
      
      This code has been successfully tested by me on PDC2026[89] chips.
      
      I tried to keep this rework as several patches but it made no sense: [2] was
      largely a modification of the non-working timing override code, [3] by itself
      extended the overclocking issue to the case of non-UltraDMA/133 drives, and
      finally, the cleanup patch based on [1] ended up rejected...
      Signed-off-by: default avatarSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Albert Lee <albertcc@tw.ibm.com>
      Acked-by: default avatarAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      47694bb8