1. 27 Feb, 2009 8 commits
  2. 26 Feb, 2009 23 commits
    • Ingo Molnar's avatar
      Merge branch 'sched/clock' into tracing/ftrace · 5d0859ce
      Ingo Molnar authored
      Conflicts:
      	kernel/sched_clock.c
      5d0859ce
    • Ingo Molnar's avatar
      x86: set X86_FEATURE_TSC_RELIABLE · 83ce4009
      Ingo Molnar authored
      If the TSC is constant and non-stop, also set it reliable.
      
      (We will turn this off in DMI quirks for multi-chassis systems)
      
      The performance number on a 16-way Nehalem system running
      32 tasks that context-switch between each other is significant:
      
         sched_clock_stable=0		sched_clock_stable=1
         ....................         ....................
         22.456925 million/sec        24.306972 million/sec   [+8.2%]
      
      lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
      0.59 microseconds - a 6.7% increase in context-switching
      performance.
      
      Perfstat of 1 million pipe context switches between two tasks:
      
       Performance counter stats for './pipe-test-1m':
      
             [before]           [after]
         ............      ............
         37621.421089      36436.848378    task clock ticks     (msecs)
      
                    0                 0    CPU migrations       (events)
              2000274           2000189    context switches     (events)
                  194               193    pagefaults           (events)
           8433799643        8171016416    CPU cycles           (events) -3.21%
           8370133368        8180999694    instructions         (events) -2.31%
              4158565           3895941    cache references     (events) -6.74%
                44312             46264    cache misses         (events)
      
          2349.287976       2279.362465    wall-time            (msecs)  -3.06%
      
      The speedup comes straight from the reduction in the instruction
      count. sched_clock_cpu() got simpler and the whole workload thus
      executes faster.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      83ce4009
    • Ingo Molnar's avatar
      sched: allow architectures to specify sched_clock_stable · b342501c
      Ingo Molnar authored
      Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
      that their sched_clock() implementation is reliable.
      
      This will be used by x86 to switch on a faster sched_clock_cpu()
      implementation on certain CPU types.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b342501c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · 64e71303
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
        Btrfs: try committing transaction before returning ENOSPC
        Btrfs: add better -ENOSPC handling
      64e71303
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · babb29b0
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        xen/blkfront: use blk_rq_map_sg to generate ring entries
        block: reduce stack footprint of blk_recount_segments()
        cciss: shorten 30s timeout on controller reset
        block: add documentation for register_blkdev()
        block: fix bogus gcc warning for uninitialized var usage
      babb29b0
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 6fc79d40
      Linus Torvalds authored
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Fix 64bit __copy_tofrom_user() regression
        powerpc: Fix 64bit memcpy() regression
        powerpc: Fix load/store float double alignment handler
      6fc79d40
    • Linus Torvalds's avatar
      Make ieee1394_init a fs-initcall · 86883c27
      Linus Torvalds authored
      It needs to happen before any firewire driver actually registers itself,
      and that was previously handled by having the Makefile list the core
      ieee1394 files before the drivers.
      
      But now there are firewire drivers in drivers/media, and the Makefile
      games aren't enough.  So just make ieee1394_init happen earlier in the
      init sequence, the way all other bus layers already do.
      Reported-and-tested-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Henrik Kurelid <henrik@kurelid.se>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Ben Backx <ben@bbackx.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86883c27
    • Ingo Molnar's avatar
      tracing: implement trace_clock_*() APIs · 14131f2f
      Ingo Molnar authored
      Impact: implement new tracing timestamp APIs
      
      Add three trace clock variants, with differing scalability/precision
      tradeoffs:
      
       -   local: CPU-local trace clock
       -  medium: scalable global clock with some jitter
       -  global: globally monotonic, serialized clock
      
      Make the ring-buffer use the local trace clock internally.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14131f2f
    • Ingo Molnar's avatar
      sched: sched_clock() improvement: use in_nmi() · 6409c4da
      Ingo Molnar authored
      make sure we dont execute more complex sched_clock() code in NMI context.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6409c4da
    • Jason Baron's avatar
      tracing, genirq: add irq enter and exit trace events · af39241b
      Jason Baron authored
      Impact: add new tracepoints
      
      Add them to the generic IRQ code, that way every architecture
      gets these new tracepoints, not just x86.
      
      Using Steve's new 'TRACE_FORMAT', I can get function graph
      trace as follows using the original two IRQ tracepoints:
      
       3)               |    handle_IRQ_event() {
       3)               |    /* (irq_handler_entry) irq=28 handler=eth0 */
       3)               |    e1000_intr_msi() {
       3)   2.460 us    |      __napi_schedule();
       3)   9.416 us    |    }
       3)               |    /* (irq_handler_exit) irq=28 handler=eth0 return=handled */
       3) + 22.935 us   |  }
      Signed-off-by: default avatarJason Baron <jbaron@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      af39241b
    • Frederic Weisbecker's avatar
      tracing/core: make the per cpu trace files in per cpu directories · 8656e7a2
      Frederic Weisbecker authored
      Impact: restructure the VFS layout of per CPU trace buffers
      
      The per cpu trace files are all in a single directory:
      /debug/tracing/per_cpu. In case of a large number of cpu, the
      content of this directory becomes messy so we create now one
      directory per cpu inside /debug/tracing/per_cpu which contain
      each their own trace_pipe and trace files.
      
      Ie:
      
       /debug/tracing$ ls -R per_cpu
       per_cpu:
       cpu0  cpu1
      
       per_cpu/cpu0:
       trace  trace_pipe
      
       per_cpu/cpu1:
       trace  trace_pipe
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8656e7a2
    • Jens Axboe's avatar
      xen/blkfront: use blk_rq_map_sg to generate ring entries · 9e973e64
      Jens Axboe authored
      On occasion, the request will apparently have more segments than we
      fit into the ring. Jens says:
      
      > The second problem is that the block layer then appears to create one
      > too many segments, but from the dump it has rq->nr_phys_segments ==
      > BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
      > xen-blkfront not handling the merging on its own. It should check that
      > the new page doesn't form part of the previous page. The
      > rq_for_each_segment() iterates all single bits in the request, not dma
      > segments. The "easiest" way to do this is to call blk_rq_map_sg() and
      > then iterate the mapped sg list. That will give you what you are
      > looking for.
      
      > Here's a test patch, compiles but otherwise untested. I spent more
      > time figuring out how to enable XEN than to code it up, so YMMV!
      > Probably the sg list wants to be put inside the ring and only
      > initialized on allocation, then you can get rid of the sg on stack and
      > sg_init_table() loop call in the function. I'll leave that, and the
      > testing, to you.
      
      [Moved sg array into info structure, and initialize once. -J]
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      9e973e64
    • Jens Axboe's avatar
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe authored
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1e428079
    • Jens Axboe's avatar
      cciss: shorten 30s timeout on controller reset · 5e4c91c8
      Jens Axboe authored
      If reset_devices is set for kexec, then cciss will delay 30 seconds
      since the old 5i controller _may_ need that long to recover. Replace
      the long sleep with incremental sleep and tests to reduce the 30 seconds
      to worst case for 5i, so that other controllers will proceed quickly.
      Reviewed-by: default avatarMike Miller <mike.miller@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5e4c91c8
    • Márton Németh's avatar
      block: add documentation for register_blkdev() · 9e8c0bcc
      Márton Németh authored
      Add documentation for register_blkdev() function and for the parameters.
      Signed-off-by: default avatarMárton Németh <nm127@freemail.hu>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      9e8c0bcc
    • Jens Axboe's avatar
      block: fix bogus gcc warning for uninitialized var usage · b2bf9683
      Jens Axboe authored
      Newer gcc throw this warning:
      
              fs/bio.c: In function ?bio_alloc_bioset?:
              fs/bio.c:305: warning: ?p? may be used uninitialized in this function
      
      since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b2bf9683
    • Mark Nelson's avatar
      powerpc: Fix 64bit __copy_tofrom_user() regression · f72b728b
      Mark Nelson authored
      This fixes a regression introduced by commit
      a4e22f02 ("powerpc: Update 64bit
      __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").
      
      The same bug that existed in the 64bit memcpy() also exists here so fix
      it here too. The fix is the same as that applied to memcpy() with the
      addition of fixes for the exception handling code required for
      __copy_tofrom_user().
      
      This stops us reading beyond the end of the source region we were told
      to copy.
      Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f72b728b
    • Mark Nelson's avatar
      powerpc: Fix 64bit memcpy() regression · e423b9ec
      Mark Nelson authored
      This fixes a regression introduced by commit
      25d6e2d7 ("powerpc: Update 64bit memcpy()
      using CPU_FTR_UNALIGNED_LD_STD").
      
      This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
      feature bit present to do the memcpy() with unaligned load doubles. But,
      along with this came a bug where our final load double would read bytes
      beyond a page boundary and into the next (unmapped) page. This was caught
      by enabling CONFIG_DEBUG_PAGEALLOC,
      
      The fix was to read only the number of bytes that we need to store rather
      than reading a full 8-byte doubleword and storing only a portion of that.
      
      In order to minimise the amount of existing code touched we use the
      original do_tail for the src_unaligned case.
      
      Below is an example of the regression, as reported by Sachin Sant:
      
      Unable to handle kernel paging request for data at address 0xc00000003f380000
      Faulting instruction address: 0xc000000000039574
      cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
          pc: c000000000039574: .memcpy+0x74/0x244
          lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
          sp: c00000003baf32a0
         msr: 8000000000009032
         dar: c00000003f380000
       dsisr: 40000000
        current = 0xc00000003e54b010
        paca    = 0xc000000000a53680
          pid   = 1840, comm = readahead
      enter ? for help
      [link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
      [c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
      (unreliab
      le)
      [c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
      [c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
      [c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
      [c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
      [c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
      [c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
      [c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
      [c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
      [c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
      [c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
      [c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
      [c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
      [c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
      [c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
      [c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
      [c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e423b9ec
    • Michael Neuling's avatar
      powerpc: Fix load/store float double alignment handler · 49f297f8
      Michael Neuling authored
      When we introduced VSX, we changed the way FPRs are stored in the
      thread_struct.  Unfortunately we missed the load/store float double
      alignment handler code when updating how we access FPRs in the
      thread_struct.
      
      Below fixes this and merges the little/big endian case.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49f297f8
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/ftrace' of... · f4abfb8d
      Ingo Molnar authored
      Merge branch 'tip/tracing/ftrace' of ssh://master.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
      f4abfb8d
    • Ingo Molnar's avatar
    • Steven Rostedt's avatar
      tracing: wrap arguments with PARAMS · 3cdfdf91
      Steven Rostedt authored
      Peter Zijlstra warned that TPPROTO and TPARGS might become something
      other than a simple copy of itself. To prevent this from having
      side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
      PARAMS() macro to be defined as just a wrapper.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      3cdfdf91
    • Steven Rostedt's avatar
      tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT · eef62a68
      Steven Rostedt authored
      There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
      be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
      TRACE_FORMAT.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      eef62a68
  3. 25 Feb, 2009 9 commits