1. 17 Feb, 2010 9 commits
    • Anton Blanchard's avatar
      powerpc: Use lwarx hint in spinlocks · 4e14a4d1
      Anton Blanchard authored
      Recent versions of the PowerPC architecture added a hint bit to the larx
      instructions to differentiate between an atomic operation and a lock operation:
      
      > 0 Other programs might attempt to modify the word in storage addressed by EA
      > even if the subsequent Store Conditional succeeds.
      >
      > 1 Other programs will not attempt to modify the word in storage addressed by
      > EA until the program that has acquired the lock performs a subsequent store
      > releasing the lock.
      
      To avoid a binutils dependency this patch create macros for the extended lwarx
      format and uses it in the spinlock code. To test this change I used a simple
      test case that acquires and releases a global pthread mutex:
      
      	pthread_mutex_lock(&mutex);
      	pthread_mutex_unlock(&mutex);
      
      On a 32 core POWER6, running 32 test threads we spend almost all our time in
      the futex spinlock code:
      
          94.37%     perf  [kernel]                     [k] ._raw_spin_lock
                     |
                     |--99.95%-- ._raw_spin_lock
                     |          |
                     |          |--63.29%-- .futex_wake
                     |          |
                     |          |--36.64%-- .futex_wait_setup
      
      Which is a good test for this patch. The results (in lock/unlock operations per
      second) are:
      
      before: 1538203 ops/sec
      after:  2189219 ops/sec
      
      An improvement of 42%
      
      A 32 core POWER7 improves even more:
      
      before: 1279529 ops/sec
      after:  2282076 ops/sec
      
      An improvement of 78%
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4e14a4d1
    • Anton Blanchard's avatar
      powerpc: Convert global "BAD" interrupt to per cpu spurious · 17081102
      Anton Blanchard authored
      I often get asked if BAD interrupts are really bad. On some boxes (eg
      IBM machines running a hypervisor) there are valid cases where are
      presented with an interrupt that is not for us. These cases are common
      enough to show up as thousands of BAD interrupts a day.
      
      Tone them down by calling them spurious. Since they can be a significant cause
      of OS jitter, we may as well log them per cpu so we know where they are
      occurring.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      17081102
    • Anton Blanchard's avatar
      powerpc: Add timer, performance monitor and machine check counts to /proc/interrupts · 89713ed1
      Anton Blanchard authored
      With NO_HZ it is useful to know how often the decrementer is going off. The
      patch below adds an entry for it and also adds it into the /proc/stat
      summaries.
      
      While here, I added performance monitoring and machine check exceptions.
      I found it useful to keep an eye on the PMU exception rate
      when using the perf tool. Since it's possible to take a completely
      handled machine check on a System p box it also sounds like a good idea to
      keep a machine check summary.
      
      The event naming matches x86 to keep gratuitous differences to a minimum.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      89713ed1
    • Anton Blanchard's avatar
      powerpc: Remove whitespace in irq chip name fields · fc380c0c
      Anton Blanchard authored
      Now we use printf style alignment there is no need to manually space
      these fields.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fc380c0c
    • Anton Blanchard's avatar
      powerpc: Rework /proc/interrupts · c86845ed
      Anton Blanchard authored
      On a large machine I noticed the columns of /proc/interrupts failed to line up
      with the header after CPU9. At sufficiently large numbers of CPUs it becomes
      impossible to line up the CPU number with the counts.
      
      While fixing this I noticed x86 has a number of updates that we may as well
      pull in. On PowerPC we currently omit an interrupt completely if there is no
      active handler, whereas on x86 it is printed if there is a non zero count.
      
      The x86 code also spaces the first column correctly based on nr_irqs.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c86845ed
    • Anton Blanchard's avatar
      powerpc: Reduce footprint of xics_ipi_struct · fda9d861
      Anton Blanchard authored
      Right now we allocate a cacheline sized NR_CPUS array for xics IPI
      communication. Use DECLARE_PER_CPU_SHARED_ALIGNED to put it in percpu
      data in its own cacheline since it is written to by other cpus.
      
      On a kernel with NR_CPUS=1024, this saves quite a lot of memory:
      
         text    data     bss      dec         hex    filename
      8767779 2944260 1505724 13217763         c9afe3 vmlinux.irq_cpustat
      8767555 2813444 1505724 13086723         c7b003 vmlinux.xics
      
      A saving of around 128kB.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fda9d861
    • Anton Blanchard's avatar
      powerpc: Reduce footprint of irq_stat · 8c007bfd
      Anton Blanchard authored
      PowerPC is currently using asm-generic/hardirq.h which statically allocates an
      NR_CPUS irq_stat array. Switch to an arch specific implementation which uses
      per cpu data:
      
      On a kernel with NR_CPUS=1024, this saves quite a lot of memory:
      
         text    data     bss      dec         hex    filename
      8767938 2944132 1636796 13348866         cbb002 vmlinux.baseline
      8767779 2944260 1505724 13217763         c9afe3 vmlinux.irq_cpustat
      
      A saving of around 128kB.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8c007bfd
    • Breno Leitao's avatar
      powerpc/eeh: Fix a bug when pci structure is null · 8d3d50bf
      Breno Leitao authored
      During a EEH recover, the pci_dev structure can be null, mainly if an
      eeh event is detected during cpi config operation. In this case, the
      pci_dev will not be known (and will be null) the kernel will crash
      with the following message:
      
      Unable to handle kernel paging request for data at address 0x000000a0
      Faulting instruction address: 0xc00000000006b8b4
      Oops: Kernel access of bad area, sig: 11 [#1]
      
      NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0
      LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
      Call Trace:
      [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
      [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70
      
      The bug occurs because pci_name() tries to access a null pointer.
      This patch just guarantee that pci_name() is not called on Null pointers.
      Signed-off-by: default avatarBreno Leitao <leitao@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinas Vepstas <linasvepstas@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d3d50bf
    • Corey Minyard's avatar
      powerpc: Add coherent_dma_mask to mv64x60 devices · e0508b15
      Corey Minyard authored
      DMA ops requires that coherent_dma_mask be set properly for a device,
      but this was not being done for devices on the MV64x60 that use DMA.
      Both the serial and ethernet devices need this or they won't be able
      to allocate memory.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e0508b15
  2. 16 Feb, 2010 22 commits
  3. 15 Feb, 2010 6 commits
  4. 14 Feb, 2010 3 commits
    • Frederic Weisbecker's avatar
      reiserfs: Fix softlockup while waiting on an inode · 175359f8
      Frederic Weisbecker authored
      When we wait for an inode through reiserfs_iget(), we hold
      the reiserfs lock. And waiting for an inode may imply waiting
      for its writeback. But the inode writeback path may also require
      the reiserfs lock, which leads to a deadlock.
      
      We just need to release the reiserfs lock from reiserfs_iget()
      to fix this.
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      175359f8
    • Clemens Ladisch's avatar
      firewire: ohci: retransmit isochronous transmit packets on cycle loss · 7f51a100
      Clemens Ladisch authored
      In isochronous transmit DMA descriptors, link the skip address pointer
      back to the descriptor itself.  When a cycle is lost, the controller
      will send the packet in the next cycle, instead of terminating the
      entire DMA program.
      
      There are two reasons for this:
      
      * This behaviour is compatible with the old IEEE1394 stack.  Old
        applications would not expect the DMA program to stop in this case.
      
      * Since the OHCI driver does not report any uncompleted packets, the
        context would stop silently; clients would not have any chance to
        detect and handle this error without a watchdog timer.
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      
      Pieter Palmers notes:
      
      "The reason I added this retry behavior to the old stack is because some
      cards now and then fail to send a packet (e.g. the o2micro card in my
      dell laptop).  I couldn't figure out why exactly this happens, my best
      guess is that the card cannot fetch the payload data on time.  This
      happens much more frequently when sending large packets, which leads me
      to suspect that there are some contention issues with the DMA that fills
      the transmit FIFO.
      
      In the old stack it was a pretty critical issue as it resulted in a
      freeze of the userspace application.
      
      The omission of a packet doesn't necessarily have to be an issue.  E.g.
      in IEC61883 streams the DBC field can be used to detect discontinuities
      in the stream.  So as long as the other side doesn't bail when no
      [packet] is present in a cycle, there is not really a problem.
      
      I'm not convinced though that retrying is the proper solution, but it is
      simple and effective for what it had to do.  And I think there are no
      reasons not to do it this way.  Userspace can still detect this by
      checking the cycle the descriptor was sent in."
      
      Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (changelog, comment)
      7f51a100
    • Kirill Smelkov's avatar
      perf top: Fix help text alignment · 1a72cfa6
      Kirill Smelkov authored
      Print this:
      
      Mapped keys:
              [d]     display refresh delay.                  (2)
              [e]     display entries (lines).                (46)
              [f]     profile display filter (count).         (5)
              [F]     annotate display filter (percent).      (5%)
              [s]     annotate symbol.                        (NULL)
              [S]     stop annotation.
              [K]     hide kernel_symbols symbols.            (no)
              [U]     hide user symbols.                      (no)
              [z]     toggle sample zeroing.                  (0)
              [qQ]    quit.
      
      instead of:
      
      Mapped keys:
              [d]     display refresh delay.                  (2)
              [e]     display entries (lines).                (46)
              [f]     profile display filter (count).         (5)
              [F]     annotate display filter (percent).      (5%)
              [s]     annotate symbol.                        (NULL)
              [S]     stop annotation.
              [K]     hide kernel_symbols symbols.                    (no)
              [U]     hide user symbols.                      (no)
              [z]     toggle sample zeroing.                  (0)
              [qQ]    quit.
      Signed-off-by: default avatarKirill Smelkov <kirr@landau.phys.spbu.ru>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100212162059.GA30041@landau.phys.spbu.ru>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1a72cfa6