1. 21 Feb, 2010 1 commit
  2. 19 Feb, 2010 2 commits
    • Brandon Philips's avatar
      x86, irq: Keep chip_data in create_irq_nr and destroy_irq · eb5b3794
      Brandon Philips authored
      Version 4: use get_irq_chip_data() in destroy_irq() to get rid of some
      local vars.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: default avatarBrandon Philips <bphilips@suse.de>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20100207210250.GB8256@jenkins.home.ifup.org>
      Signed-off-by: default avatarBrandon Phiilps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      eb5b3794
    • Eric W. Biederman's avatar
      xen: Remove unnecessary arch specific xen irq functions. · ca4dbc66
      Eric W. Biederman authored
      Right now xen's use of the x86 and ia64 handle_irq is just bizarre and very
      fragile as it is very non-obvious the function exists and is is used by
      code out in drivers/....  Luckily using handle_irq is completely unnecessary,
      and we can just use the generic irq apis instead.
      
      This still leaves drivers/xen/events.c as a problematic user of the generic
      irq apis it has "static struct irq_info irq_info[NR_IRQS]" but that can be
      fixed some other time.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <4B7CAAD2.10803@kernel.org>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      ca4dbc66
  3. 18 Feb, 2010 6 commits
  4. 17 Feb, 2010 1 commit
  5. 16 Feb, 2010 1 commit
  6. 10 Feb, 2010 15 commits
    • Brandon Phiilps's avatar
      x86: Avoid race condition in pci_enable_msix() · ced5b697
      Brandon Phiilps authored
      Keep chip_data in create_irq_nr and destroy_irq.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: default avatarBrandon Philips <bphilips@suse.de>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: default avatarBrandon Phililps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      ced5b697
    • Yinghai Lu's avatar
      x86: Fix SCI on IOAPIC != 0 · 18dce6ba
      Yinghai Lu authored
      Thomas Renninger <trenn@suse.de> reported on IBM x3330
      
      booting a latest kernel on this machine results in:
      
      PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
      PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
      ACPI: SCI (IRQ30) allocation failed
      ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
      ACPI: Unable to start the ACPI Interpreter
      
      Later all kind of devices fail...
      
      and bisect it down to this commit:
      commit b9c61b70
      
          x86/pci: update pirq_enable_irq() to setup io apic routing
      
      it turns out we need to set irq routing for the sci on ioapic1 early.
      
      -v2: make it work without sparseirq too.
      -v3: fix checkpatch.pl warning, and cc to stable
      Reported-by: default avatarThomas Renninger <trenn@suse.de>
      Bisected-by: default avatarThomas Renninger <trenn@suse.de>
      Tested-by: default avatarThomas Renninger <trenn@suse.de>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      18dce6ba
    • Jiri Slaby's avatar
      x86, ia32_aout: do not kill argument mapping · 318f6b22
      Jiri Slaby authored
      Do not set current->mm->mmap to NULL in 32-bit emulation on 64-bit
      load_aout_binary after flush_old_exec as it would destroy already
      set brpm mapping with arguments.
      
      Introduced by b6a2fea3
      mm: variable length argument support
      where the argument mapping in bprm was added.
      
      [ hpa: this is a regression from 2.6.22... time to kill a.out? ]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      LKML-Reference: <1265831716-7668-1-git-send-email-jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ollie Wild <aaw@google.com>
      Cc: x86@kernel.org
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      318f6b22
    • Linus Torvalds's avatar
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 · 909ccdb4
      Linus Torvalds authored
      * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
        [S390] Fix struct _lowcore layout.
        [S390] qdio: prevent call trace if CHPID is offline
        [S390] qdio: continue polling for buffer state ERROR
      909ccdb4
    • Linus Torvalds's avatar
      Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 2cbd1883
      Linus Torvalds authored
      * 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: PIT: control word is write-only
        kvmclock: count total_sleep_time when updating guest clock
        Export the symbol of getboottime and mmonotonic_to_bootbased
      2cbd1883
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 · 5993fe31
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
        avr32: clean up memory allocation in at32_add_device_mci
        arch/avr32: Fix build failure for avr32 caused by typo
      5993fe31
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 53910146
      Linus Torvalds authored
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Fix address masking bug in hpte_need_flush()
      53910146
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 · 5551638a
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
        cifs: fix dentry hash calculation for case-insensitive mounts
        [CIFS] Don't cache timestamps on utimes due to coarse granularity
        [CIFS] Maximum username length check in session setup does not match
        cifs: fix length calculation for converted unicode readdir names
        [CIFS] Add support for TCP_NODELAY
      5551638a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 0ea45783
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
        drivers/net: Correct NULL test
        MAINTAINERS: networking drivers - Add git net-next tree
        net/sched: Fix module name in Kconfig
        cxgb3: fix GRO checksum check
        dst: call cond_resched() in dst_gc_task()
        netfilter: nf_conntrack: fix hash resizing with namespaces
        netfilter: xtables: compat out of scope fix
        netfilter: nf_conntrack: restrict runtime expect hashsize modifications
        netfilter: nf_conntrack: per netns nf_conntrack_cachep
        netfilter: nf_conntrack: fix memory corruption with multiple namespaces
        Bluetooth: Keep a copy of each HID device's report descriptor
        pktgen: Fix freezing problem
        igb: make certain to reassign legacy interrupt vectors after reset
        irda: add missing BKL in irnet_ppp ioctl
        irda: unbalanced lock_kernel in irnet_ppp
        ixgbe: Fix return of invalid txq
        ixgbe: Fix ixgbe_tx_map error path
        netxen: protect resource cleanup by rtnl lock
        netxen: fix tx timeout recovery for NX2031 chip
        Bluetooth: Enter active mode before establishing a SCO link.
        ...
      0ea45783
    • Suresh Siddha's avatar
      x86, apic: Don't use logical-flat mode when CPU hotplug may exceed 8 CPUs · 681ee44d
      Suresh Siddha authored
      We need to fall back from logical-flat APIC mode to physical-flat mode
      when we have more than 8 CPUs.  However, in the presence of CPU
      hotplug(with bios listing not enabled but possible cpus as disabled cpus in
      MADT), we have to consider the number of possible CPUs rather than
      the number of current CPUs; otherwise we may cross the 8-CPU boundary
      when CPUs are added later.
      
      32bit apic code can use more cleanups (like the removal of vendor checks in
      32bit default_setup_apic_routing()) and more unifications with 64bit code.
      Yinghai has some patches in works already. This patch addresses the boot issue
      that is reported in the virtualization guest context.
      
      [ hpa: incorporated function annotation feedback from Yinghai Lu ]
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1265767304.2833.19.camel@sbs-t61.sc.intel.com>
      Acked-by: default avatarShaohui Zheng <shaohui.zheng@intel.com>
      Reviewed-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      681ee44d
    • David Gibson's avatar
      powerpc: Fix address masking bug in hpte_need_flush() · 77058e1a
      David Gibson authored
      Commit f71dc176 'Make
      hpte_need_flush() correctly mask for multiple page sizes' introduced
      bug, which is triggered when a kernel with a 64k base page size is run
      on a system whose hardware does not 64k hash PTEs.  In this case, we
      emulate 64k pages with multiple 4k hash PTEs, however in
      hpte_need_flush() we incorrectly only mask the hardware page size from
      the address, instead of the logical page size.  This causes things to
      go wrong when we later attempt to iterate through the hardware
      subpages of the logical page.
      
      This patch corrects the error.  It has been tested on pSeries bare
      metal by Michael Neuling.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      77058e1a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · ac73fddf
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md:
        md: fix some lockdep issues between md and sysfs.
        md: fix 'degraded' calculation when starting a reshape.
      ac73fddf
    • NeilBrown's avatar
      md: fix some lockdep issues between md and sysfs. · ef286f6f
      NeilBrown authored
      ======
      This fix is related to
          http://bugzilla.kernel.org/show_bug.cgi?id=15142
      but does not address that exact issue.
      ======
      
      sysfs does like attributes being removed while they are being accessed
      (i.e. read or written) and waits for the access to complete.
      
      As accessing some md attributes takes the same lock that is held while
      removing those attributes a deadlock can occur.
      
      This patch addresses 3 issues in md that could lead to this deadlock.
      
      Two relate to calling flush_scheduled_work while the lock is held.
      This is probably a bad idea in general and as we use schedule_work to
      delete various sysfs objects it is particularly bad.
      
      In one case flush_scheduled_work is called from md_alloc (called by
      md_probe) called from do_md_run which holds the lock.  This call is
      only present to ensure that ->gendisk is set.  However we can be sure
      that gendisk is always set (though possibly we couldn't when that code
      was originally written.  This is because do_md_run is called in three
      different contexts:
        1/ from md_ioctl.  This requires that md_open has succeeded, and it
           fails if ->gendisk is not set.
        2/ from writing a sysfs attribute.  This can only happen if the
           mddev has been registered in sysfs which happens in md_alloc
           after ->gendisk has been set.
        3/ from autorun_array which is only called by autorun_devices, which
           checks for ->gendisk to be set before calling autorun_array.
      So the call to md_probe in do_md_run can be removed, and the check on
      ->gendisk can also go.
      
      
      In the other case flush_scheduled_work is being called in do_md_stop,
      purportedly to wait for all md_delayed_delete calls (which delete the
      component rdevs) to complete.  However there really isn't any need to
      wait for them - they have already been disconnected in all important
      ways.
      
      The third issue is that raid5->stop() removes some attribute names
      while the lock is held.  There is already some infrastructure in place
      to delay attribute removal until after the lock is released (using
      schedule_work).  So extend that infrastructure to remove the
      raid5_attrs_group.
      
      This does not address all lockdep issues related to the sysfs
      "s_active" lock.  The rest can be address by splitting that lockdep
      context between symlinks and non-symlinks which hopefully will happen.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ef286f6f
    • Serge E. Hallyn's avatar
      x86-32: Make AT_VECTOR_SIZE_ARCH=2 · cf9db6c4
      Serge E. Hallyn authored
      Both x86-32 and x86-64 with 32-bit compat use ARCH_DLINFO_IA32,
      which defines two saved_auxv entries.  But system.h only defines
      AT_VECTOR_SIZE_ARCH as 2 for CONFIG_IA32_EMULATION, not for
      CONFIG_X86_32.  Fix that.
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      LKML-Reference: <20100209023502.GA15408@us.ibm.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      cf9db6c4
  7. 09 Feb, 2010 14 commits