1. 24 Sep, 2009 6 commits
    • James Bottomley's avatar
      SCSI: fix oops during scsi scanning · 0ce24e27
      James Bottomley authored
      commit ea038f63 upstream.
      
      Chris Webb reported:
        p0# uname -a
        Linux f7ea8425-d45b-490f-a738-d181d0df6963.host.elastichosts.com 2.6.30.4-elastic-lon-p #2 SMP PREEMPT Thu Aug 20 14:30:50 BST 2009 x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
        p0# zgrep SCAN_ASYNC /proc/config.gz
        # CONFIG_SCSI_SCAN_ASYNC is not set
      
        p0# cat /var/log/kern/2009-08-20
        [...]
        15:27:10.485 kernel: scsi9 : iSCSI Initiator over TCP/IP
        15:27:11.493 kernel: scsi 9:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
        15:27:11.493 kernel: scsi 9:0:0:0: Attached scsi generic sg6 type 12
        15:27:11.495 kernel: scsi 9:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
        15:27:11.495 kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
        15:27:11.495 kernel: sd 9:0:0:1: [sdg] 4194304 512-byte hardware sectors: (2.14 GB/2.00 GiB)
        15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write Protect is off
        15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
        15:27:13.012 kernel: sdg:<6>scsi 9:0:0:1: [sdg] Unhandled error code
        15:27:13.012 kernel: scsi 9:0:0:1: [sdg] Result: hostbyte=0x07 driverbyte=0x00
        15:27:13.012 kernel: end_request: I/O error, dev sdg, sector 0
        15:27:13.012 kernel: Buffer I/O error on device sdg, logical block 0
        15:27:13.012 kernel: ldm_validate_partition_table(): Disk read failed.
        15:27:13.012 kernel: unable to read partition table
        15:27:13.014 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
        15:27:13.014 kernel: IP: [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
        15:27:13.014 kernel: PGD 82ad0b067 PUD 82cd7e067 PMD 0
        15:27:13.014 kernel: Oops: 0000 [#1] PREEMPT SMP
        15:27:13.014 kernel: last sysfs file: /sys/devices/platform/host9/session4/iscsi_session/session4/ifacename
        15:27:13.014 kernel: CPU 5
        15:27:13.014 kernel: Modules linked in:
        15:27:13.014 kernel: Pid: 13999, comm: async/0 Not tainted 2.6.30.4-elastic-lon-p #2 X7DBN
        15:27:13.014 kernel: RIP: 0010:[<ffffffff803f0d77>]  [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
        15:27:13.014 kernel: RSP: 0018:ffff88066afa3dd0  EFLAGS: 00010246
        15:27:13.014 kernel: RAX: ffff88082b58a000 RBX: ffff88066afa3e00 RCX: 0000000000000000
        15:27:13.014 kernel: RDX: 0000000000000000 RSI: ffff88082b58a000 RDI: 0000000000000000
        15:27:13.014 kernel: RBP: ffff88066afa3df0 R08: ffff88066afa2000 R09: ffff8806a204f000
        15:27:13.014 kernel: R10: 000000fb12c7d274 R11: ffff8806c2bf0628 R12: ffff88066afa3e00
        15:27:13.014 kernel: R13: ffff88082c829a00 R14: 0000000000000000 R15: ffff8806bc50c920
        15:27:13.014 kernel: FS:  0000000000000000(0000) GS:ffff88002818a000(0000) knlGS:0000000000000000
        15:27:13.014 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
        15:27:13.014 kernel: CR2: 0000000000000010 CR3: 000000082ade3000 CR4: 00000000000426e0
        15:27:13.014 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        15:27:13.014 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        15:27:13.014 kernel: Process async/0 (pid: 13999, threadinfo ffff88066afa2000, task ffff8806c2bf05e0)
        15:27:13.014 kernel: Stack:
        15:27:13.014 kernel: 0000000000000000 ffff88066afa3e00 ffff88066afa3e00 ffff88082c829a00
        15:27:13.014 kernel: ffff88066afa3e40 ffffffff80306feb ffff88082b58a000 0000000000000000
        15:27:13.014 kernel: 0000000000000001 ffff8806bc50c920 ffff88066afa3e40 ffff88082b58a000
        15:27:13.014 kernel: Call Trace:
        15:27:13.014 kernel: [<ffffffff80306feb>] register_disk+0x122/0x13a
        15:27:13.014 kernel: [<ffffffff803f0b0f>] add_disk+0xaa/0x106
        15:27:13.014 kernel: [<ffffffff80493609>] sd_probe_async+0x198/0x25b
        15:27:13.014 kernel: [<ffffffff80270482>] async_thread+0x10c/0x20d
        15:27:13.014 kernel: [<ffffffff802545ff>] ? default_wake_function+0x0/0xf
        15:27:13.014 kernel: [<ffffffff80270376>] ? async_thread+0x0/0x20d
        15:27:13.014 kernel: [<ffffffff8026ad89>] kthread+0x55/0x80
        15:27:13.014 kernel: [<ffffffff8022be6a>] child_rip+0xa/0x20
        15:27:13.014 kernel: [<ffffffff8026ad34>] ? kthread+0x0/0x80
        15:27:13.014 kernel: [<ffffffff8022be60>] ? child_rip+0x0/0x20
        15:27:13.014 kernel: Code: c8 ff 80 e1 0c b9 00 00 00 00 0f 44 c1 41 83 cd ff 48 8d 7a 20 48 be ff ff ff ff 08 00 00 00 48 b9 00 00 00 00 08 00 00 00 eb 50 <8b> 42 10 41 bd 01 00 00 00 eb db 4c 63 c2 4e 8d 04 c7 4d 8b 20
        15:27:13.015 kernel: RIP  [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
        15:27:13.015 kernel: RSP <ffff88066afa3dd0>
        15:27:13.015 kernel: CR2: 0000000000000010
        15:27:13.015 kernel: ---[ end trace 6104b56ef5590e25 ]---
      
      The problem is caused because the async scanning split in sd.c doesn't hold
      any reference to the device when it kicks off the async piece.  What's
      happening is that an iSCSI disconnect is destorying the device again *before*
      the async sd scanning thread even starts.  Fix this by taking a reference
      before starting the thread and dropping it again when the thread completes.
      Reported-by: default avatarChris Webb <chris@arachsys.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      0ce24e27
    • Kashyap, Desai's avatar
      mpt2sas: Raid 10 Volume is showing as Raid 1E in dmesg · f045fdff
      Kashyap, Desai authored
      commit ed79f128 upstream.
      
      This patch modifies the slave_configure callback so the messages that get sent
      to system log for RAID1E volumes contain the string "RAID10" instead of
      "RAID1E". These messages contain information regarding what kind of scsi device
      is being added. Certain OEMS can enable displaying the RAID10 string instead of
      RAID1E via manufacturing page 10.   The driver will read this config page at
      driver load time, then determine from the GenericFlags0 bits whether display
      the RAID10 or RAID1E string, also even drive count is taken into consideration.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@lsi.com>
      Reviewed-by: default avatarEric Moore <Eric.moore@lsi.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      f045fdff
    • Kashyap, Desai's avatar
      mpt2sas: setting SDEV into RUNNING state from Interrupt context · ab58d16b
      Kashyap, Desai authored
      commit 34a03bef upstream.
      
      Changing SDEV Running state from interrupt context. Previously It was
      handle in work queue thread. With this change It will not wait for work
      queue thread to execute scsih_ublock_io_device to put SDEV into Running
      state. This will reduce delay for Device becoming RUNNING.
      
      Modified this patch considering James comment "Not to change SDEV state
      using  scsi_device_set_state API, instead use scsi_internal_device_unblock
      scsi_internal_device_block API"
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@lsi.com>
      Reviewed-by: default avatarEric Moore <Eric.moore@lsi.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ab58d16b
    • Kashyap, Desai's avatar
      mpt2sas: Prevent sending command to FW while Host Reset · fa278da6
      Kashyap, Desai authored
      commit 155dd4c7 upstream.
      
      This patch renames the flag for indicating host reset from
      ioc_reset_in_progress to shost_recovery. It also removes the spin locks
      surrounding the setting of this flag, which are unnecessary.   Sanity checks on
      the shost_recovery flag were added thru out the code so as to prevent sending
      firmware commands during host reset.  Also, the setting of the shost state to
      SHOST_RECOVERY was removed to prevent deadlocks, this is actually better
      handled by the shost_recovery flag.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@lsi.com>
      Reviewed-by: default avatarEric Moore <Eric.moore@lsi.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fa278da6
    • Kashyap, Desai's avatar
      mpt2sas : Rescan topology from Interrupt context instead of work thread · 38032be1
      Kashyap, Desai authored
      commit cd4e12e8 upstream.
      
      Following host reset its possible that the controller firmware could
      assign new handles for devices, as well as adding or deleting devices. There is
      code in the driver that will rescan the topology folowing host reset; updating
      device handles, and remove devices that are no longer responding. This patch
      will improve the responsivness by moving this rescaning from the delayed hotplug
      worker thread to immediately following the host reset.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@lsi.com>
      Reviewed-by: default avatarEric Moore <Eric.moore@lsi.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      38032be1
    • Michal Schmidt's avatar
      sg: fix oops in the error path in sg_build_indirect() · 57f4fc5e
      Michal Schmidt authored
      commit e71044ee upstream.
      
      When the allocation fails in sg_build_indirect(), an oops happens in
      the error path. It's caused by an obvious typo.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reported-by: default avatarBob Tracy <rct@gherkin.frus.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      57f4fc5e
  2. 09 Sep, 2009 3 commits
    • Linus Torvalds's avatar
      Linux 2.6.31 · 74fca6a4
      Linus Torvalds authored
      74fca6a4
    • Ed Cashin's avatar
      aoe: allocate unused request_queue for sysfs · 7135a71b
      Ed Cashin authored
      Andy Whitcroft reported an oops in aoe triggered by use of an
      incorrectly initialised request_queue object:
      
        [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
      		an uninitialized object, something is seriously wrong.
        [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
        [ 2645.959107] Call Trace:
        [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
        [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
        [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
        [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
      
      The request queue of an aoe device is not used but can be allocated in
      code that does not sleep.
      
      Bruno bisected this regression down to
      
        cd43e26f
      
        block: Expose stacked device queues in sysfs
      
      "This seems to generate /sys/block/$device/queue and its contents for
       everyone who is using queues, not just for those queues that have a
       non-NULL queue->request_fn."
      
      Addresses http://bugs.launchpad.net/bugs/410198
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942
      
      Note that embedding a queue inside another object has always been
      an illegal construct, since the queues are reference counted and
      must persist until the last reference is dropped. So aoe was
      always buggy in this respect (Jens).
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Bruno Premont <bonbons@linux-vserver.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      7135a71b
    • Linus Torvalds's avatar
      i915: disable interrupts before tearing down GEM state · e6890f6f
      Linus Torvalds authored
      Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
      when switching from graphics mode to the text console, or when
      suspending (which does the same thing). With netconsole, the oops
      turned out to be
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
      	IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
      
      and it's due to the i915_gem.c code doing drm_irq_uninstall() after
      having done i915_gem_idle(). And the i915_gem_idle() path will do
      
        i915_gem_idle() ->
          i915_gem_cleanup_ringbuffer() ->
            i915_gem_cleanup_hws() ->
              dev_priv->hw_status_page = NULL;
      
      but if an i915 interrupt comes in after this stage, it may want to
      access that hw_status_page, and gets the above NULL pointer dereference.
      
      And since the NULL pointer dereference happens from within an interrupt,
      and with the screen still in graphics mode, the common end result is
      simply a silently hung machine.
      
      Fix it by simply uninstalling the irq handler before idling rather than
      after. Fixes
      
          http://bugzilla.kernel.org/show_bug.cgi?id=13819Reported-and-tested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6890f6f
  3. 08 Sep, 2009 1 commit
  4. 07 Sep, 2009 7 commits
  5. 06 Sep, 2009 1 commit
    • David S. Miller's avatar
      gianfar: Fix build. · d9d8e041
      David S. Miller authored
      Reported by Michael Guntsche <mike@it-loops.com>
      
      --------------------
      Commit
      38bddf04 gianfar: gfar_remove needs to call unregister_netdev()
      
      breaks the build of the gianfar driver because "dev" is undefined in
      this function. To quickly test rc9 I changed this to priv->ndev but I do
      not know if this is the correct one.
      --------------------
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9d8e041
  6. 05 Sep, 2009 22 commits