1. 09 Oct, 2008 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 2.6.26.6 · afc84dac
      Greg Kroah-Hartman authored
      afc84dac
    • Jarod Wilson's avatar
      S390: CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode · 34f3c11b
      Jarod Wilson authored
      commit 3d6e48f4 upstream
      
      When running a 31-bit ptrace, on either an s390 or s390x kernel,
      reads and writes into a padding area in struct user_regs_struct32
      will result in a kernel panic.
      
      This is also known as CVE-2008-1514.
      
      Test case available here:
      http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/user-area-padding.c?cvsroot=systemtap
      
      Steps to reproduce:
      1) wget the above
      2) gcc -o user-area-padding-31bit user-area-padding.c -Wall -ggdb2 -D_GNU_SOURCE -m31
      3) ./user-area-padding-31bit
      <panic>
      
      Test status
      -----------
      Without patch, both s390 and s390x kernels panic. With patch, the test case,
      as well as the gdb testsuite, pass without incident, padding area reads
      returning zero, writes ignored.
      
      Nb: original version returned -EINVAL on write attempts, which broke the
      gdb test and made the test case slightly unhappy, Jan Kratochvil suggested
      the change to return 0 on write attempts.
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Tested-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Moritz Muehlenhoff <jmm@debian.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      34f3c11b
    • Balbir Singh's avatar
      mm owner: fix race between swapoff and exit · 553d7dd7
      Balbir Singh authored
      [Here's a backport of 2.6.27-rc8's 31a78f23
       to 2.6.26 or 2.6.26.5: I wouldn't trouble -stable for the (root only)
       swapoff case which uncovered the bug, but the /proc/<pid>/<mmstats> case
       is open to all, so I think worth plugging in the next 2.6.26-stable.
       - Hugh]
      
      
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      553d7dd7
    • Marcin Slusarz's avatar
      rtc: fix kernel panic on second use of SIGIO nofitication · eb07718d
      Marcin Slusarz authored
      commit 2e4a75cd upstream
      
      When userspace uses SIGIO notification and forgets to disable it before
      closing file descriptor, rtc->async_queue contains stale pointer to struct
      file.  When user space enables again SIGIO notification in different
      process, kernel dereferences this (poisoned) pointer and crashes.
      
      So disable SIGIO notification on close.
      
      Kernel panic:
      (second run of qemu (requires echo 1024 > /sys/class/rtc/rtc0/max_user_freq))
      
      general protection fault: 0000 [1] PREEMPT
      CPU 0
      Modules linked in: af_packet snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq usbhid tuner tea5767 tda8290 tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761 tda9875 uhci_hcd ehci_hcd usbcore bttv snd_via82xx snd_ac97_codec ac97_bus snd_pcm snd_timer ir_common compat_ioctl32 snd_page_alloc videodev v4l1_compat snd_mpu401_uart snd_rawmidi v4l2_common videobuf_dma_sg videobuf_core snd_seq_device snd btcx_risc soundcore tveeprom i2c_viapro
      Pid: 5781, comm: qemu-system-x86 Not tainted 2.6.27-rc6 #363
      RIP: 0010:[<ffffffff8024f891>]  [<ffffffff8024f891>] __lock_acquire+0x3db/0x73f
      RSP: 0000:ffffffff80674cb8  EFLAGS: 00010002
      RAX: ffff8800224c62f0 RBX: 0000000000000046 RCX: 0000000000000002
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800224c62f0
      RBP: ffffffff80674d08 R08: 0000000000000002 R09: 0000000000000001
      R10: ffffffff80238941 R11: 0000000000000001 R12: 0000000000000000
      R13: 6b6b6b6b6b6b6b6b R14: ffff88003a450080 R15: 0000000000000000
      FS:  00007f98b69516f0(0000) GS:ffffffff80623200(0000) knlGS:00000000f7cc86d0
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000a87000 CR3: 0000000022598000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process qemu-system-x86 (pid: 5781, threadinfo ffff880028812000, task ffff88003a450080)
      Stack:  ffffffff80674cf8 0000000180238440 0000000200000002 0000000000000000
       ffff8800224c62f0 0000000000000046 0000000000000000 0000000000000002
       0000000000000002 0000000000000000 ffffffff80674d68 ffffffff8024fc7a
      Call Trace:
       <IRQ>  [<ffffffff8024fc7a>] lock_acquire+0x85/0xa9
       [<ffffffff8029cb62>] ? send_sigio+0x2a/0x184
       [<ffffffff80491d1f>] _read_lock+0x3e/0x4a
       [<ffffffff8029cb62>] ? send_sigio+0x2a/0x184
       [<ffffffff8029cb62>] send_sigio+0x2a/0x184
       [<ffffffff8024fb97>] ? __lock_acquire+0x6e1/0x73f
       [<ffffffff8029cd4d>] ? kill_fasync+0x2c/0x4e
       [<ffffffff8029cd10>] __kill_fasync+0x54/0x65
       [<ffffffff8029cd5b>] kill_fasync+0x3a/0x4e
       [<ffffffff80402896>] rtc_update_irq+0x9c/0xa5
       [<ffffffff80404640>] cmos_interrupt+0xae/0xc0
       [<ffffffff8025d1c1>] handle_IRQ_event+0x25/0x5a
       [<ffffffff8025e5e4>] handle_edge_irq+0xdd/0x123
       [<ffffffff8020da34>] do_IRQ+0xe4/0x144
       [<ffffffff8020bad6>] ret_from_intr+0x0/0xf
       <EOI>  [<ffffffff8026fdc2>] ? __alloc_pages_internal+0xe7/0x3ad
       [<ffffffff8033fe67>] ? clear_page_c+0x7/0x10
       [<ffffffff8026fc10>] ? get_page_from_freelist+0x385/0x450
       [<ffffffff8026fdc2>] ? __alloc_pages_internal+0xe7/0x3ad
       [<ffffffff80280aac>] ? anon_vma_prepare+0x2e/0xf6
       [<ffffffff80279400>] ? handle_mm_fault+0x227/0x6a5
       [<ffffffff80494716>] ? do_page_fault+0x494/0x83f
       [<ffffffff8049251d>] ? error_exit+0x0/0xa9
      
      Code: cc 41 39 45 28 74 24 e8 5e 1d 0f 00 85 c0 0f 84 6a 03 00 00 83 3d 8f a9 aa 00 00 be 47 03 00 00 0f 84 6a 02 00 00 e9 53 03 00 00 <41> ff 85 38 01 00 00 45 8b be 90 06 00 00 41 83 ff 2f 76 24 e8
      RIP  [<ffffffff8024f891>] __lock_acquire+0x3db/0x73f
       RSP <ffffffff80674cb8>
      ---[ end trace 431877d860448760 ]---
      Kernel panic - not syncing: Aiee, killing interrupt handler!
      Signed-off-by: default avatarMarcin Slusarz <marcin.slusarz@gmail.com>
      Acked-by: default avatarAlessandro Zummo <alessandro.zummo@towertech.it>
      Acked-by: default avatarDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      eb07718d
    • David Winn's avatar
      fbcon: fix monochrome color value calculation · be38e82a
      David Winn authored
      commit 08650869 upstream
      
      Commit 22af89aa ("fbcon: replace mono_col
      macro with static inline") changed the order of operations for computing
      monochrome color values.  This generates 0xffff000f instead of 0x0000000f
      for a 4 bit monochrome color, leading to image corruption if it is passed
      to cfb_imageblit or other similar functions.  Fix it up.
      
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      be38e82a
    • Risto Suominen's avatar
      ALSA: snd-powermac: HP detection for 1st iMac G3 SL · ff37b8e1
      Risto Suominen authored
      commit 030b655b upstream
      
      Correct headphone detection for 1st generation iMac G3 Slot-loading (Screamer).
      
      This patch fixes the regression in the recent snd-powermac which
      doesn't support some G3/G4 PowerMacs:
          http://lkml.org/lkml/2008/10/1/220Signed-off-by: default avatarRisto Suominen <Risto.Suominen@gmail.com>
      Tested-by: default avatarMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ff37b8e1
    • Risto Suominen's avatar
      ALSA: snd-powermac: mixers for PowerMac G4 AGP · 0433c92c
      Risto Suominen authored
      commit 4dbf95ba upstream
      
      Add mixer controls for PowerMac G4 AGP (Screamer).
      
      This patch fixes the regression in the recent snd-powermac which
      doesn't support some G3/G4 PowerMacs:
          http://lkml.org/lkml/2008/10/1/220Signed-off-by: default avatarRisto Suominen <Risto.Suominen@gmail.com>
      Tested-by: default avatarMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      0433c92c
    • Pascal Terjan's avatar
      braille_console: only register notifiers when the braille console is used · c6b06fdb
      Pascal Terjan authored
      commit c0c9209d upstream
      
      Only register the braille driver VT and keyboard notifiers when the
      braille console is used.  Avoids eating insert or backspace keys.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11242Signed-off-by: default avatarPascal Terjan <pterjan@mandriva.com>
      Signed-off-by: default avatarSamuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Moritz Muehlenhoff <jmm@inutil.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c6b06fdb
    • David S. Miller's avatar
      sparc64: Fix missing devices due to PCI bridge test in of_create_pci_dev(). · 88e399f0
      David S. Miller authored
      [ Upstream commit 44b50e5a ]
      
      Just like in the arch/sparc64/kernel/of_device.c code fix commit
      071d7f4c3b411beae08d27656e958070c43b78b4 ("sparc64: Fix disappearing
      PCI devices on e3500.") we have to check the OF device node name for
      "pci" instead of relying upon the 'device_type' property being there
      on all PCI bridges.
      
      Tested by Meelis Roos, and confirmed to make the PCI QFE devices
      reappear on the E3500 system.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      88e399f0
    • David S. Miller's avatar
      sparc64: Fix disappearing PCI devices on e3500. · d78fdd8a
      David S. Miller authored
      [ Upstream commit 7ee766d8 ]
      
      Based upon a bug report by Meelis Roos.
      
      The OF device layer builds properties by matching bus types and
      applying 'range' properties as appropriate, up to the root.
      
      The match for "PCI" busses is looking at the 'device_type' property,
      and this does work %99 of the time.
      
      But on an E3500 system with a PCI QFE card, the DEC 21153 bridge
      sitting above the QFE network interface devices has a 'name' of "pci",
      but it completely lacks a 'device_type' property.  So we don't match
      it as a PCI bus, and subsequently we end up with no resource values at
      all for the devices sitting under that DEC bridge.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d78fdd8a
    • David S. Miller's avatar
      sparc64: Fix OOPS in psycho_pcierr_intr_other(). · 28a65ba6
      David S. Miller authored
      [ Upstream commit f948cc6a ]
      
      We no longer put the top-level PCI controller device into the
      PCI layer device list.  So pbm->pci_bus->self is always NULL.
      
      Therefore, use direct PCI config space accesses to get at
      the PCI controller's PCI_STATUS register.
      
      Tested by Meelis Roos.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      28a65ba6
    • David S. Miller's avatar
      sparc64: Fix interrupt register calculations on Psycho and Sabre. · 284be31e
      David S. Miller authored
      [ Upstream commit ebfb2c63 ]
      
      Use the IMAP offset calculation for OBIO devices as documented in the
      programmer's manual.  Which is "0x10000 + ((ino & 0x1f) << 3)"
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      284be31e
    • David S. Miller's avatar
      sparc64: Fix PCI error interrupt registry on PSYCHO. · 24c5886b
      David S. Miller authored
      [ Upstream commit 80a56ab6 ]
      
      We need to pass IRQF_SHARED, otherwise we get things like:
      
      IRQ handler type mismatch for IRQ 33
      current handler: PSYCHO_UE
      Call Trace:
       [000000000048394c] request_irq+0xac/0x120
       [00000000007c5f6c] psycho_scan_bus+0x98/0x158
       [00000000007c2bc0] pcibios_init+0xdc/0x12c
       [0000000000426a5c] do_one_initcall+0x1c/0x160
       [00000000007c0180] kernel_init+0x9c/0xfc
       [0000000000427050] kernel_thread+0x30/0x60
       [00000000006ae1d0] rest_init+0x10/0x60
      
      on e3500 and similar systems.
      
      On a single board, the UE interrupts of two Psycho nodes
      are funneled through the same interrupt, from of_debug=3
      dump:
      
      /pci@b,4000: direct translate 2ee --> 21
       ...
      /pci@b,2000: direct translate 2ee --> 21
      
      Decimal "33" mentioned above is the hex "21" mentioned here.
      
      Thanks to Meelis Roos for dumps and testing.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      24c5886b
    • Herbert Xu's avatar
      udp: Fix rcv socket locking · fc69b36c
      Herbert Xu authored
      [ Upstream commit 93821778 ]
      
      The previous patch in response to the recursive locking on IPsec
      reception is broken as it tries to drop the BH socket lock while in
      user context.
      
      This patch fixes it by shrinking the section protected by the
      socket lock to sock_queue_rcv_skb only.  The only reason we added
      the lock is for the accounting which happens in that function.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fc69b36c
    • Vlad Yasevich's avatar
      sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH · ce8fd8b9
      Vlad Yasevich authored
      [ Upstream commit add52379 ]
      
      If INIT-ACK is received with SupportedExtensions parameter which
      indicates that the peer does not support AUTH, the packet will be
      silently ignore, and sctp_process_init() do cleanup all of the
      transports in the association.
      When T1-Init timer is expires, OOPS happen while we try to choose
      a different init transport.
      
      The solution is to only clean up the non-active transports, i.e
      the ones that the peer added.  However, that introduces a problem
      with sctp_connectx(), because we don't mark the proper state for
      the transports provided by the user.  So, we'll simply mark
      user-provided transports as ACTIVE.  That will allow INIT
      retransmissions to work properly in the sctp_connectx() context
      and prevent the crash.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ce8fd8b9
    • Vlad Yasevich's avatar
      sctp: do not enable peer features if we can't do them. · 43562861
      Vlad Yasevich authored
      [ Upstream commit 0ef46e28 ]
      
      Do not enable peer features like addip and auth, if they
      are administratively disabled localy.  If the peer resports
      that he supports something that we don't, neither end can
      use it so enabling it is pointless.  This solves a problem
      when talking to a peer that has auth and addip enabled while
      we do not.  Found by Andrei Pelinescu-Onciul <andrei@iptel.org>.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      43562861
    • Herbert Xu's avatar
      ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space · b047cf6d
      Herbert Xu authored
      [ Upstream commit d01dbeb6 ]
      
      We're never supposed to shrink the headroom or tailroom.  In fact,
      shrinking the headroom is a fatal action.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b047cf6d
    • Vegard Nossum's avatar
      netlink: fix overrun in attribute iteration · 877755eb
      Vegard Nossum authored
      [ Upstream commit 1045b03e ]
      
      kmemcheck reported this:
      
        kmemcheck: Caught 16-bit read from uninitialized memory (f6c1ba30)
        0500110001508abf050010000500000002017300140000006f72672e66726565
         i i i i i i i i i i i i i u u u u u u u u u u u u u u u u u u u
                                         ^
      
        Pid: 3462, comm: wpa_supplicant Not tainted (2.6.27-rc3-00054-g6397ab9-dirty #13)
        EIP: 0060:[<c05de64a>] EFLAGS: 00010296 CPU: 0
        EIP is at nla_parse+0x5a/0xf0
        EAX: 00000008 EBX: fffffffd ECX: c06f16c0 EDX: 00000005
        ESI: 00000010 EDI: f6c1ba30 EBP: f6367c6c ESP: c0a11e88
         DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
        CR0: 8005003b CR2: f781cc84 CR3: 3632f000 CR4: 000006d0
        DR0: c0ead9bc DR1: 00000000 DR2: 00000000 DR3: 00000000
        DR6: ffff4ff0 DR7: 00000400
         [<c05d4b23>] rtnl_setlink+0x63/0x130
         [<c05d5f75>] rtnetlink_rcv_msg+0x165/0x200
         [<c05ddf66>] netlink_rcv_skb+0x76/0xa0
         [<c05d5dfe>] rtnetlink_rcv+0x1e/0x30
         [<c05dda21>] netlink_unicast+0x281/0x290
         [<c05ddbe9>] netlink_sendmsg+0x1b9/0x2b0
         [<c05beef2>] sock_sendmsg+0xd2/0x100
         [<c05bf945>] sys_sendto+0xa5/0xd0
         [<c05bf9a6>] sys_send+0x36/0x40
         [<c05c03d6>] sys_socketcall+0x1e6/0x2c0
         [<c020353b>] sysenter_do_call+0x12/0x3f
         [<ffffffff>] 0xffffffff
      
      This is the line in nla_ok():
      
        /**
         * nla_ok - check if the netlink attribute fits into the remaining bytes
         * @nla: netlink attribute
         * @remaining: number of bytes remaining in attribute stream
         */
        static inline int nla_ok(const struct nlattr *nla, int remaining)
        {
                return remaining >= sizeof(*nla) &&
                       nla->nla_len >= sizeof(*nla) &&
                       nla->nla_len <= remaining;
        }
      
      It turns out that remaining can become negative due to alignment in
      nla_next(). But GCC promotes "remaining" to unsigned in the test
      against sizeof(*nla) above. Therefore the test succeeds, and the
      nla_for_each_attr() may access memory outside the received buffer.
      
      A short example illustrating this point is here:
      
        #include <stdio.h>
      
        main(void)
        {
                printf("%d\n", -1 >= sizeof(int));
        }
      
      ...which prints "1".
      
      This patch adds a cast in front of the sizeof so that GCC will make
      a signed comparison and fix the illegal memory dereference. With the
      patch applied, there is no kmemcheck report.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      877755eb
    • Santwona Behera's avatar
      niu: panic on reset · 99479c65
      Santwona Behera authored
      [ Upstream commit cff502a3 ]
      
      The reset_task function in the niu driver does not reset the tx and rx
      buffers properly. This leads to panic on reset. This patch is a
      modified implementation of the previously posted fix.
      Signed-off-by: default avatarSantwona Behera <santwona.behera@sun.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      99479c65
    • Neil Horman's avatar
      ipv6: Fix OOPS in ip6_dst_lookup_tail(). · 1e4c1698
      Neil Horman authored
      [ Upstream commit e550dfb0 ]
      
      This fixes kernel bugzilla 11469: "TUN with 1024 neighbours:
      ip6_dst_lookup_tail NULL crash"
      
      dst->neighbour is not necessarily hooked up at this point
      in the processing path, so blindly dereferencing it is
      the wrong thing to do.  This NULL check exists in other
      similar paths and this case was just an oversight.
      
      Also fix the completely wrong and confusing indentation
      here while we're at it.
      
      Based upon a patch by Evgeniy Polyakov.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1e4c1698
    • Arnaud Ebalard's avatar
      XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep · 9c44da04
      Arnaud Ebalard authored
      [ Upstream commit 5dc121e9 ]
      
      ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
      be initialized) when dst_alloc() is called from ip6_dst_blackhole().
      Otherwise, it results in the following (xfrm_larval_drop is now set to
      1 by default):
      
      [   78.697642] Unable to handle kernel paging request for data at address 0x0000004c
      [   78.703449] Faulting instruction address: 0xc0097f54
      [   78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
      [   78.792791] PowerMac
      [   78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
      [   78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430
      [   78.809997] REGS: eef19ad0 TRAP: 0300   Not tainted  (2.6.27-rc5)
      [   78.815743] MSR: 00001032 <ME,IR,DR>  CR: 22242482  XER: 20000000
      [   78.821550] DAR: 0000004c, DSISR: 40000000
      [   78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000
      [   78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
      [   78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
      [   78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
      [   78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
      [   78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94
      [   78.862581] LR [c0334a28] dst_alloc+0x40/0xc4
      [   78.868451] Call Trace:
      [   78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
      [   78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4
      [   78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc
      [   78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88
      [   78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78
      [   78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4
      [   78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0
      [   78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210
      [   78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38
      [   78.927295] --- Exception: c01 at 0xfe2d730
      [   78.927297]     LR = 0xfe2d71c
      [   78.939019] Instruction dump:
      [   78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
      [   78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050
      [   78.956464] ---[ end trace 05fa1ed7972487a1 ]---
      
      As commented by Benjamin Thery, the bug was introduced by
      f2fc6a54, while adding network
      namespaces support to ipv6 routes.
      Signed-off-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Acked-by: default avatarBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      9c44da04
    • Timo Teras's avatar
      af_key: Free dumping state on socket close · 1ead836b
      Timo Teras authored
      [ Upstream commit 05238204 ]
      
      Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
      dumping is on-going.
      Signed-off-by: default avatarTimo Teras <timo.teras@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ead836b
    • Alan Cox's avatar
      pcmcia: Fix broken abuse of dev->driver_data · 400f9f32
      Alan Cox authored
      [ Upstream commit: cec5eb7b ]
      
      PCMCIA abuses dev->private_data in the probe methods. Unfortunately it
      continues to abuse it after calling drv->probe() which leads to crashes and
      other nasties (such as bogus probes of multifunction devices) giving errors like
      
      pcmcia: registering new device pcmcia0.1
      kernel: 0.1: GetNextTuple: No more items
      
      Extract the passed data before calling the driver probe function that way
      we don't blow up when the driver reuses dev->private_data as its right.
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      400f9f32
    • Thomas Gleixner's avatar
      clockevents: remove WARN_ON which was used to gather information · bc3ac469
      Thomas Gleixner authored
      commit 61c22c34 upstream
      
      The issue of the endless reprogramming loop due to a too small
      min_delta_ns was fixed with the previous updates of the clock events
      code, but we had no information about the spread of this problem. I
      added a WARN_ON to get automated information via kerneloops.org and to
      get some direct reports, which allowed me to analyse the affected
      machines.
      
      The WARN_ON has served its purpose and would be annoying for a release
      kernel. Remove it and just keep the information about the increase of
      the min_delta_ns value.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      bc3ac469
    • Maciej W. Rozycki's avatar
      ntp: fix calculation of the next jiffie to trigger RTC sync · e0d725a2
      Maciej W. Rozycki authored
      commit 4ff4b9e1 upstream
      
      We have a bug in the calculation of the next jiffie to trigger the RTC
      synchronisation.  The aim here is to run sync_cmos_clock() as close as
      possible to the middle of a second.  Which means we want this function to
      be called less than or equal to half a jiffie away from when now.tv_nsec
      equals 5e8 (500000000).
      
      If this is not the case for a given call to the function, for this purpose
      instead of updating the RTC we calculate the offset in nanoseconds to the
      next point in time where now.tv_nsec will be equal 5e8.  The calculated
      offset is then converted to jiffies as these are the unit used by the
      timer.
      
      Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
      where the resulting value is rounded up.  As a result the range of
      now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
      rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.
      
      As a result if for example sync_cmos_clock() happens to be called at the
      time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
      TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
      same range of now.tv_nsec again.  Similarly for cases offsetted by an
      integer multiple of TICK_NSEC.
      
      This change addresses the problem by subtracting TICK_NSEC / 2 from the
      nanosecond offset to the next point in time where now.tv_nsec will be
      equal 5e8, effectively shifting the following rounding in
      timespec_to_jiffies() so that it produces a rounded-to-nearest result.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e0d725a2
    • Thomas Gleixner's avatar
      x86: HPET: read back compare register before reading counter · 9c57bca1
      Thomas Gleixner authored
      commit 72d43d9b upstream
      
      After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
      with small deltas. There seems to be a delay between writing the compare
      register and the transffer to the internal register which triggers the
      interrupt. Reading back the value makes sure, that it hit the internal
      match register befor we compare against the counter value.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      9c57bca1
    • Thomas Gleixner's avatar
      x86: HPET fix moronic 32/64bit thinko · 45f9d522
      Thomas Gleixner authored
      commit f7676254 upstream
      
      We use the HPET only in 32bit mode because:
      1) some HPETs are 32bit only
      2) on i386 there is no way to read/write the HPET atomic 64bit wide
      
      The HPET code unification done by the "moron of the year" did
      not take into account that unsigned long is different on 32 and
      64 bit.
      
      This thinko results in a possible endless loop in the clockevents
      code, when the return comparison fails due to the 64bit/332bit
      unawareness.
      
      unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
      but the final compare will fail and return -ETIME causing endless
      loops.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      45f9d522
    • Thomas Gleixner's avatar
      clockevents: broadcast fixup possible waiters · 92741d2d
      Thomas Gleixner authored
      commit 7300711e upstream
      
      Until the C1E patches arrived there where no users of periodic broadcast
      before switching to oneshot mode. Now we need to trigger a possible
      waiter for a periodic broadcast when switching to oneshot mode.
      Otherwise we can starve them for ever.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      92741d2d
    • Thomas Gleixner's avatar
      HPET: make minimum reprogramming delta useful · f8a5d65f
      Thomas Gleixner authored
      commit 7cfb0435 upstream
      
      The minimum reprogramming delta was hardcoded in HPET ticks,
      which is stupid as it does not work with faster running HPETs.
      The C1E idle patches made this prominent on AMD/RS690 chipsets,
      where the HPET runs with 25MHz. Set it to 5us which seems to be
      a reasonable value and fixes the problems on the bug reporters
      machines. We have a further sanity check now in the clock events,
      which increases the delta when it is not sufficient.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarLuiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
      Tested-by: default avatarDmitry Nezhevenko <dion@inhex.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      f8a5d65f
    • Thomas Gleixner's avatar
      clockevents: prevent endless loop lockup · 9b498932
      Thomas Gleixner authored
      commit 1fb9b7d2 upstream
      
      The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
      too small value of the HPET minumum delta for programming an event.
      
      The clockevents code needs to enforce an interrupt event on the clock event
      device in some cases. The enforcement code was stupid and naive, as it just
      added the minimum delta to the current time and tried to reprogram the device.
      When the minimum delta is too small, then this loops forever.
      
      Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
      and double the minimum delta value to make sure, that this does not happen again.
      Use the same function for both tick-oneshot and tick-broadcast code.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      9b498932
    • Thomas Gleixner's avatar
      clockevents: prevent multiple init/shutdown · 7f0a673a
      Thomas Gleixner authored
      commit 9c17bcda upstream
      
      While chasing the C1E/HPET bugreports I went through the clock events
      code inch by inch and found that the broadcast device can be initialized
      and shutdown multiple times. Multiple shutdowns are not critical, but
      useless waste of time. Multiple initializations are simply broken. Another
      CPU might have the device in use already after the first initialization and
      the second init could just render it unusable again.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7f0a673a
    • Thomas Gleixner's avatar
      clockevents: enforce reprogram in oneshot setup · ea16e1b4
      Thomas Gleixner authored
      commit 7205656a upstream
      
      In tick_oneshot_setup we program the device to the given next_event,
      but we do not check the return value. We need to make sure that the
      device is programmed enforced so the interrupt handler engine starts
      working. Split out the reprogramming function from tick_program_event()
      and call it with the device, which was handed in to tick_setup_oneshot().
      Set the force argument, so the devices is firing an interrupt.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ea16e1b4
    • Thomas Gleixner's avatar
      clockevents: prevent endless loop in periodic broadcast handler · cf25095c
      Thomas Gleixner authored
      commit d4496b39 upstream
      
      The reprogramming of the periodic broadcast handler was broken,
      when the first programming returned -ETIME. The clockevents code
      stores the new expiry value in the clock events device next_event field
      only when the programming time has not been elapsed yet. The loop in
      question calculates the new expiry value from the next_event value
      and therefor never increases.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      cf25095c
    • Venkatesh Pallipadi's avatar
      clockevents: prevent clockevent event_handler ending up handler_noop · 2a2bac60
      Venkatesh Pallipadi authored
      commit 7c1e7689 upstream
      
      There is a ordering related problem with clockevents code, due to which
      clockevents_register_device() called after tickless/highres switch
      will not work. The new clockevent ends up with clockevents_handle_noop as
      event handler, resulting in no timer activity.
      
      The problematic path seems to be
      
      * old device already has hrtimer_interrupt as the event_handler
      * new clockevent device registers with a higher rating
      * tick_check_new_device() is called
        * clockevents_exchange_device() gets called
          * old->event_handler is set to clockevents_handle_noop
        * tick_setup_device() is called for the new device
          * which sets new->event_handler using the old->event_handler which is noop.
      
      Change the ordering so that new device inherits the proper handler.
      
      This does not have any issue in normal case as most likely all the clockevent
      devices are setup before the highres switch. But, can potentially be affecting
      some corner case where HPET force detect happens after the highres switch.
      This was a problem with HPET in MSI mode code that we have been experimenting
      with.
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2a2bac60
    • Prarit Bhargava's avatar
      x86: fix memmap=exactmap boot argument · 579b4b38
      Prarit Bhargava authored
      Backport of d6be118a by Chuck Ebbert
      
      When using kdump modifying the e820 map is yielding strange results.
      
      For example starting with
      
       BIOS-provided physical RAM map:
       BIOS-e820: 0000000000000100 - 0000000000093400 (usable)
       BIOS-e820: 0000000000093400 - 00000000000a0000 (reserved)
       BIOS-e820: 0000000000100000 - 000000003fee0000 (usable)
       BIOS-e820: 000000003fee0000 - 000000003fef3000 (ACPI data)
       BIOS-e820: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
       BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
       BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
       BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
       BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
       BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
      
      and booting with args
      
      memmap=exactmap memmap=640K@0K memmap=5228K@16384K memmap=125188K@22252K memmap=76K#1047424K memmap=564K#1047500K
      
      resulted in:
      
       user-defined physical RAM map:
       user: 0000000000000000 - 0000000000093400 (usable)
       user: 0000000000093400 - 00000000000a0000 (reserved)
       user: 0000000000100000 - 000000003fee0000 (usable)
       user: 000000003fee0000 - 000000003fef3000 (ACPI data)
       user: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
       user: 000000003ff80000 - 0000000040000000 (reserved)
       user: 00000000e0000000 - 00000000f0000000 (reserved)
       user: 00000000fec00000 - 00000000fec10000 (reserved)
       user: 00000000fee00000 - 00000000fee01000 (reserved)
       user: 00000000ff000000 - 0000000100000000 (reserved)
      
      But should have resulted in:
      
       user-defined physical RAM map:
       user: 0000000000000000 - 00000000000a0000 (usable)
       user: 0000000001000000 - 000000000151b000 (usable)
       user: 00000000015bb000 - 0000000008ffc000 (usable)
       user: 000000003fee0000 - 000000003ff80000 (ACPI data)
      
      This is happening because of an improper usage of strcmp() in the
      e820 parsing code.  The strcmp() always returns !0 and never resets the
      value for e820.nr_map and returns an incorrect user-defined map.
      
      This patch fixes the problem.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      579b4b38
    • Chuck Ebbert's avatar
      x86: add io delay quirk for Presario F700 · 130bdec9
      Chuck Ebbert authored
      commit e6a5652f upstream
      
      Manually adding "io_delay=0xed" fixes system lockups in ioapic
      mode on this machine.
      
      System Information
      	Manufacturer: Hewlett-Packard
      	Product Name: Presario F700 (KA695EA#ABF)
      
      Base Board Information
      	Manufacturer: Quanta
      	Product Name: 30D3
      
      Reference:
      https://bugzilla.redhat.com/show_bug.cgi?id=459546Signed-off-by: default avatarChuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      130bdec9
    • Zhao Yakui's avatar
      ACPI: Avoid bogus EC timeout when EC is in Polling mode · e6908f26
      Zhao Yakui authored
      commit 9d699ed9 upstream
      
      When EC is in Polling mode, OS will check the EC status continually by using
      the following source code:
             clear_bit(EC_FLAGS_WAIT_GPE, &ec->flags);
             while (time_before(jiffies, delay)) {
                     if (acpi_ec_check_status(ec, event))
             	            return 0;
                     msleep(1);
             }
      But msleep is realized by the function of schedule_timeout. At the same time
      although one process is already waken up by some events, it won't be scheduled
      immediately. So maybe there exists the following phenomena:
           a. The current jiffies is already after the predefined jiffies.
      	But before timeout happens, OS has no chance to check the EC
      	status again.
           b. If preemptible schedule is enabled, maybe preempt schedule will happen
      	before checking loop. When the process is resumed again, maybe
      	timeout already happens, which means that OS has no chance to check
      	the EC status.
      
      In such case maybe EC status is already what OS expects when timeout happens.
      But OS has no chance to check the EC status and regards it as AE_TIME.
      
      So it will be more appropriate that OS will try to check the EC status again
      when timeout happens. If the EC status is what we expect, it won't be regarded
      as timeout. Only when the EC status is not what we expect, it will be regarded
      as timeout, which means that EC controller can't give a response in time.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=9823
      http://bugzilla.kernel.org/show_bug.cgi?id=11141Signed-off-by: default avatarZhao Yakui <yakui.zhao@intel.com>
      Signed-off-by: default avatarZhang Rui  <rui.zhang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      
      e6908f26
    • Pekka Paalanen's avatar
      x86: fix SMP alternatives: use mutex instead of spinlock, text_poke is sleepable · fe1c8324
      Pekka Paalanen authored
      commit 2f1dafe5 upstream
      
      text_poke is sleepable.
      The original fix by Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>.
      Signed-off-by: default avatarPekka Paalanen <pq@iki.fi>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fe1c8324
    • Ingo Molnar's avatar
      rtc: fix deadlock · 459872ff
      Ingo Molnar authored
      commit 38c052f8 upstream
      
      if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly
      in it, waiting for jiffies to increment.
      
      So make the code more robust by doing an explicit mdelay(20).
      
      This solves a very hard to reproduce/debug hard lockup reported
      by Mikael Pettersson.
      Reported-by: default avatarMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      459872ff
    • Nick Piggin's avatar
      mm: dirty page tracking race fix · 2d5794c1
      Nick Piggin authored
      commit 479db0bf upstream
      
      There is a race with dirty page accounting where a page may not properly
      be accounted for.
      
      clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
      
      page_mkclean walks the rmaps for that page, and for each one it cleans and
      write protects the pte if it was dirty.  It uses page_check_address to
      find the pte.  That function has a shortcut to avoid the ptl if the pte is
      not present.  Unfortunately, the pte can be switched to not-present then
      back to present by other code while holding the page table lock -- this
      should not be a signal for page_mkclean to ignore that pte, because it may
      be dirty.
      
      For example, powerpc64's set_pte_at will clear a previously present pte
      before setting it to the desired value.  There may also be other code in
      core mm or in arch which do similar things.
      
      The consequence of the bug is loss of data integrity due to msync, and
      loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
      also be unreliable (depending on the exact XIP locking scheme), which can
      lead to data corruption.
      
      Fix this by having an option to always take ptl to check the pte in
      page_check_address.
      
      It's possible to retain this optimization for page_referenced and
      try_to_unmap.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Carsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2d5794c1