- 12 Oct, 2009 21 commits
-
-
Jean Delvare authored
commit b607bd90 upstream. Which is why I have always preferred sizeof(struct foo) over sizeof(var). Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Joerg Roedel authored
commit 20824f30 upstream. When running nested we need to touch the l1 guests tsc_offset. Otherwise changes will be lost or a wrong value be read. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Joerg Roedel authored
commit 77b1ab17 upstream. When svm_vcpu_load is called while the vcpu is running in guest mode the tsc adjustment made there is lost on the next emulated #vmexit. This causes the tsc running backwards in the guest. This patch fixes the issue by also adjusting the tsc_offset in the emulated hsave area so that it will not get lost. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Aurelien Jarno authored
commit b2d83cfa upstream. Don't overflow when computing the 64-bit period from 32-bit registers. Fixes sourceforge bug #2826486. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Marcelo Tosatti authored
commit eb5109e3 upstream. It is possible that stale EPTP-tagged mappings are used, if a vcpu migrates to a different pcpu. Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which will invalidate both VPID and EPT mappings on the next vm-entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Avi Kivity authored
commit 6a544355 upstream. The number of entries is multiplied by the entry size, which can overflow on 32-bit hosts. Bound the entry count instead. Reported-by: David Wagner <daw@cs.berkeley.edu> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Mark Brown authored
commit 5b7dde34 upstream. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Clemens Ladisch authored
commit 2fb930b5 upstream. The "VIA DXS" controls are actually volume controls that apply to the four PCM substreams, so we better indicate this connection by moving the controls to the PCM interface. Commit b452e08e in 2.6.30 broke the restoring of these volumes by "alsactl restore" that most distributions use; the renaming in this patch cures that regression by preventing alsactl from applying the old, wrong volume levels to the new controls. http://bugzilla.kernel.org/show_bug.cgi?id=14151 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=532613Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
commit 3b761d3d upstream. While trying to work around spurious detection retries for non-existent devices on slave links, commit 816ab897 incorrectly added link offline check logic before ata_eh_thaw() was called. This means that if an occupied link goes down briefly at the time that offline check was performed, device class will be cleared to ATA_DEV_NONE and libata wouldn't retry thus failing detection of the device. The offline check should be done after the port is thawed together with online check so that such link glitches can be detected by the interrupt handler and handled properly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tim Blechmann <tim@klingt.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Mimi Zohar authored
commit 36520be8 upstream. The unencrypted files are being measured. Update the counters to get rid of the ecryptfs imbalance message. (http://bugzilla.redhat.com/519737) Reported-by: Sachin Garg Cc: Eric Paris <eparis@redhat.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: James Morris <jmorris@namei.org> Cc: David Safford <safford@watson.ibm.com> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Eero Nurkkala authored
commit fdc6f192 upstream. Commit f2e21c96 had unfortunate side effects with cpufreq governors on some systems. If the system did not switch into NOHZ mode ts->inidle is not set when tick_nohz_stop_sched_tick() is called from the idle routine. Therefor all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick() fail to call tick_nohz_start_idle(). This results in bogus idle accounting information which is passed to cpufreq governors. Set the inidle flag unconditionally of the NOHZ active state to keep the idle time accounting correct in any case. [ tglx: Added comment and tweaked the changelog ] Reported-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@nokia.com> Cc: Rik van Riel <riel@redhat.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Steven Noonan <steven@uplinklabs.net> LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit eaaea803 upstream. Rich reported a lock imbalance in the futex code: http://bugzilla.kernel.org/show_bug.cgi?id=14288 It's caused by the displacement of the retry_private label in futex_wake_op(). The code unlocks the hash bucket locks in the error handling path and retries without locking them again which makes the next unlock fail. Move retry_private so we lock the hash bucket locks when we retry. Reported-by: Rich Ercolany <rercola@acm.jhu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <dvhltc@us.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit fc6b177d upstream. The robust list pointers of user space held futexes are kept intact over an exec() call. When the exec'ed task exits exit_robust_list() is called with the stale pointer. The risk of corruption is minimal, but still it is incorrect to keep the pointers valid. Actually glibc should uninstall the robust list before calling exec() but we have to deal with it anyway. Nullify the pointers after [compat_]exit_robust_list() has been called. Reported-by: Anirban Sinha <ani@anirban.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 322a2c10 upstream. exit_pi_state() is called from do_exit() but not from do_execve(). Move it to release_mm() so it gets called from do_execve() as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Anirban Sinha <ani@anirban.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Darren Hart authored
commit da085681 upstream. If futex_wait_requeue_pi() wakes prior to requeue, we drop the reference to the source futex_key twice, once in handle_early_requeue_pi_wakeup() and once on our way out. Remove the drop from the handle_early_requeue_pi_wakeup() and keep the get/drops together in futex_wait_requeue_pi(). Reported-by: Helge Bahmann <hcb@chaoticmind.net> Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Helge Bahmann <hcb@chaoticmind.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <4ACCE21E.5030805@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Steven Rostedt authored
commit 3279ba37 upstream. Due to legacy code from back when the dynamic tracer used a daemon, only core kernel code was checking for failures. This is no longer the case. We must check for failures any time we perform text modifications. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
jolsa@redhat.com authored
commit e7247a15 upstream. When the module is about the unload we release its call records. The ftrace_release function was given wrong values representing the module core boundaries, thus not releasing its call records. Plus making ftrace_release function module specific. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Manoj Iyer authored
commit 3db6c037 upstream. Patch was tested on Toshiba NB200 and is found to enable sound. Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jan Beulich authored
commit 24e35800 upstream. While 32-bit processes can't directly access R8...R15, they can gain access to these registers by temporarily switching themselves into 64-bit mode. Therefore, registers not preserved anyway by called C functions (i.e. R8...R11) must be cleared prior to returning to user mode. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4AC34D73020000780001744A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Linus Torvalds authored
commit 0b5759c6 upstream. A couple of people have hit the WARN_ON() in drivers/char/tty_io.c, tty_open() that is unhappy about seeing the tty line discipline go away during the tty hangup. See for example http://bugzilla.kernel.org/show_bug.cgi?id=14255 and the reason is that we do the tty_ldisc_halt() outside the ldisc_mutex in order to be able to flush the scheduled work without a deadlock with vhangup_work. However, it turns out that we can solve this particular case by - using "cancel_delayed_work_sync()" in tty_ldisc_halt(), which waits for just the particular work, rather than synchronizing with any random outstanding pending work. This won't deadlock, since the buf.work we synchronize with doesn't care about the ldisc_mutex, it just flushes the tty ldisc buffers. - realize that for this particular case, we don't need to wait for any hangup work, because we are inside the hangup codepaths ourselves. so as a result we can just drop the flush_scheduled_work() entirely, and then move the tty_ldisc_halt() call to inside the mutex. That way we never expose the partially torn down ldisc state to tty_open(), and hold the ldisc_mutex over the whole sequence. Reported-by: Ingo Molnar <mingo@elte.hu> Reported-by: Heinz Diehl <htd@fancy-poultry.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Samuel Thibault authored
commit 392d814d upstream. Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a memory clobber, as it is only passed the address of the buffer, not a memory reference to the buffer itself. This caused failures in Hurd's pfinetv4 when we tried to compile it with gcc-4.3 (bogus checksums). Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
- 07 Oct, 2009 2 commits
-
-
Greg Kroah-Hartman authored
-
Alan Stern authored
commit 1f5c13fa upstream. This patch (as1282) fixes some obvious typos in the TTY core. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
- 05 Oct, 2009 17 commits
-
-
Greg Kroah-Hartman authored
-
Wey-Yi Guy authored
This is commit 5bddf549 in linux-2.6. If NetworkManager is busy scanning when user tries to unload the module, the driver can not be unloaded because HW still scanning. Make sure driver sends abort scan host command to uCode if it is in the middle of scanning during driver unload. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Wey-Yi Guy authored
commit 415e4993 upstream. For devices using OTP memory, EEPROM image can start from any one of the OTP blocks. If shadow RAM is disabled, we need to traverse link list to find the last valid block, then start the EEPROM image reading. If OTP is not full, the valid block is the block _before_ the last block on the link list; the last block on the link list is the empty block ready for next OTP refresh/update. If OTP is full, then the last block is the valid block to be used for configure the device. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Wey-Yi Guy authored
commit 02c06e4a upstream. On 1000, there are two Switching Voltage Regulators (SVR). The first one apply digital voltage level (1.32V) for PCIe block and core. We need to use this regulator to solve a stability issue related to noisy DC2DC line in the silicon. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jay Sternberg authored
commit cce53aa3 upstream. firmware file now contains build number so API needs to be updated. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jay Sternberg authored
commit cc0f555d upstream. Adding new API version to account for change to ucode file format. New header includes the build number of the ucode. This build number is the SVN revision thus allowing for exact correlation to the code that generated it. The header adds the build number so that older ucode images can also be enhanced to include the build in the future. some cleanup in iwl_read_ucode needed to ensure old header not used and reduce unnecessary references through pointer with the data is already in heap variable. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
David Howells authored
commit 645d83c5 upstream. Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store might be mapped directly. Use the BDI_CAP_MAP_DIRECT capability flag to govern whether or not we should be trying to map a file directly. This can be used to determine whether or not a region has been filled in at the point where we call do_mmap_shared() or do_mmap_private(). The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if there's any reason we can't use it. It's also cleared in do_mmap_pgoff() if f_op->get_unmapped_area() fails. Without this fix, attempting to run a program from a RomFS image on a non-mappable MTD partition results in a BUG as the kernel attempts XIP, and this can be caught in gdb: Program received signal SIGABRT, Aborted. 0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547 (gdb) bt #0 0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547 #1 0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=<value optimized out>, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373 #2 0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145 #3 0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=<value optimized out>) at fs/binfmt_elf_fdpic.c:343 #4 0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234 #5 0xc006c648 in do_execve (filename=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356 #6 0xc0008cf0 in sys_execve (name=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263 #7 0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897 Note that this fix does the following commit differently: commit a190887b Author: David Howells <dhowells@redhat.com> Date: Sat Sep 5 11:17:07 2009 -0700 nommu: fix error handling in do_mmap_pgoff() Reported-by: Graff Yang <graff.yang@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Kashyap, Desai authored
commit c55b89fb upstream. This patch is solving problem for PAE kernel DMA operation. On PAE system dma_addr and unsigned long will have different values. Now dma_addr is not type casted using unsigned long. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Jan Beulich <JBeulich@novell.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jan Scholz authored
commit 42960a13 upstream. Commit fa047e4f "HID: fix inverted wheel for bluetooth version of apple mighty mouse" is incomplete. If we remove Apple MightyMouse (bluetooth version) from the list of apple_devices in drivers/hid/hid-apple.c we have to remove it from hid_blacklist in drivers/hid/hid-core.c as well. Signed-off-by: Jan Scholz <Scholz@fias.uni-frankfurt.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Weirich, Bernhard authored
[I'm going to fix upstream differently, by having all CPU types actually support _PAGE_SPECIAL, but I prefer the simple and obvious fix for -stable. -- Ben] The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on powerpc is bogus and will end up always defining it, even when _PAGE_SPECIAL is not supported (in which case it's 0) such as on 8xx or 40x processors. Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Rex Feany authored
commit e0908085 upstream. After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14c, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Brian Rogers authored
commit 7aedd5ec upstream. Tested on MSI TV@nywhere Plus. Original commit message: ir-kbd-i2c's ir_probe() function can be called much later (i.e. at ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data allocated off of the stack in cx18_i2c_new_ir() at registration time. Make sure we pass a pointer to a persistent IR_i2c_init_data object at i2c registration time. Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to rise this question. Before this patch, if ir-kbd-i2c were probed after SAA7134, trash data were used. Compile tested only, but the patch is identical to em28xx one. So, it should work properly. Original-patch-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [brian@xyzw.org: backported for 2.6.31] Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Brian Rogers authored
commit d2ebd0f8 upstream. Original commit message: ir-kbd-i2c's ir_probe() function can be called much later (i.e. at ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data allocated off of the stack in cx18_i2c_new_ir() at registration time. Make sure we pass a pointer to a persistent IR_i2c_init_data object at i2c registration time. Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to rise this question. Before this patch, if ir-kbd-i2c were probed after em28xx, trash data were used. After the patch, no matter what order, it is properly reported as tested by me: input: i2c IR (i2c IR (EM2840 Hauppaug as /class/input/input10 ir-kbd-i2c: i2c IR (i2c IR (EM2840 Hauppaug detected at i2c-4/4-0030/ir0 [em28xx #0] Original-patch-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [brian@xyzw.org: backported for 2.6.31] Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Chris Wilson authored
commit c715089f upstream. During a page fault and rebinding the buffer there exists a window for a signal to arrive during the i915_wait_request() and trigger a ERESTARTSYS. This used to be handled by returning SIGBUS and thereby killing the application. Try 'cairo-perf-trace & cairo-test-suite' and watch X go boom! The solution as suggested by H. Peter Anvin is to simply return NOPAGE and leave the higher layers to spot we did not fill the page and resubmit the page fault. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [anholt: Mostly squash it with another commit] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Michael Abbott authored
commit 96830a57 upstream. Git commit 79741dd3 changes idle cputime accounting, but unfortunately the /proc/uptime file hasn't caught up. Here the idle time calculation from /proc/stat is copied over. Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Lee Schermerhorn authored
commit 252c5f94 upstream. We noticed very erratic behavior [throughput] with the AIM7 shared workload running on recent distro [SLES11] and mainline kernels on an 8-socket, 32-core, 256GB x86_64 platform. On the SLES11 kernel [2.6.27.19+] with Barcelona processors, as we increased the load [10s of thousands of tasks], the throughput would vary between two "plateaus"--one at ~65K jobs per minute and one at ~130K jpm. The simple patch below causes the results to smooth out at the ~130k plateau. But wait, there's more: We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. This could be the result of the larger number of cpus on the larger platform--a scalability issue--or it could be the result of the larger number of interconnect "hops" between some nodes in this platform and how the tasks for a given load end up distributed over the nodes' cpus and memories--a stochastic NUMA effect. The variability in the results are less pronounced [on the same platform] with Shanghai processors and with mainline kernels. With 31-rc6 on Shanghai processors and 288 file systems on 288 fibre attached storage volumes, the curves [jpm vs load] are both quite flat with the patched kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K jpm] than the unpatched kernel. Profiling indicated that the "slow" runs were incurring high[er] contention on an anon_vma lock in vma_adjust(), apparently called from the sbrk() system call. The patch: A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the anon_vma lock when we're only adjusting the end of a vma, as is the case for brk(). The comment questions whether it's worth while to optimize for this case. Apparently, on the newer, larger x86_64 platforms, with interesting NUMA topologies, it is worth while--especially considering that the patch [if correct!] is quite simple. We can detect this condition--no overlap with next vma--by noting a NULL "importer". The anon_vma pointer will also be NULL in this case, so simply avoid loading vma->anon_vma to avoid the lock. However, we DO need to take the anon_vma lock when we're inserting a vma ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we need to check for 'insert', as well. And Hugh points out that we should also take it when adjusting vm_start (so that rmap.c can rely upon vma_address() while it holds the anon_vma lock). akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it might be -stable material even though thiss isn't a regression: "this issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is severe on large machine (4*8*2 cpu)" [hugh.dickins@tiscali.co.uk: test vma start too] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nick Piggin <npiggin@suse.de> Cc: Eric Whitney <eric.whitney@hp.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Hugh Dickins authored
commit 1ac0cb5d upstream. do_anonymous_page() has been wrong to dirty the pte regardless. If it's not going to mark the pte writable, then it won't help to mark it dirty here, and clogs up memory with pages which will need swap instead of being thrown away. Especially wrong if no overcommit is chosen, and this vma is not yet VM_ACCOUNTed - we could exceed the limit and OOM despite no overcommit. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-