Commits · 3945c4161ccb0423757e331145144835b3e85b57 · linux / linux-davinci

14 Dec, 2007 19 commits

KVM: SVM: Intercept the 'invd' and 'wbinvd' instructions · 3945c416

Avi Kivity authored Dec 02, 2007

patch cf5a94d1 in mainline.

'invd' can destroy host data, and 'wbinvd' allows the guest to induce
long (milliseconds) latencies.

Noted by Ben Serebrin.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3945c416

KVM: x86 emulator: invd instruction · 141f41dd

Avi Kivity authored Dec 02, 2007

patch 651a3e29 in mainline.

Emulate the 'invd' instruction (opcode 0f 08).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

141f41dd

KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3 · fb2fc4cf

Aurelien Jarno authored Dec 02, 2007

patch 4e62417b in mainline.

The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.

It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fb2fc4cf

KVM: x86 emulator: implement 'movnti mem, reg' · 117b22ff

Sheng Yang authored Dec 02, 2007

patch a012e65a in mainline.

Implement emulation of instruction:
    movnti m32/m64, r32/r64
    opcode: 0x0f 0xc3

Needed to support Linux 2.6.16 as guest (used for mmio).
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

117b22ff

hrtimers: avoid overflow for large relative timeouts (-5966) · ee6abc25

Thomas Gleixner authored Dec 07, 2007

patch 62f0f61e in mainline

Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().

This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.

This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ee6abc25

forcedeth boot delay fix · 8d88a7d7

Ayaz Abdulla authored Nov 21, 2007

patch 9e555930 in mainline.

Fix a long boot delay in the forcedeth driver.  During initialization, the
timeout for the handshake between mgmt unit and driver can be very long.
The patch reduces the timeout by eliminating a extra loop around the
timeout logic.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9308Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Alex Howells <astinus@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

8d88a7d7

forcedeth: new mcp79 pci ids · 2cf220bb

Ayaz Abdulla authored Nov 23, 2007

patch 490dde89 in mainline.

This patch adds new device ids and features for mcp79 devices into the
forcedeth driver.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

index 92ce2e3..f9ba0ac 100644

2cf220bb

I4L: fix isdn_ioctl memory overrun vulnerability · 27b39667

Karsten Keil authored Dec 01, 2007

patch eafe1aa3 in mainline.

Fix possible memory overrun issue in the isdn ioctl code.  Found by ADLAB
<adlab@venustech.com.cn>
Signed-off-by: Karsten Keil <kkeil@suse.de>
Cc: ADLAB <adlab@venustech.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

27b39667

tmpfs: restore missing clear_highpage · 831ac1f2

Hugh Dickins authored Nov 28, 2007

patch e84e2e13 in mainline

tmpfs was misconverted to __GFP_ZERO in 2.6.11.  There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

831ac1f2

USB: fix up EHCI startup synchronization · d57b4ae3

David Brownell authored Nov 28, 2007

patch 1cb52658 in mainline.

A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started.  This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Dave Miller <davem@davemloft.net>
Cc: Dely Sy <dely.l.sy@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d57b4ae3

USB: make the microtek driver and HAL cooperate · dd5cca4e

Oliver Neukum authored Nov 28, 2007

patch 5cf1973a in mainline

to make HAL like the microtek driver's devices the parent must be
correctly set.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dd5cca4e

Input: ALPS - add support for model found in Dell Vostro 1400 · 955ab48d

William Pettersson authored Nov 21, 2007

changeset dac4ae0d in mainline.

Input: ALPS - add support for model found in Dell Vostro 1400
Signed-off-by: William Pettersson <william.pettersson@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

955ab48d

Fix synchronize_irq races with IRQ handler · bedda54e

Herbert Xu authored Nov 21, 2007

patch a98ce5c6 in mainline.

Fix synchronize_irq races with IRQ handler

As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers.  For example,
the tg3 driver does

tp->irq_sync = 1;
smp_mb();
synchronize_irq();

and then in the IRQ handler:

if (!tp->irq_sync)
	netif_rx_schedule(dev, &tp->napi);

Unfortunately memory barriers only work well when they come in
pairs.  Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.

In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.

This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bedda54e

PKT_SCHED: Check subqueue status before calling hard_start_xmit · 96219ec9

Peter P Waskiewicz Jr authored Nov 21, 2007

[PKT_SCHED]: Check subqueue status before calling hard_start_xmit

[ Upstream commit: 5f1a485d ]

The only qdiscs that check subqueue state before dequeue'ing are PRIO
and RR.  The other qdiscs, including the default pfifo_fast qdisc,
will allow traffic bound for subqueue 0 through to hard_start_xmit.
The check for netif_queue_stopped() is done above in pkt_sched.h, so
it is unnecessary for qdisc_restart().  However, if the underlying
driver is multiqueue capable, and only sets queue states on subqueues,
this will allow packets to enter the driver when it's currently unable
to process packets, resulting in expensive requeues and driver
entries.  This patch re-adds the check for the subqueue status before
calling hard_start_xmit, so we can try and avoid the driver entry when
the queues are stopped.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

96219ec9

sched: some proc entries are missed in sched_domain sys_ctl debug code · ca21e46b

Zou Nan hai authored Oct 15, 2007

patch ace8b3d6 in mainline.

cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ca21e46b

Future of Linux 2.6.22.y series · 5ade3f9f

Christian Borntraeger authored Nov 06, 2007

commit 5d0360ee upstream.

We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.

It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver.  On memory pressure
shrink_zone runs and it starts to run shrink_active_list.  There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.

The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5ade3f9f

NETFILTER: Fix NULL pointer dereference in nf_nat_move_storage() · 992a69a7

Evgeniy Polyakov authored Nov 21, 2007

[NETFILTER]: Fix NULL pointer dereference in nf_nat_move_storage()

[ Upstream commit: 77996525 ]

Reported by Chuck Ebbert as:

	https://bugzilla.redhat.com/show_bug.cgi?id=259501#c14

This routine is called each time hash should be replaced, nf_conn has
extension list which contains pointers to connection tracking users
(like nat, which is right now the only such user), so when replace takes
place it should copy own extensions. Loop above checks for own
extension, but tries to move higer-layer one, which can lead to above
oops.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

992a69a7

NET: random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR · 671369b6

Eric Dumazet authored Nov 21, 2007

[NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR

[ Upstream commit: 6dd10a62 ]

All 32 bits machines but i386 dont have CONFIG_KTIME_SCALAR. On these
machines, ktime.tv64 is more than 4 times the (correct) result given
by ktime_to_ns()

Again on these machines, using ktime_get_real().tv64 >> 6 give a
32bits rollover every 64 seconds, which is not wanted (less than the
120 s MSL)

Using ktime_to_ns() is the portable way to get nsecs from a ktime, and
have correct code.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

671369b6

libertas: properly account for queue commands · 857c440a

Marcelo Tosatti authored Nov 20, 2007

patch 29f5f2a1 in mainline.

Properly account for queue commands, this fixes a problem reported
by Holger Schurig when using the debugfs interface.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

857c440a

26 Nov, 2007 21 commits

Linux 2.6.23.9 · 8996d0af
Greg Kroah-Hartman authored Nov 26, 2007

8996d0af

ipw2200: batch non-user-requested scan result notifications · 6bb559a3

Dan Williams authored Oct 09, 2007

patch 0b531676 in mainline.

ipw2200 makes extensive use of background scanning when unassociated or
down.  Unfortunately, the firmware sends scan completed events many
times per second, which the driver pushes directly up to userspace.
This needlessly wakes up processes listening for wireless events many
times per second.  Batch together scan completed events for
non-user-requested scans and send them up to userspace every 4 seconds.
Scan completed events resulting from an SIOCSIWSCAN call are pushed up
without delay.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

6bb559a3

USB: Nikon D40X unusual_devs entry · 590ca6cb

Ortwin Glück authored Oct 11, 2007

patch d466a919 in mainline.

Not surprisingly the Nikon D40X DSC needs the same quirks as the D40,
but it has a separate ID.
See http://bugs.gentoo.org/show_bug.cgi?id=191431

From: Ortwin Glück <odi@odi.ch>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

590ca6cb

USB: unusual_devs modification for Nikon D200 · 1ed8fc8e

Phil Dibowitz authored Sep 22, 2007

patch 16eb345f in mainline.

Upgrade the unusual_devs.h file to support the Nikon D200
Signed-off-by: Mike Pagano <mpagano-kernel@mpagano.com>
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1ed8fc8e

softlockup: use cpu_clock() instead of sched_clock() · fd5ec14e

Ingo Molnar authored Oct 16, 2007

patch a3b13c23 in mainline.

sched_clock() is not a reliable time-source, use cpu_clock() instead.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fd5ec14e

softlockup watchdog fixes and cleanups · 00aceb50

Ingo Molnar authored Nov 18, 2007

This is a merge of commits a5f2ce3c and
43581a10 in mainline to fix a warning in
the 2.6.23.3 kernel release.

softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

text    data     bss     dec     hex filename
1126      76       4    1206     4b6 softlockup.o.before
1129      76       4    1209     4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
the printk messages. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

BUG: soft lockup detected on CPU#1!
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c0105f59>] dump_stack+0x14/0x16
[<c015f6bc>] softlockup_tick+0xbe/0xd0
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fb8>] tick_sched_timer+0x7c/0xc0
[<c0140a75>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

The new format is:

BUG: soft lockup detected on CPU#1! [prctl:2363]

Pid: 2363, comm:                prctl
EIP: 0060:[<c013915f>] CPU: 1
EIP is at sys_prctl+0x24/0x18c
EFLAGS: 00000213    Not tainted  (2.6.22-cfs-v20 #26)
EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c01040be>] show_regs+0x1ab/0x1b3
[<c015f807>] softlockup_tick+0xef/0x108
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fcc>] tick_sched_timer+0x7c/0xc0
[<c0140a89>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]

This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)

The task name is printed too first, just in case we dont manage to print
a useful backtrace.

[satyam@infradead.org: fix warning]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

00aceb50

x86: fix freeze in x86_64 RTC update code in time_64.c · 653e60e2

David P. Reed authored Nov 14, 2007

patch c399da0d in mainline.

x86: fix freeze in x86_64 RTC update code in time_64.c

Fix hard freeze on x86_64 when the ntpd service calls
update_persistent_clock()

A repeatable but randomly timed freeze has been happening in Fedora 6
and 7 for the last year, whenever I run the ntpd service on my AMD64x2
HP Pavilion dv9000z laptop.  This freeze is due to the use of
spin_lock(&rtc_lock) under the assumption (per a bad comment) that
set_rtc_mmss is called only with interrupts disabled.  The call from
ntp.c to update_persistent_clock is made with interrupts enabled.

[ tglx@linutronix.de: ported to 2.6.23.stable ]
Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

653e60e2

ntp: fix typo that makes sync_cmos_clock erratic · 2d429c89

David P. Reed authored Nov 14, 2007

patch fa6a1a55 in mainline.

ntp: fix typo that makes sync_cmos_clock erratic

Fix a typo in ntp.c that has caused updating of the persistent (RTC)
clock when synced to NTP to behave erratically.

When debugging a freeze that arises on my AMD64 machines when I
run the ntpd service, I added a number of printk's to monitor the
sync_cmos_clock procedure.  I discovered that it was not syncing to
cmos RTC every 11 minutes as documented, but instead would keep trying
every second for hours at a time.  The reason turned out to be a typo
in sync_cmos_clock, where it attempts to ensure that
update_persistent_clock is called very close to 500 msec. after a 1
second boundary (required by the PC RTC's spec). That typo referred to
"xtime" in one spot, rather than "now", which is derived from "xtime"
but not equal to it.  This makes the test erratic, creating a
"coin-flip" that decides when update_persistent_clock is called - when
it is called, which is rarely, it may be at any time during the one
second period, rather than close to 500 msec, so the value written is
needlessly incorrect, too.
Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2d429c89

x86: return correct error code from child_rip in x86_64 entry.S · 26b880d6

Andrey Mirkin authored Oct 17, 2007

patch 1c5b5cfd in mainline.

x86: return correct error code from child_rip in x86_64 entry.S

Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.

AK: changed to 32bit move because it's strictly an int

[ tglx: arch/x86 adaptation ]
Signed-off-by: Andrey Mirkin <major@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

26b880d6

x86: NX bit handling in change_page_attr() · 5b7e28db

Huang, Ying authored Oct 17, 2007

patch 84e0fdb1 in mainline.

x86: NX bit handling in change_page_attr()

This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel x86_64 CPUs.  After changing page attribute to be executable with
these functions, the page remains un-executable on Intel x86_64 CPU.
Because on Intel x86_64 CPU, only if the "NX" bits of all four level
page tables are cleared, the corresponding page is executable (refer to
section 4.13.2 of Intel 64 and IA-32 Architectures Software Developer's
Manual).  So, the bug is fixed through clearing the "NX" bit of PMD when
splitting the huge PMD.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5b7e28db

x86: mark read_crX() asm code as volatile · 420b463a

Kirill Korotaev authored Oct 17, 2007

patch c1217a75 in mainline.

x86: mark read_crX() asm code as volatile

Some gcc versions (I checked at least 4.1.1 from RHEL5 & 4.1.2 from gentoo)
can generate incorrect code with read_crX()/write_crX() functions mix up,
due to cached results of read_crX().

The small app for x8664 below compiled with -O2 demonstrates this
(i686 does the same thing):
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

420b463a

x86: fix off-by-one in find_next_zero_string · 2e6792e3

Andrew Hastings authored Oct 17, 2007

patch 801916c1 in mainline.

x86: fix off-by-one in find_next_zero_string

Fix an off-by-one error in find_next_zero_string which prevents
allocating the last bit.

[ tglx: arch/x86 adaptation ]
Signed-off-by: Andrew Hastings <abh@cray.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2e6792e3

i386: avoid temporarily inconsistent pte-s · df84bfba

Jan Beulich authored Oct 17, 2007

patch aa506dc7 in mainline.

i386: avoid temporarily inconsistent pte-s

One more of these issues (which were considered fixed a few releases
back): other than on x86-64, i386 allows set_fixmap() to replace
already present mappings. Consequently, on PAE, care must be taken to
not update the high half of a pte while the low half is still holding
the old value.

 [tglx: arch/x86 adaptation]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

df84bfba

libcrc32c: keep intermediate crc state in cpu order · 332a20e3

Herbert Xu authored Nov 15, 2007

It's upstream changeset ef19454b.

[LIB] crc32c: Keep intermediate crc state in cpu order

crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.
Signed-off-by: Benny Halevy <bhalevy@fs1.bhalevy.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

332a20e3

geode: Fix not inplace encryption · ca1b1e5c

Sebastian Siewior authored Nov 10, 2007

patch 2e21630d in mainline.

Currently the Geode AES module fails to encrypt or decrypt if
the coherent bits are not set what is currently the case if the
encryption does not occur inplace. However, the encryption works
on my Geode machine _only_ if the coherent bits are always set.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ca1b1e5c

Fix divide-by-zero in the 2.6.23 scheduler code · 23a5e6a5

Chuck Ebbert authored Nov 14, 2007

No patch in mainline as this logic has been removed from 2.6.24 so it is
not necessary.


https://bugzilla.redhat.com/show_bug.cgi?id=340161

The problem code has been removed in 2.6.24. The below patch disables
SCHED_FEAT_PRECISE_CPU_LOAD which causes the offending code to be skipped
but does not prevent the user from enabling it.

The divide-by-zero is here in kernel/sched.c:

static void update_cpu_load(struct rq *this_rq)
{
	u64 fair_delta64, exec_delta64, idle_delta64, sample_interval64, tmp64;
	unsigned long total_load = this_rq->ls.load.weight;
	unsigned long this_load =  total_load;
	struct load_stat *ls = &this_rq->ls;
	int i, scale;

	this_rq->nr_load_updates++;
	if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
		goto do_avg;

	/* Update delta_fair/delta_exec fields first */
	update_curr_load(this_rq);

	fair_delta64 = ls->delta_fair + 1;
	ls->delta_fair = 0;

	exec_delta64 = ls->delta_exec + 1;
	ls->delta_exec = 0;

	sample_interval64 = this_rq->clock - ls->load_update_last;
	ls->load_update_last = this_rq->clock;

	if ((s64)sample_interval64 < (s64)TICK_NSEC)
		sample_interval64 = TICK_NSEC;

	if (exec_delta64 > sample_interval64)
		exec_delta64 = sample_interval64;

	idle_delta64 = sample_interval64 - exec_delta64;

======>	tmp64 = div64_64(SCHED_LOAD_SCALE * exec_delta64, fair_delta64);
	tmp64 = div64_64(tmp64 * exec_delta64, sample_interval64);

	this_load = (unsigned long)tmp64;

do_avg:

	/* Update our load: */
	for (i = 0, scale = 1; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
		unsigned long old_load, new_load;

		/* scale is effectively 1 << i now, and >> i divides by scale */

		old_load = this_rq->cpu_load[i];
		new_load = this_load;

		this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
	}
}

For stable only; the code has been removed in 2.6.24.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

23a5e6a5

ACPI: VIDEO: Adjust current level to closest available one. · 77675886

Alexey Starikovskiy authored Nov 15, 2007

patch 63f0edfc in mainline.

ACPI: VIDEO: Adjust current level to closest available one.
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

77675886

libata: sata_sis: use correct S/G table size · 2fcce6c9

Jeff Garzik authored Nov 15, 2007

patch 96af1547 in mainline.

[libata] sata_sis: use correct S/G table size

sata_sis has the same restrictions as other SFF controllers, and so must
use LIBATA_MAX_PRD to denote that SCSI may only fill ATA_MAX_PRD/2
entries, due to our need to handle IOMMU merging.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2fcce6c9

sata_sis: fix SCR read breakage · 458c3a1a

Tejun Heo authored Nov 15, 2007

patch aaa092a1 in mainline.

sata_sis: fix SCR read breakage

SCR read for controllers which uses PCI configuration space for SCR
access got broken while adding @val argument to SCR accessors.  Fix
it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

458c3a1a

reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file · a47a72d3

Fengguang Wu authored Nov 14, 2007

patch c06a018f in mainline.

This is not a new problem in 2.6.23-git17.  2.6.22/2.6.23 is buggy in the
same way.

Reiserfs could accumulate dirty sub-page-size files until umount time. 
They cannot be synced to disk by pdflush routines or explicit `sync'
commands.  Only `umount' can do the trick.

The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
	 [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
	 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
	 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
	 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
	 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
	 [<ffffffff802a187c>] __fput+0xcc/0x1b0
	 [<ffffffff802a1ba6>] fput+0x16/0x20
	 [<ffffffff8029e676>] filp_close+0x56/0x90
	 [<ffffffff8029fe0d>] sys_close+0xad/0x110
	 [<ffffffff8020c41e>] system_call+0x7e/0x83

Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.

=== for the patient ===
Here are more detailed demonstrations of the problem.

1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
   and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___UD__Bd_      2
[T2] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___U___Bd_      2

2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# echo hi > /tmp/hi
[T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
[T2] hi
[T3] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /proc/4397# cd /proc/`pidof cp`
[T1] root /proc/4713# cat io
     rchar: 8396
     wchar: 3
     syscr: 20
     syscw: 1
     read_bytes: 0
     write_bytes: 20480
     cancelled_write_bytes: 4096
[T2] root /proc/4713# cat io
     rchar: 8399
     wchar: 6
     syscr: 21
     syscw: 2
     read_bytes: 0
     write_bytes: 24576
     cancelled_write_bytes: 4096

//Question: the 'write_bytes' is a bit more than expected ;-)
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Reviewed-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a47a72d3

x86: disable preemption in delay_tsc() · 778b656e

Andrew Morton authored Nov 14, 2007

patch 35d5d08a in mainline.

Marin Mitov points out that delay_tsc() can misbehave if it is preempted and
rescheduled on a different CPU which has a skewed TSC.  Fix it by disabling
preemption.

(I assume that the worst-case behaviour here is a stall of 2^32 cycles)

Cc: Andi Kleen <ak@suse.de>
Cc: Marin Mitov <mitov@issp.bas.bg>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

778b656e