Commits · 316343e2cfd9a4bb4c70d0e1991e7a74840fe29e · linux / linux-davinci

03 Sep, 2008 40 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 316343e2

Linus Torvalds authored Sep 03, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  bnx2x: Accessing un-mapped page
  ath9k: Fix TX control flag use for no ACK and RTS/CTS
  ath9k: Fix TX status reporting
  iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove
  iwlwifi: call apm stop on exit
  iwlwifi: fix Tx cmd memory allocation failure handling
  iwlwifi: fix rx_chain computation
  iwlwifi: fix station mimo power save values
  iwlwifi: remove false rxon if rx chain changes
  iwlwifi: fix hidden ssid discovery in passive channels
  iwlwifi: W/A for the TSF correction in IBSS
  netxen: Remove workaround for chipset quirk
  pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info
  ixgbe: initialize interrupt throttle rate
  net/usb/pegasus: avoid hundreds of diagnostics
  tipc: Don't use structure names which easily globally conflict.

316343e2

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · fca1287a
David S. Miller authored Sep 03, 2008

fca1287a

bnx2x: Accessing un-mapped page · 437cf2f1

Eilon Greenstein authored Sep 03, 2008

The allocated RX buffer size was 64 bytes bigger than the PCI mapped
size with no good reason. If the packet was actually using the buffer up
to its limit and if the last 64 bytes of the buffer crossed 4KB boundary
then an unmapped PCI page was accessed. The fix is to use only one
parameter for the buffer size - there is no need to differentiate
between the buffer size and the PCI mapping size since the extra 64
bytes can actually be used by the FW to align the Ethernet payload to
64 bytes.

Also updating the driver version and date
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

437cf2f1

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 · 56e9c0a6
David S. Miller authored Sep 03, 2008

56e9c0a6

ath9k: Fix TX control flag use for no ACK and RTS/CTS · 9aab3e3e

Jouni Malinen authored Aug 11, 2008

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

9aab3e3e

ath9k: Fix TX status reporting · 43f30ae0

Jouni Malinen authored Aug 11, 2008

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

43f30ae0

iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove · 0b124c31

Gregory Greenman authored Sep 03, 2008

This patch sets STATUS_EXIT_PENDING on pci_remove. Otherwise
iwl4965_down may fail to uninitialize the driver.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

0b124c31

iwlwifi: call apm stop on exit · d535311e

Gregory Greenman authored Sep 03, 2008

This patch calls apm stop on exit and suspend. Without this patch
hardware consumes power even after driver is removed or suspended.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

d535311e

iwlwifi: fix Tx cmd memory allocation failure handling · 73b7d742

Tomas Winkler authored Sep 03, 2008

This patch "iwlwifi: do not use GFP_DMA in iwl_tx_queue_init" removes
GFP_DMA from allocation tx command buffers. GFP_DMA allows allocation
only for memory under 16M which causes allocation problems
suspend/resume flows.

Using kmalloc is temporal solution and some consistent/coherent
allocation schema will be more correct. Since iwlwifi hardware
supports 64bit address this solution should work on x86 (32 and
64bit) for now.

This patch fixes memory freeing problem in the previous patch.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ian Schram <ischram@telenet.be>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

73b7d742

iwlwifi: fix rx_chain computation · 28a6b07a

Tomas Winkler authored Sep 03, 2008

This patch fixes rx_chain computation. The code that adjusts number of
rx chains to number supported by HW was missing. Miss configuration
causes firmware error.  Note: iwlwifi supports HW with up to 3 RX
chains (2x2, 2x3, 1x2, and 3x3 MIMO). This patch also simplifies the
whole RX chain computation.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

28a6b07a

iwlwifi: fix station mimo power save values · 4834c73f

Ron Rindjunsky authored Sep 03, 2008

This patch fixes the wrong use MIMO power save values. Our TX was
configured with our MIMO power save values instead of peer's MIMO power
save values, this may affect connectivity. The peer STA/AP may not sense
our traffic at all as it doesn't have all RX chains opened.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

4834c73f

iwlwifi: remove false rxon if rx chain changes · 54559703

Mohamed Abbas authored Sep 03, 2008

Rx chain might change during power save transitions but it doesn't
require sending Full-ROXN command to the firmware. Full-RXON requires
reconnection to an AP and thus affects user experience. The patch
avoids the Full-RXON by removing the rx_chain modification check in
iwl_full_rxon_required function.

Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

54559703

iwlwifi: fix hidden ssid discovery in passive channels · 0ffe014a

Ron Rindjunsky authored Sep 03, 2008

This enables sending of direct probes on passive channels, as long as
traffic was detected on that channel. This enables connectivity to
hidden/non broadcasting SSIDs APs on passive channels. Note 5000 HW
declares all 5.2 spectrum as passive.
Signed-off-by: Cahill Ben <ben.m.cahill@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

0ffe014a

iwlwifi: W/A for the TSF correction in IBSS · b94d8eea

Assaf Krauss authored Sep 03, 2008

This patch is a W/A for the TSF sync issue in IBSS merging. HW is not
capable to sync TSF (it's constantly little behind). This creates
constant IBSS merging upon reception of each beacon, adding and removing
station which in turn creates above 50% packet loss and thus dramatically
degrade the throughput. The W/A simply stops the driver from declaring it
has a reliable TSF value and thus eliminates IBSS merging.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b94d8eea

Split up PIT part of TSC calibration from native_calibrate_tsc · ec0c15af

Linus Torvalds authored Sep 03, 2008

The TSC calibration function is still very complicated, but this makes
it at least a little bit less so by moving the PIT part out into a
helper function of its own.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>

ec0c15af

netxen: Remove workaround for chipset quirk · 0b62afb4

Dhananjay Phadke authored Aug 27, 2008

Remove chipset-specific quirk workaround; the workaround caused
unrecoverable DMA lockups when the driver was loaded following a
PXE boot.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

0b62afb4

pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info · 2dcc9ff7

Komuro authored Aug 30, 2008

pcnet_cs:
    add new ID: "corega Ether PCC-TD".
    remove duplicate ID: "IC-CARD".

axnet_cs:
    add new ID: "IO DATA ETXPCM".
Signed-off-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

2dcc9ff7

ixgbe: initialize interrupt throttle rate · 15e79f24

Andy Gospodarek authored Aug 27, 2008

This commit dropped the setting of the default interrupt throttle rate.

commit 021230d4
Author: Ayyappan Veeraiyan <ayyappan.veeraiyan@intel.com>
Date:   Mon Mar 3 15:03:45 2008 -0800

    ixgbe: Introduce MSI-X queue vector code

The following patch adds it back.  Without this the default value of 0
causes the performance of this card to be awful.  Restoring these to the
default values yields much better performance.

This regression has been around since 2.6.25.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: stable@kernel.org [2.6.25 and later]
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

15e79f24

net/usb/pegasus: avoid hundreds of diagnostics · e000ea13

David Brownell authored Sep 02, 2008

Make the "pegasus" driver scream less loudly in the face of
problems as it initializes, avoiding hundreds of messages:

 - ratelimit some key error messages
 - avoid some spurious diagnostics caused by strange codeflow

And fix one instance of goofy indentation.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

e000ea13

tipc: Don't use structure names which easily globally conflict. · 6c00055a

David S. Miller authored Sep 02, 2008

Andrew Morton reported a build failure on sparc32, because TIPC
uses names like "struct node" and there is a like named data
structure defined in linux/node.h

This just regexp replaces "struct node*" to "struct tipc_node*"
to avoid this and any future similar problems.
Signed-off-by: David S. Miller <davem@davemloft.net>

6c00055a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · d26acd92

Linus Torvalds authored Sep 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ipsec: Fix deadlock in xfrm_state management.
  ipv: Re-enable IP when MTU > 68
  net/xfrm: Use an IS_ERR test rather than a NULL test
  ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message.
  ath9k: Incorrect key used when group and pairwise ciphers are different.
  rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON
  mac80211: Fix debugfs union misuse and pointer corruption
  wireless/libertas/if_cs.c: fix memory leaks
  orinoco: Multicast to the specified addresses
  iwlwifi: fix 64bit platform firmware loading
  iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE)
  iwlwifi: workaround interrupt handling no some platforms
  iwlwifi: do not use GFP_DMA in iwl_tx_queue_init
  net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS
  net: Unbreak userspace usage of linux/mroute.h
  pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock()
  ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0

d26acd92

[x86] Fix TSC calibration issues · fbb16e24

Thomas Gleixner authored Sep 03, 2008

Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f594
"x86: merge tsc calibration".

The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.

Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.

On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.

According to Alok also virtualized systems suffer from the PIT
calibration method.

The solution is to use a more wreckage aware aproach than the current
either/or decision.

1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency

2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.

3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration

4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.

5) Set a clear preference to PIT based calibration when the result
makes sense.

The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.

Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbb16e24

ipsec: Fix deadlock in xfrm_state management. · 37b08e34

David S. Miller authored Sep 02, 2008

Ever since commit 4c563f76
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held.  If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.

Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.
Signed-off-by: David S. Miller <davem@davemloft.net>

37b08e34

drivers/char/random.c: fix a race which can lead to a bogus BUG() · 8b76f46a

Andrew Morton authored Sep 02, 2008

Fix a bug reported by and diagnosed by Aaron Straus.

This is a regression intruduced into 2.6.26 by

    commit adc782da
    Author: Matt Mackall <mpm@selenic.com>
    Date:   Tue Apr 29 01:03:07 2008 -0700

        random: simplify and rename credit_entropy_store

credit_entropy_bits() does:

	spin_lock_irqsave(&r->lock, flags);
	...
	if (r->entropy_count > r->poolinfo->POOLBITS)
		r->entropy_count = r->poolinfo->POOLBITS;

so there is a time window in which this BUG_ON():

static size_t account(struct entropy_store *r, size_t nbytes, int min,
		      int reserved)
{
	unsigned long flags;

	BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);

	/* Hold lock while accounting */
	spin_lock_irqsave(&r->lock, flags);

can trigger.

We could fix this by moving the assertion inside the lock, but it seems
safer and saner to revert to the old behaviour wherein
entropy_store.entropy_count at no time exceeds
entropy_store.poolinfo->POOLBITS.
Reported-by: Aaron Straus <aaron@merfinllc.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@kernel.org>		[2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8b76f46a

pm_qos_requirement might sleep · 9d359357

John Kacur authored Sep 02, 2008

Make PM_QOS and CPU_IDLE play nicer when run with the RT-Preempt kernel.

The purpose of the patch is to remove the spin_lock around the read in the
function pm_qos_requirement - since spinlocks can sleep in -rt and this
function is called from idle.

CPU_IDLE polls the target_value's of some of the pm_qos parameters from
the idle loop causing sleeping locking warnings.  Changing the
target_value to an atomic avoids this issue.

Remove the spinlock in pm_qos_requirement by making target_value an atomic
type.
Signed-off-by: mark gross <mgross@linux.intel.com>
Signed-off-by: John Kacur <jkacur@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9d359357

rtc-cmos: wake again from S5 · 74c4633d

Rafael J. Wysocki authored Sep 02, 2008

Update rtc-cmos shutdown handling to leave RTC alarms active, resolving
http://bugzilla.kernel.org/show_bug.cgi?id=11411 on several boards.  There
are still some systems where the ACPI event handling doesn't cooperate.
(Possibly related to bugid 11312, reporting the spontaneous disabling of
RTC events.)

Bug 11411 reported that changes to work around some ACPI event issues
broke wake-from-S5 handling, as used for DVR applications.  (They like to
power off, then wake later to record programs.)

[yakui.zhao@intel.com: add shutdown for PNP devices]
[dbrownell@users.sourceforge.net: update comments]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

74c4633d

sysfs: document files in /sys/firmware/sgi_uv/ · 8b3a8944

Russ Anderson authored Sep 02, 2008

Document files in /sys/firmware/sgi_uv/.
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8b3a8944

ibft: fix target info parsing in ibft module · bb8fb4e6

Mike Christie authored Sep 02, 2008

I got this patch through Red Hat's bugzilla from the bug submitter and
patch creator.  I have just fixed it up so it applies without fuzz to
upstream kernels.

Original patch and description from Shyam kumar Iyer:

The issue [ibft module not displaying targets with short names] is because
of an offset calculatation error in the iscsi_ibft.c code.  Due to this
error directory structure for the target in /sys/firmware/ibft does not
get created and so the initiator is unable to connect to the target.

Note that this bug surfaced only with an name that had a short section at
the end.  eg: "iqn.1984-05.com.dell:dell".  It did not surface when the
iqn's had a longer section at the end.  eg:
"iqn.2001-04.com.example:storage.disk2.sys1.xyz"

So, the eot_offset was calculated such that an extra 48 bytes i.e.  the
size of the ibft_header which has already been accounted was subtracted
twice.

This was not evident with longer iqn names because they would overshoot
the total ibft length more than 48 bytes and thus would escape the bug.
Signed-off-by: Shyam Kumar Iyer <shyam_iyer@dell.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Konrad Rzeszutek <konrad@virtualiron.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bb8fb4e6

rtc_time_to_tm: fix signed/unsigned arithmetic · 73442daf

Jan Altenberg authored Sep 02, 2008

commit 945185a6 ("rtc: rtc_time_to_tm: use
unsigned arithmetic") changed the some types in rtc_time_to_tm() to
unsigned:

 void rtc_time_to_tm(unsigned long time, struct rtc_time *tm)
 {
-       register int days, month, year;
+       unsigned int days, month, year;

This doesn't work for all cases, because days is checked for < 0 later
on:

if (days < 0) {
	year -= 1;
	days += 365 + LEAP_YEAR(year);
}

I think the correct fix would be to keep days signed and do an appropriate
cast later on.
Signed-off-by: Jan Altenberg <jan.altenberg@linutronix.de>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

73442daf

tdfxfb: fix frame buffer name overrun · b4a49b12

Krzysztof Helt authored Sep 02, 2008

If there are more then one graphics card handled by the tdfxfb driver the
name of the frame buffer overruns reserved size.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b4a49b12

tdfxfb: fix SDRAM memory size detection · bf6910c0

Krzysztof Helt authored Sep 02, 2008

Fix memory detection on Voodoo3 cards with SDRAM memory.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bf6910c0

hp-wmi: add proper hotkey support · a8823aef

Matthew Garrett authored Sep 02, 2008

It turns out that event 0x4 merely indcates that a hotkey has been
pressed, not which one.  A further query is required in order to determine
the actual keypress.  The following patch adds support for that along with
the known keycodes.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a8823aef

hp-wmi: update to match current rfkill semantics · 3f6e2f13

Matthew Garrett authored Sep 02, 2008

hp-wmi currently changes the RFKill state by altering the struct members
rather than using the dedicated interface, meaning that update events
won't be pushed to userspace.  This patch fixes that, along with fixing
the declared type of the WWAN kill switch.  It also ensures that rfkill
interfaces are only registered for hardware that exists.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Ivo van Doorn <ivdoorn@gmail.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3f6e2f13

ipc: document the new auto_msgmni proc file · 61e55d05

Nadia Derbey authored Sep 02, 2008

Update Documentation/filesystems/proc.txt: it describes the file
auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon
memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27).  Also
added a description for msgmni (this filex is only listed in
Documentation/sysctl/kernel.txt).
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

61e55d05

mm: size of quicklists shouldn't be proportional to the number of CPUs · b9541852

KOSAKI Motohiro authored Sep 02, 2008

Quicklists store pages for each CPU as caches.  (Each CPU can cache
node_free_pages/16 pages)

It is used for page table cache.  exit() will increase the cache size,
while fork() consumes it.

So for example if an apache-style application runs (one parent and many
child model), one CPU process will fork() while another CPU will process
the middleware work and exit().

At that time, the CPU on which the parent runs doesn't have page table
cache at all.  Others (on which children runs) have maximum caches.

	QList_max = (#ofCPUs - 1) x Free / 16
	=> QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1)

So, How much quicklist memory is used in the maximum case?

This is proposional to # of CPUs because the limit of per cpu quicklist
cache doesn't see the number of cpus.

Above calculation mean

	 Number of CPUs per node            2    4    8   16
	 ==============================  ====================
	 QList_max / (Free + QList_max)   5.8%  16%  30%  48%

Wow! Quicklist can spend about 50% memory at worst case.

My demonstration program is here
--------------------------------------------------------------------------------
#define _GNU_SOURCE

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sched.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define BUFFSIZE 512

int max_cpu(void)	/* get max number of logical cpus from /proc/cpuinfo */
{
  FILE *fd;
  char *ret, buffer[BUFFSIZE];
  int cpu = 1;

  fd = fopen("/proc/cpuinfo", "r");
  if (fd == NULL) {
    perror("fopen(/proc/cpuinfo)");
    exit(EXIT_FAILURE);
  }
  while (1) {
    ret = fgets(buffer, BUFFSIZE, fd);
    if (ret == NULL)
      break;
    if (!strncmp(buffer, "processor", 9))
      cpu = atoi(strchr(buffer, ':') + 2);
  }
  fclose(fd);
  return cpu;
}

void cpu_bind(int cpu)	/* bind current process to one cpu */
{
  cpu_set_t mask;
  int ret;

  CPU_ZERO(&mask);
  CPU_SET(cpu, &mask);
  ret = sched_setaffinity(0, sizeof(mask), &mask);
  if (ret == -1) {
    perror("sched_setaffinity()");
    exit(EXIT_FAILURE);
  }
  sched_yield();	/* not necessary */
}

#define MMAP_SIZE (10 * 1024 * 1024)	/* 10 MB */
#define FORK_INTERVAL 1	/* 1 second */

main(int argc, char *argv[])
{
  int cpu_max, nextcpu;
  long pagesize;
  pid_t pid;

  /* set max number of logical cpu */
  if (argc > 1)
    cpu_max = atoi(argv[1]) - 1;
  else
    cpu_max = max_cpu();

  /* get the page size */
  pagesize = sysconf(_SC_PAGESIZE);
  if (pagesize == -1) {
    perror("sysconf(_SC_PAGESIZE)");
    exit(EXIT_FAILURE);
  }

  /* prepare parent process */
  cpu_bind(0);
  nextcpu = cpu_max;

loop:

  /* select destination cpu for child process by round-robin rule */
  if (++nextcpu > cpu_max)
    nextcpu = 1;

  pid = fork();

  if (pid == 0) { /* child action */

    char *p;
    int i;

    /* consume page tables */
    p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    i = MMAP_SIZE / pagesize;
    while (i-- > 0) {
      *p = 1;
      p += pagesize;
    }

    /* move to other cpu */
    cpu_bind(nextcpu);
/*
    printf("a child moved to cpu%d after mmap().\n", nextcpu);
    fflush(stdout);
 */

    /* back page tables to pgtable_quicklist */
    exit(0);

  } else if (pid > 0) { /* parent action */

    sleep(FORK_INTERVAL);
    waitpid(pid, NULL, WNOHANG);

  }

  goto loop;
}
----------------------------------------

When above program which does task migration runs, my 8GB box spends
800MB of memory for quicklist.  This is not memory leak but doesn't seem
good.

% cat /proc/meminfo

MemTotal:        7701568 kB
MemFree:         4724672 kB
(snip)
Quicklists:       844800 kB

because

- My machine spec is
	number of numa node: 2
	number of cpus:      8 (4CPU x2 node)
        total mem:           8GB (4GB x2 node)
        free mem:            about 5GB

- Then, 4.7GB x 16% ~= 880MB.
  So, Quicklist can use 800MB.

So, if following spec machine run that program

   CPUs: 64 (8cpu x 8node)
   Mem:  1TB (128GB x8node)

Then, quicklist can waste 300GB (= 1TB x 30%).  It is too large.

So, I don't like cache policies which is proportional to # of cpus.

My patch changes the number of caches
from:
   per-cpu-cache-amount = memory_on_node / 16
to
   per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Tested-by: David Miller <davem@davemloft.net>
Acked-by: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b9541852

mm: show quicklist usage in /proc/meminfo · 4b856152

KOSAKI Motohiro authored Sep 02, 2008

Quicklists can consume several GB of memory.  We should provide a means of
monitoring this.

After this patch is applied, /proc/meminfo will output the following:

% cat /proc/meminfo

MemTotal:      7715392 kB
MemFree:       5401600 kB
Buffers:         80384 kB
Cached:         300800 kB
SwapCached:          0 kB
Active:         235584 kB
Inactive:       262656 kB
SwapTotal:     2031488 kB
SwapFree:      2031488 kB
Dirty:            3520 kB
Writeback:           0 kB
AnonPages:      117696 kB
Mapped:          38528 kB
Slab:          1589952 kB
SReclaimable:    23104 kB
SUnreclaim:    1566848 kB
PageTables:      14656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   5889152 kB
Committed_AS:   393152 kB
VmallocTotal: 17592177655808 kB
VmallocUsed:     29056 kB
VmallocChunk: 17592177626432 kB
Quicklists:     130944 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    262144 kB
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4b856152

devcgroup: fix race against rmdir() · 36fd71d2

Li Zefan authored Sep 02, 2008

During the use of a dev_cgroup, we should guarantee the corresponding
cgroup won't be deleted (i.e.  via rmdir).  This can be done through
css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup
under rcu_read_lock.

And also remove checking NULL dev_cgroup, it won't be NULL since a task
always belongs to a cgroup.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

36fd71d2

cirrusfb: check_par fixes · 09a2910e

Krzysztof Helt authored Sep 02, 2008

1. Check if virtual resolution fits into memory.
   Otherwise, Linux hangs during panning.
2. When selected use all available memory to
    maximize yres_virtual to speed up panning
   (previously also xres_virtual was increased).
3. Simplify memory restriction calculations.
Signed-off-by: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

09a2910e

pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits · 950bbabb

Oleg Nesterov authored Sep 02, 2008

We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits.  As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.

Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().

Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed.  Kill
the now unneeded exit_child_reaper().

Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391Reported-by: Robert Rex <robert.rex@exasol.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

950bbabb

pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing · add0d4df

Oleg Nesterov authored Sep 02, 2008

zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.

Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.

Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.

Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

add0d4df