Commits · d2a28ad9fa7bf16761d070d8a3338375e1574b32 · linux / linux-davinci-2.6.23

24 Mar, 2006 1 commit

[IA64] MCA recovery: kernel context recovery table · d2a28ad9

Russ Anderson authored Mar 24, 2006

Memory errors encountered by user applications may surface
when the CPU is running in kernel context.  The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0).  This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.

An example is a user process lauching a load from memory
and the data in memory had bad ECC.  Before the bad data
gets to the CPU register, and interrupt comes in.  The
code jumps to the IVT interrupt entry point and begins
execution in kernel context.  The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA.  The MCA surfaces in
kernel context, even though the load was initiated from
user context.

As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table.  One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers.  A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address.  This limits clutter in the code.

This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers.  Other spots
can be added as needed by adding a single macro.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>

d2a28ad9

23 Mar, 2006 7 commits

IA64: Use early_parm to handle mvec_name and nomca · a5b00bb4

Horms authored Mar 23, 2006

I'm not sure of the worthiness of this idea, so please consider it an RFC.
Its key merits are:

* Reuse existing infrastructure
* Greatly tightens up the parsing of nomca
* Greatly simplifies the parsing of machvec

Addition cleanup (moving setup_mvec() to machvec.c) by Ken Chen.
Signed-Off-By: Horms <horms@verge.net.au>
Signed-Off-By: Tony Luck <tony.luck@intel.com>

a5b00bb4

[IA64] move patchlist and machvec into init section · 39e18de8

Chen, Kenneth W authored Mar 12, 2006

ia64_mv is initialized based on platform detected or specified.
However, there is one instantiation of each platform type.  We
don't expect to switch platform vector during run time.  Move
those platform specific type into init section since a copy is
made into global ia64_mv at initialization.

Also move instruction patch list into init section as well.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

39e18de8

[IA64] add init declaration - nolwsys · 03906ea0

Chen, Kenneth W authored Mar 12, 2006

Add __initdata to nolwsys.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

03906ea0

[IA64] add init declaration - gate page functions · 914a4ea4

Chen, Kenneth W authored Mar 12, 2006

Add init declaration to bunch of patch functions and gate
page setup function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

914a4ea4

[IA64] add init declaration to memory initialization functions · dae28066

Chen, Kenneth W authored Mar 22, 2006

Add init declaration to variables/functions used for memory
initialization.  I don't think they would clash with memory
hotplug.  If they do, please yell.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

dae28066

[IA64] add init declaration to cpu initialization functions · 244fd545

Chen, Kenneth W authored Mar 12, 2006

Add init declaration to cpu initialization functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

244fd545

[IA64] add __init declaration to mca functions · 0881fc8d

Chen, Kenneth W authored Mar 12, 2006

Mark init related variable and functions with appropriate
__init* declaration to mca functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

0881fc8d

22 Mar, 2006 32 commits

[IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT · d903cea3

Kenji Kaneshige authored Mar 15, 2006

According to the ACPI spec, the OSPM must ignore the contents of the
Processor Local APIC/SAPIC Affinity Structure in System Resource
Affinity Table (SRAT), if its enable flag is cleared. However, ia64
linux refers all of the Processor Local APIC/SAPIC Affinity Structures
in SRAT regardless of the enable flag. This is obviously against the
ACPI spec. This patch fixes this bug.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

d903cea3

[IA64] sn_check_intr: use ia64_get_irr() · 9a4e5549

Bjorn Helgaas authored Mar 21, 2006

Use the recently-added ia64_get_irr() rather than duplicating the code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

9a4e5549

[IA64] fix ia64 is_hugepage_only_range · 2332c9ae

Chen, Kenneth W authored Mar 22, 2006

fix is_hugepage_only_range() definition to be "overlaps"
instead of "within architectural restricted hugetlb address
range".  Simplify the ia64 specific code that used to use
is_hugepage_only_range() to just check which region the
address is in.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

2332c9ae

Merge git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa · 1c2e0275

Linus Torvalds authored Mar 22, 2006

* git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa: (124 commits)
  [ALSA] version 1.0.11rc4
  [PATCH] Intruduce DMA_28BIT_MASK
  [ALSA] hda-codec - Add support for ASUS P4GPL-X
  [ALSA] hda-codec - Add support for HP nx9420 laptop
  [ALSA] Fix memory leaks in error path of control.c
  [ALSA] AMD Au1x00: AC'97 controller is memory mapped
  [ALSA] AMD Au1x00: fix DMA init/cleanup
  [ALSA] hda-codec - Fix generic auto-configurator
  [ALSA] hda-codec - Fix BIOS auto-configuration
  [ALSA] Fixes typos in Audiophile-USB.txt
  [ALSA] ice1712 - typo fixes for dxr_enable module option
  [ALSA] AMD Au1x00: make driver build after cleanup
  [ALSA] ice1712 - Fix wrong value types for enum items
  [ALSA] fix resource leak in usbmixer
  [ALSA] Fix gus_pcm dereference before NULL
  [ALSA] Fix seq_clientmgr dereferences before NULL check
  [ALSA] hda-codec - Fix for Samsung R65 and ASUS A6J
  [ALSA] hda-codec - Add support for VAIO FE550G and SZ110
  [ALSA] usb-audio: add Maya44 mixer control names
  [ALSA] usb-audio: add Casio PL-40R support
  ...

1c2e0275

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial · 8b4b6707

Linus Torvalds authored Mar 22, 2006

* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  fixed path to moved file in include/linux/device.h
  Fix spelling in E1000_DISABLE_PACKET_SPLIT Kconfig description
  Documentation/dvb/get_dvb_firmware: fix firmware URL
  Documentation: Update to BUG-HUNTING
  Remove superfluous NOTIFY_COOKIE_LEN define
  add "tags" to .gitignore
  Fix "frist", "fisrt", typos
  fix rwlock usage example
  It's UTF-8

8b4b6707

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · d04ef3a7

Linus Torvalds authored Mar 22, 2006

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Add a secondary TSB for hugepage mappings.
  [SPARC]: Respect vm_page_prot in io_remap_page_range().

d04ef3a7

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 36177ba6

Linus Torvalds authored Mar 22, 2006

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [TG3]: Bump driver version and reldate.
  [TG3]: Skip phy power down on some devices
  [TG3]: Fix SRAM access during tg3_init_one()
  [X25]: dte facilities 32 64 ioctl conversion
  [X25]: allow ITU-T DTE facilities for x25
  [X25]: fix kernel error message 64 bit kernel
  [X25]: ioctl conversion 32 bit user to 64 bit kernel
  [NET]: socket timestamp 32 bit handler for 64 bit kernel
  [NET]: allow 32 bit socket ioctl in 64 bit kernel
  [BLUETOOTH]: Return negative error constant

36177ba6

Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 · 2152f853

Linus Torvalds authored Mar 22, 2006

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (138 commits)
  [SCSI] libata: implement minimal transport template for ->eh_timed_out
  [SCSI] eliminate rphy allocation in favour of expander/end device allocation
  [SCSI] convert mptsas over to end_device/expander allocations
  [SCSI] allow displaying and setting of cache type via sysfs
  [SCSI] add scsi_mode_select to scsi_lib.c
  [SCSI] 3ware 9000 add big endian support
  [SCSI] qla2xxx: update MAINTAINERS
  [SCSI] scsi: move target_destroy call
  [SCSI] fusion - bump version
  [SCSI] fusion - expander hotplug suport in mptsas module
  [SCSI] fusion - exposing raid components in mptsas
  [SCSI] fusion - memory leak, and initializing fields
  [SCSI] fusion - exclosure misspelled
  [SCSI] fusion - cleanup mptsas event handling functions
  [SCSI] fusion - removing target_id/bus_id from the VirtDevice structure
  [SCSI] fusion - static fix's
  [SCSI] fusion - move some debug firmware event debug msgs to verbose level
  [SCSI] fusion - loginfo header update
  [SCSI] add scsi_reprobe_device
  [SCSI] megaraid_sas: fix extended timeout handling
  ...

2152f853

[PATCH] SELinux: add slab cache for inode security struct · 7cae7e26

James Morris authored Mar 22, 2006

Add a slab cache for the SELinux inode security struct, one of which is
allocated for every inode instantiated by the system.

The memory savings are considerable.

On 64-bit, instead of the size-128 cache, we have a slab object of 96
bytes, saving 32 bytes per object.  After booting, I see about 4000 of
these and then about 17,000 after a kernel compile.  With this patch, we
save around 530KB of kernel memory in the latter case.  On 32-bit, the
savings are about half of this.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7cae7e26

[PATCH] SELinux: cleanup stray variable in selinux_inode_init_security() · cf01efd0

James Morris authored Mar 22, 2006

Remove an unneded pointer variable in selinux_inode_init_security().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

cf01efd0

[PATCH] SELinux: fix hard link count for selinuxfs root directory · edb20fb5

James Morris authored Mar 22, 2006

A further fix is needed for selinuxfs link count management, to ensure that
the count is correct for the parent directory when a subdirectory is
created.  This is only required for the root directory currently, but the
code has been updated for the general case.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

edb20fb5

[PATCH] selinuxfs cleanups: sel_make_avc_files · d6aafa65

James Morris authored Mar 22, 2006

Fix copy & paste error in sel_make_avc_files(), removing a supurious call to
d_genocide() in the error path.  All of this will be cleaned up by
kill_litter_super().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d6aafa65

[PATCH] selinuxfs cleanups: sel_make_bools · 253a8b1d

James Morris authored Mar 22, 2006

Remove the call to sel_make_bools() from sel_fill_super(), as policy needs to
be loaded before the boolean files can be created.  Policy will never be
loaded during sel_fill_super() as selinuxfs is kernel mounted during init and
the only means to load policy is via selinuxfs.

Also, the call to d_genocide() on the error path of sel_make_bools() is
incorrect and replaced with sel_remove_bools().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

253a8b1d

[PATCH] selinuxfs cleanups: sel_fill_super exit path · 161ce45a

James Morris authored Mar 22, 2006

Unify the error path of sel_fill_super() so that all errors pass through the
same point and generate an error message.  Also, removes a spurious dput() in
the error path which breaks the refcounting for the filesystem
(litter_kill_super() will correctly clean things up itself on error).
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

161ce45a

[PATCH] selinuxfs cleanups: use sel_make_dir() · cde174a8

James Morris authored Mar 22, 2006

Use existing sel_make_dir() helper to create booleans directory rather than
duplicating the logic.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

cde174a8

[PATCH] selinuxfs cleanups: fix hard link count · 40e906f8

James Morris authored Mar 22, 2006

Fix the hard link count for selinuxfs directories, which are currently one
short.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

40e906f8

[PATCH] selinux: simplify sel_read_bool · 68bdcf28

Stephen Smalley authored Mar 22, 2006

Simplify sel_read_bool to use the simple_read_from_buffer helper, like the
other selinuxfs functions.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

68bdcf28

[PATCH] sem2mutex: security/ · bb003079

Ingo Molnar authored Mar 22, 2006

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: James Morris <jmorris@namei.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bb003079

[PATCH] selinux: Disable automatic labeling of new inodes when no policy is loaded · 8aad3875

Stephen Smalley authored Mar 22, 2006

This patch disables the automatic labeling of new inodes on disk
when no policy is loaded.

Discussion is here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180296

In short, we're changing the behavior so that when no policy is loaded,
SELinux does not label files at all.  Currently it does add an 'unlabeled'
label in this case, which we've found causes problems later.

SELinux always maintains a safe internal label if there is none, so with this
patch, we just stick with that and wait until a policy is loaded before adding
a persistent label on disk.

The effect is simply that if you boot with SELinux enabled but no policy
loaded and create a file in that state, SELinux won't try to set a security
extended attribute on the new inode on the disk.  This is the only sane
behavior for SELinux in that state, as it cannot determine the right label to
assign in the absence of a policy.  That state usually doesn't occur, but the
rawhide installer seemed to be misbehaving temporarily so it happened to show
up on a test install.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8aad3875

[PATCH] page migration reorg · b20a3503

Christoph Lameter authored Mar 22, 2006

Centralize the page migration functions in anticipation of additional
tinkering.  Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
   and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b20a3503

[PATCH] mm: slab cache interleave rotor fix · 442295c9

Paul Jackson authored Mar 22, 2006

The alien cache rotor in mm/slab.c assumes that the first online node is
node 0.  Eventually for some archs, especially with hotplug, this will no
longer be true.

Fix the interleave rotor to handle the general case of node numbering.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

442295c9

[PATCH] mm: hugetlb alloc_fresh_huge_page bogus node loop fix · fdb7cc59

Paul Jackson authored Mar 22, 2006

Fix bogus node loop in hugetlb.c alloc_fresh_huge_page(), which was
assuming that nodes are numbered contiguously from 0 to num_online_nodes().
Once the hotplug folks get this far, that will be false.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

fdb7cc59

[PATCH] fix swap cluster offset · 9b65ef59

Akinobu Mita authored Mar 22, 2006

When we've allocated SWAPFILE_CLUSTER pages, ->cluster_next should be the
first index of swap cluster.  But current code probably sets it wrong offset.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9b65ef59

[PATCH] drain_node_pages: interrupt latency reduction / optimization · 879336c3

Christoph Lameter authored Mar 22, 2006

1. Only disable interrupts if there is actually something to free

2. Only dirty the pcp cacheline if we actually freed something.

3. Disable interrupts for each single pcp and not for cleaning
  all the pcps in all zones of a node.

drain_node_pages is called every 2 seconds from cache_reap. This
fix should avoid most disabling of interrupts.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

879336c3

[PATCH] slab: fix drain_array() so that it works correctly with the shared_array · b18e7e65

Christoph Lameter authored Mar 22, 2006

The list_lock also protects the shared array and we call drain_array() with
the shared array.  Therefore we cannot go as far as I wanted to but have to
take the lock in a way so that it also protects the array_cache in
drain_pages.

(Note: maybe we should make the array_cache locking more consistent?  I.e.
always take the array cache lock for shared arrays and disable interrupts
for the per cpu arrays?)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b18e7e65

[PATCH] slab: remove drain_array_locked · 1b55253a

Christoph Lameter authored Mar 22, 2006

Remove drain_array_locked and use that opportunity to limit the time the l3
lock is taken further.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1b55253a

[PATCH] slab: make drain_array more universal by adding more parameters · aab2207c

Christoph Lameter authored Mar 22, 2006

And a parameter to drain_array to control the freeing of all objects and
then use drain_array() to replace instances of drain_array_locked with
drain_array.  Doing so will avoid taking locks in those locations if the
arrays are empty.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

aab2207c

[PATCH] slab: cache_reap(): further reduction in interrupt holdoff · 35386e3b

Christoph Lameter authored Mar 22, 2006

cache_reap takes the l3->list_lock (disabling interrupts) unconditionally
and then does a few checks and maybe does some cleanup.  This patch makes
cache_reap() only take the lock if there is work to do and then the lock is
taken and released for each cleaning action.

The checking of when to do the next reaping is done without any locking and
becomes racy.  Should not matter since reaping can also be skipped if the
slab mutex cannot be acquired.

The same is true for the touched processing.  If we get this wrong once in
awhile then we will mistakenly clean or not clean the shared cache.  This
will impact performance slightly.

Note that the additional drain_array() function introduced here will fall
out in a subsequent patch since array cleaning will now be very similar
from all callers.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

35386e3b

[PATCH] mm: make shrink_all_memory try harder · 248a0301

Rafael J. Wysocki authored Mar 22, 2006

Make shrink_all_memory() repeat the attempts to free more memory if there
seems to be no pages to free.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

248a0301

[PATCH] optimize follow_hugetlb_page · d5d4b0aa

Chen, Kenneth W authored Mar 22, 2006

follow_hugetlb_page() walks a range of user virtual address and then fills
in list of struct page * into an array that is passed from the argument
list. It also gets a reference count via get_page(). For compound page,
get_page() actually traverse back to head page via page_private() macro and
then adds a reference count to the head page. Since we are doing a virt to
pte look up, kernel already has a struct page pointer into the head page.
So instead of traverse into the small unit page struct and then follow a
link back to the head page, optimize that with incrementing the reference
count directly on the head page.

The benefit is that we don't take a cache miss on accessing page struct for
the corresponding user address and more importantly, not to pollute the
cache with a "not very useful" round trip of pointer chasing. This adds a
moderate performance gain on an I/O intensive database transaction
workload.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d5d4b0aa

[PATCH] convert hugetlbfs_counter to atomic · bba1e9b2

Chen, Kenneth W authored Mar 22, 2006

Implementation of hugetlbfs_counter() is functionally equivalent to
atomic_inc_return().  Use the simpler atomic form.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bba1e9b2

[PATCH] hugepage: is_aligned_hugepage_range() cleanup · 42b88bef

David Gibson authored Mar 22, 2006

Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.

Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range().  On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.

In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).

This patch cleans up by abolishing is_aligned_hugepage_range().  Instead
prepare_hugepage_range() is defined directly.  Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage.  ia64 and powerpc define custom versions.  The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned.  The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.

No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

42b88bef