1. 02 Mar, 2009 1 commit
    • Jeremy Fitzhardinge's avatar
      xen: deal with virtually mapped percpu data · 9976b39b
      Jeremy Fitzhardinge authored
      The virtually mapped percpu space causes us two problems:
      
       - for hypercalls which take an mfn, we need to do a full pagetable
         walk to convert the percpu va into an mfn, and
      
       - when a hypercall requires a page to be mapped RO via all its aliases,
         we need to make sure its RO in both the percpu mapping and in the
         linear mapping
      
      This primarily affects the gdt and the vcpu info structure.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9976b39b
  2. 01 Mar, 2009 3 commits
    • Tejun Heo's avatar
      bootmem, x86: further fixes for arch-specific bootmem wrapping · d0c4f570
      Tejun Heo authored
      Impact: fix new breakages introduced by previous fix
      
      Commit c1329375 tried to clean up
      bootmem arch wrapper but it wasn't quite correct.  Before the commit,
      the followings were broken.
      
      * Low level interface functions prefixed with __ ignored arch
        preference.
      
      * reserve_bootmem(...) can't be mapped into
        reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
        not preference here.  The region specified MUST fall into the
        specified region; otherwise, it will panic.
      
      After the commit,
      
      * If allocation fails for the arch preferred node, it should fallback
        to whatever is available.  Instead, it simply failed allocation.
      
      There are too many internal details to allow generic wrapping and
      still keep things simple for archs.  Plus, all that arch wants is a
      way to prefer certain node over another.
      
      This patch drops the generic wrapping around alloc_bootmem_core() and
      add alloc_bootmem_core() instead.  If necessary, arch can define
      bootmem_arch_referred_node() macro or function which takes all
      allocation information and returns the preferred node.  bootmem
      generic code will always try the preferred node first and then
      fallback to other nodes as usual.
      
      Breakages noted and changes reviewed by Johannes Weiner.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      d0c4f570
    • Tejun Heo's avatar
      alpha: fix typo in recent early vmalloc change · af6326d7
      Tejun Heo authored
      Impact: fix build
      
      Add missing 'o' in variable name.  Compile tested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      af6326d7
    • Tejun Heo's avatar
      percpu: kill compile warning in pcpu_populate_chunk() · 02d51fdf
      Tejun Heo authored
      Impact: remove compile warning
      
      Mark local variable map_end in pcpu_populate_chunk() with
      uninitialized_var().  The variable is always used in tandem with
      map_start and guaranteed to be initialized before use but gcc doesn't
      understand that.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      02d51fdf
  3. 26 Feb, 2009 2 commits
  4. 25 Feb, 2009 4 commits
    • Ingo Molnar's avatar
      alloc_percpu: fix UP build · d2b02615
      Ingo Molnar authored
      Impact: build fix
      
      the !SMP branch had a 'gfp' leftover:
      
       include/linux/percpu.h: In function '__alloc_percpu':
       include/linux/percpu.h:160: error: 'gfp' undeclared (first use in this function)
       include/linux/percpu.h:160: error: (Each undeclared identifier is reported only once
       include/linux/percpu.h:160: error: for each function it appears in.)
      
      Use GFP_KERNEL like the SMP version does.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d2b02615
    • Ingo Molnar's avatar
      alloc_percpu: add align argument to __alloc_percpu, fix · 0dcec8c2
      Ingo Molnar authored
      Impact: build fix
      
      API was changed, but not all usage sites were converted:
      
       net/ipv4/route.c: In function ‘ip_rt_init’:
       net/ipv4/route.c:3379: error: too few arguments to function ‘__alloc_percpu’
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0dcec8c2
    • Tejun Heo's avatar
      x86: convert cacheflush macros inline functions · d3251005
      Tejun Heo authored
      Impact: cleanup
      
      Unused macro parameters cause spurious unused variable warnings.
      Convert all cacheflush macros to inline functions to avoid the
      warnings and achieve better type checking.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d3251005
    • Tejun Heo's avatar
      x86, percpu: fix minor bugs in setup_percpu.c · 24ff9542
      Tejun Heo authored
      Recent changes in setup_percpu.c made a now meaningless DBG()
      statement fail to compile and introduced a
      comparison-of-different-types warning.  Fix them.
      
      Compile failure is reported by Ingo Molnar.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      24ff9542
  5. 24 Feb, 2009 21 commits
    • Ingo Molnar's avatar
      Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu · 0edcf8d6
      Ingo Molnar authored
      Conflicts:
      	arch/x86/include/asm/pgtable.h
      0edcf8d6
    • Ingo Molnar's avatar
      Merge branch 'x86/core' into core/percpu · 87b20307
      Ingo Molnar authored
      87b20307
    • Ingo Molnar's avatar
      Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm',... · a852cbfa
      Ingo Molnar authored
      Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core
      a852cbfa
    • Cyrill Gorcunov's avatar
      x86: efi_stub_32,64 - add missing ENDPROCs · 9f331119
      Cyrill Gorcunov authored
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f331119
    • Cyrill Gorcunov's avatar
      x86: head_64.S - use GLOBAL macro · bc8b2b92
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bc8b2b92
    • Cyrill Gorcunov's avatar
      x86: entry_64.S - add missing ENDPROC · b3baaa13
      Cyrill Gorcunov authored
      native_usergs_sysret64 is described as
      
      	extern void native_usergs_sysret64(void)
      
      so lets add ENDPROC here
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3baaa13
    • Cyrill Gorcunov's avatar
      x86: invalid_vm86_irq -- use predefined macros · 57e37293
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      57e37293
    • Cyrill Gorcunov's avatar
      x86: head_64.S - use IDT_ENTRIES instead of hardcoded number · 5e112ae2
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5e112ae2
    • Cyrill Gorcunov's avatar
      x86: head_64.S - remove useless balign · 2a0b1001
      Cyrill Gorcunov authored
      Impact: cleanup
      
      NEXT_PAGE already has 'balign' so no
      need to keep this redundant one.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2a0b1001
    • Salman Qazi's avatar
      x86: fix performance regression in write() syscall · 30d697fa
      Salman Qazi authored
      While the introduction of __copy_from_user_nocache (see commit:
      0812a579) may have been an improvement
      for sufficiently large writes, there is evidence to show that it is
      deterimental for small writes.  Unixbench's fstime test gives the
      following results for 256 byte writes with MAX_BLOCK of 2000:
      
          2.6.29-rc6 ( 5 samples, each in KB/sec ):
          283750, 295200, 294500, 293000, 293300
      
          2.6.29-rc6 + this patch (5 samples, each in KB/sec):
          313050, 3106750, 293350, 306300, 307900
      
          2.6.18
          395700, 342000, 399100, 366050, 359850
      
          See w_test() in src/fstime.c in unixbench version 4.1.0.  Basically, the above test
          consists of counting how much we can write in this manner:
      
          alarm(10);
          while (!sigalarm) {
                  for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
                         write(f, buf, 256);
                  }
                  lseek(f, 0L, 0);
          }
      
      Note, there are other components to the write syscall regression
      that are not addressed here.
      Signed-off-by: default avatarSalman Qazi <sqazi@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      30d697fa
    • Tejun Heo's avatar
      percpu: add __read_mostly to variables which are mostly read only · 40150d37
      Tejun Heo authored
      Most global variables in percpu allocator are initialized during boot
      and read only from that point on.  Add __read_mostly as per Rusty's
      suggestion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      40150d37
    • Tejun Heo's avatar
      x86: add remapping percpu first chunk allocator · 8ac83757
      Tejun Heo authored
      Impact: add better first percpu allocation for NUMA
      
      On NUMA, embedding allocator can't be used as different units can't be
      made to fall in the correct NUMA nodes.  To use large page mapping,
      each unit needs to be remapped.  However, percpu areas are usually
      much smaller than large page size and unused space hurts a lot as the
      number of cpus grow.  This allocator remaps large pages for each chunk
      but gives back unused part to the bootmem allocator making the large
      pages mapped twice.
      
      This adds slightly to the TLB pressure but is much better than using
      4k mappings while still being NUMA-friendly.
      
      Ingo suggested that this would be the correct approach for NUMA.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8ac83757
    • Tejun Heo's avatar
      x86: add embedding percpu first chunk allocator · 89c92151
      Tejun Heo authored
      Impact: add better first percpu allocation for !NUMA
      
      On !NUMA, we can simply allocate contiguous memory and use it for the
      first chunk without mapping it into vmalloc area.  As the memory area
      is covered by the large page physical memory mapping, it allows the
      dynamic perpcu allocator to not add any TLB overhead for the static
      percpu area and whatever falls into the first chunk and the
      implementation is very simple too.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      89c92151
    • Tejun Heo's avatar
      x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() · 5f5d8405
      Tejun Heo authored
      Impact: modularize percpu first chunk allocation
      
      x86 is gonna have a few different strategies for the first chunk
      allocation.  Modularize it by separating out the current allocation
      mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5f5d8405
    • Tejun Heo's avatar
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo authored
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8d408b4b
    • Tejun Heo's avatar
      percpu: remove unit_size power-of-2 restriction · d9b55eeb
      Tejun Heo authored
      Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE
      
      In dynamic percpu allocator, there is no reason the unit size should
      be power of two.  Remove the restriction.
      
      As non-power-of-two unit size means that empty chunks fall into the
      same slot index as lightly occupied chunks which is bad for reclaming.
      Reserve an extra slot for empty chunks.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d9b55eeb
    • Tejun Heo's avatar
      x86: update populate_extra_pte() and add populate_extra_pmd() · 458a3e64
      Tejun Heo authored
      Impact: minor change to populate_extra_pte() and addition of pmd flavor
      
      Update populate_extra_pte() to return pointer to the pte_t for the
      specified address and add populate_extra_pmd() which only populates
      till the pmd and returns pointer to the pmd entry for the address.
      
      For 64bit, pud/pmd/pte fill functions are separated out from
      set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
      populate_extra_{pte|pmd}().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      458a3e64
    • Tejun Heo's avatar
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo authored
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
    • Tejun Heo's avatar
      bootmem: reorder interface functions and add a missing one · 2d0aae41
      Tejun Heo authored
      Impact: cleanup and addition of missing interface wrapper
      
      The interface functions in bootmem.h was ordered in not so orderly
      manner.  Reorder them such that
      
      * functions allocating the same area group together -
        ie. alloc_bootmem group and alloc_bootmem_low group.
      
      * functions w/o node parameter come before the ones w/ node parameter.
      
      * nopanic variants are immediately below their panicky counterparts.
      
      While at it, add alloc_bootmem_pages_node_nopanic() which was missing.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      2d0aae41
    • Tejun Heo's avatar
      bootmem: clean up arch-specific bootmem wrapping · c1329375
      Tejun Heo authored
      Impact: cleaner and consistent bootmem wrapping
      
      By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
      arch-specific wrappers for bootmem allocation.  However, this is done
      a bit strangely in that only the high level convenience macros can be
      changed while lower level, but still exported, interface functions
      can't be wrapped.  This not only is messy but also leads to strange
      situation where alloc_bootmem() does what the arch wants it to do but
      the equivalent __alloc_bootmem() call doesn't although they should be
      able to be used interchangeably.
      
      This patch updates bootmem such that archs can override / wrap the
      backend function - alloc_bootmem_core() instead of the highlevel
      interface functions to allow simpler and consistent wrapping.  Also,
      HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      c1329375
    • Tejun Heo's avatar
      percpu: fix pcpu_chunk_struct_size · cb83b42e
      Tejun Heo authored
      Impact: fix short allocation leading to memory corruption
      
      While dropping rvalue wrapping macros around global parameters,
      pcpu_chunk_struct_size was set incorrectly resulting in shorter page
      pointer array.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      cb83b42e
  6. 23 Feb, 2009 9 commits