1. 02 May, 2007 40 commits
    • Joerg Roedel's avatar
      [PATCH] x86: remove constant_tsc reporting from /proc/cpuinfo' power flags · d824395c
      Joerg Roedel authored
      remove the reporting of the constant_tsc flag from the "power management"
      field in /proc/cpuinfo.  The NULL value there was replaced by "" because
      the former would result in a printout of [8] if the flag is set.
      Signed-off-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      d824395c
    • David Rientjes's avatar
      [PATCH] x86-64: fixed size remaining fake nodes · 382591d5
      David Rientjes authored
      Extends the numa=fake x86_64 command-line option to split the remaining system
      memory into nodes of fixed size.  Any leftover memory is allocated to a final
      node unless the command-line ends with a comma.
      
      For example:
        numa=fake=2*512,*128	gives two 512M nodes and the remaining system
      			memory is split into nodes of 128M each.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the size of the remaining nodes to be allocated is
      known based on their capacity for resource management.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      382591d5
    • David Rientjes's avatar
      [PATCH] x86-64: split remaining fake nodes equally · 14694d73
      David Rientjes authored
      Extends the numa=fake x86_64 command-line option to split the remaining
      system memory into equal-sized nodes.
      
      For example:
      numa=fake=2*512,4*	gives two 512M nodes and the remaining system
      			memory is split into four approximately equal
      			chunks.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the granularity with which nodes shall be allocated
      is known.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      14694d73
    • David Rientjes's avatar
      [PATCH] x86-64: configurable fake numa node sizes · 8b8ca80e
      David Rientjes authored
      Extends the numa=fake x86_64 command-line option to allow for configurable
      node sizes.  These nodes can be used in conjunction with cpusets for coarse
      memory resource management.
      
      The old command-line option is still supported:
        numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
      		actual machine.
      
      But now you may configure your system for the node sizes of your choice:
        numa=fake=2*512,1024,2*256
      		gives two 512M nodes, one 1024M node, two 256M nodes, and
      		the rest of system memory to a sixth node.
      
      The existing hash function is maintained to support the various node sizes
      that are possible with this implementation.
      
      Each node of the same size receives roughly the same amount of available
      pages, regardless of any reserved memory with its address range.  The total
      available pages on the system is calculated and divided by the number of equal
      nodes to allocate.  These nodes are then dynamically allocated and their
      borders extended until such time as their number of available pages reaches
      the required size.
      
      Configurable node sizes are recommended when used in conjunction with cpusets
      for memory control because it eliminates the overhead associated with scanning
      the zonelists of many smaller full nodes on page_alloc().
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8b8ca80e
    • Ahmed S. Darwish's avatar
      [PATCH] i386: fix GDT's number of quadwords in comment · 8280c0c5
      Ahmed S. Darwish authored
      Fix comments to represent the true number of quadwords in GDT.
      Signed-off-by: default avatarAhmed S. Darwish <darwish.07@gmail.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8280c0c5
    • Adrian Bunk's avatar
      [PATCH] i386: vmi_pmd_clear() static · 8eb68fae
      Adrian Bunk authored
      This patch makes the needlessly global vmi_pmd_clear() static.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8eb68fae
    • Adrian Bunk's avatar
      [PATCH] x86-64: make simnow_init() static · 786142fa
      Adrian Bunk authored
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      786142fa
    • Yinghai Lu's avatar
      [PATCH] x86-64: remove extra smp_processor_id calling · f0e13ae7
      Yinghai Lu authored
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f0e13ae7
    • Ralf Baechle's avatar
      [PATCH] x86-64: fix ia32_binfmt.c build error · 9f7290ed
      Ralf Baechle authored
      Reorder code to avoid multiple inclusion of elf.h.
      
      #undef several symbols to avoid build errors over redefinitions.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f7290ed
    • john stultz's avatar
      [PATCH] x86: Log reason why TSC was marked unstable · 5a90cf20
      john stultz authored
      Change mark_tsc_unstable() so it takes a string argument, which holds the
      reason the TSC was marked unstable.
      
      This is then displayed the first time mark_tsc_unstable is called.
      
      This should help us better debug why the TSC was marked unstable on certain
      systems and allow us to make sure we're not being overly paranoid when
      throwing out this troublesome clocksource.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      5a90cf20
    • Adrian Bunk's avatar
      [PATCH] i386: workaround for a -Wmissing-prototypes warning · 27142219
      Adrian Bunk authored
      Work around a warning with -Wmissing-prototypes in
      arch/i386/kernel/asm-offsets.c
      
      The warning isn't gcc's fault - asm-offsets.c is simply a special file.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      27142219
    • Ken Chen's avatar
      [PATCH] i386: type cast clean up for find_next_zero_bit · e48b30c1
      Ken Chen authored
      clean up unneeded type cast by properly declare data type.
      Signed-off-by: default avatarKen Chen <kenchen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      e48b30c1
    • Adrian Bunk's avatar
      [PATCH] i386: make struct vmi_ops static · 30a1528d
      Adrian Bunk authored
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      30a1528d
    • Vivek Goyal's avatar
      [PATCH] i386: modpost apic related warning fixes · 1833d6bc
      Vivek Goyal authored
      o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y
      
      WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check'
      WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present'
      
      o Some functions which are inline (acpi_madt_oem_check) are not inlined by
        compiler as these functions are accessed using function pointer. These
        functions are put in .text section and they in-turn access __init type
        functions hence modpost generates warnings.
      
      o Do not iniline acpi_madt_oem_check, instead make it __init.
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1833d6bc
    • Ravikiran G Thirumalai's avatar
      [PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMA · e073ae1b
      Ravikiran G Thirumalai authored
      Enable system hashtable memory to be distributed among nodes on x86_64 NUMA
      
      Forcing the kernel to use node interleaved vmalloc instead of bootmem for
      the system hashtable memory (alloc_large_system_hash) reduces the memory
      imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:
      
      Before the following patch, on bootup of a 8 node box:
      
      Node 0 MemTotal:      3407488 kB
      Node 0 MemFree:       3206296 kB
      Node 0 MemUsed:        201192 kB
      Node 0 Active:           7012 kB
      Node 0 Inactive:          512 kB
      Node 0 Dirty:               0 kB
      Node 0 Writeback:           0 kB
      Node 0 FilePages:        1912 kB
      Node 0 Mapped:            420 kB
      Node 0 AnonPages:        5612 kB
      Node 0 PageTables:        468 kB
      Node 0 NFS_Unstable:        0 kB
      Node 0 Bounce:              0 kB
      Node 0 Slab:             5408 kB
      Node 0 SReclaimable:      644 kB
      Node 0 SUnreclaim:       4764 kB
      
      After the patch (or using hashdist=1 on the kernel command line):
      
      Node 0 MemTotal:      3407488 kB
      Node 0 MemFree:       3247608 kB
      Node 0 MemUsed:        159880 kB
      Node 0 Active:           3012 kB
      Node 0 Inactive:          616 kB
      Node 0 Dirty:               0 kB
      Node 0 Writeback:           0 kB
      Node 0 FilePages:        2424 kB
      Node 0 Mapped:            380 kB
      Node 0 AnonPages:        1200 kB
      Node 0 PageTables:        396 kB
      Node 0 NFS_Unstable:        0 kB
      Node 0 Bounce:              0 kB
      Node 0 Slab:             6304 kB
      Node 0 SReclaimable:     1596 kB
      Node 0 SUnreclaim:       4708 kB
      
      I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
      since x86_64 has no dearth of vmalloc space?  Or maybe enable hash
      distribution for all 64bit NUMA arches?  The following patch does it only
      for x86_64.
      
      I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
      memory and uses up tlb entries.  This was on a 4 way, 2 socket
      Tyan AMD box (non vsmp), with 8G total memory (4G pernode).
      
      The results with and without hash distribution are:
      
      1. Vanilla - runtime of 1188.000s
      2. With hashdist=1 runtime of 1154.000s
      
      Oprofile output for the duration of run is:
      
      1. Vanilla:
      PU: AMD64 processors, speed 2411.16 MHz (estimated)
      Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
      mask of 0x00 (No unit mask) count 500
      samples  %        app name                 symbol name
      163054    6.5513  libansys1.so             MultiFront::decompose(int, int,
      Elemset *, int *, int, int, int)
      162061    6.5114  libansys3.so             blockSaxpy6L_fd
      162042    6.5107  libansys3.so             blockInnerProduct6L_fd
      156286    6.2794  libansys3.so             maxb33_
      87879     3.5309  libansys1.so             elmatrixmultpcg_
      84857     3.4095  libansys4.so             saxpy_pcg
      58637     2.3560  libansys4.so             .st4560
      46612     1.8728  libansys4.so             .st4282
      43043     1.7294  vmlinux-t                copy_user_generic_string
      41326     1.6604  libansys3.so             blockSaxpyBackSolve6L_fd
      41288     1.6589  libansys3.so             blockInnerProductBackSolve6L_fd
      
      2. With hashdist=1
      CPU: AMD64 processors, speed 2411.13 MHz (estimated)
      Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
      mask of 0x00 (No unit mask) count 500
      samples  %        app name                 symbol name
      162993    6.9814  libansys1.so             MultiFront::decompose(int, int,
      Elemset *, int *, int, int, int)
      160799    6.8874  libansys3.so             blockInnerProduct6L_fd
      160459    6.8729  libansys3.so             blockSaxpy6L_fd
      156018    6.6826  libansys3.so             maxb33_
      84700     3.6279  libansys4.so             saxpy_pcg
      83434     3.5737  libansys1.so             elmatrixmultpcg_
      58074     2.4875  libansys4.so             .st4560
      46000     1.9703  libansys4.so             .st4282
      41166     1.7632  libansys3.so             blockSaxpyBackSolve6L_fd
      41033     1.7575  libansys3.so             blockInnerProductBackSolve6L_fd
      35762     1.5318  libansys1.so             inner_product_sub
      35591     1.5245  libansys1.so             inner_product_sub2
      28259     1.2104  libansys4.so             addVectors
      Signed-off-by: default avatarPravin B. Shelar <pravin.shelar@calsoftinc.com>
      Signed-off-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: default avatarShai Fultheim <shai@scalex86.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarChristoph Lameter <clameter@engr.sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e073ae1b
    • Andi Kleen's avatar
      [PATCH] x86-64: Minor white space cleanup in traps.c · d039c688
      Andi Kleen authored
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      d039c688
    • Andi Kleen's avatar
      [PATCH] x86-64: Allow sys_uselib unconditionally · fb60b839
      Andi Kleen authored
      Previously it wasn't enabled in the binfmt_aout is a module case.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      fb60b839
    • Andi Kleen's avatar
      [PATCH] x86-64: Don't disable basic block reordering · 1652fcbf
      Andi Kleen authored
      When compiling with -Os (which is default) the compiler defaults to it
      anyways. And with -O2 it probably generates somewhat better (although
      also larger) code.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      1652fcbf
    • Andrew Morton's avatar
      [PATCH] x86-64: fix x86_64-mm-sched-clock-share · 184c44d2
      Andrew Morton authored
      Fix for the following patch. Provide dummy cpufreq functions when
      CPUFREQ is not compiled in.
      
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      184c44d2
    • Vivek Goyal's avatar
      [PATCH] x86-64: Move cpu verification code to common file · a4831e08
      Vivek Goyal authored
      o This patch moves the code to verify long mode and SSE to a common file.
        This code is now shared by trampoline.S, wakeup.S, boot/setup.S and
        boot/compressed/head.S
      
      o So far we used to do very limited check in trampoline.S, wakeup.S and
        in 32bit entry point. Now all the entry paths are forced to do the
        exhaustive check, including SSE because verify_cpu is shared.
      
      o I am keeping this patch as last in the x86 relocatable series because
        previous patches have got quite some amount of testing done and don't want
        to distrub that. So that if there is problem introduced by this patch, at
        least it can be easily isolated.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      a4831e08
    • Vivek Goyal's avatar
      [PATCH] x86-64: Extend bzImage protocol for relocatable bzImage · 8035d3ea
      Vivek Goyal authored
      o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
        load the protected mode kernel at non-1MB address. Now protected mode
        component is relocatable and can be loaded at non-1MB addresses.
      
      o As of today kdump uses it to run a second kernel from a reserved memory
        area.
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8035d3ea
    • Vivek Goyal's avatar
      [PATCH] x86-64: build-time checking · 6a50a664
      Vivek Goyal authored
      o X86_64 kernel should run from 2MB aligned address for two reasons.
      	- Performance.
      	- For relocatable kernels, page tables are updated based on difference
      	  between compile time address and load time physical address.
      	  This difference should be multiple of 2MB as kernel text and data
      	  is mapped using 2MB pages and PMD should be pointing to a 2MB
      	  aligned address. Life is simpler if both compile time and load time
      	  kernel addresses are 2MB aligned.
      
      o Flag the error at compile time if one is trying to build a kernel which
        does not meet alignment restrictions.
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6a50a664
    • Vivek Goyal's avatar
      [PATCH] x86-64: Relocatable Kernel Support · 1ab60e0f
      Vivek Goyal authored
      This patch modifies the x86_64 kernel so that it can be loaded and run
      at any 2M aligned address, below 512G.  The technique used is to
      compile the decompressor with -fPIC and modify it so the decompressor
      is fully relocatable.  For the main kernel the page tables are
      modified so the kernel remains at the same virtual address.  In
      addition a variable phys_base is kept that holds the physical address
      the kernel is loaded at.  __pa_symbol is modified to add that when
      we take the address of a kernel symbol.
      
      When loaded with a normal bootloader the decompressor will decompress
      the kernel to 2M and it will run there.  This both ensures the
      relocation code is always working, and makes it easier to use 2M
      pages for the kernel and the cpu.
      
      AK: changed to not make RELOCATABLE default in Kconfig
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      1ab60e0f
    • Vivek Goyal's avatar
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal authored
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      0dbf7028
    • Vivek Goyal's avatar
      [PATCH] x86-64: do not use virt_to_page on kernel data address · 1b29c164
      Vivek Goyal authored
      o virt_to_page() call should be used on kernel linear addresses and not
        on kernel text and data addresses. Swsusp code uses it on kernel data
        (statically allocated swsusp_header).
      
      o Allocate swsusp_header dynamically so that virt_to_page() can be used
        safely.
      
      o I am changing this because in next few patches, __pa() on x86_64 will
        no longer support kernel text and data addresses and hibernation breaks.
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      1b29c164
    • Vivek Goyal's avatar
      [PATCH] x86: Move swsusp __pa() dependent code to arch portion · 49c3df6a
      Vivek Goyal authored
      o __pa() should be used only on kernel linearly mapped virtual addresses
        and not on kernel text and data addresses.
      
      o Hibernation code needs to determine the physical address associated
        with kernel symbol to mark a section boundary which contains pages which
        don't have to be saved and restored during hibernate/resume operation.
      
      o Move this piece of code in arch dependent section. So that architectures
        which don't have kernel text/data mapped into kernel linearly mapped
        region can come up with their own ways of determining physical addresses
        associated with a kernel text.
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      49c3df6a
    • Vivek Goyal's avatar
      [PATCH] x86-64: Remove the identity mapping as early as possible · cfd243d4
      Vivek Goyal authored
      With the rewrite of the SMP trampoline and the early page
      allocator there is nothing that needs identity mapped pages,
      once we start executing C code.
      
      So add zap_identity_mappings into head64.c and remove
      zap_low_mappings() from much later in the code.  The functions
       are subtly different thus the name change.
      
      This also kills boot_level4_pgt which was from an earlier
      attempt to move the identity mappings as early as possible,
      and is now no longer needed.  Essentially I have replaced
      boot_level4_pgt with trampoline_level4_pgt in trampoline.S
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      cfd243d4
    • Vivek Goyal's avatar
    • Vivek Goyal's avatar
      [PATCH] x86-64: 64bit ACPI wakeup trampoline · d8e1baf1
      Vivek Goyal authored
      o Moved wakeup_level4_pgt into the wakeup routine so we can
        run the kernel above 4G.
      
      o Now we first go to 64bit mode and continue to run from trampoline and
        then then start accessing kernel symbols and restore processor context.
        This enables us to resume even in relocatable kernel context when
        kernel might not be loaded at physical addr it has been compiled for.
      
      o Removed the need for modifying any existing kernel page table.
      
      o Increased the size of the wakeup routine to 8K. This is required as
        wake page tables are on trampoline itself and they got to be at 4K
        boundary, hence one page is not sufficient.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      d8e1baf1
    • Vivek Goyal's avatar
      [PATCH] x86-64: wakeup.S misc cleanups · 275f5517
      Vivek Goyal authored
      o Various cleanups. One of the main purpose of cleanups is that make
        wakeup.S as close as possible to trampoline.S.
      
      o Following are the changes
      	- Indentations for comments.
      	- Changed the gdt table to compact form and to resemble the
      	  one in trampoline.S
      	- Take the jump to 32bit from real mode using ljmpl. Makes code
      	  more readable.
      	- After enabling long mode, directly take a long jump for 64bit
      	  mode. No need to take an extra jump to "reach_comaptibility_mode"
      	- Stack is not used after real mode. So don't load stack in
       	  32 bit mode.
      	- No need to enable PGE here.
      	- No need to do extra EFER read, anyway we trash the read contents.
      	- No need to enable system call (EFER_SCE). Anyway it will be
      	  enabled when original EFER is restored.
      	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
        	  reload the original cr0 while restroing the processor state.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      275f5517
    • Vivek Goyal's avatar
      [PATCH] x86-64: wakeup.S rename registers to reflect right names · 7db681d7
      Vivek Goyal authored
      o Use appropriate names for 64bit regsiters.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      7db681d7
    • Vivek Goyal's avatar
      [PATCH] x86-64: Get rid of dead code in suspend resume · 7c17e706
      Vivek Goyal authored
      o Get rid of dead code in wakeup.S
      
      o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
        saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
        of of associated code.
      
      o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
        used.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      7c17e706
    • Vivek Goyal's avatar
      [PATCH] x86-64: 64bit PIC SMP trampoline · 90b1c208
      Vivek Goyal authored
      This modifies the SMP trampoline and all of the associated code so
      it can jump to a 64bit kernel loaded at an arbitrary address.
      
      The dependencies on having an idenetity mapped page in the kernel
      page tables for SMP bootup have all been removed.
      
      In addition the trampoline has been modified to verify
      that long mode is supported.  Asking if long mode is implemented is
      down right silly but we have traditionally had some of these checks,
      and they can't hurt anything.  So when the totally ludicrous happens
      we just might handle it correctly.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      90b1c208
    • Vivek Goyal's avatar
      [PATCH] x86-64: Add EFER to the register set saved by save_processor_state · 3c321bce
      Vivek Goyal authored
      EFER varies like %cr4 depending on the cpu capabilities, and which cpu
      capabilities we want to make use of.  So save/restore it make certain
      we have the same EFER value when we are done.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      3c321bce
    • Vivek Goyal's avatar
      [PATCH] x86-64: cleanup segments · 30f47289
      Vivek Goyal authored
      Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
      used when entering the kernel so putting it first is useful when
      trying to keep boot gdt sizes to a minimum.
      
      Set the accessed bit on all gdt entries.  We don't care
      so there is no need for the cpu to burn the extra cycles,
      and it potentially allows the pages to be immutable.  Plus
      it is confusing when debugging and your gdt entries mysteriously
      change.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      30f47289
    • Vivek Goyal's avatar
      [PATCH] x86-64: modify copy_bootdata to use virtual addresses · 278c0eb7
      Vivek Goyal authored
      Use virtual addresses instead of physical addresses
      in copy bootdata.  In addition fix the implementation
      of the old bootloader convention.  Everything is
      at real_mode_data always.  It is just that sometimes
      real_mode_data was relocated by setup.S to not sit at
      0x90000.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      278c0eb7
    • Vivek Goyal's avatar
    • Vivek Goyal's avatar
      [PATCH] x86-64: Clean up the early boot page table · 67dcbb6b
      Vivek Goyal authored
      - Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
        is broken as soon as mm/init.c:init_memory_mapping is run.
      - As physmem_pgt is gone don't export it in pgtable.h.
      - Use defines from pgtable.h for page permissions.
      - Fix the physical memory identity mapping so it is at the correct
        address.
      - Remove the physical memory mapping from wakeup_level4_pgt it
        is at the wrong address so we can't possibly be usinging it.
      - Simply NEXT_PAGE the work to calculate the phys_ alias
        of the labels was very cool.  Unfortuantely it was a brittle
        special purpose hack that makes maitenance more difficult.
        Instead just use label - __START_KERNEL_map like we do
        everywhere else in assembly.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      67dcbb6b
    • Vivek Goyal's avatar
      [PATCH] x86-64: Kill temp boot pmds · dafe41ee
      Vivek Goyal authored
      Early in the boot process we need the ability to set
      up temporary mappings, before our normal mechanisms are
      initialized.  Currently this is used to map pages that
      are part of the page tables we are building and pages
      during the dmi scan.
      
      The core problem is that we are using the user portion of
      the page tables to implement this.  Which means that while
      this mechanism is active we cannot catch NULL pointer dereferences
      and we deviate from the normal ways of handling things.
      
      In this patch I modify early_ioremap to map pages into
      the kernel portion of address space, roughly where
      we will later put modules, and I make the discovery of
      which addresses we can use dynamic which removes all
      kinds of static limits and remove the dependencies
      on implementation details between different parts of the code.
      
      Now alloc_low_page() and unmap_low_page() use
      early_iomap() and early_iounmap() to allocate/map and
      unmap a page.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      dafe41ee
    • Vivek Goyal's avatar
      [PATCH] x86-64: Assembly safe page.h and pgtable.h · 9d291e78
      Vivek Goyal authored
      This patch makes pgtable.h and page.h safe to include
      in assembly files like head.S.  Allowing us to use
      symbolic constants instead of hard coded numbers when
      refering to the page tables.
      
      This patch copies asm-sparc64/const.h to asm-x86_64 to
      get a definition of _AC() a very convinient macro that
      allows us to force the type when we are compiling the
      code in C and to drop all of the type information when
      we are using the constant in assembly.  Previously this
      was done with multiple definition of the same constant.
      const.h was modified slightly so that it works when given
      CONFIG options as arguments.
      
      This patch adds #ifndef __ASSEMBLY__ ... #endif
      and _AC(1,UL) where appropriate so the assembler won't
      choke on the header files.  Otherwise nothing
      should have changed.
      
      AK: added const.h to exported headers to fix headers_check
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      9d291e78