1. 12 Jul, 2008 2 commits
    • Roland McGrath's avatar
      x86_64: fix delayed signals · eca91e78
      Roland McGrath authored
      On three of the several paths in entry_64.S that call
      do_notify_resume() on the way back to user mode, we fail to properly
      check again for newly-arrived work that requires another call to
      do_notify_resume() before going to user mode.  These paths set the
      mask to check only _TIF_NEED_RESCHED, but this is wrong.  The other
      paths that lead to do_notify_resume() do this correctly already, and
      entry_32.S does it correctly in all cases.
      
      All paths back to user mode have to check all the _TIF_WORK_MASK
      flags at the last possible stage, with interrupts disabled.
      Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
      set any time after we entered do_notify_resume().  More work flags
      can be set (or left set) synchronously inside do_notify_resume(), as
      TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
      (which then send an asynchronous interrupt).
      
      There are many different scenarios that could hit this bug, most of
      them races.  The simplest one to demonstrate does not require any
      race: when one signal has done handler setup at the check before
      returning from a syscall, and there is another signal pending that
      should be handled.  The second signal's handler should interrupt the
      first signal handler before it actually starts (so the interrupted PC
      is still at the handler's entry point).  Instead, it runs away until
      the next kernel entry (next syscall, tick, etc).
      
      This test behaves correctly on 32-bit kernels, and fails on 64-bit
      (either 32-bit or 64-bit test binary).  With this fix, it works.
      
          #define _GNU_SOURCE
          #include <stdio.h>
          #include <signal.h>
          #include <string.h>
          #include <sys/ucontext.h>
      
          #ifndef REG_RIP
          #define REG_RIP REG_EIP
          #endif
      
          static sig_atomic_t hit1, hit2;
      
          static void
          handler (int sig, siginfo_t *info, void *ctx)
          {
            ucontext_t *uc = ctx;
      
            if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
              {
                if (sig == SIGUSR1)
                  hit1 = 1;
                else
                  hit2 = 1;
              }
      
            printf ("%s at %#lx\n", strsignal (sig),
                    uc->uc_mcontext.gregs[REG_RIP]);
          }
      
          int
          main (void)
          {
            struct sigaction sa;
            sigset_t set;
      
            sigemptyset (&sa.sa_mask);
            sa.sa_flags = SA_SIGINFO;
            sa.sa_sigaction = &handler;
      
            if (sigaction (SIGUSR1, &sa, NULL)
                || sigaction (SIGUSR2, &sa, NULL))
              return 2;
      
            sigemptyset (&set);
            sigaddset (&set, SIGUSR1);
            sigaddset (&set, SIGUSR2);
            if (sigprocmask (SIG_BLOCK, &set, NULL))
              return 3;
      
            printf ("main at %p, handler at %p\n", &main, &handler);
      
            raise (SIGUSR1);
            raise (SIGUSR2);
      
            if (sigprocmask (SIG_UNBLOCK, &set, NULL))
              return 4;
      
            if (hit1 + hit2 == 1)
              {
                puts ("PASS");
                return 0;
              }
      
            puts ("FAIL");
            return 1;
          }
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      eca91e78
    • Rafael J. Wysocki's avatar
      x86: remove conflicting nx6325 and nx6125 quirks · da1f29f5
      Rafael J. Wysocki authored
      We have two conflicting DMA-based quirks in there for the same set of
      boxes (HP nx6325 and nx6125) and one of them actually breaks my box.
      
      So remove the extra code.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: =?iso-8859-1?q?T=F6r=F6k_Edwin?= <edwintorok@gmail.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      da1f29f5
  2. 11 Jul, 2008 16 commits
    • Ingo Molnar's avatar
      6c82a000
    • Maciej W. Rozycki's avatar
      x86: Recover timer_ack lost in the merge of the NMI watchdog · 5b4d2386
      Maciej W. Rozycki authored
      In the course of the recent unification of the NMI watchdog an assignment
      to timer_ack to switch off unnecesary POLL commands to the 8259A in the
      case of a watchdog failure has been accidentally removed.  The statement
      used to be limited to the 32-bit variation as since the rewrite of the
      timer code it has been relevant for the 82489DX only.  This change brings
      it back.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5b4d2386
    • Maciej W. Rozycki's avatar
      x86: I/O APIC: Never configure IRQ2 · af174783
      Maciej W. Rozycki authored
      There is no such entity as ISA IRQ2.  The ACPI spec does not make it
      explicitly clear, but does not preclude it either -- all it says is ISA
      legacy interrupts are identity mapped by default (subject to overrides),
      but it does not state whether IRQ2 exists or not.  As a result if there is
      no IRQ0 override, then IRQ2 is normally initialised as an ISA interrupt,
      which implies an edge-triggered line, which is unmasked by default as this
      is what we do for edge-triggered I/O APIC interrupts so as not to miss an
      edge.
      
      To the best of my knowledge it is useless, as IRQ2 has not been in use
      since the PC/AT as back then it was taken by the 8259A cascade interrupt
      to the slave, with the line position in the slot rerouted to newly-created
      IRQ9.  No device could thus make use of this line with the pair of 8259A
      chips.  Now in theory INTIN2 of the I/O APIC may be usable, but the
      interrupt of the device wired to it would not be available in the PIC mode
      at all, so I seriously doubt if anybody decided to reuse it for a regular
      device.
      
      However there are two common uses of INTIN2.  One is for IRQ0, with an
      ACPI interrupt override (or its equivalent in the MP table).  But in this
      case IRQ2 is gone entirely with INTIN0 left vacant.  The other one is for
      an 8959A ExtINTA cascade.  In this case IRQ0 goes to INTIN0 and if ACPI is
      used INTIN2 is assumed to be IRQ2 (there is no override and ACPI has no
      way to report ExtINTA interrupts).  This is where a problem happens.
      
      The problem is INTIN2 is configured as a native APIC interrupt, with a
      vector assigned and the mask cleared.  And the line may indeed get active
      and inject interrupts if the master 8959A has its timer interrupt enabled
      (it might happen for other interrupts too, but they are normally masked in
      the process of rerouting them to the I/O APIC).  There are two cases where
      it will happen:
      
      * When the I/O APIC NMI watchdog is enabled.  This is actually a misnomer
        as the watchdog pulses are delivered through the 8259A to the LINT0
        inputs of all the local APICs in the system.  The implication is the
        output of the master 8259A goes high and low repeatedly, signalling
        interrupts to INTIN2 which is enabled too!
      
        [The origin of the name is I think for a brief period during the
        development we had a capability in our code to configure the watchdog to
        use an I/O APIC input; that would be INTIN2 in this scenario.]
      
      * When the native route of IRQ0 via INTIN0 fails for whatever reason -- as
        it happens with the system considered here.  In this scenario the timer
        pulse is delivered through the 8259A to LINT0 input of the local APIC of
        the bootstrap processor, quite similarly to how is done for the watchdog
        described above.  The result is, again, INTIN2 receives these pulses
        too.  Rafael's system used to escape this scenario, because an incorrect
        IRQ0 override would occupy INTIN2 and prevent it from being unmasked.
      
      My conclusion is IRQ2 should be excluded from configuration in all the
      cases and the current exception for ACPI systems should be lifted.  The
      reason being the exception not only being useless, but harmful as well.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      af174783
    • Maciej W. Rozycki's avatar
      x86: L-APIC: Always fully configure IRQ0 · c88ac1df
      Maciej W. Rozycki authored
      Unlike the 32-bit one, the 64-bit variation of the LVT0 setup code for
      the "8259A Virtual Wire" through the local APIC timer configuration does
      not fully configure the relevant irq_chip structure.  Instead it relies on
      the preceding I/O APIC code to have set it up, which does not happen if
      the I/O APIC variants have not been tried.
      
      The patch includes corresponding changes to the 32-bit variation too
      which make them both the same, barring a small syntactic difference
      involving sequence of functions in the source.  That should work as an aid
      with the upcoming merge.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c88ac1df
    • Maciej W. Rozycki's avatar
      x86: L-APIC: Set IRQ0 as edge-triggered · 1baea6e2
      Maciej W. Rozycki authored
       IRQ0 is edge-triggered, but the "8259A Virtual Wire" through the local
      APIC configuration in the 32-bit version uses the "fasteoi" handler
      suitable for level-triggered APIC interrupt.  Rewrite code so that the
      "edge" handler is used.  The 64-bit version uses different code and is
      unaffected.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1baea6e2
    • Glauber Costa's avatar
      x86: merge dwarf2 headers · 392a0fc9
      Glauber Costa authored
      Merge dwarf2_32.h and dwarf2_64.h into dwarf2.h.
      Signed-off-by: default avatarGlauber Costa <gcosta@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      392a0fc9
    • Glauber Costa's avatar
      x86: use AS_CFI instead of UNWIND_INFO · d73a731a
      Glauber Costa authored
      In dwarf2_32.h, test for CONFIG_AS_CFI instead of
      CONFIG_UNWIND_INFO. Turns out that searching for UNWIND_INFO
      returns no match in any Kconfig or Makefile, so we're really
      just throwing everything away regarding dwarf frames for i386.
      
      The test that generates CONFIG_AS_CFI does not have anything
      x86_64-specific, and right now, checking V=1 builds shows me
      that the flags is there anyway, although unused.
      Signed-off-by: default avatarGlauber Costa <gcosta@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d73a731a
    • Glauber Costa's avatar
      x86: use ignore macro instead of hash comment · 70f1bba4
      Glauber Costa authored
      In dwarf_64.h header, use the "ignore" macro the way
      i386 does.
      Signed-off-by: default avatarGlauber Costa <gcosta@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      70f1bba4
    • Glauber Costa's avatar
      x86: use matching CFI_ENDPROC · 557d7d4e
      Glauber Costa authored
      The RING0_INT_FRAME macro defines a CFI_STARTPROC.
      So we should really be using CFI_ENDPROC after it.
      Signed-off-by: default avatarGlauber Costa <gcosta@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      557d7d4e
    • Ingo Molnar's avatar
      x86: fix savesegment() bug causing crashes on 64-bit · d9fc3fd3
      Ingo Molnar authored
      i spent a fair amount of time chasing a 64-bit bootup crash that manifested
      itself as bootup segfaults:
      
        S10network[1825]: segfault at 7f3e2b5d16b8 ip 00000031108748c9 sp 00007fffb9c14c70 error 4 in libc-2.7.so[3110800000+14d000]
      
      eventually causing init to die and panic the system:
      
        Kernel panic - not syncing: Attempted to kill init!
        Pid: 1, comm: init Not tainted 2.6.26-rc9-tip #13878
      
      after a maratonic bisection session, the bad commit turned out to be:
      
      | b7675791859075418199c7af86a116ea34eaf5bd is first bad commit
      | commit b7675791859075418199c7af86a116ea34eaf5bd
      | Author: Jeremy Fitzhardinge <jeremy@goop.org>
      | Date:   Wed Jun 25 00:19:00 2008 -0400
      |
      |     x86: remove open-coded save/load segment operations
      |
      |     This removes a pile of buggy open-coded implementations of savesegment
      |     and loadsegment.
      
      after some more bisection of this patch itself, it turns out that what
      makes the difference are the savesegment() changes to __switch_to().
      
      Taking a look at this portion of arch/x86/kernel/process_64.o revealed
      this crutial difference:
      
      | good:    99c:       8c e0                   mov    %fs,%eax
      |          99e:       89 45 cc                mov    %eax,-0x34(%rbp)
      |
      | bad:     99c:       8c 65 cc                mov    %fs,-0x34(%rbp)
      
      which is due to:
      
      |                 unsigned fsindex;
      | -               asm volatile("movl %%fs,%0" : "=r" (fsindex));
      | +               savesegment(fs, fsindex);
      
      savesegment() is implemented as:
      
       #define savesegment(seg, value)                                \
                asm("mov %%" #seg ",%0":"=rm" (value) : : "memory")
      
      note the "m" modifier - it allows GCC to generate the segment move
      into a memory operand as well.
      
      But regarding segment operands there's a subtle detail in the x86
      instruction set: the above 16-bit moves are zero-extend, but only
      if it goes to a register.
      
      If it goes to a memory operand, -0x34(%rbp) in the above case, there's
      no zero-extend to 32-bit and the instruction will only save 16 bits
      instead of the intended 32-bit.
      
      The other 16 bits is random data - which can cause problems when that
      value is used later on.
      
      The solution is to only allow segment operands to go to registers.
      This fix allows my test-system to boot up without crashing.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d9fc3fd3
    • Jeremy Fitzhardinge's avatar
      x86_64: vdso32 cleanup using feature flags · b6ad92d4
      Jeremy Fitzhardinge authored
      Use the X86_FEATURE_SYSENTER32 to remove hard-coded CPU vendor check.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b6ad92d4
    • Jeremy Fitzhardinge's avatar
      x86_64: add pseudo-features for 32-bit compat syscall · 8d28aab5
      Jeremy Fitzhardinge authored
      Add pseudo-feature bits to describe whether the CPU supports sysenter
      and/or syscall from ia32-compat userspace.  This removes a hardcoded
      test in vdso32-setup.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8d28aab5
    • Ingo Molnar's avatar
      x86: fix tsc unification buglet with ftrace and stackprotector · 3d0decc4
      Ingo Molnar authored
      Yinghai Lu reported crashes on 64-bit x86:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       IP: [<ffffffff80253b17>] hrtick_start_fair+0x89/0x173
       [...]
      
      And with a long session of debugging and a lot of difficulty, tracked it down
      to this commit:
      
       --------------->
       8fbbc4b4 is first bad commit
       commit 8fbbc4b4
       Author: Alok Kataria <akataria@vmware.com>
       Date:   Tue Jul 1 11:43:34 2008 -0700
      
           x86: merge tsc_init and clocksource code
       <--------------
      
      The problem is that the TSC unification missed these Makefile rules
      in arch/x86/kernel/Makefile:
      
        # Do not profile debug and lowlevel utilities
        CFLAGS_REMOVE_tsc_64.o = -pg
        CFLAGS_REMOVE_tsc_32.o = -pg
        ...
        CFLAGS_tsc_64.o         := $(nostackp)
        ...
      
      which rules make sure that various instrumentation and debugging
      facilities are disabled for code that might end up in a VDSO - such as
      the TSC code.
      Reported-and-bisected-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      
      Conflicts:
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3d0decc4
    • Yinghai Lu's avatar
      x86: introduce max_low_pfn_mapped for 64-bit · f361a450
      Yinghai Lu authored
      when more than 4g memory is installed, don't map the big hole below 4g.
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f361a450
    • Yinghai Lu's avatar
      x86: reserve SLIT · f302a5bb
      Yinghai Lu authored
      save the SLIT, in case we are using fixmap to read it, and that fixmap
      could be cleared by others.
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f302a5bb
    • Yinghai Lu's avatar
      x86: e820: user-defined memory maps: remove the range instead of update it to reserved · 69a7704d
      Yinghai Lu authored
      also let mem= to print out modified e820 map too
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Cc: Bernhard Walle <bwalle@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      69a7704d
  3. 10 Jul, 2008 22 commits