- 03 Jun, 2009 9 commits
-
-
Andi Kleen authored
The exception handler should behave differently if the exception is fatal versus one that can be returned from. In the first case it should never clear any registers because these need to be preserved for logging after the next boot. Otherwise it should clear them on each CPU step by step so that other CPUs sharing the same bank don't see duplicate events. Otherwise we risk reporting events multiple times on any CPUs which have shared machine check banks, which is a common problem on Intel Nehalem which has both SMT (two CPU threads sharing banks) and shared machine check banks in the uncore. Determine early in a special pass if any event requires a panic. This uses the mce_severity() function added earlier. This is needed for the next patch. Also fixes a problem together with an earlier patch that corrected events weren't logged on a fatal MCE. [ Impact: Feature ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
The machine check grading (as in deciding what should be done for a given register value) has to be done multiple times soon and it's also getting more complicated. So it makes sense to consolidate it into a single function. To get smaller and more straight forward and possibly more extensible code I opted towards a new table driven method. The various rules are put into a table when is then executed by a very simple interpreter. The grading engine is in a new file mce-severity.c. I also added a private include file mce-internal.h, because mce.h is already a bit too cluttered. This is dead code right now, but will be used in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
Previously mce_panic used a simple heuristic to avoid printing old so far unreported machine check events on a mce panic. This worked by comparing the TSC value at the start of the machine check handler with the event time stamp and only printing newer ones. This has a couple of issues, in particular on systems where the TSC is not fully synchronized between CPUs it could lose events or print old ones. It is also problematic with full system synchronization as it is added by the next patch. Remove the TSC heuristic and instead replace it with a simple heuristic to print corrected errors first and after that uncorrected errors and finally the worst machine check as determined by the machine check handler. This simplifies the code because there is no need to pass the original TSC value around. Contains fixes from Ying Huang [ Impact: bug fix, cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Ying Huang <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
Normally the machine check handler ignores corrected errors and leaves them to machine_check_poll(). But when panicing mcp won't run, so log all errors. Note: this can still miss some cases until the "early no way out" patch later is applied too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
Experience has shown that struct mce which is used to pass an machine check to the user space daemon currently a few limitations. Also some data which is useful to print at panic level is also missing. This patch addresses most of them. The same information is also printed out together with mce panic. struct mce can be painlessly extended in a compatible way, the mcelog user space code just ignores additional fields with a warning. - It doesn't provide a wall time timestamp. There have been a few complaints about that. Fix that by adding a 64bit time_t - It doesn't provide the exact CPU identification. This makes it awkward for mcelog to decode the event correctly, especially when there are variations in the supported MCE codes on different CPU models or when mcelog is running on a different host after a panic. Previously the administrator had to specify the correct CPU when mcelog ran on a different host, but with the more variation in machine checks now it's better to auto detect that. It's also useful for more detailed analysis of CPU events. Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead. - Socket ID and initial APIC ID are useful to report because they allow to identify the failing CPU in some (not all) cases. This is also especially useful for the panic situation. This addresses one of the complaints from Thomas Gleixner earlier. - The MCG capabilities MSR needs to be reported for some advanced error processing in mcelog Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
The old struct mce had a limitation to 256 CPUs. But x86 Linux supports more than that now with x2apic. Add a new field extcpu to report the extended number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
This makes it easier for tools who want to extract the mcelog out of crash images or memory dumps to adapt to changing struct mce size. The length field replaces padding, so it's fully compatible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
Keep a count of the machine check polls (or CMCI events) in /proc/interrupts. Andi needs this for debugging, but it's also useful in general to see what's going in by the kernel. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Andi Kleen authored
Useful for debugging, but it's also good general policy to have a counter for all special interrupts there. This makes it easier to diagnose where a CPU is spending its time. [ Impact: feature, debugging tool ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
- 01 Jun, 2009 13 commits
-
-
H. Peter Anvin authored
Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa and modified in x86/mce3; this merge resolves the conflict. Conflicts: arch/x86/kernel/irqinit.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>
-
Ingo Molnar authored
Merge reason: irq/numa didnt build because this commit: 2759c328: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Ingo Molnar authored
Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/stagingLinus Torvalds authored
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: hwmon: Update documentation on fan_max hwmon: (lm78) Add missing __devexit_p()
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Fix section attribute warnings. sparc64: Fix SET_PERSONALITY to not clip bits outside of PER_MASK.
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: 3c509: Add missing EISA IDs MAINTAINERS: take maintainership of the cpmac Ethernet driver net/firmare: Ignore .cis files ath1e: add new device id for asus hardware mlx4_en: Fix a kernel panic when waking tx queue rtl8187: add USB ID for Linksys WUSB54GC-EU v2 USB wifi dongle at76c50x-usb: avoid mutex deadlock in at76_dwork_hw_scan mac8390: fix build with NET_POLL_CONTROLLER cxgb3: link fault fixes cxgb3: fix dma mapping regression netfilter: nfnetlink_log: fix wrong skbuff size calculation netfilter: xt_hashlimit does a wrong SEQ_SKIP bfin_mac: fix build error due to net_device_ops convert atlx: move modinfo data from atlx.h to atl1.c gianfar: fix babbling rx error event bug cls_cgroup: read classid atomically in classifier netfilter: nf_ct_dccp: add missing DCCP protocol changes in event cache netfilter: nf_ct_tcp: fix accepting invalid RST segments
-
git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/headers-check-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/headers-check-2.6: headers_check fix: linux/net_dropmon.h headers_check fix: linux/auto_fs.h
-
Christian Engelmayer authored
Add fan_max description. Add fan limit alarm 'max_alarm' to the alarm section. Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Mike Frysinger authored
The remove function uses __devexit, so the .remove assignment needs __devexit_p() to fix a build error with hotplug disabled. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Maciej W. Rozycki authored
Several EISA device IDs for 3c509 family network cards are missing from the driver, making the cards unusable in their EISA mode. Here's a fix to add them based on the EISA configuration files distributed by 3Com and our eisa.ids database. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
This patch adds me as the maintainer of the CPMAC (AR7) Ethernet driver. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jaswinder Singh Rajput authored
fix the following 'make headers_check' warnings: usr/include/linux/net_dropmon.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
-
Jaswinder Singh Rajput authored
fix the following 'make headers_check' warnings: usr/include/linux/auto_fs.h:17: include of <linux/types.h> is preferred over <asm/types.h> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
-
- 30 May, 2009 18 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: ide_pci_generic: add quirk for Netcell ATA RAID
-
Bartlomiej Zolnierkiewicz authored
We need to explicitly mark words 85-87 as valid ones since firmware doesn't do it. This should fix support for LBA48 and FLUSH CACHE [EXT] command which stopped working after we applied more strict checking of identify words in: commit 942dcd85 ("ide: idedisk_supports_lba48() -> ata_id_lba48_enabled()") and commit 4b58f17d ("ide: ide_id_has_flush_cache() -> ata_id_flush_enabled()") Reported-and-tested-by: "Trevor Hemsley" <trevor.hemsley@ntlworld.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6Linus Torvalds authored
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI, i915: build fix (v2) acpi-cpufreq: fix printk typo and indentation ACPI processor: remove spurious newline from warning message drm/i915: acpi/video.c fix section mismatch warning ACPI: video: DMI workaround broken Acer 5315 BIOS enabling display brightness ACPI: video: DMI workaround broken eMachines E510 BIOS enabling display brightness ACPI: sanity check _PSS frequency to prevent cpufreq crash i7300_idle: allow testing on i5000-series hardware w/o re-compile PCI/ACPI: fix wrong ref count handling in acpi_pci_bind() cpuidle: fix AMD C1E suspend hang cpuidle: makes AMD C1E work in acpi_idle
-
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_txLinus Torvalds authored
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: fsldma: Fix compile warnings fsldma: fix memory leak on error path in fsl_dma_prep_memcpy() fsldma: snooping is not enabled for last entry in descriptor chain fsldma: fix infinite loop on multi-descriptor DMA chain completion fsldma: fix "DMA halt timeout!" errors fsldma: fix check on potential fdev->chan[] overflow fsldma: update mailling list address in MAINTAINERS
-
Ryusuke Konishi authored
The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the header block of checkpoint file in case of errors. This fixes the leak bug. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
Matt Kraai authored
Signed-off-by: Matt Kraai <kraai@ftbfs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Greg Kroah-Hartman authored
Gary Lin reports that a new device id needs to be added to the atl1e in order to get some new Asus hardware to work properly. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yevgeny Petrilin authored
When the transmit queue gets full we enable interrupts for TX completions There was a race that we handled the TX queue both from the interrupt context and from the transmit function. Using "spin_trylock_irq()" ensures this doesn't happen. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Len Brown authored
Merge branches 'bugzilla-13121+', 'bugzilla-13233', 'redhat-bugzilla-500311', 'pci-bind-oops', 'misc-2.6.30' and 'i7300_idle' into release
-
Len Brown authored
drivers/built-in.o: In function `intel_opregion_init': (.text+0x9d540): undefined reference to `acpi_video_register' v2: move under DRM_I915 from DRM_I915_KMS Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
-
Joe Perches authored
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
-
Frans Pop authored
Commit 4973b22a ("ACPI processor: reset the throttling state once it's invalid") introduced a new warning which prints a spurious newline. The ACPI_WARNING macro that is used already takes care of adding a newline, after adding ACPI_CA_VERSION to the message. Remove the newline to avoid the message getting split into two lines. Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: Len Brown <len.brown@intel.com>
-
Jaswinder Singh Rajput authored
Currently acpi_video_exit() is exported as well as using __exit which causes: WARNING: drivers/acpi/video.o(__ksymtab+0x0): Section mismatch in reference from the variable __ksymtab_acpi_video_exit to the function .exit.text:acpi_video_exit() The symbol acpi_video_exit is exported and annotated __exit Fix this by removing the __exit annotation of acpi_video_exit or drop the export. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
http://bugzilla.kernel.org/show_bug.cgi?id=13121Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
http://bugzilla.kernel.org/show_bug.cgi?id=13376Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Len Brown authored
When BIOS SETUP is changed to disable EIST, some BIOS hand the OS an un-initialized _PSS: Name (_PSS, Package (0x06) { Package (0x06) { 0x80000000, // frequency [MHz] 0x80000000, // power [mW] 0x80000000, // latency [us] 0x80000000, // BM latency [us] 0x80000000, // control 0x80000000 // status }, ... These are outrageous values for frequency, power and latency, raising the question where to draw the line between legal and illegal. We tend to survive garbage in the power and latency fields, but we can BUG_ON when garbage is in the frequency field. Cpufreq multiplies the frequency by 1000 and stores it in a u32 KHz. So disregard a _PSS with a frequency so large that it can't be represented by cpufreq. https://bugzilla.redhat.com/show_bug.cgi?id=500311Signed-off-by: Len Brown <len.brown@intel.com>
-