Commit d4220f98 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (34 commits)
  HWPOISON: Remove stray phrase in a comment
  HWPOISON: Try to allocate migration page on the same node
  HWPOISON: Don't do early filtering if filter is disabled
  HWPOISON: Add a madvise() injector for soft page offlining
  HWPOISON: Add soft page offline support
  HWPOISON: Undefine short-hand macros after use to avoid namespace conflict
  HWPOISON: Use new shake_page in memory_failure
  HWPOISON: Use correct name for MADV_HWPOISON in documentation
  HWPOISON: mention HWPoison in Kconfig entry
  HWPOISON: Use get_user_page_fast in hwpoison madvise
  HWPOISON: add an interface to switch off/on all the page filters
  HWPOISON: add memory cgroup filter
  memcg: add accessor to mem_cgroup.css
  memcg: rename and export try_get_mem_cgroup_from_page()
  HWPOISON: add page flags filter
  mm: export stable page flags
  HWPOISON: limit hwpoison injector to known page types
  HWPOISON: add fs/device filters
  HWPOISON: return 0 to indicate success reliably
  HWPOISON: make semantics of IGNORED/DELAYED clear
  ...
parents 61cf6931 f2c03deb
What: /sys/devices/system/memory/soft_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
Contact: andi@firstfloor.org
Description:
Soft-offline the memory page containing the physical address
written into this file. Input is a hex number specifying the
physical address of the page. The kernel will then attempt
to soft-offline it, by moving the contents elsewhere or
dropping it if possible. The kernel will then be placed
on the bad page list and never be reused.
The offlining is done in kernel specific granuality.
Normally it's the base page size of the kernel, but
this might change.
The page must be still accessible, not poisoned. The
kernel will never kill anything for this, but rather
fail the offline. Return value is the size of the
number, or a error when the offlining failed. Reading
the file is not allowed.
What: /sys/devices/system/memory/hard_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
Contact: andi@firstfloor.org
Description:
Hard-offline the memory page containing the physical
address written into this file. Input is a hex number
specifying the physical address of the page. The
kernel will then attempt to hard-offline the page, by
trying to drop the page or killing any owner or
triggering IO errors if needed. Note this may kill
any processes owning the page. The kernel will avoid
to access this page assuming it's poisoned by the
hardware.
The offlining is done in kernel specific granuality.
Normally it's the base page size of the kernel, but
this might change.
Return value is the size of the number, or a error when
the offlining failed.
Reading the file is not allowed.
...@@ -92,16 +92,62 @@ PR_MCE_KILL_GET ...@@ -92,16 +92,62 @@ PR_MCE_KILL_GET
Testing: Testing:
madvise(MADV_POISON, ....) madvise(MADV_HWPOISON, ....)
(as root) (as root)
Poison a page in the process for testing Poison a page in the process for testing
hwpoison-inject module through debugfs hwpoison-inject module through debugfs
/sys/debug/hwpoison/corrupt-pfn
Inject hwpoison fault at PFN echoed into this file /sys/debug/hwpoison/
corrupt-pfn
Inject hwpoison fault at PFN echoed into this file. This does
some early filtering to avoid corrupted unintended pages in test suites.
unpoison-pfn
Software-unpoison page at PFN echoed into this file. This
way a page can be reused again.
This only works for Linux injected failures, not for real
memory failures.
Note these injection interfaces are not stable and might change between
kernel versions
corrupt-filter-dev-major
corrupt-filter-dev-minor
Only handle memory failures to pages associated with the file system defined
by block device major/minor. -1U is the wildcard value.
This should be only used for testing with artificial injection.
corrupt-filter-memcg
Limit injection to pages owned by memgroup. Specified by inode number
of the memcg.
Example:
mkdir /cgroup/hwpoison
usemem -m 100 -s 1000 &
echo `jobs -p` > /cgroup/hwpoison/tasks
memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ')
echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg
page-types -p `pidof init` --hwpoison # shall do nothing
page-types -p `pidof usemem` --hwpoison # poison its pages
corrupt-filter-flags-mask
corrupt-filter-flags-value
When specified, only poison pages if ((page_flags & mask) == value).
This allows stress testing of many kinds of pages. The page_flags
are the same as in /proc/kpageflags. The flag bits are defined in
include/linux/kernel-page-flags.h and documented in
Documentation/vm/pagemap.txt
Architecture specific MCE injector Architecture specific MCE injector
......
/* /*
* page-types: Tool for querying page flags * page-types: Tool for querying page flags
* *
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; version 2.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should find a copy of v2 of the GNU General Public License somewhere on
* your Linux system; if not, write to the Free Software Foundation, Inc., 59
* Temple Place, Suite 330, Boston, MA 02111-1307 USA.
*
* Copyright (C) 2009 Intel corporation * Copyright (C) 2009 Intel corporation
* *
* Authors: Wu Fengguang <fengguang.wu@intel.com> * Authors: Wu Fengguang <fengguang.wu@intel.com>
*
* Released under the General Public License (GPL).
*/ */
#define _LARGEFILE64_SOURCE #define _LARGEFILE64_SOURCE
......
...@@ -2377,6 +2377,15 @@ W: http://www.kernel.org/pub/linux/kernel/people/fseidel/hdaps/ ...@@ -2377,6 +2377,15 @@ W: http://www.kernel.org/pub/linux/kernel/people/fseidel/hdaps/
S: Maintained S: Maintained
F: drivers/hwmon/hdaps.c F: drivers/hwmon/hdaps.c
HWPOISON MEMORY FAILURE HANDLING
M: Andi Kleen <andi@firstfloor.org>
L: linux-mm@kvack.org
L: linux-kernel@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git hwpoison
S: Maintained
F: mm/memory-failure.c
F: mm/hwpoison-inject.c
HYPERVISOR VIRTUAL CONSOLE DRIVER HYPERVISOR VIRTUAL CONSOLE DRIVER
L: linuxppc-dev@ozlabs.org L: linuxppc-dev@ozlabs.org
S: Odd Fixes S: Odd Fixes
......
...@@ -341,6 +341,64 @@ static inline int memory_probe_init(void) ...@@ -341,6 +341,64 @@ static inline int memory_probe_init(void)
} }
#endif #endif
#ifdef CONFIG_MEMORY_FAILURE
/*
* Support for offlining pages of memory
*/
/* Soft offline a page */
static ssize_t
store_soft_offline_page(struct class *class, const char *buf, size_t count)
{
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (strict_strtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
ret = soft_offline_page(pfn_to_page(pfn), 0);
return ret == 0 ? count : ret;
}
/* Forcibly offline a page, including killing processes. */
static ssize_t
store_hard_offline_page(struct class *class, const char *buf, size_t count)
{
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (strict_strtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
ret = __memory_failure(pfn, 0, 0);
return ret ? ret : count;
}
static CLASS_ATTR(soft_offline_page, 0644, NULL, store_soft_offline_page);
static CLASS_ATTR(hard_offline_page, 0644, NULL, store_hard_offline_page);
static __init int memory_fail_init(void)
{
int err;
err = sysfs_create_file(&memory_sysdev_class.kset.kobj,
&class_attr_soft_offline_page.attr);
if (!err)
err = sysfs_create_file(&memory_sysdev_class.kset.kobj,
&class_attr_hard_offline_page.attr);
return err;
}
#else
static inline int memory_fail_init(void)
{
return 0;
}
#endif
/* /*
* Note that phys_device is optional. It is here to allow for * Note that phys_device is optional. It is here to allow for
* differentiation between which *physical* devices each * differentiation between which *physical* devices each
...@@ -471,6 +529,9 @@ int __init memory_dev_init(void) ...@@ -471,6 +529,9 @@ int __init memory_dev_init(void)
} }
err = memory_probe_init(); err = memory_probe_init();
if (!ret)
ret = err;
err = memory_fail_init();
if (!ret) if (!ret)
ret = err; ret = err;
err = block_size_init(); err = block_size_init();
......
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
#include <linux/proc_fs.h> #include <linux/proc_fs.h>
#include <linux/seq_file.h> #include <linux/seq_file.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
#include <linux/kernel-page-flags.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include "internal.h" #include "internal.h"
...@@ -71,52 +72,12 @@ static const struct file_operations proc_kpagecount_operations = { ...@@ -71,52 +72,12 @@ static const struct file_operations proc_kpagecount_operations = {
* physical page flags. * physical page flags.
*/ */
/* These macros are used to decouple internal flags from exported ones */
#define KPF_LOCKED 0
#define KPF_ERROR 1
#define KPF_REFERENCED 2
#define KPF_UPTODATE 3
#define KPF_DIRTY 4
#define KPF_LRU 5
#define KPF_ACTIVE 6
#define KPF_SLAB 7
#define KPF_WRITEBACK 8
#define KPF_RECLAIM 9
#define KPF_BUDDY 10
/* 11-20: new additions in 2.6.31 */
#define KPF_MMAP 11
#define KPF_ANON 12
#define KPF_SWAPCACHE 13
#define KPF_SWAPBACKED 14
#define KPF_COMPOUND_HEAD 15
#define KPF_COMPOUND_TAIL 16
#define KPF_HUGE 17
#define KPF_UNEVICTABLE 18
#define KPF_HWPOISON 19
#define KPF_NOPAGE 20
#define KPF_KSM 21
/* kernel hacking assistances
* WARNING: subject to change, never rely on them!
*/
#define KPF_RESERVED 32
#define KPF_MLOCKED 33
#define KPF_MAPPEDTODISK 34
#define KPF_PRIVATE 35
#define KPF_PRIVATE_2 36
#define KPF_OWNER_PRIVATE 37
#define KPF_ARCH 38
#define KPF_UNCACHED 39
static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit) static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
{ {
return ((kflags >> kbit) & 1) << ubit; return ((kflags >> kbit) & 1) << ubit;
} }
static u64 get_uflags(struct page *page) u64 stable_page_flags(struct page *page)
{ {
u64 k; u64 k;
u64 u; u64 u;
...@@ -219,7 +180,7 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf, ...@@ -219,7 +180,7 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
else else
ppage = NULL; ppage = NULL;
if (put_user(get_uflags(ppage), out)) { if (put_user(stable_page_flags(ppage), out)) {
ret = -EFAULT; ret = -EFAULT;
break; break;
} }
......
...@@ -40,6 +40,7 @@ ...@@ -40,6 +40,7 @@
#define MADV_DONTFORK 10 /* don't inherit across fork */ #define MADV_DONTFORK 10 /* don't inherit across fork */
#define MADV_DOFORK 11 /* do inherit across fork */ #define MADV_DOFORK 11 /* do inherit across fork */
#define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_HWPOISON 100 /* poison a page for testing */
#define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */
#define MADV_MERGEABLE 12 /* KSM may merge identical pages */ #define MADV_MERGEABLE 12 /* KSM may merge identical pages */
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */ #define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
......
#ifndef LINUX_KERNEL_PAGE_FLAGS_H
#define LINUX_KERNEL_PAGE_FLAGS_H
/*
* Stable page flag bits exported to user space
*/
#define KPF_LOCKED 0
#define KPF_ERROR 1
#define KPF_REFERENCED 2
#define KPF_UPTODATE 3
#define KPF_DIRTY 4
#define KPF_LRU 5
#define KPF_ACTIVE 6
#define KPF_SLAB 7
#define KPF_WRITEBACK 8
#define KPF_RECLAIM 9
#define KPF_BUDDY 10
/* 11-20: new additions in 2.6.31 */
#define KPF_MMAP 11
#define KPF_ANON 12
#define KPF_SWAPCACHE 13
#define KPF_SWAPBACKED 14
#define KPF_COMPOUND_HEAD 15
#define KPF_COMPOUND_TAIL 16
#define KPF_HUGE 17
#define KPF_UNEVICTABLE 18
#define KPF_HWPOISON 19
#define KPF_NOPAGE 20
#define KPF_KSM 21
/* kernel hacking assistances
* WARNING: subject to change, never rely on them!
*/
#define KPF_RESERVED 32
#define KPF_MLOCKED 33
#define KPF_MAPPEDTODISK 34
#define KPF_PRIVATE 35
#define KPF_PRIVATE_2 36
#define KPF_OWNER_PRIVATE 37
#define KPF_ARCH 38
#define KPF_UNCACHED 39
#endif /* LINUX_KERNEL_PAGE_FLAGS_H */
...@@ -73,6 +73,7 @@ extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan, ...@@ -73,6 +73,7 @@ extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
static inline static inline
...@@ -85,6 +86,8 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup) ...@@ -85,6 +86,8 @@ int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup)
return cgroup == mem; return cgroup == mem;
} }
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);
extern int extern int
mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr); mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
extern void mem_cgroup_end_migration(struct mem_cgroup *mem, extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
...@@ -202,6 +205,11 @@ mem_cgroup_move_lists(struct page *page, enum lru_list from, enum lru_list to) ...@@ -202,6 +205,11 @@ mem_cgroup_move_lists(struct page *page, enum lru_list from, enum lru_list to)
{ {
} }
static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
{
return NULL;
}
static inline int mm_match_cgroup(struct mm_struct *mm, struct mem_cgroup *mem) static inline int mm_match_cgroup(struct mm_struct *mm, struct mem_cgroup *mem)
{ {
return 1; return 1;
...@@ -213,6 +221,11 @@ static inline int task_in_mem_cgroup(struct task_struct *task, ...@@ -213,6 +221,11 @@ static inline int task_in_mem_cgroup(struct task_struct *task,
return 1; return 1;
} }
static inline struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem)
{
return NULL;
}
static inline int static inline int
mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
{ {
......
...@@ -1331,11 +1331,17 @@ extern int account_locked_memory(struct mm_struct *mm, struct rlimit *rlim, ...@@ -1331,11 +1331,17 @@ extern int account_locked_memory(struct mm_struct *mm, struct rlimit *rlim,
size_t size); size_t size);
extern void refund_locked_memory(struct mm_struct *mm, size_t size); extern void refund_locked_memory(struct mm_struct *mm, size_t size);
enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
};
extern void memory_failure(unsigned long pfn, int trapno); extern void memory_failure(unsigned long pfn, int trapno);
extern int __memory_failure(unsigned long pfn, int trapno, int ref); extern int __memory_failure(unsigned long pfn, int trapno, int flags);
extern int unpoison_memory(unsigned long pfn);
extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_early_kill;
extern int sysctl_memory_failure_recovery; extern int sysctl_memory_failure_recovery;
extern void shake_page(struct page *p, int access);
extern atomic_long_t mce_bad_pages; extern atomic_long_t mce_bad_pages;
extern int soft_offline_page(struct page *page, int flags);
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */ #endif /* _LINUX_MM_H */
...@@ -275,13 +275,15 @@ PAGEFLAG_FALSE(Uncached) ...@@ -275,13 +275,15 @@ PAGEFLAG_FALSE(Uncached)
#ifdef CONFIG_MEMORY_FAILURE #ifdef CONFIG_MEMORY_FAILURE
PAGEFLAG(HWPoison, hwpoison) PAGEFLAG(HWPoison, hwpoison)
TESTSETFLAG(HWPoison, hwpoison) TESTSCFLAG(HWPoison, hwpoison)
#define __PG_HWPOISON (1UL << PG_hwpoison) #define __PG_HWPOISON (1UL << PG_hwpoison)
#else #else
PAGEFLAG_FALSE(HWPoison) PAGEFLAG_FALSE(HWPoison)
#define __PG_HWPOISON 0 #define __PG_HWPOISON 0
#endif #endif
u64 stable_page_flags(struct page *page);
static inline int PageUptodate(struct page *page) static inline int PageUptodate(struct page *page)
{ {
int ret = test_bit(PG_uptodate, &(page)->flags); int ret = test_bit(PG_uptodate, &(page)->flags);
......
...@@ -251,8 +251,9 @@ config MEMORY_FAILURE ...@@ -251,8 +251,9 @@ config MEMORY_FAILURE
special hardware support and typically ECC memory. special hardware support and typically ECC memory.
config HWPOISON_INJECT config HWPOISON_INJECT
tristate "Poison pages injector" tristate "HWPoison pages injector"
depends on MEMORY_FAILURE && DEBUG_KERNEL depends on MEMORY_FAILURE && DEBUG_KERNEL
select PROC_PAGE_MONITOR
config NOMMU_INITIAL_TRIM_EXCESS config NOMMU_INITIAL_TRIM_EXCESS
int "Turn on mmap() excess space trimming before booting" int "Turn on mmap() excess space trimming before booting"
......
...@@ -3,18 +3,68 @@ ...@@ -3,18 +3,68 @@
#include <linux/debugfs.h> #include <linux/debugfs.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/swap.h>
#include <linux/pagemap.h>
#include "internal.h"
static struct dentry *hwpoison_dir, *corrupt_pfn; static struct dentry *hwpoison_dir;
static int hwpoison_inject(void *data, u64 val) static int hwpoison_inject(void *data, u64 val)
{ {
unsigned long pfn = val;
struct page *p;
int err;
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
return -EPERM; return -EPERM;
printk(KERN_INFO "Injecting memory failure at pfn %Lx\n", val);
return __memory_failure(val, 18, 0); if (!hwpoison_filter_enable)
goto inject;
if (!pfn_valid(pfn))
return -ENXIO;
p = pfn_to_page(pfn);
/*
* This implies unable to support free buddy pages.
*/
if (!get_page_unless_zero(p))
return 0;
if (!PageLRU(p))
shake_page(p, 0);
/*
* This implies unable to support non-LRU pages.
*/
if (!PageLRU(p))
return 0;
/*
* do a racy check with elevated page count, to make sure PG_hwpoison
* will only be set for the targeted owner (or on a free page).
* We temporarily take page lock for try_get_mem_cgroup_from_page().
* __memory_failure() will redo the check reliably inside page lock.
*/
lock_page(p);
err = hwpoison_filter(p);
unlock_page(p);
if (err)
return 0;
inject:
printk(KERN_INFO "Injecting memory failure at pfn %lx\n", pfn);
return __memory_failure(pfn, 18, MF_COUNT_INCREASED);
}
static int hwpoison_unpoison(void *data, u64 val)
{
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
return unpoison_memory(val);
} }
DEFINE_SIMPLE_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n"); DEFINE_SIMPLE_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
DEFINE_SIMPLE_ATTRIBUTE(unpoison_fops, NULL, hwpoison_unpoison, "%lli\n");
static void pfn_inject_exit(void) static void pfn_inject_exit(void)
{ {
...@@ -24,16 +74,63 @@ static void pfn_inject_exit(void) ...@@ -24,16 +74,63 @@ static void pfn_inject_exit(void)
static int pfn_inject_init(void) static int pfn_inject_init(void)
{ {
struct dentry *dentry;
hwpoison_dir = debugfs_create_dir("hwpoison", NULL); hwpoison_dir = debugfs_create_dir("hwpoison", NULL);
if (hwpoison_dir == NULL) if (hwpoison_dir == NULL)
return -ENOMEM; return -ENOMEM;
corrupt_pfn = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir,
/*
* Note that the below poison/unpoison interfaces do not involve
* hardware status change, hence do not require hardware support.
* They are mainly for testing hwpoison in software level.
*/
dentry = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir,
NULL, &hwpoison_fops); NULL, &hwpoison_fops);
if (corrupt_pfn == NULL) { if (!dentry)
goto fail;
dentry = debugfs_create_file("unpoison-pfn", 0600, hwpoison_dir,
NULL, &unpoison_fops);
if (!dentry)
goto fail;
dentry = debugfs_create_u32("corrupt-filter-enable", 0600,
hwpoison_dir, &hwpoison_filter_enable);
if (!dentry)
goto fail;
dentry = debugfs_create_u32("corrupt-filter-dev-major", 0600,
hwpoison_dir, &hwpoison_filter_dev_major);
if (!dentry)
goto fail;
dentry = debugfs_create_u32("corrupt-filter-dev-minor", 0600,
hwpoison_dir, &hwpoison_filter_dev_minor);
if (!dentry)
goto fail;
dentry = debugfs_create_u64("corrupt-filter-flags-mask", 0600,
hwpoison_dir, &hwpoison_filter_flags_mask);
if (!dentry)
goto fail;
dentry = debugfs_create_u64("corrupt-filter-flags-value", 0600,
hwpoison_dir, &hwpoison_filter_flags_value);
if (!dentry)
goto fail;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
dentry = debugfs_create_u64("corrupt-filter-memcg", 0600,
hwpoison_dir, &hwpoison_filter_memcg);
if (!dentry)
goto fail;
#endif
return 0;
fail:
pfn_inject_exit(); pfn_inject_exit();
return -ENOMEM; return -ENOMEM;
}
return 0;
} }
module_init(pfn_inject_init); module_init(pfn_inject_init);
......
...@@ -50,6 +50,9 @@ extern void putback_lru_page(struct page *page); ...@@ -50,6 +50,9 @@ extern void putback_lru_page(struct page *page);
*/ */
extern void __free_pages_bootmem(struct page *page, unsigned int order); extern void __free_pages_bootmem(struct page *page, unsigned int order);
extern void prep_compound_page(struct page *page, unsigned long order); extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
#endif
/* /*
...@@ -247,3 +250,12 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, ...@@ -247,3 +250,12 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
#define ZONE_RECLAIM_SOME 0 #define ZONE_RECLAIM_SOME 0
#define ZONE_RECLAIM_SUCCESS 1 #define ZONE_RECLAIM_SUCCESS 1
#endif #endif
extern int hwpoison_filter(struct page *p);
extern u32 hwpoison_filter_dev_major;
extern u32 hwpoison_filter_dev_minor;
extern u64 hwpoison_filter_flags_mask;
extern u64 hwpoison_filter_flags_value;
extern u64 hwpoison_filter_memcg;
extern u32 hwpoison_filter_enable;
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/syscalls.h> #include <linux/syscalls.h>
#include <linux/mempolicy.h> #include <linux/mempolicy.h>
#include <linux/page-isolation.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/ksm.h> #include <linux/ksm.h>
...@@ -222,7 +223,7 @@ static long madvise_remove(struct vm_area_struct *vma, ...@@ -222,7 +223,7 @@ static long madvise_remove(struct vm_area_struct *vma,
/* /*
* Error injection support for memory error handling. * Error injection support for memory error handling.
*/ */
static int madvise_hwpoison(unsigned long start, unsigned long end) static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
{ {
int ret = 0; int ret = 0;
...@@ -230,15 +231,21 @@ static int madvise_hwpoison(unsigned long start, unsigned long end) ...@@ -230,15 +231,21 @@ static int madvise_hwpoison(unsigned long start, unsigned long end)
return -EPERM; return -EPERM;
for (; start < end; start += PAGE_SIZE) { for (; start < end; start += PAGE_SIZE) {
struct page *p; struct page *p;
int ret = get_user_pages(current, current->mm, start, 1, int ret = get_user_pages_fast(start, 1, 0, &p);
0, 0, &p, NULL);
if (ret != 1) if (ret != 1)
return ret; return ret;
if (bhv == MADV_SOFT_OFFLINE) {
printk(KERN_INFO "Soft offlining page %lx at %lx\n",
page_to_pfn(p), start);
ret = soft_offline_page(p, MF_COUNT_INCREASED);
if (ret)
break;
continue;
}
printk(KERN_INFO "Injecting memory failure for page %lx at %lx\n", printk(KERN_INFO "Injecting memory failure for page %lx at %lx\n",
page_to_pfn(p), start); page_to_pfn(p), start);
/* Ignore return value for now */ /* Ignore return value for now */
__memory_failure(page_to_pfn(p), 0, 1); __memory_failure(page_to_pfn(p), 0, MF_COUNT_INCREASED);
put_page(p);
} }
return ret; return ret;
} }
...@@ -335,8 +342,8 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) ...@@ -335,8 +342,8 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
size_t len; size_t len;
#ifdef CONFIG_MEMORY_FAILURE #ifdef CONFIG_MEMORY_FAILURE
if (behavior == MADV_HWPOISON) if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
return madvise_hwpoison(start, start+len_in); return madvise_hwpoison(behavior, start, start+len_in);
#endif #endif
if (!madvise_behavior_valid(behavior)) if (!madvise_behavior_valid(behavior))
return error; return error;
......
...@@ -283,6 +283,11 @@ mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) ...@@ -283,6 +283,11 @@ mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
return &mem->info.nodeinfo[nid]->zoneinfo[zid]; return &mem->info.nodeinfo[nid]->zoneinfo[zid];
} }
struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem)
{
return &mem->css;
}
static struct mem_cgroup_per_zone * static struct mem_cgroup_per_zone *
page_cgroup_zoneinfo(struct page_cgroup *pc) page_cgroup_zoneinfo(struct page_cgroup *pc)
{ {
...@@ -1536,25 +1541,22 @@ static struct mem_cgroup *mem_cgroup_lookup(unsigned short id) ...@@ -1536,25 +1541,22 @@ static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
return container_of(css, struct mem_cgroup, css); return container_of(css, struct mem_cgroup, css);
} }
static struct mem_cgroup *try_get_mem_cgroup_from_swapcache(struct page *page) struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
{ {
struct mem_cgroup *mem; struct mem_cgroup *mem = NULL;
struct page_cgroup *pc; struct page_cgroup *pc;
unsigned short id; unsigned short id;
swp_entry_t ent; swp_entry_t ent;
VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(!PageLocked(page));
if (!PageSwapCache(page))
return NULL;
pc = lookup_page_cgroup(page); pc = lookup_page_cgroup(page);
lock_page_cgroup(pc); lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) { if (PageCgroupUsed(pc)) {
mem = pc->mem_cgroup; mem = pc->mem_cgroup;
if (mem && !css_tryget(&mem->css)) if (mem && !css_tryget(&mem->css))
mem = NULL; mem = NULL;
} else { } else if (PageSwapCache(page)) {
ent.val = page_private(page); ent.val = page_private(page);
id = lookup_swap_cgroup(ent); id = lookup_swap_cgroup(ent);
rcu_read_lock(); rcu_read_lock();
...@@ -1874,7 +1876,7 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm, ...@@ -1874,7 +1876,7 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
*/ */
if (!PageSwapCache(page)) if (!PageSwapCache(page))
goto charge_cur_mm; goto charge_cur_mm;
mem = try_get_mem_cgroup_from_swapcache(page); mem = try_get_mem_cgroup_from_page(page);
if (!mem) if (!mem)
goto charge_cur_mm; goto charge_cur_mm;
*ptr = mem; *ptr = mem;
......
This diff is collapsed.
...@@ -2555,6 +2555,10 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -2555,6 +2555,10 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
ret = VM_FAULT_MAJOR; ret = VM_FAULT_MAJOR;
count_vm_event(PGMAJFAULT); count_vm_event(PGMAJFAULT);
} else if (PageHWPoison(page)) { } else if (PageHWPoison(page)) {
/*
* hwpoisoned dirty swapcache pages are kept for killing
* owner processes (which may be unknown at hwpoison time)
*/
ret = VM_FAULT_HWPOISON; ret = VM_FAULT_HWPOISON;
delayacct_clear_flag(DELAYACCT_PF_SWAPIN); delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
goto out_release; goto out_release;
......
...@@ -5091,3 +5091,24 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) ...@@ -5091,3 +5091,24 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
spin_unlock_irqrestore(&zone->lock, flags); spin_unlock_irqrestore(&zone->lock, flags);
} }
#endif #endif
#ifdef CONFIG_MEMORY_FAILURE
bool is_free_buddy_page(struct page *page)
{
struct zone *zone = page_zone(page);
unsigned long pfn = page_to_pfn(page);
unsigned long flags;
int order;
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
struct page *page_head = page - (pfn & ((1 << order) - 1));
if (PageBuddy(page_head) && page_order(page_head) >= order)
break;
}
spin_unlock_irqrestore(&zone->lock, flags);
return order < MAX_ORDER;
}
#endif
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment