Commit 3ad33b24 authored by Lee Schermerhorn's avatar Lee Schermerhorn Committed by Linus Torvalds

Migration: find correct vma in new_vma_page()

We hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via
mbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas.  For
anon-regions, we just fail to migrate any pages beyond the 1st vma in the
range.

This occurs because do_mbind() collects a list of pages to migrate by
calling check_range().  check_range() walks the task's mm, spanning vmas as
necessary, to collect the migratable pages into a list.  Then, do_mbind()
calls migrate_pages() passing the list of pages, a function to allocate new
pages based on vma policy [new_vma_page()], and a pointer to the first vma
of the range.

For each page in the list, new_vma_page() calls page_address_in_vma()
passing the page and the vma [first in range] to obtain the address to get
for alloc_page_vma().  The page address is needed to get interleaving
policy correct.  If the pages in the list come from multiple vmas,
eventually, new_page_address() will pass that page to page_address_in_vma()
with the incorrect vma.  For !PageAnon pages, this will result in a bug
check in rmap.c:vma_address().  For anon pages, vma_address() will just
return EFAULT and fail the migration.

This patch modifies new_vma_page() to check the return value from
page_address_in_vma().  If the return value is EFAULT, new_vma_page()
searchs forward via vm_next for the vma that maps the page--i.e., that does
not return EFAULT.  This assumes that the pages in the list handed to
migrate_pages() is in address order.  This is currently case.  The patch
documents this assumption in a new comment block for new_vma_page().

If new_vma_page() cannot locate the vma mapping the page in a forward
search in the mm, it will pass a NULL vma to alloc_page_vma().  This will
result in the allocation using the task policy, if any, else system default
policy.  This situation is unlikely, but the patch documents this behavior
with a comment.

Note, this patch results in restarting from the first vma in a multi-vma
range each time new_vma_page() is called.  If this is not acceptable, we
can make the vma argument a pointer, both in new_vma_page() and it's caller
unmap_and_move() so that the value held by the loop in migrate_pages()
always passes down the last vma in which a page was found.  This will
require changes to all new_page_t functions passed to migrate_pages().  Is
this necessary?

For this patch to work, we can't bug check in vma_address() for pages
outside the argument vma.  This patch removes the BUG_ON().  All other
callers [besides new_vma_page()] already check the return status.

Tested on x86_64, 4 node NUMA platform.
Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: default avatarChristoph Lameter <clameter@sgi.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent e1a1c997
...@@ -722,12 +722,29 @@ out: ...@@ -722,12 +722,29 @@ out:
} }
/*
* Allocate a new page for page migration based on vma policy.
* Start assuming that page is mapped by vma pointed to by @private.
* Search forward from there, if not. N.B., this assumes that the
* list of pages handed to migrate_pages()--which is how we get here--
* is in virtual address order.
*/
static struct page *new_vma_page(struct page *page, unsigned long private, int **x) static struct page *new_vma_page(struct page *page, unsigned long private, int **x)
{ {
struct vm_area_struct *vma = (struct vm_area_struct *)private; struct vm_area_struct *vma = (struct vm_area_struct *)private;
unsigned long uninitialized_var(address);
return alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, while (vma) {
page_address_in_vma(page, vma)); address = page_address_in_vma(page, vma);
if (address != -EFAULT)
break;
vma = vma->vm_next;
}
/*
* if !vma, alloc_page_vma() will use task or system default policy
*/
return alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
} }
#else #else
......
...@@ -183,7 +183,9 @@ static void page_unlock_anon_vma(struct anon_vma *anon_vma) ...@@ -183,7 +183,9 @@ static void page_unlock_anon_vma(struct anon_vma *anon_vma)
} }
/* /*
* At what user virtual address is page expected in vma? * At what user virtual address is page expected in @vma?
* Returns virtual address or -EFAULT if page's index/offset is not
* within the range mapped the @vma.
*/ */
static inline unsigned long static inline unsigned long
vma_address(struct page *page, struct vm_area_struct *vma) vma_address(struct page *page, struct vm_area_struct *vma)
...@@ -193,8 +195,7 @@ vma_address(struct page *page, struct vm_area_struct *vma) ...@@ -193,8 +195,7 @@ vma_address(struct page *page, struct vm_area_struct *vma)
address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
/* page should be within any vma from prio_tree_next */ /* page should be within @vma mapping range */
BUG_ON(!PageAnon(page));
return -EFAULT; return -EFAULT;
} }
return address; return address;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment