mm: dirty page tracking race fix There is a race with dirty page accounting where a page may not properly be accounted for. clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty. page_mkclean walks the rmaps for that page, and for each one it cleans and write protects the pte if it was dirty. It uses page_check_address to find the pte. That function has a shortcut to avoid the ptl if the pte is not present. Unfortunately, the pte can be switched to not-present then back to present by other code while holding the page table lock -- this should not be a signal for page_mkclean to ignore that pte, because it may be dirty. For example, powerpc64's set_pte_at will clear a previously present pte before setting it to the desired value. There may also be other code in core mm or in arch which do similar things. The consequence of the bug is loss of data integrity due to msync, and loss of dirty page accounting accuracy. XIP's __xip_unmap could easily also be unreliable (depending on the exact XIP locking scheme), which can lead to data corruption. Fix this by having an option to always take ptl to check the pte in page_check_address. It's possible to retain this optimization for page_referenced and try_to_unmap. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Carsten Otte <cotte@freenet.de> Cc: Hugh Dickins <hugh@veritas.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit: 479db0bf408e65baa14d2a9821abfcbc0804b847 [log] [tgz]
author: Nick Piggin <npiggin@suse.de> Wed Aug 20 14:09:18 2008 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> Wed Aug 20 15:40:32 2008 -0700
tree: acdaaed567afefa36ac2fe27cfe22cfefeb50cd5
parent: 2d70b68d42b5196a48ccb639e3797f097ef5bea3 [diff]
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 69407f8..fed6f5e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h

@@ -102,7 +102,7 @@
  * Called from mm/filemap_xip.c to unmap empty zero page
  */
 pte_t *page_check_address(struct page *, struct mm_struct *,
-				unsigned long, spinlock_t **);
+				unsigned long, spinlock_t **, int);
 
 /*
  * Used by swapoff to help locate where page is expected in vma.

diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index 380ab40..8b710ca 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c

@@ -185,7 +185,7 @@
 		address = vma->vm_start +
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
-		pte = page_check_address(page, mm, address, &ptl);
+		pte = page_check_address(page, mm, address, &ptl, 1);
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));

diff --git a/mm/rmap.c b/mm/rmap.c
index 0597747..0383acf 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c

@@ -224,10 +224,14 @@
 /*
  * Check that @page is mapped at @address into @mm.
  *
+ * If @sync is false, page_check_address may perform a racy check to avoid
+ * the page table lock when the pte is not present (helpful when reclaiming
+ * highly shared pages).
+ *
  * On success returns with pte mapped and locked.
  */
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp)
+			  unsigned long address, spinlock_t **ptlp, int sync)
 {
 	pgd_t *pgd;
 	pud_t *pud;
@@ -249,7 +253,7 @@
 
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
-	if (!pte_present(*pte)) {
+	if (!sync && !pte_present(*pte)) {
 		pte_unmap(pte);
 		return NULL;
 	}
@@ -281,7 +285,7 @@
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
 
@@ -450,7 +454,7 @@
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &ptl, 1);
 	if (!pte)
 		goto out;
 
@@ -704,7 +708,7 @@
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
commit	479db0bf408e65baa14d2a9821abfcbc0804b847	[log] [tgz]
author	Nick Piggin <npiggin@suse.de>	Wed Aug 20 14:09:18 2008 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	Wed Aug 20 15:40:32 2008 -0700
tree	acdaaed567afefa36ac2fe27cfe22cfefeb50cd5
parent	2d70b68d42b5196a48ccb639e3797f097ef5bea3 [diff]