[kernel] r19101 - in dists/sid/linux/debian: . patches patches/bugfix/x86
Ben Hutchings
benh at alioth.debian.org
Sat Jun 9 00:10:32 UTC 2012
Author: benh
Date: Sat Jun 9 00:10:28 2012
New Revision: 19101
Log:
[i386] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE (Closes: #676360)
Added:
dists/sid/linux/debian/patches/bugfix/x86/thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
Modified:
dists/sid/linux/debian/changelog
dists/sid/linux/debian/patches/series
Modified: dists/sid/linux/debian/changelog
==============================================================================
--- dists/sid/linux/debian/changelog Fri Jun 8 13:16:33 2012 (r19100)
+++ dists/sid/linux/debian/changelog Sat Jun 9 00:10:28 2012 (r19101)
@@ -26,6 +26,8 @@
* net: Backport team driver from Linux 3.5-rc1
* proc: Backport hidepid mount option from Linux 3.4 (Closes: #669028)
* NFSv4: Reduce the footprint of the idmapper (Closes: #657078)
+ * [i386] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
+ (Closes: #676360)
-- Ben Hutchings <ben at decadent.org.uk> Sat, 02 Jun 2012 20:31:53 +0100
Added: dists/sid/linux/debian/patches/bugfix/x86/thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ dists/sid/linux/debian/patches/bugfix/x86/thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch Sat Jun 9 00:10:28 2012 (r19101)
@@ -0,0 +1,112 @@
+From: Andrea Arcangeli <aarcange at redhat.com>
+Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
+
+In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
+mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
+Xen.
+
+So instead of dealing only with "consistent" pmdvals in
+pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
+simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
+where the low 32bit and high 32bit could be inconsistent (to avoid having
+to use cmpxchg8b).
+
+The only guarantee we get from pmd_read_atomic is that if the low part of
+the pmd was found null, the high part will be null too (so the pmd will be
+considered unstable). And if the low part of the pmd is found "stable"
+later, then it means the whole pmd was read atomically (because after a
+pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
+and we read the high part after the low part).
+
+In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
+atomically to declare the pmd as "stable" and that's true for THP and no
+THP, furthermore in the THP case we also have a barrier() that will
+prevent any inconsistent pmdvals to be cached by a later re-read of the
+*pmd.
+
+Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
+Cc: Jonathan Nieder <jrnieder at gmail.com>
+Cc: Ulrich Obergfell <uobergfe at redhat.com>
+Cc: Mel Gorman <mgorman at suse.de>
+Cc: Hugh Dickins <hughd at google.com>
+Cc: Larry Woodman <lwoodman at redhat.com>
+Cc: Petr Matousek <pmatouse at redhat.com>
+Cc: Rik van Riel <riel at redhat.com>
+Cc: Jan Beulich <jbeulich at suse.com>
+Cc: KOSAKI Motohiro <kosaki.motohiro at gmail.com>
+Cc: <stable at vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
+---
+
+ arch/x86/include/asm/pgtable-3level.h | 30 +++++++++++++-----------
+ include/asm-generic/pgtable.h | 10 ++++++++
+ 2 files changed, 27 insertions(+), 13 deletions(-)
+
+--- a/arch/x86/include/asm/pgtable-3level.h
++++ b/arch/x86/include/asm/pgtable-3level.h
+@@ -47,16 +47,26 @@
+ * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
+ * operations.
+ *
+- * Without THP if the mmap_sem is hold for reading, the
+- * pmd can only transition from null to not null while pmd_read_atomic runs.
+- * So there's no need of literally reading it atomically.
++ * Without THP if the mmap_sem is hold for reading, the pmd can only
++ * transition from null to not null while pmd_read_atomic runs. So
++ * we can always return atomic pmd values with this function.
+ *
+ * With THP if the mmap_sem is hold for reading, the pmd can become
+- * THP or null or point to a pte (and in turn become "stable") at any
+- * time under pmd_read_atomic, so it's mandatory to read it atomically
+- * with cmpxchg8b.
++ * trans_huge or none or point to a pte (and in turn become "stable")
++ * at any time under pmd_read_atomic. We could read it really
++ * atomically here with a atomic64_read for the THP enabled case (and
++ * it would be a whole lot simpler), but to avoid using cmpxchg8b we
++ * only return an atomic pmdval if the low part of the pmdval is later
++ * found stable (i.e. pointing to a pte). And we're returning a none
++ * pmdval if the low part of the pmd is none. In some cases the high
++ * and low part of the pmdval returned may not be consistent if THP is
++ * enabled (the low part may point to previously mapped hugepage,
++ * while the high part may point to a more recently mapped hugepage),
++ * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
++ * of the pmd to be read atomically to decide if the pmd is unstable
++ * or not, with the only exception of when the low part of the pmd is
++ * zero in which case we return a none pmd.
+ */
+-#ifndef CONFIG_TRANSPARENT_HUGEPAGE
+ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
+ {
+ pmdval_t ret;
+@@ -74,12 +84,6 @@
+
+ return (pmd_t) { ret };
+ }
+-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
+-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
+-{
+- return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
+-}
+-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+ static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
+ {
+--- a/include/asm-generic/pgtable.h
++++ b/include/asm-generic/pgtable.h
+@@ -484,6 +484,16 @@
+ /*
+ * The barrier will stabilize the pmdval in a register or on
+ * the stack so that it will stop changing under the code.
++ *
++ * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
++ * pmd_read_atomic is allowed to return a not atomic pmdval
++ * (for example pointing to an hugepage that has never been
++ * mapped in the pmd). The below checks will only care about
++ * the low part of the pmd with 32bit PAE x86 anyway, with the
++ * exception of pmd_none(). So the important thing is that if
++ * the low part of the pmd is found null, the high part will
++ * be also null or the pmd_none() check below would be
++ * confused.
+ */
+ #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ barrier();
Modified: dists/sid/linux/debian/patches/series
==============================================================================
--- dists/sid/linux/debian/patches/series Fri Jun 8 13:16:33 2012 (r19100)
+++ dists/sid/linux/debian/patches/series Sat Jun 9 00:10:28 2012 (r19101)
@@ -304,6 +304,7 @@
features/all/AppArmor-compatibility-patch-for-v5-interface.patch
bugfix/x86/mm-pmd_read_atomic-fix-32bit-pae-pmd-walk-vs-pmd_populate-smp-race.patch
+bugfix/x86/thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
bugfix/all/hugetlb-fix-resv_map-leak-in-error-path.patch
bugfix/all/mm-fix-vma_resv_map-null-pointer.patch
bugfix/all/hugepages-fix-use-after-free-bug-in-quota-handling.patch
More information about the Kernel-svn-changes
mailing list