[linux] 01/01: x86/mm: Add barriers and document switch_mm()-vs-flush synchronization (CVE-2016-2069)

Fri Jan 29 03:44:01 UTC 2016

This is an automated email from the git hooks/post-receive script.

benh pushed a commit to branch wheezy-security
in repository linux.

commit 61aa0802efa4c9250199c5e3922851a5e0dacd90
Author: Ben Hutchings <ben at decadent.org.uk>
Date:   Fri Jan 29 03:27:03 2016 +0000

    x86/mm: Add barriers and document switch_mm()-vs-flush synchronization (CVE-2016-2069)
    
    Plus a follow-up fix to the comments.
---
 debian/changelog                                   |   3 +
 ...barriers-and-document-switch_mm-vs-flush-.patch | 138 +++++++++++++++++++++
 ...x86-mm-Improve-switch_mm-barrier-comments.patch |  62 +++++++++
 debian/patches/series                              |   2 +
 4 files changed, 205 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index 2ca0463..fa563eb 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -7,6 +7,9 @@ linux (3.2.73-2+deb7u3) UNRELEASED; urgency=medium
     (CVE-2015-8767)
   * tty: Fix unsafe ldisc reference via ioctl(TIOCGETD) (CVE-2016-0723)
   * fuse: break infinite loop in fuse_fill_write_pages() (CVE-2015-8785)
+  * [x86] mm: Add barriers and document switch_mm()-vs-flush synchronization
+    (CVE-2016-2069)
+  * [x86] mm: Improve switch_mm() barrier comments
 
   [ Salvatore Bonaccorso ]
   * unix: properly account for FDs passed over unix sockets (CVE-2013-4312)
diff --git a/debian/patches/bugfix/x86/x86-mm-Add-barriers-and-document-switch_mm-vs-flush-.patch b/debian/patches/bugfix/x86/x86-mm-Add-barriers-and-document-switch_mm-vs-flush-.patch
new file mode 100644
index 0000000..d2e3314
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-mm-Add-barriers-and-document-switch_mm-vs-flush-.patch
@@ -0,0 +1,138 @@
+From: Andy Lutomirski <luto at kernel.org>
+Date: Wed, 6 Jan 2016 12:21:01 -0800
+Subject: x86/mm: Add barriers and document switch_mm()-vs-flush
+ synchronization
+Origin: https://git.kernel.org/linus/71b3c126e61177eb693423f2e18a1914205b165e
+
+When switch_mm() activates a new PGD, it also sets a bit that
+tells other CPUs that the PGD is in use so that TLB flush IPIs
+will be sent.  In order for that to work correctly, the bit
+needs to be visible prior to loading the PGD and therefore
+starting to fill the local TLB.
+
+Document all the barriers that make this work correctly and add
+a couple that were missing.
+
+Signed-off-by: Andy Lutomirski <luto at kernel.org>
+Cc: Andrew Morton <akpm at linux-foundation.org>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Hansen <dave.hansen at linux.intel.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Rik van Riel <riel at redhat.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: linux-mm at kvack.org
+Cc: stable at vger.kernel.org
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+[bwh: Backported to 3.2:
+ - There's no flush_tlb_mm_range(), only flush_tlb_mm() which does not use
+   INVLPG
+ - Adjust context]
+---
+--- a/arch/x86/include/asm/mmu_context.h
++++ b/arch/x86/include/asm/mmu_context.h
+@@ -87,7 +87,32 @@ static inline void switch_mm(struct mm_s
+ #endif
+ 		cpumask_set_cpu(cpu, mm_cpumask(next));
+ 
+-		/* Re-load page tables */
++		/*
++		 * Re-load page tables.
++		 *
++		 * This logic has an ordering constraint:
++		 *
++		 *  CPU 0: Write to a PTE for 'next'
++		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
++		 *  CPU 1: set bit 1 in next's mm_cpumask
++		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
++		 *
++		 * We need to prevent an outcome in which CPU 1 observes
++		 * the new PTE value and CPU 0 observes bit 1 clear in
++		 * mm_cpumask.  (If that occurs, then the IPI will never
++		 * be sent, and CPU 0's TLB will contain a stale entry.)
++		 *
++		 * The bad outcome can occur if either CPU's load is
++		 * reordered before that CPU's store, so both CPUs much
++		 * execute full barriers to prevent this from happening.
++		 *
++		 * Thus, switch_mm needs a full barrier between the
++		 * store to mm_cpumask and any operation that could load
++		 * from next->pgd.  This barrier synchronizes with
++		 * remote TLB flushers.  Fortunately, load_cr3 is
++		 * serializing and thus acts as a full barrier.
++		 *
++		 */
+ 		load_cr3(next->pgd);
+ 
+ 		/* stop flush ipis for the previous mm */
+@@ -108,6 +133,10 @@ static inline void switch_mm(struct mm_s
+ 			/* We were in lazy tlb mode and leave_mm disabled
+ 			 * tlb flush IPI delivery. We must reload CR3
+ 			 * to make sure to use no freed page tables.
++			 *
++			 * As above, this is a barrier that forces
++			 * TLB repopulation to be ordered after the
++			 * store to mm_cpumask.
+ 			 */
+ 			load_cr3(next->pgd);
+ 			load_mm_ldt(next);
+--- a/arch/x86/mm/tlb.c
++++ b/arch/x86/mm/tlb.c
+@@ -278,7 +278,9 @@ void flush_tlb_current_task(void)
+ 
+ 	preempt_disable();
+ 
++	/* This is an implicit full barrier that synchronizes with switch_mm. */
+ 	local_flush_tlb();
++
+ 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
+ 		flush_tlb_others(mm_cpumask(mm), mm, TLB_FLUSH_ALL);
+ 	preempt_enable();
+@@ -289,10 +291,20 @@ void flush_tlb_mm(struct mm_struct *mm)
+ 	preempt_disable();
+ 
+ 	if (current->active_mm == mm) {
+-		if (current->mm)
++		if (current->mm) {
++			/*
++			 * This is an implicit full barrier (MOV to CR) that
++			 * synchronizes with switch_mm.
++			 */
+ 			local_flush_tlb();
+-		else
++		} else {
+ 			leave_mm(smp_processor_id());
++			/* Synchronize with switch_mm. */
++			smp_mb();
++		}
++	} else {
++		/* Synchronize with switch_mm. */
++		smp_mb();
+ 	}
+ 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
+ 		flush_tlb_others(mm_cpumask(mm), mm, TLB_FLUSH_ALL);
+@@ -307,10 +319,18 @@ void flush_tlb_page(struct vm_area_struc
+ 	preempt_disable();
+ 
+ 	if (current->active_mm == mm) {
+-		if (current->mm)
++		if (current->mm) {
++			/*
++			 * Implicit full barrier (INVLPG) that synchronizes
++			 * with switch_mm.
++			 */
+ 			__flush_tlb_one(va);
+-		else
++		} else {
+ 			leave_mm(smp_processor_id());
++
++			/* Synchronize with switch_mm. */
++			smp_mb();
++		}
+ 	}
+ 
+ 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
diff --git a/debian/patches/bugfix/x86/x86-mm-Improve-switch_mm-barrier-comments.patch b/debian/patches/bugfix/x86/x86-mm-Improve-switch_mm-barrier-comments.patch
new file mode 100644
index 0000000..dbf8a9c
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-mm-Improve-switch_mm-barrier-comments.patch
@@ -0,0 +1,62 @@
+From: Andy Lutomirski <luto at kernel.org>
+Date: Tue, 12 Jan 2016 12:47:40 -0800
+Subject: x86/mm: Improve switch_mm() barrier comments
+Origin: https://git.kernel.org/linus/4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b
+
+My previous comments were still a bit confusing and there was a
+typo. Fix it up.
+
+Reported-by: Peter Zijlstra <peterz at infradead.org>
+Signed-off-by: Andy Lutomirski <luto at kernel.org>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Hansen <dave.hansen at linux.intel.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Rik van Riel <riel at redhat.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: stable at vger.kernel.org
+Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
+Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/include/asm/mmu_context.h | 15 ++++++++-------
+ 1 file changed, 8 insertions(+), 7 deletions(-)
+
+--- a/arch/x86/include/asm/mmu_context.h
++++ b/arch/x86/include/asm/mmu_context.h
+@@ -103,14 +103,16 @@ static inline void switch_mm(struct mm_s
+ 		 * be sent, and CPU 0's TLB will contain a stale entry.)
+ 		 *
+ 		 * The bad outcome can occur if either CPU's load is
+-		 * reordered before that CPU's store, so both CPUs much
++		 * reordered before that CPU's store, so both CPUs must
+ 		 * execute full barriers to prevent this from happening.
+ 		 *
+ 		 * Thus, switch_mm needs a full barrier between the
+ 		 * store to mm_cpumask and any operation that could load
+-		 * from next->pgd.  This barrier synchronizes with
+-		 * remote TLB flushers.  Fortunately, load_cr3 is
+-		 * serializing and thus acts as a full barrier.
++		 * from next->pgd.  TLB fills are special and can happen
++		 * due to instruction fetches or for no reason at all,
++		 * and neither LOCK nor MFENCE orders them.
++		 * Fortunately, load_cr3() is serializing and gives the
++		 * ordering guarantee we need.
+ 		 *
+ 		 */
+ 		load_cr3(next->pgd);
+@@ -134,9 +136,8 @@ static inline void switch_mm(struct mm_s
+ 			 * tlb flush IPI delivery. We must reload CR3
+ 			 * to make sure to use no freed page tables.
+ 			 *
+-			 * As above, this is a barrier that forces
+-			 * TLB repopulation to be ordered after the
+-			 * store to mm_cpumask.
++			 * As above, load_cr3() is serializing and orders TLB
++			 * fills with respect to the mm_cpumask write.
+ 			 */
+ 			load_cr3(next->pgd);
+ 			load_mm_ldt(next);
diff --git a/debian/patches/series b/debian/patches/series
index bdbdab5..4fc84bd 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1196,3 +1196,5 @@ bugfix/all/tty-fix-unsafe-ldisc-reference-via-ioctl-tiocgetd.patch
 bugfix/all/unix-properly-account-for-FDs-passed-over-unix-socke.patch
 debian/unix-fix-abi-change-for-cve-2013-4312-fix.patch
 bugfix/all/fuse-break-infinite-loop-in-fuse_fill_write_pages.patch
+bugfix/x86/x86-mm-Add-barriers-and-document-switch_mm-vs-flush-.patch
+bugfix/x86/x86-mm-Improve-switch_mm-barrier-comments.patch

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git