[linux] 01/02: [amd64] efi: Build our own page table structure

debian-kernel at lists.debian.org debian-kernel at lists.debian.org
Tue Feb 16 02:54:49 UTC 2016


This is an automated email from the git hooks/post-receive script.

benh pushed a commit to branch sid
in repository linux.

commit 5b76884dc265e8166262bc834bdad7af13f57881
Author: Ben Hutchings <ben at decadent.org.uk>
Date:   Mon Feb 15 22:56:18 2016 +0000

    [amd64] efi: Build our own page table structure
    
    This avoids adding W+X pages to the default page table, which is not
    only bad for security but also now triggers a warning on boot.
---
 debian/changelog                                   |   2 +
 ...6-efi-build-our-own-page-table-structures.patch | 319 +++++++++++++++++++++
 ...st-page-table-switching-code-into-efi_cal.patch | 215 ++++++++++++++
 ...-ram-into-the-identity-page-table-for-mix.patch |  71 +++++
 ...up-separate-efi-page-tables-in-kexec-path.patch |  83 ++++++
 ...-align-the-_end-symbol-to-avoid-pfn-conve.patch |  53 ++++
 ...ensure-cpa-pfn-only-contains-page-frame-n.patch | 144 ++++++++++
 debian/patches/series                              |   6 +
 8 files changed, 893 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index dd26cde..25f8739 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -31,6 +31,8 @@ linux (4.4.1-1) UNRELEASED; urgency=medium
   * udeb: Move most USB wireless drivers from nic-usb-modules to
     nic-wireless-modules
   * udeb: Really add virtio_input to virtio-modules (not input-modules)
+  * [x86] Fix issues resulting in W+X pages:
+    - [amd64] efi: Build our own page table structure
 
   [ Roger Shimizu ]
   * Enable TTY_PRINTK as module (Closes: #814540).
diff --git a/debian/patches/bugfix/x86/x86-efi-build-our-own-page-table-structures.patch b/debian/patches/bugfix/x86/x86-efi-build-our-own-page-table-structures.patch
new file mode 100644
index 0000000..2c61994
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-efi-build-our-own-page-table-structures.patch
@@ -0,0 +1,319 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 27 Nov 2015 21:09:34 +0000
+Subject: [5/5] x86/efi: Build our own page table structures
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=67a9108ed4313b85a9c53406d80dc1ae3f8c3e36
+
+With commit e1a58320a38d ("x86/mm: Warn on W^X mappings") all
+users booting on 64-bit UEFI machines see the following warning,
+
+  ------------[ cut here ]------------
+  WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
+  x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
+  ...
+  x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
+  ...
+
+This is caused by mapping EFI regions with RWX permissions.
+There isn't much we can do to restrict the permissions for these
+regions due to the way the firmware toolchains mix code and
+data, but we can at least isolate these mappings so that they do
+not appear in the regular kernel page tables.
+
+In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
+mapping") we started using 'trampoline_pgd' to map the EFI
+regions because there was an existing identity mapping there
+which we use during the SetVirtualAddressMap() call and for
+broken firmware that accesses those addresses.
+
+But 'trampoline_pgd' shares some PGD entries with
+'swapper_pg_dir' and does not provide the isolation we require.
+Notably the virtual address for __START_KERNEL_map and
+MODULES_START are mapped by the same PGD entry so we need to be
+more careful when copying changes over in
+efi_sync_low_kernel_mappings().
+
+This patch doesn't go the full mile, we still want to share some
+PGD entries with 'swapper_pg_dir'. Having completely separate
+page tables brings its own issues such as synchronising new
+mappings after memory hotplug and module loading. Sharing also
+keeps memory usage down.
+
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Reviewed-by: Borislav Petkov <bp at suse.de>
+Acked-by: Borislav Petkov <bp at suse.de>
+Cc: Andrew Morton <akpm at linux-foundation.org>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Andy Lutomirski <luto at kernel.org>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Jones <davej at codemonkey.org.uk>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
+Cc: Stephen Smalley <sds at tycho.nsa.gov>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: Toshi Kani <toshi.kani at hp.com>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/include/asm/efi.h     |  1 +
+ arch/x86/platform/efi/efi.c    | 39 ++++++-----------
+ arch/x86/platform/efi/efi_32.c |  5 +++
+ arch/x86/platform/efi/efi_64.c | 97 +++++++++++++++++++++++++++++++++++-------
+ 4 files changed, 102 insertions(+), 40 deletions(-)
+
+diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
+index 347eeacb06a8..8fd9e637629a 100644
+--- a/arch/x86/include/asm/efi.h
++++ b/arch/x86/include/asm/efi.h
+@@ -136,6 +136,7 @@ extern void __init efi_memory_uc(u64 addr, unsigned long size);
+ extern void __init efi_map_region(efi_memory_desc_t *md);
+ extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
+ extern void efi_sync_low_kernel_mappings(void);
++extern int __init efi_alloc_page_tables(void);
+ extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
+ extern void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned num_pages);
+ extern void __init old_map_region(efi_memory_desc_t *md);
+diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
+index ad285404ea7f..3c1f3cd7b2ba 100644
+--- a/arch/x86/platform/efi/efi.c
++++ b/arch/x86/platform/efi/efi.c
+@@ -869,7 +869,7 @@ static void __init kexec_enter_virtual_mode(void)
+  * This function will switch the EFI runtime services to virtual mode.
+  * Essentially, we look through the EFI memmap and map every region that
+  * has the runtime attribute bit set in its memory descriptor into the
+- * ->trampoline_pgd page table using a top-down VA allocation scheme.
++ * efi_pgd page table.
+  *
+  * The old method which used to update that memory descriptor with the
+  * virtual address obtained from ioremap() is still supported when the
+@@ -879,8 +879,8 @@ static void __init kexec_enter_virtual_mode(void)
+  *
+  * The new method does a pagetable switch in a preemption-safe manner
+  * so that we're in a different address space when calling a runtime
+- * function. For function arguments passing we do copy the PGDs of the
+- * kernel page table into ->trampoline_pgd prior to each call.
++ * function. For function arguments passing we do copy the PUDs of the
++ * kernel page table into efi_pgd prior to each call.
+  *
+  * Specially for kexec boot, efi runtime maps in previous kernel should
+  * be passed in via setup_data. In that case runtime ranges will be mapped
+@@ -895,6 +895,12 @@ static void __init __efi_enter_virtual_mode(void)
+ 
+ 	efi.systab = NULL;
+ 
++	if (efi_alloc_page_tables()) {
++		pr_err("Failed to allocate EFI page tables\n");
++		clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
++		return;
++	}
++
+ 	efi_merge_regions();
+ 	new_memmap = efi_map_regions(&count, &pg_shift);
+ 	if (!new_memmap) {
+@@ -954,28 +960,11 @@ static void __init __efi_enter_virtual_mode(void)
+ 	efi_runtime_mkexec();
+ 
+ 	/*
+-	 * We mapped the descriptor array into the EFI pagetable above but we're
+-	 * not unmapping it here. Here's why:
+-	 *
+-	 * We're copying select PGDs from the kernel page table to the EFI page
+-	 * table and when we do so and make changes to those PGDs like unmapping
+-	 * stuff from them, those changes appear in the kernel page table and we
+-	 * go boom.
+-	 *
+-	 * From setup_real_mode():
+-	 *
+-	 * ...
+-	 * trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+-	 *
+-	 * In this particular case, our allocation is in PGD 0 of the EFI page
+-	 * table but we've copied that PGD from PGD[272] of the EFI page table:
+-	 *
+-	 *	pgd_index(__PAGE_OFFSET = 0xffff880000000000) = 272
+-	 *
+-	 * where the direct memory mapping in kernel space is.
+-	 *
+-	 * new_memmap's VA comes from that direct mapping and thus clearing it,
+-	 * it would get cleared in the kernel page table too.
++	 * We mapped the descriptor array into the EFI pagetable above
++	 * but we're not unmapping it here because if we're running in
++	 * EFI mixed mode we need all of memory to be accessible when
++	 * we pass parameters to the EFI runtime services in the
++	 * thunking code.
+ 	 *
+ 	 * efi_cleanup_page_tables(__pa(new_memmap), 1 << pg_shift);
+ 	 */
+diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
+index ed5b67338294..58d669bc8250 100644
+--- a/arch/x86/platform/efi/efi_32.c
++++ b/arch/x86/platform/efi/efi_32.c
+@@ -38,6 +38,11 @@
+  * say 0 - 3G.
+  */
+ 
++int __init efi_alloc_page_tables(void)
++{
++	return 0;
++}
++
+ void efi_sync_low_kernel_mappings(void) {}
+ void __init efi_dump_pagetable(void) {}
+ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
+index b19cdac959b2..4897f518760f 100644
+--- a/arch/x86/platform/efi/efi_64.c
++++ b/arch/x86/platform/efi/efi_64.c
+@@ -40,6 +40,7 @@
+ #include <asm/fixmap.h>
+ #include <asm/realmode.h>
+ #include <asm/time.h>
++#include <asm/pgalloc.h>
+ 
+ /*
+  * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+@@ -121,22 +122,92 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
+ 	early_code_mapping_set_exec(0);
+ }
+ 
++static pgd_t *efi_pgd;
++
++/*
++ * We need our own copy of the higher levels of the page tables
++ * because we want to avoid inserting EFI region mappings (EFI_VA_END
++ * to EFI_VA_START) into the standard kernel page tables. Everything
++ * else can be shared, see efi_sync_low_kernel_mappings().
++ */
++int __init efi_alloc_page_tables(void)
++{
++	pgd_t *pgd;
++	pud_t *pud;
++	gfp_t gfp_mask;
++
++	if (efi_enabled(EFI_OLD_MEMMAP))
++		return 0;
++
++	gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO;
++	efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
++	if (!efi_pgd)
++		return -ENOMEM;
++
++	pgd = efi_pgd + pgd_index(EFI_VA_END);
++
++	pud = pud_alloc_one(NULL, 0);
++	if (!pud) {
++		free_page((unsigned long)efi_pgd);
++		return -ENOMEM;
++	}
++
++	pgd_populate(NULL, pgd, pud);
++
++	return 0;
++}
++
+ /*
+  * Add low kernel mappings for passing arguments to EFI functions.
+  */
+ void efi_sync_low_kernel_mappings(void)
+ {
+-	unsigned num_pgds;
+-	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
++	unsigned num_entries;
++	pgd_t *pgd_k, *pgd_efi;
++	pud_t *pud_k, *pud_efi;
+ 
+ 	if (efi_enabled(EFI_OLD_MEMMAP))
+ 		return;
+ 
+-	num_pgds = pgd_index(MODULES_END - 1) - pgd_index(PAGE_OFFSET);
++	/*
++	 * We can share all PGD entries apart from the one entry that
++	 * covers the EFI runtime mapping space.
++	 *
++	 * Make sure the EFI runtime region mappings are guaranteed to
++	 * only span a single PGD entry and that the entry also maps
++	 * other important kernel regions.
++	 */
++	BUILD_BUG_ON(pgd_index(EFI_VA_END) != pgd_index(MODULES_END));
++	BUILD_BUG_ON((EFI_VA_START & PGDIR_MASK) !=
++			(EFI_VA_END & PGDIR_MASK));
++
++	pgd_efi = efi_pgd + pgd_index(PAGE_OFFSET);
++	pgd_k = pgd_offset_k(PAGE_OFFSET);
++
++	num_entries = pgd_index(EFI_VA_END) - pgd_index(PAGE_OFFSET);
++	memcpy(pgd_efi, pgd_k, sizeof(pgd_t) * num_entries);
+ 
+-	memcpy(pgd + pgd_index(PAGE_OFFSET),
+-		init_mm.pgd + pgd_index(PAGE_OFFSET),
+-		sizeof(pgd_t) * num_pgds);
++	/*
++	 * We share all the PUD entries apart from those that map the
++	 * EFI regions. Copy around them.
++	 */
++	BUILD_BUG_ON((EFI_VA_START & ~PUD_MASK) != 0);
++	BUILD_BUG_ON((EFI_VA_END & ~PUD_MASK) != 0);
++
++	pgd_efi = efi_pgd + pgd_index(EFI_VA_END);
++	pud_efi = pud_offset(pgd_efi, 0);
++
++	pgd_k = pgd_offset_k(EFI_VA_END);
++	pud_k = pud_offset(pgd_k, 0);
++
++	num_entries = pud_index(EFI_VA_END);
++	memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
++
++	pud_efi = pud_offset(pgd_efi, EFI_VA_START);
++	pud_k = pud_offset(pgd_k, EFI_VA_START);
++
++	num_entries = PTRS_PER_PUD - pud_index(EFI_VA_START);
++	memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
+ }
+ 
+ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+@@ -150,8 +221,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ 	if (efi_enabled(EFI_OLD_MEMMAP))
+ 		return 0;
+ 
+-	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
+-	pgd = __va(efi_scratch.efi_pgt);
++	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
++	pgd = efi_pgd;
+ 
+ 	/*
+ 	 * It can happen that the physical address of new_memmap lands in memory
+@@ -216,16 +287,14 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ 
+ void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ {
+-	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+-
+-	kernel_unmap_pages_in_pgd(pgd, pa_memmap, num_pages);
++	kernel_unmap_pages_in_pgd(efi_pgd, pa_memmap, num_pages);
+ }
+ 
+ static void __init __map_region(efi_memory_desc_t *md, u64 va)
+ {
+-	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+ 	unsigned long flags = 0;
+ 	unsigned long pfn;
++	pgd_t *pgd = efi_pgd;
+ 
+ 	if (!(md->attribute & EFI_MEMORY_WB))
+ 		flags |= _PAGE_PCD;
+@@ -334,9 +403,7 @@ void __init efi_runtime_mkexec(void)
+ void __init efi_dump_pagetable(void)
+ {
+ #ifdef CONFIG_EFI_PGT_DUMP
+-	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+-
+-	ptdump_walk_pgd_level(NULL, pgd);
++	ptdump_walk_pgd_level(NULL, efi_pgd);
+ #endif
+ }
+ 
diff --git a/debian/patches/bugfix/x86/x86-efi-hoist-page-table-switching-code-into-efi_cal.patch b/debian/patches/bugfix/x86/x86-efi-hoist-page-table-switching-code-into-efi_cal.patch
new file mode 100644
index 0000000..4b3230e
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-efi-hoist-page-table-switching-code-into-efi_cal.patch
@@ -0,0 +1,215 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 27 Nov 2015 21:09:33 +0000
+Subject: [4/5] x86/efi: Hoist page table switching code into efi_call_virt()
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=c9f2a9a65e4855b74d92cdad688f6ee4a1a323ff
+
+This change is a prerequisite for pending patches that switch to
+a dedicated EFI page table, instead of using 'trampoline_pgd'
+which shares PGD entries with 'swapper_pg_dir'. The pending
+patches make it impossible to dereference the runtime service
+function pointer without first switching %cr3.
+
+It's true that we now have duplicated switching code in
+efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are
+sacrificing code duplication for a little more clarity and the
+ease of writing the page table switching code in C instead of
+asm.
+
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Reviewed-by: Borislav Petkov <bp at suse.de>
+Acked-by: Borislav Petkov <bp at suse.de>
+Cc: Andrew Morton <akpm at linux-foundation.org>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Andy Lutomirski <luto at kernel.org>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Jones <davej at codemonkey.org.uk>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
+Cc: Stephen Smalley <sds at tycho.nsa.gov>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: Toshi Kani <toshi.kani at hp.com>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1448658575-17029-5-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/include/asm/efi.h          | 25 +++++++++++++++++++++
+ arch/x86/platform/efi/efi_64.c      | 24 ++++++++++-----------
+ arch/x86/platform/efi/efi_stub_64.S | 43 -------------------------------------
+ 3 files changed, 36 insertions(+), 56 deletions(-)
+
+diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
+index 0010c78c4998..347eeacb06a8 100644
+--- a/arch/x86/include/asm/efi.h
++++ b/arch/x86/include/asm/efi.h
+@@ -3,6 +3,7 @@
+ 
+ #include <asm/fpu/api.h>
+ #include <asm/pgtable.h>
++#include <asm/tlb.h>
+ 
+ /*
+  * We map the EFI regions needed for runtime services non-contiguously,
+@@ -64,6 +65,17 @@ extern u64 asmlinkage efi_call(void *fp, ...);
+ 
+ #define efi_call_phys(f, args...)		efi_call((f), args)
+ 
++/*
++ * Scratch space used for switching the pagetable in the EFI stub
++ */
++struct efi_scratch {
++	u64	r15;
++	u64	prev_cr3;
++	pgd_t	*efi_pgt;
++	bool	use_pgd;
++	u64	phys_stack;
++} __packed;
++
+ #define efi_call_virt(f, ...)						\
+ ({									\
+ 	efi_status_t __s;						\
+@@ -71,7 +83,20 @@ extern u64 asmlinkage efi_call(void *fp, ...);
+ 	efi_sync_low_kernel_mappings();					\
+ 	preempt_disable();						\
+ 	__kernel_fpu_begin();						\
++									\
++	if (efi_scratch.use_pgd) {					\
++		efi_scratch.prev_cr3 = read_cr3();			\
++		write_cr3((unsigned long)efi_scratch.efi_pgt);		\
++		__flush_tlb_all();					\
++	}								\
++									\
+ 	__s = efi_call((void *)efi.systab->runtime->f, __VA_ARGS__);	\
++									\
++	if (efi_scratch.use_pgd) {					\
++		write_cr3(efi_scratch.prev_cr3);			\
++		__flush_tlb_all();					\
++	}								\
++									\
+ 	__kernel_fpu_end();						\
+ 	preempt_enable();						\
+ 	__s;								\
+diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
+index 102976dda8c4..b19cdac959b2 100644
+--- a/arch/x86/platform/efi/efi_64.c
++++ b/arch/x86/platform/efi/efi_64.c
+@@ -47,16 +47,7 @@
+  */
+ static u64 efi_va = EFI_VA_START;
+ 
+-/*
+- * Scratch space used for switching the pagetable in the EFI stub
+- */
+-struct efi_scratch {
+-	u64 r15;
+-	u64 prev_cr3;
+-	pgd_t *efi_pgt;
+-	bool use_pgd;
+-	u64 phys_stack;
+-} __packed;
++struct efi_scratch efi_scratch;
+ 
+ static void __init early_code_mapping_set_exec(int executable)
+ {
+@@ -83,8 +74,11 @@ pgd_t * __init efi_call_phys_prolog(void)
+ 	int pgd;
+ 	int n_pgds;
+ 
+-	if (!efi_enabled(EFI_OLD_MEMMAP))
+-		return NULL;
++	if (!efi_enabled(EFI_OLD_MEMMAP)) {
++		save_pgd = (pgd_t *)read_cr3();
++		write_cr3((unsigned long)efi_scratch.efi_pgt);
++		goto out;
++	}
+ 
+ 	early_code_mapping_set_exec(1);
+ 
+@@ -96,6 +90,7 @@ pgd_t * __init efi_call_phys_prolog(void)
+ 		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
+ 		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
+ 	}
++out:
+ 	__flush_tlb_all();
+ 
+ 	return save_pgd;
+@@ -109,8 +104,11 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
+ 	int pgd_idx;
+ 	int nr_pgds;
+ 
+-	if (!save_pgd)
++	if (!efi_enabled(EFI_OLD_MEMMAP)) {
++		write_cr3((unsigned long)save_pgd);
++		__flush_tlb_all();
+ 		return;
++	}
+ 
+ 	nr_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
+ 
+diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
+index 86d0f9e08dd9..32020cb8bb08 100644
+--- a/arch/x86/platform/efi/efi_stub_64.S
++++ b/arch/x86/platform/efi/efi_stub_64.S
+@@ -38,41 +38,6 @@
+ 	mov %rsi, %cr0;			\
+ 	mov (%rsp), %rsp
+ 
+-	/* stolen from gcc */
+-	.macro FLUSH_TLB_ALL
+-	movq %r15, efi_scratch(%rip)
+-	movq %r14, efi_scratch+8(%rip)
+-	movq %cr4, %r15
+-	movq %r15, %r14
+-	andb $0x7f, %r14b
+-	movq %r14, %cr4
+-	movq %r15, %cr4
+-	movq efi_scratch+8(%rip), %r14
+-	movq efi_scratch(%rip), %r15
+-	.endm
+-
+-	.macro SWITCH_PGT
+-	cmpb $0, efi_scratch+24(%rip)
+-	je 1f
+-	movq %r15, efi_scratch(%rip)		# r15
+-	# save previous CR3
+-	movq %cr3, %r15
+-	movq %r15, efi_scratch+8(%rip)		# prev_cr3
+-	movq efi_scratch+16(%rip), %r15		# EFI pgt
+-	movq %r15, %cr3
+-	1:
+-	.endm
+-
+-	.macro RESTORE_PGT
+-	cmpb $0, efi_scratch+24(%rip)
+-	je 2f
+-	movq efi_scratch+8(%rip), %r15
+-	movq %r15, %cr3
+-	movq efi_scratch(%rip), %r15
+-	FLUSH_TLB_ALL
+-	2:
+-	.endm
+-
+ ENTRY(efi_call)
+ 	SAVE_XMM
+ 	mov (%rsp), %rax
+@@ -83,16 +48,8 @@ ENTRY(efi_call)
+ 	mov %r8, %r9
+ 	mov %rcx, %r8
+ 	mov %rsi, %rcx
+-	SWITCH_PGT
+ 	call *%rdi
+-	RESTORE_PGT
+ 	addq $48, %rsp
+ 	RESTORE_XMM
+ 	ret
+ ENDPROC(efi_call)
+-
+-	.data
+-ENTRY(efi_scratch)
+-	.fill 3,8,0
+-	.byte 0
+-	.quad 0
diff --git a/debian/patches/bugfix/x86/x86-efi-map-ram-into-the-identity-page-table-for-mix.patch b/debian/patches/bugfix/x86/x86-efi-map-ram-into-the-identity-page-table-for-mix.patch
new file mode 100644
index 0000000..8d2bddc
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-efi-map-ram-into-the-identity-page-table-for-mix.patch
@@ -0,0 +1,71 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 27 Nov 2015 21:09:32 +0000
+Subject: [3/5] x86/efi: Map RAM into the identity page table for mixed mode
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=b61a76f8850d2979550abc42d7e09154ebb8d785
+
+We are relying on the pre-existing mappings in 'trampoline_pgd'
+when accessing function arguments in the EFI mixed mode thunking
+code.
+
+Instead let's map memory explicitly so that things will continue
+to work when we move to a separate page table in the future.
+
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Reviewed-by: Borislav Petkov <bp at suse.de>
+Acked-by: Borislav Petkov <bp at suse.de>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: Toshi Kani <toshi.kani at hp.com>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1448658575-17029-4-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/platform/efi/efi_64.c | 20 ++++++++++++++++++++
+ 1 file changed, 20 insertions(+)
+
+diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
+index 5aa186db59e3..102976dda8c4 100644
+--- a/arch/x86/platform/efi/efi_64.c
++++ b/arch/x86/platform/efi/efi_64.c
+@@ -144,6 +144,7 @@ void efi_sync_low_kernel_mappings(void)
+ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ {
+ 	unsigned long pfn, text;
++	efi_memory_desc_t *md;
+ 	struct page *page;
+ 	unsigned npages;
+ 	pgd_t *pgd;
+@@ -177,6 +178,25 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ 	if (!IS_ENABLED(CONFIG_EFI_MIXED))
+ 		return 0;
+ 
++	/*
++	 * Map all of RAM so that we can access arguments in the 1:1
++	 * mapping when making EFI runtime calls.
++	 */
++	for_each_efi_memory_desc(&memmap, md) {
++		if (md->type != EFI_CONVENTIONAL_MEMORY &&
++		    md->type != EFI_LOADER_DATA &&
++		    md->type != EFI_LOADER_CODE)
++			continue;
++
++		pfn = md->phys_addr >> PAGE_SHIFT;
++		npages = md->num_pages;
++
++		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages, 0)) {
++			pr_err("Failed to map 1:1 memory\n");
++			return 1;
++		}
++	}
++
+ 	page = alloc_page(GFP_KERNEL|__GFP_DMA32);
+ 	if (!page)
+ 		panic("Unable to allocate EFI runtime stack < 4GB\n");
diff --git a/debian/patches/bugfix/x86/x86-efi-setup-separate-efi-page-tables-in-kexec-path.patch b/debian/patches/bugfix/x86/x86-efi-setup-separate-efi-page-tables-in-kexec-path.patch
new file mode 100644
index 0000000..b95ee8f
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-efi-setup-separate-efi-page-tables-in-kexec-path.patch
@@ -0,0 +1,83 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Thu, 21 Jan 2016 14:11:59 +0000
+Subject: x86/efi: Setup separate EFI page tables in kexec paths
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=753b11ef8e92a1c1bbe97f2a5ec14bdd1ef2e6fe
+
+The switch to using a new dedicated page table for EFI runtime
+calls in commit commit 67a9108ed431 ("x86/efi: Build our own
+page table structures") failed to take into account changes
+required for the kexec code paths, which are unfortunately
+duplicated in the EFI code.
+
+Call the allocation and setup functions in
+kexec_enter_virtual_mode() just like we do for
+__efi_enter_virtual_mode() to avoid hitting NULL-pointer
+dereferences when making EFI runtime calls.
+
+At the very least, the call to efi_setup_page_tables() should
+have existed for kexec before the following commit:
+
+  67a9108ed431 ("x86/efi: Build our own page table structures")
+
+Things just magically worked because we were actually using
+the kernel's page tables that contained the required mappings.
+
+Reported-by: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
+Tested-by: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Young <dyoung at redhat.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Link: http://lkml.kernel.org/r/1453385519-11477-1-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/platform/efi/efi.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
+index 3c1f3cd7b2ba..bdd9477f937c 100644
+--- a/arch/x86/platform/efi/efi.c
++++ b/arch/x86/platform/efi/efi.c
+@@ -815,6 +815,7 @@ static void __init kexec_enter_virtual_mode(void)
+ {
+ #ifdef CONFIG_KEXEC_CORE
+ 	efi_memory_desc_t *md;
++	unsigned int num_pages;
+ 	void *p;
+ 
+ 	efi.systab = NULL;
+@@ -829,6 +830,12 @@ static void __init kexec_enter_virtual_mode(void)
+ 		return;
+ 	}
+ 
++	if (efi_alloc_page_tables()) {
++		pr_err("Failed to allocate EFI page tables\n");
++		clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
++		return;
++	}
++
+ 	/*
+ 	* Map efi regions which were passed via setup_data. The virt_addr is a
+ 	* fixed addr which was used in first kernel of a kexec boot.
+@@ -843,6 +850,14 @@ static void __init kexec_enter_virtual_mode(void)
+ 
+ 	BUG_ON(!efi.systab);
+ 
++	num_pages = ALIGN(memmap.nr_map * memmap.desc_size, PAGE_SIZE);
++	num_pages >>= PAGE_SHIFT;
++
++	if (efi_setup_page_tables(memmap.phys_map, num_pages)) {
++		clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
++		return;
++	}
++
+ 	efi_sync_low_kernel_mappings();
+ 
+ 	/*
diff --git a/debian/patches/bugfix/x86/x86-mm-page-align-the-_end-symbol-to-avoid-pfn-conve.patch b/debian/patches/bugfix/x86/x86-mm-page-align-the-_end-symbol-to-avoid-pfn-conve.patch
new file mode 100644
index 0000000..0fb24e0
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-mm-page-align-the-_end-symbol-to-avoid-pfn-conve.patch
@@ -0,0 +1,53 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 27 Nov 2015 21:09:30 +0000
+Subject: [1/5] x86/mm: Page align the '_end' symbol to avoid pfn conversion
+ bugs
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=21cdb6b568435738cc0b303b2b3b82742396310c
+
+Ingo noted that if we can guarantee _end is aligned to PAGE_SIZE
+we can automatically avoid bugs along the lines of,
+
+	size = _end - _text >> PAGE_SHIFT
+
+which is missing a call to PFN_ALIGN(). The EFI mixed mode
+contains this bug, for example.
+
+_text is already aligned to PAGE_SIZE through the use of
+LOAD_PHYSICAL_ADDR, and the BSS and BRK sections are explicitly
+aligned in the linker script, so it makes sense to align _end to
+match.
+
+Reported-by: Ingo Molnar <mingo at kernel.org>
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Acked-by: Borislav Petkov <bp at suse.de>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Hansen <dave.hansen at intel.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: Toshi Kani <toshi.kani at hp.com>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1448658575-17029-2-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/kernel/vmlinux.lds.S | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
+index 74e4bf11f562..4f1994257a18 100644
+--- a/arch/x86/kernel/vmlinux.lds.S
++++ b/arch/x86/kernel/vmlinux.lds.S
+@@ -325,6 +325,7 @@ SECTIONS
+ 		__brk_limit = .;
+ 	}
+ 
++	. = ALIGN(PAGE_SIZE);
+ 	_end = .;
+ 
+         STABS_DEBUG
diff --git a/debian/patches/bugfix/x86/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-n.patch b/debian/patches/bugfix/x86/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-n.patch
new file mode 100644
index 0000000..b7db9a3
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-n.patch
@@ -0,0 +1,144 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 27 Nov 2015 21:09:31 +0000
+Subject: [2/5] x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
+Origin: https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit?id=edc3b9129cecd0f0857112136f5b8b1bc1d45918
+
+The x86 pageattr code is confused about the data that is stored
+in cpa->pfn, sometimes it's treated as a page frame number,
+sometimes it's treated as an unshifted physical address, and in
+one place it's treated as a pte.
+
+The result of this is that the mapping functions do not map the
+intended physical address.
+
+This isn't a problem in practice because most of the addresses
+we're mapping in the EFI code paths are already mapped in
+'trampoline_pgd' and so the pageattr mapping functions don't
+actually do anything in this case. But when we move to using a
+separate page table for the EFI runtime this will be an issue.
+
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Reviewed-by: Borislav Petkov <bp at suse.de>
+Acked-by: Borislav Petkov <bp at suse.de>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Dave Hansen <dave.hansen at intel.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: Toshi Kani <toshi.kani at hp.com>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codeblueprint.co.uk
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/mm/pageattr.c         | 17 ++++++-----------
+ arch/x86/platform/efi/efi_64.c | 16 ++++++++++------
+ 2 files changed, 16 insertions(+), 17 deletions(-)
+
+diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
+index a3137a4feed1..c70e42014101 100644
+--- a/arch/x86/mm/pageattr.c
++++ b/arch/x86/mm/pageattr.c
+@@ -905,15 +905,10 @@ static void populate_pte(struct cpa_data *cpa,
+ 	pte = pte_offset_kernel(pmd, start);
+ 
+ 	while (num_pages-- && start < end) {
+-
+-		/* deal with the NX bit */
+-		if (!(pgprot_val(pgprot) & _PAGE_NX))
+-			cpa->pfn &= ~_PAGE_NX;
+-
+-		set_pte(pte, pfn_pte(cpa->pfn >> PAGE_SHIFT, pgprot));
++		set_pte(pte, pfn_pte(cpa->pfn, pgprot));
+ 
+ 		start	 += PAGE_SIZE;
+-		cpa->pfn += PAGE_SIZE;
++		cpa->pfn++;
+ 		pte++;
+ 	}
+ }
+@@ -969,11 +964,11 @@ static int populate_pmd(struct cpa_data *cpa,
+ 
+ 		pmd = pmd_offset(pud, start);
+ 
+-		set_pmd(pmd, __pmd(cpa->pfn | _PAGE_PSE |
++		set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
+ 				   massage_pgprot(pmd_pgprot)));
+ 
+ 		start	  += PMD_SIZE;
+-		cpa->pfn  += PMD_SIZE;
++		cpa->pfn  += PMD_SIZE >> PAGE_SHIFT;
+ 		cur_pages += PMD_SIZE >> PAGE_SHIFT;
+ 	}
+ 
+@@ -1042,11 +1037,11 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
+ 	 * Map everything starting from the Gb boundary, possibly with 1G pages
+ 	 */
+ 	while (end - start >= PUD_SIZE) {
+-		set_pud(pud, __pud(cpa->pfn | _PAGE_PSE |
++		set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
+ 				   massage_pgprot(pud_pgprot)));
+ 
+ 		start	  += PUD_SIZE;
+-		cpa->pfn  += PUD_SIZE;
++		cpa->pfn  += PUD_SIZE >> PAGE_SHIFT;
+ 		cur_pages += PUD_SIZE >> PAGE_SHIFT;
+ 		pud++;
+ 	}
+diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
+index a0ac0f9c307f..5aa186db59e3 100644
+--- a/arch/x86/platform/efi/efi_64.c
++++ b/arch/x86/platform/efi/efi_64.c
+@@ -143,7 +143,7 @@ void efi_sync_low_kernel_mappings(void)
+ 
+ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ {
+-	unsigned long text;
++	unsigned long pfn, text;
+ 	struct page *page;
+ 	unsigned npages;
+ 	pgd_t *pgd;
+@@ -160,7 +160,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ 	 * and ident-map those pages containing the map before calling
+ 	 * phys_efi_set_virtual_address_map().
+ 	 */
+-	if (kernel_map_pages_in_pgd(pgd, pa_memmap, pa_memmap, num_pages, _PAGE_NX)) {
++	pfn = pa_memmap >> PAGE_SHIFT;
++	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX)) {
+ 		pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
+ 		return 1;
+ 	}
+@@ -185,8 +186,9 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ 
+ 	npages = (_end - _text) >> PAGE_SHIFT;
+ 	text = __pa(_text);
++	pfn = text >> PAGE_SHIFT;
+ 
+-	if (kernel_map_pages_in_pgd(pgd, text >> PAGE_SHIFT, text, npages, 0)) {
++	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, 0)) {
+ 		pr_err("Failed to map kernel text 1:1\n");
+ 		return 1;
+ 	}
+@@ -204,12 +206,14 @@ void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned num_pages)
+ static void __init __map_region(efi_memory_desc_t *md, u64 va)
+ {
+ 	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+-	unsigned long pf = 0;
++	unsigned long flags = 0;
++	unsigned long pfn;
+ 
+ 	if (!(md->attribute & EFI_MEMORY_WB))
+-		pf |= _PAGE_PCD;
++		flags |= _PAGE_PCD;
+ 
+-	if (kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
++	pfn = md->phys_addr >> PAGE_SHIFT;
++	if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
+ 		pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
+ 			   md->phys_addr, va);
+ }
diff --git a/debian/patches/series b/debian/patches/series
index 1368076..eaf3ec4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -124,3 +124,9 @@ bugfix/all/revert-workqueue-make-sure-delayed-work-run-in-local-cpu.patch
 bugfix/all/af_unix-don-t-set-err-in-unix_stream_read_generic-unless-there-was-an-error.patch
 bugfix/all/bpf-fix-branch-offset-adjustment-on-backjumps-after-.patch
 bugfix/all/alsa-usb-audio-avoid-freeing-umidi-object-twice.patch
+bugfix/x86/x86-mm-page-align-the-_end-symbol-to-avoid-pfn-conve.patch
+bugfix/x86/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-n.patch
+bugfix/x86/x86-efi-map-ram-into-the-identity-page-table-for-mix.patch
+bugfix/x86/x86-efi-hoist-page-table-switching-code-into-efi_cal.patch
+bugfix/x86/x86-efi-build-our-own-page-table-structures.patch
+bugfix/x86/x86-efi-setup-separate-efi-page-tables-in-kexec-path.patch

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git



More information about the Kernel-svn-changes mailing list