[linux] 02/08: [amd64] Fix more regressions due to "efi: Build our own page table structure"
debian-kernel at lists.debian.org
debian-kernel at lists.debian.org
Tue Mar 15 02:04:02 UTC 2016
This is an automated email from the git hooks/post-receive script.
benh pushed a commit to branch sid
in repository linux.
commit 328762c833dc5274b168936d15222338c06fecbc
Author: Ben Hutchings <ben at decadent.org.uk>
Date: Mon Mar 14 23:45:22 2016 +0000
[amd64] Fix more regressions due to "efi: Build our own page table structure"
- efi: Fix boot crash by always mapping boot service regions into new EFI
page tables (Closes: #815125)
- mm/pat: Fix boot crash when 1GB pages are not supported by cpu
Although I don't have confirmation from Norbert Preining for #815125, it
does look like these fix all the remaining regressions.
---
debian/changelog | 5 +
...-boot-crash-by-always-mapping-boot-servic.patch | 197 +++++++++++++++++++++
...ot-crash-when-1gb-pages-are-not-supported.patch | 58 ++++++
debian/patches/series | 2 +
4 files changed, 262 insertions(+)
diff --git a/debian/changelog b/debian/changelog
index ebb6063..8626fbb 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -73,6 +73,11 @@ linux (4.4.5-1) UNRELEASED; urgency=medium
* [x86] drm/i915: Fix oops caused by fbdev initialization failure
* module: Fix ABI change in 4.4.5
* Revert "libata: Align ata_device's id on a cacheline" to avoid ABI change
+ * [amd64] Fix more regressions due to "efi: Build our own page table
+ structure":
+ - efi: Fix boot crash by always mapping boot service regions into new EFI
+ page tables (Closes: #815125)
+ - mm/pat: Fix boot crash when 1GB pages are not supported by cpu
[ Ian Campbell ]
* [arm64] Enable ARCH_HISI (Hisilicon) and the set of currently available
diff --git a/debian/patches/bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch b/debian/patches/bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
new file mode 100644
index 0000000..56e8485
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
@@ -0,0 +1,197 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Fri, 11 Mar 2016 11:19:23 +0000
+Subject: x86/efi: Fix boot crash by always mapping boot service regions into
+ new EFI page tables
+Origin: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit?id=452308de61056a539352a9306c46716d7af8a1f1
+Bug-Debian: https://bugs.debian.org/815125
+
+Some machines have EFI regions in page zero (physical address
+0x00000000) and historically that region has been added to the e820
+map via trim_bios_range(), and ultimately mapped into the kernel page
+tables. It was not mapped via efi_map_regions() as one would expect.
+
+Alexis reports that with the new separate EFI page tables some boot
+services regions, such as page zero, are not mapped. This triggers an
+oops during the SetVirtualAddressMap() runtime call.
+
+For the EFI boot services quirk on x86 we need to memblock_reserve()
+boot services regions until after SetVirtualAddressMap(). Doing that
+while respecting the ownership of regions that may have already been
+reserved by the kernel was the motivation behind this commit:
+
+ 7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")
+
+That patch was merged at a time when the EFI runtime virtual mappings
+were inserted into the kernel page tables as described above, and the
+trick of setting ->numpages (and hence the region size) to zero to
+track regions that should not be freed in efi_free_boot_services()
+meant that we never mapped those regions in efi_map_regions(). Instead
+we were relying solely on the existing kernel mappings.
+
+Now that we have separate page tables we need to make sure the EFI
+boot services regions are mapped correctly, even if someone else has
+already called memblock_reserve(). Instead of stashing a tag in
+->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
+generally makes no sense to mark a boot services region as required at
+runtime, it's pretty much guaranteed the firmware will not have
+already set this bit.
+
+For the record, the specific circumstances under which Alexis
+triggered this bug was that an EFI runtime driver on his machine was
+responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
+SetVirtualAddressMap().
+
+The event handler for this driver looks like this,
+
+ sub rsp,0x28
+ lea rdx,[rip+0x2445] # 0xaa948720
+ mov ecx,0x4
+ call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
+ mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
+ xor eax,eax
+ mov BYTE PTR [r11+0x1],0x1
+ add rsp,0x28
+ ret
+
+Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
+handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
+instruction because ConvertPointer() was being called to convert the
+address 0x0000000000000000, which when converted is left unchanged and
+remains 0x0000000000000000.
+
+The output of the oops trace gave the impression of a standard NULL
+pointer dereference bug, but because we're accessing physical
+addresses during ConvertPointer(), it wasn't. EFI boot services code
+is stored at that address on Alexis' machine.
+
+Reported-by: Alexis Murzeau <amurzeau at gmail.com>
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+Cc: Andy Lutomirski <luto at amacapital.net>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Ben Hutchings <ben at decadent.org.uk>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: H. Peter Anvin <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Maarten Lankhorst <maarten.lankhorst at canonical.com>
+Cc: Matthew Garrett <mjg59 at srcf.ucam.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Raphael Hertzog <hertzog at debian.org>
+Cc: Roger Shimizu <rogershimizu at gmail.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: linux-efi at vger.kernel.org
+Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
+Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
+Signed-off-by: Ingo Molnar <mingo at kernel.org>
+---
+ arch/x86/platform/efi/quirks.c | 79 +++++++++++++++++++++++++++++++++---------
+ 1 file changed, 62 insertions(+), 17 deletions(-)
+
+--- a/arch/x86/platform/efi/quirks.c
++++ b/arch/x86/platform/efi/quirks.c
+@@ -130,6 +130,27 @@ efi_status_t efi_query_variable_store(u3
+ EXPORT_SYMBOL_GPL(efi_query_variable_store);
+
+ /*
++ * Helper function for efi_reserve_boot_services() to figure out if we
++ * can free regions in efi_free_boot_services().
++ *
++ * Use this function to ensure we do not free regions owned by somebody
++ * else. We must only reserve (and then free) regions:
++ *
++ * - Not within any part of the kernel
++ * - Not the BIOS reserved area (E820_RESERVED, E820_NVS, etc)
++ */
++static bool can_free_region(u64 start, u64 size)
++{
++ if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
++ return false;
++
++ if (!e820_all_mapped(start, start+size, E820_RAM))
++ return false;
++
++ return true;
++}
++
++/*
+ * The UEFI specification makes it clear that the operating system is free to do
+ * whatever it wants with boot services code after ExitBootServices() has been
+ * called. Ignoring this recommendation a significant bunch of EFI implementations
+@@ -146,26 +167,50 @@ void __init efi_reserve_boot_services(vo
+ efi_memory_desc_t *md = p;
+ u64 start = md->phys_addr;
+ u64 size = md->num_pages << EFI_PAGE_SHIFT;
++ bool already_reserved;
+
+ if (md->type != EFI_BOOT_SERVICES_CODE &&
+ md->type != EFI_BOOT_SERVICES_DATA)
+ continue;
+- /* Only reserve where possible:
+- * - Not within any already allocated areas
+- * - Not over any memory area (really needed, if above?)
+- * - Not within any part of the kernel
+- * - Not the bios reserved area
+- */
+- if ((start + size > __pa_symbol(_text)
+- && start <= __pa_symbol(_end)) ||
+- !e820_all_mapped(start, start+size, E820_RAM) ||
+- memblock_is_region_reserved(start, size)) {
+- /* Could not reserve, skip it */
+- md->num_pages = 0;
+- memblock_dbg("Could not reserve boot range [0x%010llx-0x%010llx]\n",
+- start, start+size-1);
+- } else
++
++ already_reserved = memblock_is_region_reserved(start, size);
++
++ /*
++ * Because the following memblock_reserve() is paired
++ * with free_bootmem_late() for this region in
++ * efi_free_boot_services(), we must be extremely
++ * careful not to reserve, and subsequently free,
++ * critical regions of memory (like the kernel image) or
++ * those regions that somebody else has already
++ * reserved.
++ *
++ * A good example of a critical region that must not be
++ * freed is page zero (first 4Kb of memory), which may
++ * contain boot services code/data but is marked
++ * E820_RESERVED by trim_bios_range().
++ */
++ if (!already_reserved) {
+ memblock_reserve(start, size);
++
++ /*
++ * If we are the first to reserve the region, no
++ * one else cares about it. We own it and can
++ * free it later.
++ */
++ if (can_free_region(start, size))
++ continue;
++ }
++
++ /*
++ * We don't own the region. We must not free it.
++ *
++ * Setting this bit for a boot services region really
++ * doesn't make sense as far as the firmware is
++ * concerned, but it does provide us with a way to tag
++ * those regions that must not be paired with
++ * free_bootmem_late().
++ */
++ md->attribute |= EFI_MEMORY_RUNTIME;
+ }
+ }
+
+@@ -182,8 +227,8 @@ void __init efi_free_boot_services(void)
+ md->type != EFI_BOOT_SERVICES_DATA)
+ continue;
+
+- /* Could not reserve boot area */
+- if (!size)
++ /* Do not free, someone else owns it: */
++ if (md->attribute & EFI_MEMORY_RUNTIME)
+ continue;
+
+ free_bootmem_late(start, size);
diff --git a/debian/patches/bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch b/debian/patches/bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
new file mode 100644
index 0000000..bae631e
--- /dev/null
+++ b/debian/patches/bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
@@ -0,0 +1,58 @@
+From: Matt Fleming <matt at codeblueprint.co.uk>
+Date: Mon, 14 Mar 2016 10:33:01 +0000
+Subject: x86/mm/pat: Fix boot crash when 1GB pages are not supported by cpu
+Origin: http://mid.gmane.org/1457951581-27353-2-git-send-email-matt@codeblueprint.co.uk
+
+Scott reports that with the new separate EFI page tables he's seeing
+the following error on boot, caused by setting reserved bits in the
+page table structures (fault code is PF_RSVD | PF_PROT),
+
+ swapper/0: Corrupted page table at address 17b102020
+ PGD 17b0e5063 PUD 1400000e3
+ Bad pagetable: 0009 [#1] SMP
+
+On first inspection the PUD is using a 1GB page size (_PAGE_PSE) and
+looks fine but that's only true if support for 1GB PUD pages
+("pdpe1gb") is present in the cpu.
+
+Scott's Intel Celeron N2820 does not have that feature and so the
+_PAGE_PSE bit is reserved. Fix this issue by making the 1GB mapping
+code in conditional on "cpu_has_gbpages".
+
+This issue didn't come up in the past because the required mapping for
+the faulting address (0x17b102020) will already have been setup by the
+kernel in early boot before we got to efi_map_regions(), but we no
+longer use the standard kernel page tables during EFI calls.
+
+Reported-by: Scott Ashcroft <scott.ashcroft at talk21.com>
+Tested-by: Scott Ashcroft <scott.ashcroft at talk21.com>
+Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
+Cc: Ben Hutchings <ben at decadent.org.uk>
+Cc: Borislav Petkov <bp at alien8.de>
+Cc: Brian Gerst <brgerst at gmail.com>
+Cc: Denys Vlasenko <dvlasenk at redhat.com>
+Cc: "H. Peter Anvin" <hpa at zytor.com>
+Cc: Linus Torvalds <torvalds at linux-foundation.org>
+Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
+Cc: Matthew Garrett <mjg59 at srcf.ucam.org>
+Cc: Peter Zijlstra <peterz at infradead.org>
+Cc: Raphael Hertzog <hertzog at debian.org>
+Cc: Roger Shimizu <rogershimizu at gmail.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: linux-efi at vger.kernel.org
+Signed-off-by: Matt Fleming <matt at codeblueprint.co.uk>
+---
+ arch/x86/mm/pageattr.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/x86/mm/pageattr.c
++++ b/arch/x86/mm/pageattr.c
+@@ -1036,7 +1036,7 @@ static int populate_pud(struct cpa_data
+ /*
+ * Map everything starting from the Gb boundary, possibly with 1G pages
+ */
+- while (end - start >= PUD_SIZE) {
++ while (cpu_has_gbpages && end - start >= PUD_SIZE) {
+ set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
+ massage_pgprot(pud_pgprot)));
+
diff --git a/debian/patches/series b/debian/patches/series
index 51d5816..e4b136c 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -137,3 +137,5 @@ bugfix/powerpc/powerpc-fix-dedotify-for-binutils-2.26.patch
bugfix/x86/drm-i915-Fix-oops-caused-by-fbdev-initialization-fai.patch
debian/revert-libata-align-ata_device-s-id-on-a-cacheline.patch
debian/module-fix-abi-change-in-4.4.5.patch
+bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
+bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git
More information about the Kernel-svn-changes
mailing list