[linux] 01/01: ext4: Fix checksum validation for inodes with small i_extra_isize
debian-kernel at lists.debian.org
debian-kernel at lists.debian.org
Sun Sep 25 21:01:33 UTC 2016
This is an automated email from the git hooks/post-receive script.
benh pushed a commit to branch sid
in repository linux.
commit 7a0f81fe53007e90678365344dd839946f7dd969
Author: Ben Hutchings <ben at decadent.org.uk>
Date: Sun Sep 25 21:59:53 2016 +0100
ext4: Fix checksum validation for inodes with small i_extra_isize
Closes: #838544, regression in 4.7.4
---
debian/changelog | 2 +
.../patches/bugfix/all/ext4-fix-bug-838544.patch | 218 +++++++++++++++++++++
debian/patches/series | 1 +
3 files changed, 221 insertions(+)
diff --git a/debian/changelog b/debian/changelog
index c781418..e82695a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -139,6 +139,8 @@ linux (4.7.5-1) UNRELEASED; urgency=medium
* [hppa] tracing: Re-enable FTRACE
* [powerpc,powerpcspe,ppc64] linux-image: Suppress automatic dbgsym packages
* uaccess,uio: Fix ABI changes in 4.7.5
+ * ext4: Fix checksum validation for inodes with small i_extra_isize
+ (Closes: #838544, regression in 4.7.4)
-- Ben Hutchings <ben at decadent.org.uk> Fri, 23 Sep 2016 00:50:40 +0100
diff --git a/debian/patches/bugfix/all/ext4-fix-bug-838544.patch b/debian/patches/bugfix/all/ext4-fix-bug-838544.patch
new file mode 100644
index 0000000..2bdf2b5
--- /dev/null
+++ b/debian/patches/bugfix/all/ext4-fix-bug-838544.patch
@@ -0,0 +1,218 @@
+From: "Darrick J. Wong" <darrick.wong at oracle.com>
+Date: Mon, 19 Sep 2016 22:52:16 -0700
+Subject: Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
+Origin: https://www.spinics.net/lists/linux-fsdevel/msg101888.html
+Bug-Debian: https://bugs.debian.org/838544
+
+[cc Ted and the ext4 list]
+
+On Mon, Sep 19, 2016 at 03:19:03PM +0100, Nix wrote:
+> So I ran into spurious metadata corruption warnings in v4.7.2 due to the
+> problem fixed by c9274d8. I applied an early version of the fix,
+> rebooted, and oh dear root filesystem mount failure with invalid
+> checksum errors.
+>
+> The problem persists in v4.7.4, as seen here in qemu emulation on a raw
+> image dd'ed directly from the thing that won't boot, with a couple of
+> printk()s:
+>
+> # mount /dev/vda /new-root/
+> [ 8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
+> [ 8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
+> [ 9.017980] Inode size 256 > good old size 128; fits in inode: 0
+> [ 8.134897] inode 8: provided: 5c50l; calculated: 36e1i
+> [ 8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
+> [ 8.138992] EXT4-fs (vda): no journal found
+> [ 8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
+> mount: mounting /dev/vda on /new-root/ failed: Invalid argument
+>
+> I added a bit of printking to show the failure of the journal inode
+> checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
+> happy with this filesystem. Reverting c9274d8 makes everything happy
+> again (well, it does bring the original bug back, which is a rather
+> serious one, but other than that...):
+>
+> [ 9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
+> [ 9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
+> [ 9.832593] inode 8: provided: 5c50l; calculated: 5c50i
+> [ 9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
+> [ 9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
+>
+> So c9274d8 is clearly messing up the calculation of the checksum.
+>
+> The problem becomes more evident if we add more printk()s to the old
+> code, so we can see what region is being checksummed:
+>
+> # mount /dev/vda /new-root
+> [ 6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
+> [ 6.827596] adjusted csum: 5c50
+> [ 6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
+> [ 6.836173] adjusted csum: d6ea92e9
+> [ 6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:
+>
+> and the new:
+>
+> [ 11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
+> [ 11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
+> [ 11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
+> [ 11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
+> [ 11.098890] 8: adjusted csum: 36e1
+> [ 11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid
+>
+> We are not checksumming enough bytes! We used to checksum the entire
+> 256-byte inode: now, we checksum only 130 bytes of it, which isn't even
+> enough to cover the 28-byte extra_isize on this filesystem and is more
+> or less guaranteed to give the wrong answer. I'd fix the problem, but I
+> frankly can't see how the new code is meant to be equivalent to the old
+> code in any sense -- most particularly what the stuff around dummy_csum
+> is meant to do -- so I thought it better to let the people who wrote it
+> fix it :)
+
+Sh*t, I missed this during the review. So your filesystem image (thank
+you!) had this to say:
+
+debugfs> stats
+Inode size: 256
+debugfs> stat <8>
+Size of extra inode fields: 0
+
+Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
+for each inode. As you point out, the old code would checksum the entire
+allocated space, whether or not the inode core used it. Obviously, you
+want this since inline extended attributes live in that space:
+
+csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
+ EXT4_INODE_SIZE(inode->i_sb));
+
+The new code, on the other hand, carefully checksums around the
+i_checksum fields and only bothers to checksum the space between the end
+of i_checksum_hi and the end of the allocated space if the inode core is
+big enough to store i_checksum_hi. Since we allocated 256 bytes for
+each inode, we checksum the first two bytes after byte 128
+(EXT4_GOOD_OLD_INODE_SIZE), but then we see that i_extra_size == 0 so we
+never bother to checksum anything after that. This is of course wrong
+since we no longer checksum the xattr space and we've deviated from the
+pre-4.7.4 (documented) on-disk format.
+
+*FORTUNATELY* since the root inode fails the read verifier, the file (in
+this case the root dir) can't be modified and therefore nothing has been
+corrupted. Especially fortunate for you, the fs won't mount, reducing
+the chances that any serious damage has occurred.
+
+I /think/ the fix in this case is to hoist the last ext4_chksum call
+out of the EXT4_FITS_IN_INODE blob:
+
+if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+ offset = offsetof(struct ext4_inode, i_checksum_hi);
+ csum = ext4_chksum(sbi, csum, (__u8 *)raw +
+ EXT4_GOOD_OLD_INODE_SIZE,
+ offset - EXT4_GOOD_OLD_INODE_SIZE);
+ if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
+ csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
+ csum_size);
+ offset += csum_size;
+ }
+ csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+ EXT4_INODE_SIZE(inode->i_sb) - offset);
+}
+
+Can you give that a try?
+
+> tune2fs output for this filesystem, particularly the extra_isize and
+> inode size fields are likely relevant:
+>
+> tune2fs 1.43.1 (08-Jun-2016)
+> Filesystem volume name: root
+> Last mounted on: /
+> Filesystem UUID: 6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
+> Filesystem magic number: 0xEF53
+> Filesystem revision #: 1 (dynamic)
+> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+> Filesystem flags: signed_directory_hash
+> Default mount options: (none)
+> Filesystem state: clean
+> Errors behavior: Continue
+> Filesystem OS type: Linux
+> Inode count: 65536
+> Block count: 262144
+> Reserved block count: 13107
+> Free blocks: 227009
+> Free inodes: 59499
+> First block: 0
+> Block size: 4096
+> Fragment size: 4096
+> Group descriptor size: 64
+> Reserved GDT blocks: 63
+> Blocks per group: 32768
+> Fragments per group: 32768
+> Inodes per group: 8192
+> Inode blocks per group: 512
+> RAID stripe width: 16
+> Flex block group size: 64
+> Filesystem created: Tue May 26 21:28:46 2009
+> Last mount time: Sun Sep 18 23:34:41 2016
+> Last write time: Mon Sep 19 13:51:59 2016
+> Mount count: 0
+> Maximum mount count: 36
+> Last checked: Mon Sep 19 13:51:59 2016
+> Check interval: 15552000 (6 months)
+> Next check after: Sat Mar 18 12:51:59 2017
+> Lifetime writes: 16 GB
+> Reserved blocks uid: 0 (user root)
+> Reserved blocks gid: 0 (group root)
+> First inode: 11
+> Inode size: 256
+> Required extra isize: 28
+> Desired extra isize: 28
+> Journal inode: 8
+> Default directory hash: half_md4
+> Directory Hash Seed: f1da2da0-057e-4ba0-a021-3d56db5b24ab
+> Journal backup: inode blocks
+> Checksum type: crc32c
+> Checksum: 0x92acf115
+>
+> This is an old, upgraded fs from before the days of checksums, but even
+> so I'm surprised that with a 256-byte inode and no xattrs in use,
+> EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?
+
+It's zero, so yes. :)
+
+> An lzipped e2image of the problematic filesystem is available from
+> <http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
+> I have verified that the problem recurs with this image.
+>
+> I can also replicate the problem in literally seconds if you need more
+> debugging output.
+>
+>
+> ... The mystery is why this isn't going wrong anywhere else: I have
+> metadata_csum working on every fs on a bunch of other ext4-using
+> systems, and indeed on every filesystem on this machine as well, as long
+> as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
+> fses with the same inode size and extra_isize, but they work...
+
+Hard to say, but this bug only affects inodes with 0 < i_extra_size <= 4
+i.e. 128 < inode-core-size <= 130. Maybe an old ext3 fs that was
+created with 256 bytes per inode but inode-core-size of 128?
+
+Uck. Sorry about this mess.
+
+--D
+
+[bwh: Converted to a patch]
+---
+--- a/fs/ext4/inode.c
++++ b/fs/ext4/inode.c
+@@ -71,10 +71,9 @@ static __u32 ext4_inode_csum(struct inod
+ csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
+ csum_size);
+ offset += csum_size;
+- csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+- EXT4_INODE_SIZE(inode->i_sb) -
+- offset);
+ }
++ csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
++ EXT4_INODE_SIZE(inode->i_sb) - offset);
+ }
+
+ return csum;
diff --git a/debian/patches/series b/debian/patches/series
index 52fdabd..9561b26 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -74,6 +74,7 @@ bugfix/all/kbuild-use-nostdinc-in-compile-tests.patch
bugfix/all/disable-some-marvell-phys.patch
bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch
bugfix/all/kbuild-do-not-use-hyphen-in-exported-variable-name.patch
+bugfix/all/ext4-fix-bug-838544.patch
# Miscellaneous features
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git
More information about the Kernel-svn-changes
mailing list