[linux] 01/01: ext4: Fix checksum validation for inodes with small i_extra_isize

debian-kernel at lists.debian.org debian-kernel at lists.debian.org
Sun Sep 25 21:01:33 UTC 2016


This is an automated email from the git hooks/post-receive script.

benh pushed a commit to branch sid
in repository linux.

commit 7a0f81fe53007e90678365344dd839946f7dd969
Author: Ben Hutchings <ben at decadent.org.uk>
Date:   Sun Sep 25 21:59:53 2016 +0100

    ext4: Fix checksum validation for inodes with small i_extra_isize
    
    Closes: #838544, regression in 4.7.4
---
 debian/changelog                                   |   2 +
 .../patches/bugfix/all/ext4-fix-bug-838544.patch   | 218 +++++++++++++++++++++
 debian/patches/series                              |   1 +
 3 files changed, 221 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index c781418..e82695a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -139,6 +139,8 @@ linux (4.7.5-1) UNRELEASED; urgency=medium
   * [hppa] tracing: Re-enable FTRACE
   * [powerpc,powerpcspe,ppc64] linux-image: Suppress automatic dbgsym packages
   * uaccess,uio: Fix ABI changes in 4.7.5
+  * ext4: Fix checksum validation for inodes with small i_extra_isize
+    (Closes: #838544, regression in 4.7.4)
 
  -- Ben Hutchings <ben at decadent.org.uk>  Fri, 23 Sep 2016 00:50:40 +0100
 
diff --git a/debian/patches/bugfix/all/ext4-fix-bug-838544.patch b/debian/patches/bugfix/all/ext4-fix-bug-838544.patch
new file mode 100644
index 0000000..2bdf2b5
--- /dev/null
+++ b/debian/patches/bugfix/all/ext4-fix-bug-838544.patch
@@ -0,0 +1,218 @@
+From:  "Darrick J. Wong" <darrick.wong at oracle.com>
+Date: Mon, 19 Sep 2016 22:52:16 -0700
+Subject: Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
+Origin: https://www.spinics.net/lists/linux-fsdevel/msg101888.html
+Bug-Debian: https://bugs.debian.org/838544
+
+[cc Ted and the ext4 list]
+
+On Mon, Sep 19, 2016 at 03:19:03PM +0100, Nix wrote:
+> So I ran into spurious metadata corruption warnings in v4.7.2 due to the
+> problem fixed by c9274d8. I applied an early version of the fix,
+> rebooted, and oh dear root filesystem mount failure with invalid
+> checksum errors.
+> 
+> The problem persists in v4.7.4, as seen here in qemu emulation on a raw
+> image dd'ed directly from the thing that won't boot, with a couple of
+> printk()s:
+> 
+> # mount /dev/vda /new-root/
+> [    8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
+> [    8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
+> [    9.017980] Inode size 256 > good old size 128; fits in inode: 0
+> [    8.134897] inode 8: provided: 5c50l; calculated: 36e1i
+> [    8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
+> [    8.138992] EXT4-fs (vda): no journal found
+> [    8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
+> mount: mounting /dev/vda on /new-root/ failed: Invalid argument
+> 
+> I added a bit of printking to show the failure of the journal inode
+> checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
+> happy with this filesystem. Reverting c9274d8 makes everything happy
+> again (well, it does bring the original bug back, which is a rather
+> serious one, but other than that...):
+> 
+> [    9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
+> [    9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
+> [    9.832593] inode 8: provided: 5c50l; calculated: 5c50i
+> [    9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
+> [    9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
+> 
+> So c9274d8 is clearly messing up the calculation of the checksum.
+> 
+> The problem becomes more evident if we add more printk()s to the old
+> code, so we can see what region is being checksummed:
+> 
+> # mount /dev/vda /new-root
+> [    6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
+> [    6.827596] adjusted csum: 5c50
+> [    6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
+> [    6.836173] adjusted csum: d6ea92e9
+> [    6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:
+> 
+> and the new:
+> 
+> [   11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
+> [   11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
+> [   11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
+> [   11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
+> [   11.098890] 8: adjusted csum: 36e1
+> [   11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid
+> 
+> We are not checksumming enough bytes! We used to checksum the entire
+> 256-byte inode: now, we checksum only 130 bytes of it, which isn't even
+> enough to cover the 28-byte extra_isize on this filesystem and is more
+> or less guaranteed to give the wrong answer. I'd fix the problem, but I
+> frankly can't see how the new code is meant to be equivalent to the old
+> code in any sense -- most particularly what the stuff around dummy_csum
+> is meant to do -- so I thought it better to let the people who wrote it
+> fix it :)
+
+Sh*t, I missed this during the review.  So your filesystem image (thank
+you!) had this to say:
+
+debugfs> stats
+Inode size:               256
+debugfs> stat <8>
+Size of extra inode fields: 0
+
+Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
+for each inode.  As you point out, the old code would checksum the entire
+allocated space, whether or not the inode core used it.  Obviously, you
+want this since inline extended attributes live in that space:
+
+csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
+		   EXT4_INODE_SIZE(inode->i_sb));
+
+The new code, on the other hand, carefully checksums around the
+i_checksum fields and only bothers to checksum the space between the end
+of i_checksum_hi and the end of the allocated space if the inode core is
+big enough to store i_checksum_hi.  Since we allocated 256 bytes for
+each inode, we checksum the first two bytes after byte 128
+(EXT4_GOOD_OLD_INODE_SIZE), but then we see that i_extra_size == 0 so we
+never bother to checksum anything after that.  This is of course wrong
+since we no longer checksum the xattr space and we've deviated from the
+pre-4.7.4 (documented) on-disk format.
+
+*FORTUNATELY* since the root inode fails the read verifier, the file (in
+this case the root dir) can't be modified and therefore nothing has been
+corrupted.  Especially fortunate for you, the fs won't mount, reducing
+the chances that any serious damage has occurred.
+
+I /think/ the fix in this case is to hoist the last ext4_chksum call
+out of the EXT4_FITS_IN_INODE blob:
+
+if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+	offset = offsetof(struct ext4_inode, i_checksum_hi);
+	csum = ext4_chksum(sbi, csum, (__u8 *)raw +
+			   EXT4_GOOD_OLD_INODE_SIZE,
+			   offset - EXT4_GOOD_OLD_INODE_SIZE);
+	if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
+		csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
+				   csum_size);
+		offset += csum_size;
+	}
+	csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+			   EXT4_INODE_SIZE(inode->i_sb) - offset);
+}
+
+Can you give that a try?
+
+> tune2fs output for this filesystem, particularly the extra_isize and
+> inode size fields are likely relevant:
+> 
+> tune2fs 1.43.1 (08-Jun-2016)
+> Filesystem volume name:   root
+> Last mounted on:          /
+> Filesystem UUID:          6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
+> Filesystem magic number:  0xEF53
+> Filesystem revision #:    1 (dynamic)
+> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+> Filesystem flags:         signed_directory_hash
+> Default mount options:    (none)
+> Filesystem state:         clean
+> Errors behavior:          Continue
+> Filesystem OS type:       Linux
+> Inode count:              65536
+> Block count:              262144
+> Reserved block count:     13107
+> Free blocks:              227009
+> Free inodes:              59499
+> First block:              0
+> Block size:               4096
+> Fragment size:            4096
+> Group descriptor size:    64
+> Reserved GDT blocks:      63
+> Blocks per group:         32768
+> Fragments per group:      32768
+> Inodes per group:         8192
+> Inode blocks per group:   512
+> RAID stripe width:        16
+> Flex block group size:    64
+> Filesystem created:       Tue May 26 21:28:46 2009
+> Last mount time:          Sun Sep 18 23:34:41 2016
+> Last write time:          Mon Sep 19 13:51:59 2016
+> Mount count:              0
+> Maximum mount count:      36
+> Last checked:             Mon Sep 19 13:51:59 2016
+> Check interval:           15552000 (6 months)
+> Next check after:         Sat Mar 18 12:51:59 2017
+> Lifetime writes:          16 GB
+> Reserved blocks uid:      0 (user root)
+> Reserved blocks gid:      0 (group root)
+> First inode:              11
+> Inode size:               256
+> Required extra isize:     28
+> Desired extra isize:      28
+> Journal inode:            8
+> Default directory hash:   half_md4
+> Directory Hash Seed:      f1da2da0-057e-4ba0-a021-3d56db5b24ab
+> Journal backup:           inode blocks
+> Checksum type:            crc32c
+> Checksum:                 0x92acf115
+> 
+> This is an old, upgraded fs from before the days of checksums, but even
+> so I'm surprised that with a 256-byte inode and no xattrs in use,
+> EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?
+
+It's zero, so yes. :)
+
+> An lzipped e2image of the problematic filesystem is available from
+> <http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
+> I have verified that the problem recurs with this image.
+> 
+> I can also replicate the problem in literally seconds if you need more
+> debugging output.
+> 
+> 
+> ... The mystery is why this isn't going wrong anywhere else: I have
+> metadata_csum working on every fs on a bunch of other ext4-using
+> systems, and indeed on every filesystem on this machine as well, as long
+> as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
+> fses with the same inode size and extra_isize, but they work...
+
+Hard to say, but this bug only affects inodes with 0 < i_extra_size <= 4
+i.e. 128 < inode-core-size <= 130.  Maybe an old ext3 fs that was
+created with 256 bytes per inode but inode-core-size of 128?
+
+Uck.  Sorry about this mess.
+
+--D
+
+[bwh: Converted to a patch]
+---
+--- a/fs/ext4/inode.c
++++ b/fs/ext4/inode.c
+@@ -71,10 +71,9 @@ static __u32 ext4_inode_csum(struct inod
+ 			csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
+ 					   csum_size);
+ 			offset += csum_size;
+-			csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+-					   EXT4_INODE_SIZE(inode->i_sb) -
+-					   offset);
+ 		}
++		csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
++				   EXT4_INODE_SIZE(inode->i_sb) - offset);
+ 	}
+ 
+ 	return csum;
diff --git a/debian/patches/series b/debian/patches/series
index 52fdabd..9561b26 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -74,6 +74,7 @@ bugfix/all/kbuild-use-nostdinc-in-compile-tests.patch
 bugfix/all/disable-some-marvell-phys.patch
 bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch
 bugfix/all/kbuild-do-not-use-hyphen-in-exported-variable-name.patch
+bugfix/all/ext4-fix-bug-838544.patch
 
 # Miscellaneous features
 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git



More information about the Kernel-svn-changes mailing list