[OT] Re: [Debtags-devel] libtagcoll updates

Enrico Zini enrico@enricozini.org
Tue, 5 Apr 2005 17:32:31 +0200


--Nq2Wo0NMKNjxTN9z
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

On Tue, Apr 05, 2005 at 01:35:35PM +0200, Benjamin Mesing wrote:

A small note about Unix filesystems: a file is an inode, which is a
number.  What takes space on disk are inodes.  If you do "ls -i" you see
inode numbers.  A directory has an inode as well, and it's like a file
containing (name, inode#) couples.  That is, the file attributes such as
type, permissions, dates, whatever, they are memorized in the inode and
not in the directory.

An inode can be referenced in more than one directory.  You do that with
the ln command: ln <existing file> <new file> will link <new file> to
the inode of <existing file>.  The linked file will have all attributes
of the original file, and if you change them in one, they'll change in
the other.  That shows that attributes are in the inode, not in the
directory.  You can't have a file that is owned by enrico in a directory
and by benjamin in another directory.

Every inode has a reference count.  It's the second column in "ls -l".
It shows how many times its inode is linked from directories.  Note that
the link count for a directory is at least 2: one for linking it into
some place, and one for the "." link contained in it.  It's usually very
high for directories with lots of subdirectories because of all the
links made with the '..' entries.

When you do "ln", you increase the link count of the existing file.
When you remove ("unlink") the file, the link count is decreased.  The
inode is freed in these two cases I know of:
 - The link count reaches 0 during unlink, and no process has a file
   descriptor open on that inode
 - The last process with an open file descriptor to that inode calls
   close(2) on it, and the link count is 0.
This by the way means that a simple-looking close(2) operation can
in some cases unexpectedly take a long time to complete.

If you do something like this:
  (sleep 60 ; cat) < file
  rm file
  mount -o remount,ro .    (or cut the power)
you won't really have stuff coming up in lost+found: the inode count for
'file' will be 0, and fsck will take care of deleting it when it finds
it.

Stuff ends up in lost+found when inodes are found with the link count >
0 but not linked from any directory.  That is only supposed to happen
in case something screws up: power loss when rewriting a directory, who
knows what other oddities.

Journaling filesystems prevent that, except when they have bugs such in
the case that Benjamin experienced.

The "block bitmap" is a bitmap (one bit per disk block) telling which
blocks are used and which ones are free.  The system uses the bitmap to
allocate space for files.  When an inode (not a directory entry, as you
know now :) is deleted, then blocks are eventually marked free in the
inode bitmap.  If you cut power before the system does it, then you'd
have a lost block.  However, fsck scans all the inodes checking the
blocks they occupy, in order to validate or regenerate the free block
bitmap.

Journaling filesystems prevent this problem as well.  What they don't
prevent (unless they journal both metadata and file contents, which has
performance problems) is losing updates to the file data: if you do a
write(2) and cut the power right away, the contents of that write can
either be:
 1) lost like the write had never been done (and that'd be ok)
 2) kept like the write succeeded (and that'd be ok)
 3) something inbetween (and that may cause problems)
journaling file data could prevent case 3.

> the block in use. If a program fails to toggle this bit when the last
> inode pointing to this block is deleted (or more likely the OS does),
> this would lead to a unreferenced (lost) datablock.
> On the other hand there is the filename to inode mapping. Deleting a
> file without deleting the inode it is referring to would lead to an
> unreferenced (lost) inode floating around.
> Is this correct?

Yes, precisely, except two things:

 - it's not a program responsibility to update the block bitmap or to
   delete the inode: these tasks are performed by the filesystem code in
   the kernel;
 - the kernel fails to do it in case of kernel bugs or hardware failures
   such as hard drive failure or power loss.
   If any of these cases, fsck can easily delete unreferenced inodes
   with 0 reference count, link unreferenced inodes with >0 reference
   count into lost+found and rebuild the block bitmap.

Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

--Nq2Wo0NMKNjxTN9z
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCUq+P9LSwzHl+v6sRAmUnAJ99CMJVFY3tw0QTgeH+g9vIhSmFzgCgiFbp
+GXLeEwrTxA1lZX4PBxRZdc=
=bwxo
-----END PGP SIGNATURE-----

--Nq2Wo0NMKNjxTN9z--