[kernel] r20062 - in dists/sid/linux/debian: . patches/features/all/aufs3
Ben Hutchings
benh at alioth.debian.org
Thu May 9 00:41:04 UTC 2013
Author: benh
Date: Thu May 9 00:41:03 2013
New Revision: 20062
Log:
linux-doc: Include aufs documentation
Modified:
dists/sid/linux/debian/changelog
dists/sid/linux/debian/patches/features/all/aufs3/aufs3-add.patch
dists/sid/linux/debian/patches/features/all/aufs3/gen-patch
Modified: dists/sid/linux/debian/changelog
==============================================================================
--- dists/sid/linux/debian/changelog Thu May 9 00:37:34 2013 (r20061)
+++ dists/sid/linux/debian/changelog Thu May 9 00:41:03 2013 (r20062)
@@ -58,6 +58,7 @@
- new copyup implementation
- pin the branch dir
- convert the plink list into hlist
+ * linux-doc: Include aufs documentation
-- Ben Hutchings <ben at decadent.org.uk> Mon, 06 May 2013 03:59:09 +0100
Modified: dists/sid/linux/debian/patches/features/all/aufs3/aufs3-add.patch
==============================================================================
--- dists/sid/linux/debian/patches/features/all/aufs3/aufs3-add.patch Thu May 9 00:37:34 2013 (r20061)
+++ dists/sid/linux/debian/patches/features/all/aufs3/aufs3-add.patch Thu May 9 00:41:03 2013 (r20062)
@@ -1,3 +1,1390 @@
+--- a/Documentation/ABI/testing/debugfs-aufs 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/ABI/testing/debugfs-aufs 2013-05-09 01:36:20.741181631 +0100
+@@ -0,0 +1,50 @@
++What: /debug/aufs/si_<id>/
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ Under /debug/aufs, a directory named si_<id> is created
++ per aufs mount, where <id> is a unique id generated
++ internally.
++
++What: /debug/aufs/si_<id>/plink
++Date: Apr 2013
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It has three lines and shows the information about the
++ pseudo-link. The first line is a single number
++ representing a number of buckets. The second line is a
++ number of pseudo-links per buckets (separated by a
++ blank). The last line is a single number representing a
++ total number of psedo-links.
++ When the aufs mount option 'noplink' is specified, it
++ will show "1\n0\n0\n".
++
++What: /debug/aufs/si_<id>/xib
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It shows the consumed blocks by xib (External Inode Number
++ Bitmap), its block size and file size.
++ When the aufs mount option 'noxino' is specified, it
++ will be empty. About XINO files, see the aufs manual.
++
++What: /debug/aufs/si_<id>/xino0, xino1 ... xinoN
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It shows the consumed blocks by xino (External Inode Number
++ Translation Table), its link count, block size and file
++ size.
++ When the aufs mount option 'noxino' is specified, it
++ will be empty. About XINO files, see the aufs manual.
++
++What: /debug/aufs/si_<id>/xigen
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It shows the consumed blocks by xigen (External Inode
++ Generation Table), its block size and file size.
++ If CONFIG_AUFS_EXPORT is disabled, this entry will not
++ be created.
++ When the aufs mount option 'noxino' is specified, it
++ will be empty. About XINO files, see the aufs manual.
+--- a/Documentation/ABI/testing/sysfs-aufs 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/ABI/testing/sysfs-aufs 2012-01-10 02:15:56.000000000 +0000
+@@ -0,0 +1,24 @@
++What: /sys/fs/aufs/si_<id>/
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ Under /sys/fs/aufs, a directory named si_<id> is created
++ per aufs mount, where <id> is a unique id generated
++ internally.
++
++What: /sys/fs/aufs/si_<id>/br0, br1 ... brN
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It shows the abolute path of a member directory (which
++ is called branch) in aufs, and its permission.
++
++What: /sys/fs/aufs/si_<id>/xi_path
++Date: March 2009
++Contact: J. R. Okajima <hooanon05 at yahoo.co.jp>
++Description:
++ It shows the abolute path of XINO (External Inode Number
++ Bitmap, Translation Table and Generation Table) file
++ even if it is the default path.
++ When the aufs mount option 'noxino' is specified, it
++ will be empty. About XINO files, see the aufs manual.
+--- a/Documentation/filesystems/aufs/README 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/README 2013-05-09 01:36:20.741181631 +0100
+@@ -0,0 +1,337 @@
++
++Aufs3 -- advanced multi layered unification filesystem version 3.x
++http://aufs.sf.net
++Junjiro R. Okajima
++
++
++0. Introduction
++----------------------------------------
++In the early days, aufs was entirely re-designed and re-implemented
++Unionfs Version 1.x series. After many original ideas, approaches,
++improvements and implementations, it becomes totally different from
++Unionfs while keeping the basic features.
++Recently, Unionfs Version 2.x series begin taking some of the same
++approaches to aufs1's.
++Unionfs is being developed by Professor Erez Zadok at Stony Brook
++University and his team.
++
++Aufs3 supports linux-3.0 and later.
++If you want older kernel version support, try aufs2-2.6.git or
++aufs2-standalone.git repository, aufs1 from CVS on SourceForge.
++
++Note: it becomes clear that "Aufs was rejected. Let's give it up."
++According to Christoph Hellwig, linux rejects all union-type filesystems
++but UnionMount.
++<http://marc.info/?l=linux-kernel&m=123938533724484&w=2>
++
++
++1. Features
++----------------------------------------
++- unite several directories into a single virtual filesystem. The member
++ directory is called as a branch.
++- you can specify the permission flags to the branch, which are 'readonly',
++ 'readwrite' and 'whiteout-able.'
++- by upper writable branch, internal copyup and whiteout, files/dirs on
++ readonly branch are modifiable logically.
++- dynamic branch manipulation, add, del.
++- etc...
++
++Also there are many enhancements in aufs1, such as:
++- readdir(3) in userspace.
++- keep inode number by external inode number table
++- keep the timestamps of file/dir in internal copyup operation
++- seekable directory, supporting NFS readdir.
++- whiteout is hardlinked in order to reduce the consumption of inodes
++ on branch
++- do not copyup, nor create a whiteout when it is unnecessary
++- revert a single systemcall when an error occurs in aufs
++- remount interface instead of ioctl
++- maintain /etc/mtab by an external command, /sbin/mount.aufs.
++- loopback mounted filesystem as a branch
++- kernel thread for removing the dir who has a plenty of whiteouts
++- support copyup sparse file (a file which has a 'hole' in it)
++- default permission flags for branches
++- selectable permission flags for ro branch, whether whiteout can
++ exist or not
++- export via NFS.
++- support <sysfs>/fs/aufs and <debugfs>/aufs.
++- support multiple writable branches, some policies to select one
++ among multiple writable branches.
++- a new semantics for link(2) and rename(2) to support multiple
++ writable branches.
++- no glibc changes are required.
++- pseudo hardlink (hardlink over branches)
++- allow a direct access manually to a file on branch, e.g. bypassing aufs.
++ including NFS or remote filesystem branch.
++- userspace wrapper for pathconf(3)/fpathconf(3) with _PC_LINK_MAX.
++- and more...
++
++Currently these features are dropped temporary from aufs3.
++See design/08plan.txt in detail.
++- test only the highest one for the directory permission (dirperm1)
++- copyup on open (coo=)
++- nested mount, i.e. aufs as readonly no-whiteout branch of another aufs
++ (robr)
++- statistics of aufs thread (/sys/fs/aufs/stat)
++- delegation mode (dlgt)
++ a delegation of the internal branch access to support task I/O
++ accounting, which also supports Linux Security Modules (LSM) mainly
++ for Suse AppArmor.
++- intent.open/create (file open in a single lookup)
++
++Features or just an idea in the future (see also design/*.txt),
++- reorder the branch index without del/re-add.
++- permanent xino files for NFSD
++- an option for refreshing the opened files after add/del branches
++- 'move' policy for copy-up between two writable branches, after
++ checking free space.
++- light version, without branch manipulation. (unnecessary?)
++- copyup in userspace
++- inotify in userspace
++- readv/writev
++- xattr, acl
++
++
++2. Download
++----------------------------------------
++There were three GIT trees for aufs3, aufs3-linux.git,
++aufs3-standalone.git, and aufs-util.git. Note that there is no "3" in
++"aufs-util.git."
++While the aufs-util is always necessary, you need either of aufs3-linux
++or aufs3-standalone.
++
++The aufs3-linux tree includes the whole linux mainline GIT tree,
++git://git.kernel.org/.../torvalds/linux.git.
++And you cannot select CONFIG_AUFS_FS=m for this version, eg. you cannot
++build aufs3 as an external kernel module.
++
++On the other hand, the aufs3-standalone tree has only aufs source files
++and necessary patches, and you can select CONFIG_AUFS_FS=m.
++
++You will find GIT branches whose name is in form of "aufs3.x" where "x"
++represents the linux kernel version, "linux-3.x". For instance,
++"aufs3.0" is for linux-3.0. For latest "linux-3.x-rcN", use
++"aufs3.x-rcN" branch.
++
++o aufs3-linux tree
++$ git clone --reference /your/linux/git/tree \
++ git://git.code.sf.net/p/aufs/aufs3-linux aufs-aufs3-linux \
++ aufs3-linux.git
++- if you don't have linux GIT tree, then remove "--reference ..."
++$ cd aufs3-linux.git
++$ git checkout origin/aufs3.0
++
++o aufs3-standalone tree
++$ git clone git://git.code.sf.net/p/aufs/aufs3-standalone \
++ aufs3-standalone.git
++$ cd aufs3-standalone.git
++$ git checkout origin/aufs3.0
++
++o aufs-util tree
++$ git clone git://git.code.sf.net/p/aufs/aufs-util \
++ aufs-util.git
++$ cd aufs-util.git
++$ git checkout origin/aufs3.0
++
++Note: The 3.x-rcN branch is to be used with `rc' kernel versions ONLY.
++The minor version number, 'x' in '3.x', of aufs may not always
++follow the minor version number of the kernel.
++Because changes in the kernel that cause the use of a new
++minor version number do not always require changes to aufs-util.
++
++Since aufs-util has its own minor version number, you may not be
++able to find a GIT branch in aufs-util for your kernel's
++exact minor version number.
++In this case, you should git-checkout the branch for the
++nearest lower number.
++
++For (an unreleased) example:
++If you are using "linux-3.10" and the "aufs3.10" branch
++does not exist in aufs-util repository, then "aufs3.9", "aufs3.8"
++or something numerically smaller is the branch for your kernel.
++
++Also you can view all branches by
++ $ git branch -a
++
++
++3. Configuration and Compilation
++----------------------------------------
++Make sure you have git-checkout'ed the correct branch.
++
++For aufs3-linux tree,
++- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS.
++- set other aufs configurations if necessary.
++
++For aufs3-standalone tree,
++There are several ways to build.
++
++1.
++- apply ./aufs3-kbuild.patch to your kernel source files.
++- apply ./aufs3-base.patch too.
++- apply ./aufs3-proc_map.patch too, if you want to make /proc/PID/maps (and
++ others including lsof(1)) show the file path on aufs instead of the
++ path on the branch fs.
++- apply ./aufs3-standalone.patch too, if you have a plan to set
++ CONFIG_AUFS_FS=m. otherwise you don't need ./aufs3-standalone.patch.
++- copy ./{Documentation,fs,include/uapi/linux/aufs_type.h} files to your
++ kernel source tree. Never copy $PWD/include/uapi/linux/Kbuild.
++- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS, you can select either
++ =m or =y.
++- and build your kernel as usual.
++- install the built kernel.
++- install the header files too by "make headers_install" to the
++ directory where you specify. By default, it is $PWD/usr.
++ "make help" shows a brief note for headers_install.
++- and reboot your system.
++
++2.
++- module only (CONFIG_AUFS_FS=m).
++- apply ./aufs3-base.patch to your kernel source files.
++- apply ./aufs3-proc_map.patch too to your kernel source files,
++ if you want to make /proc/PID/maps (and others including lsof(1)) show
++ the file path on aufs instead of the path on the branch fs.
++- apply ./aufs3-standalone.patch too.
++- build your kernel, don't forget "make headers_install", and reboot.
++- edit ./config.mk and set other aufs configurations if necessary.
++ Note: You should read $PWD/fs/aufs/Kconfig carefully which describes
++ every aufs configurations.
++- build the module by simple "make".
++- you can specify ${KDIR} make variable which points to your kernel
++ source tree.
++- install the files
++ + run "make install" to install the aufs module, or copy the built
++ $PWD/aufs.ko to /lib/modules/... and run depmod -a (or reboot simply).
++ + run "make install_headers" (instead of headers_install) to install
++ the modified aufs header file (you can specify DESTDIR which is
++ available in aufs standalone version's Makefile only), or copy
++ $PWD/usr/include/linux/aufs_type.h to /usr/include/linux or wherever
++ you like manually. By default, the target directory is $PWD/usr.
++- no need to apply aufs3-kbuild.patch, nor copying source files to your
++ kernel source tree.
++
++Note: The header file aufs_type.h is necessary to build aufs-util
++ as well as "make headers_install" in the kernel source tree.
++ headers_install is subject to be forgotten, but it is essentially
++ necessary, not only for building aufs-util.
++ You may not meet problems without headers_install in some older
++ version though.
++
++And then,
++- read README in aufs-util, build and install it
++- note that your distribution may contain an obsoleted version of
++ aufs_type.h in /usr/include/linux or something. When you build aufs
++ utilities, make sure that your compiler refers the correct aufs header
++ file which is built by "make headers_install."
++- if you want to use readdir(3) in userspace or pathconf(3) wrapper,
++ then run "make install_ulib" too. And refer to the aufs manual in
++ detail.
++
++
++4. Usage
++----------------------------------------
++At first, make sure aufs-util are installed, and please read the aufs
++manual, aufs.5 in aufs-util.git tree.
++$ man -l aufs.5
++
++And then,
++$ mkdir /tmp/rw /tmp/aufs
++# mount -t aufs -o br=/tmp/rw:${HOME} none /tmp/aufs
++
++Here is another example. The result is equivalent.
++# mount -t aufs -o br=/tmp/rw=rw:${HOME}=ro none /tmp/aufs
++ Or
++# mount -t aufs -o br:/tmp/rw none /tmp/aufs
++# mount -o remount,append:${HOME} /tmp/aufs
++
++Then, you can see whole tree of your home dir through /tmp/aufs. If
++you modify a file under /tmp/aufs, the one on your home directory is
++not affected, instead the same named file will be newly created under
++/tmp/rw. And all of your modification to a file will be applied to
++the one under /tmp/rw. This is called the file based Copy on Write
++(COW) method.
++Aufs mount options are described in aufs.5.
++If you run chroot or something and make your aufs as a root directory,
++then you need to customize the shutdown script. See the aufs manual in
++detail.
++
++Additionally, there are some sample usages of aufs which are a
++diskless system with network booting, and LiveCD over NFS.
++See sample dir in CVS tree on SourceForge.
++
++
++5. Contact
++----------------------------------------
++When you have any problems or strange behaviour in aufs, please let me
++know with:
++- /proc/mounts (instead of the output of mount(8))
++- /sys/module/aufs/*
++- /sys/fs/aufs/* (if you have them)
++- /debug/aufs/* (if you have them)
++- linux kernel version
++ if your kernel is not plain, for example modified by distributor,
++ the url where i can download its source is necessary too.
++- aufs version which was printed at loading the module or booting the
++ system, instead of the date you downloaded.
++- configuration (define/undefine CONFIG_AUFS_xxx)
++- kernel configuration or /proc/config.gz (if you have it)
++- behaviour which you think to be incorrect
++- actual operation, reproducible one is better
++- mailto: aufs-users at lists.sourceforge.net
++
++Usually, I don't watch the Public Areas(Bugs, Support Requests, Patches,
++and Feature Requests) on SourceForge. Please join and write to
++aufs-users ML.
++
++
++6. Acknowledgements
++----------------------------------------
++Thanks to everyone who have tried and are using aufs, whoever
++have reported a bug or any feedback.
++
++Especially donators:
++Tomas Matejicek(slax.org) made a donation (much more than once).
++ Since Apr 2010, Tomas M (the author of Slax and Linux Live
++ scripts) is making "doubling" donations.
++ Unfortunately I cannot list all of the donators, but I really
++ appreciate.
++ It ends Aug 2010, but the ordinary donation URL is still available.
++ <http://sourceforge.net/donate/index.php?group_id=167503>
++Dai Itasaka made a donation (2007/8).
++Chuck Smith made a donation (2008/4, 10 and 12).
++Henk Schoneveld made a donation (2008/9).
++Chih-Wei Huang, ASUS, CTC donated Eee PC 4G (2008/10).
++Francois Dupoux made a donation (2008/11).
++Bruno Cesar Ribas and Luis Carlos Erpen de Bona, C3SL serves public
++ aufs2 GIT tree (2009/2).
++William Grant made a donation (2009/3).
++Patrick Lane made a donation (2009/4).
++The Mail Archive (mail-archive.com) made donations (2009/5).
++Nippy Networks (Ed Wildgoose) made a donation (2009/7).
++New Dream Network, LLC (www.dreamhost.com) made a donation (2009/11).
++Pavel Pronskiy made a donation (2011/2).
++Iridium and Inmarsat satellite phone retailer (www.mailasail.com), Nippy
++ Networks (Ed Wildgoose) made a donation for hardware (2011/3).
++Max Lekomcev (DOM-TV project) made a donation (2011/7, 12, 2012/3, 6 and
++11).
++Sam Liddicott made a donation (2011/9).
++Era Scarecrow made a donation (2013/4).
++Bor Ratajc made a donation (2013/4).
++Alessandro Gorreta made a donation (2013/4).
++POIRETTE Marc made a donation (2013/4).
++
++Thank you very much.
++Donations are always, including future donations, very important and
++helpful for me to keep on developing aufs.
++
++
++7.
++----------------------------------------
++If you are an experienced user, no explanation is needed. Aufs is
++just a linux filesystem.
++
++
++Enjoy!
++
++# Local variables: ;
++# mode: text;
++# End: ;
+--- a/Documentation/filesystems/aufs/design/01intro.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/01intro.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,162 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Introduction
++----------------------------------------
++
++aufs [ei ju: ef es] | [a u f s]
++1. abbrev. for "advanced multi-layered unification filesystem".
++2. abbrev. for "another unionfs".
++3. abbrev. for "auf das" in German which means "on the" in English.
++ Ex. "Butter aufs Brot"(G) means "butter onto bread"(E).
++ But "Filesystem aufs Filesystem" is hard to understand.
++
++AUFS is a filesystem with features:
++- multi layered stackable unification filesystem, the member directory
++ is called as a branch.
++- branch permission and attribute, 'readonly', 'real-readonly',
++ 'readwrite', 'whiteout-able', 'link-able whiteout' and their
++ combination.
++- internal "file copy-on-write".
++- logical deletion, whiteout.
++- dynamic branch manipulation, adding, deleting and changing permission.
++- allow bypassing aufs, user's direct branch access.
++- external inode number translation table and bitmap which maintains the
++ persistent aufs inode number.
++- seekable directory, including NFS readdir.
++- file mapping, mmap and sharing pages.
++- pseudo-link, hardlink over branches.
++- loopback mounted filesystem as a branch.
++- several policies to select one among multiple writable branches.
++- revert a single systemcall when an error occurs in aufs.
++- and more...
++
++
++Multi Layered Stackable Unification Filesystem
++----------------------------------------------------------------------
++Most people already knows what it is.
++It is a filesystem which unifies several directories and provides a
++merged single directory. When users access a file, the access will be
++passed/re-directed/converted (sorry, I am not sure which English word is
++correct) to the real file on the member filesystem. The member
++filesystem is called 'lower filesystem' or 'branch' and has a mode
++'readonly' and 'readwrite.' And the deletion for a file on the lower
++readonly branch is handled by creating 'whiteout' on the upper writable
++branch.
++
++On LKML, there have been discussions about UnionMount (Jan Blunck,
++Bharata B Rao and Valerie Aurora) and Unionfs (Erez Zadok). They took
++different approaches to implement the merged-view.
++The former tries putting it into VFS, and the latter implements as a
++separate filesystem.
++(If I misunderstand about these implementations, please let me know and
++I shall correct it. Because it is a long time ago when I read their
++source files last time).
++
++UnionMount's approach will be able to small, but may be hard to share
++branches between several UnionMount since the whiteout in it is
++implemented in the inode on branch filesystem and always
++shared. According to Bharata's post, readdir does not seems to be
++finished yet.
++There are several missing features known in this implementations such as
++- for users, the inode number may change silently. eg. copy-up.
++- link(2) may break by copy-up.
++- read(2) may get an obsoleted filedata (fstat(2) too).
++- fcntl(F_SETLK) may be broken by copy-up.
++- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
++ open(O_RDWR).
++
++Unionfs has a longer history. When I started implementing a stacking filesystem
++(Aug 2005), it already existed. It has virtual super_block, inode,
++dentry and file objects and they have an array pointing lower same kind
++objects. After contributing many patches for Unionfs, I re-started my
++project AUFS (Jun 2006).
++
++In AUFS, the structure of filesystem resembles to Unionfs, but I
++implemented my own ideas, approaches and enhancements and it became
++totally different one.
++
++Comparing DM snapshot and fs based implementation
++- the number of bytes to be copied between devices is much smaller.
++- the type of filesystem must be one and only.
++- the fs must be writable, no readonly fs, even for the lower original
++ device. so the compression fs will not be usable. but if we use
++ loopback mount, we may address this issue.
++ for instance,
++ mount /cdrom/squashfs.img /sq
++ losetup /sq/ext2.img
++ losetup /somewhere/cow
++ dmsetup "snapshot /dev/loop0 /dev/loop1 ..."
++- it will be difficult (or needs more operations) to extract the
++ difference between the original device and COW.
++- DM snapshot-merge may help a lot when users try merging. in the
++ fs-layer union, users will use rsync(1).
++
++
++Several characters/aspects of aufs
++----------------------------------------------------------------------
++
++Aufs has several characters or aspects.
++1. a filesystem, callee of VFS helper
++2. sub-VFS, caller of VFS helper for branches
++3. a virtual filesystem which maintains persistent inode number
++4. reader/writer of files on branches such like an application
++
++1. Callee of VFS Helper
++As an ordinary linux filesystem, aufs is a callee of VFS. For instance,
++unlink(2) from an application reaches sys_unlink() kernel function and
++then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it
++calls filesystem specific unlink operation. Actually aufs implements the
++unlink operation but it behaves like a redirector.
++
++2. Caller of VFS Helper for Branches
++aufs_unlink() passes the unlink request to the branch filesystem as if
++it were called from VFS. So the called unlink operation of the branch
++filesystem acts as usual. As a caller of VFS helper, aufs should handle
++every necessary pre/post operation for the branch filesystem.
++- acquire the lock for the parent dir on a branch
++- lookup in a branch
++- revalidate dentry on a branch
++- mnt_want_write() for a branch
++- vfs_unlink() for a branch
++- mnt_drop_write() for a branch
++- release the lock on a branch
++
++3. Persistent Inode Number
++One of the most important issue for a filesystem is to maintain inode
++numbers. This is particularly important to support exporting a
++filesystem via NFS. Aufs is a virtual filesystem which doesn't have a
++backend block device for its own. But some storage is necessary to
++maintain inode number. It may be a large space and may not suit to keep
++in memory. Aufs rents some space from its first writable branch
++filesystem (by default) and creates file(s) on it. These files are
++created by aufs internally and removed soon (currently) keeping opened.
++Note: Because these files are removed, they are totally gone after
++ unmounting aufs. It means the inode numbers are not persistent
++ across unmount or reboot. I have a plan to make them really
++ persistent which will be important for aufs on NFS server.
++
++4. Read/Write Files Internally (copy-on-write)
++Because a branch can be readonly, when you write a file on it, aufs will
++"copy-up" it to the upper writable branch internally. And then write the
++originally requested thing to the file. Generally kernel doesn't
++open/read/write file actively. In aufs, even a single write may cause a
++internal "file copy". This behaviour is very similar to cp(1) command.
++
++Some people may think it is better to pass such work to user space
++helper, instead of doing in kernel space. Actually I am still thinking
++about it. But currently I have implemented it in kernel space.
+--- a/Documentation/filesystems/aufs/design/02struct.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/02struct.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,226 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Basic Aufs Internal Structure
++
++Superblock/Inode/Dentry/File Objects
++----------------------------------------------------------------------
++As like an ordinary filesystem, aufs has its own
++superblock/inode/dentry/file objects. All these objects have a
++dynamically allocated array and store the same kind of pointers to the
++lower filesystem, branch.
++For example, when you build a union with one readwrite branch and one
++readonly, mounted /au, /rw and /ro respectively.
++- /au = /rw + /ro
++- /ro/fileA exists but /rw/fileA
++
++Aufs lookup operation finds /ro/fileA and gets dentry for that. These
++pointers are stored in a aufs dentry. The array in aufs dentry will be,
++- [0] = NULL
++- [1] = /ro/fileA
++
++This style of an array is essentially same to the aufs
++superblock/inode/dentry/file objects.
++
++Because aufs supports manipulating branches, ie. add/delete/change
++dynamically, these objects has its own generation. When branches are
++changed, the generation in aufs superblock is incremented. And a
++generation in other object are compared when it is accessed.
++When a generation in other objects are obsoleted, aufs refreshes the
++internal array.
++
++
++Superblock
++----------------------------------------------------------------------
++Additionally aufs superblock has some data for policies to select one
++among multiple writable branches, XIB files, pseudo-links and kobject.
++See below in detail.
++About the policies which supports copy-down a directory, see policy.txt
++too.
++
++
++Branch and XINO(External Inode Number Translation Table)
++----------------------------------------------------------------------
++Every branch has its own xino (external inode number translation table)
++file. The xino file is created and unlinked by aufs internally. When two
++members of a union exist on the same filesystem, they share the single
++xino file.
++The struct of a xino file is simple, just a sequence of aufs inode
++numbers which is indexed by the lower inode number.
++In the above sample, assume the inode number of /ro/fileA is i111 and
++aufs assigns the inode number i999 for fileA. Then aufs writes 999 as
++4(8) bytes at 111 * 4(8) bytes offset in the xino file.
++
++When the inode numbers are not contiguous, the xino file will be sparse
++which has a hole in it and doesn't consume as much disk space as it
++might appear. If your branch filesystem consumes disk space for such
++holes, then you should specify 'xino=' option at mounting aufs.
++
++Also a writable branch has three kinds of "whiteout bases". All these
++are existed when the branch is joined to aufs and the names are
++whiteout-ed doubly, so that users will never see their names in aufs
++hierarchy.
++1. a regular file which will be linked to all whiteouts.
++2. a directory to store a pseudo-link.
++3. a directory to store an "orphan-ed" file temporary.
++
++1. Whiteout Base
++ When you remove a file on a readonly branch, aufs handles it as a
++ logical deletion and creates a whiteout on the upper writable branch
++ as a hardlink of this file in order not to consume inode on the
++ writable branch.
++2. Pseudo-link Dir
++ See below, Pseudo-link.
++3. Step-Parent Dir
++ When "fileC" exists on the lower readonly branch only and it is
++ opened and removed with its parent dir, and then user writes
++ something into it, then aufs copies-up fileC to this
++ directory. Because there is no other dir to store fileC. After
++ creating a file under this dir, the file is unlinked.
++
++Because aufs supports manipulating branches, ie. add/delete/change
++dynamically, a branch has its own id. When the branch order changes, aufs
++finds the new index by searching the branch id.
++
++
++Pseudo-link
++----------------------------------------------------------------------
++Assume "fileA" exists on the lower readonly branch only and it is
++hardlinked to "fileB" on the branch. When you write something to fileA,
++aufs copies-up it to the upper writable branch. Additionally aufs
++creates a hardlink under the Pseudo-link Directory of the writable
++branch. The inode of a pseudo-link is kept in aufs super_block as a
++simple list. If fileB is read after unlinking fileA, aufs returns
++filedata from the pseudo-link instead of the lower readonly
++branch. Because the pseudo-link is based upon the inode, to keep the
++inode number by xino (see above) is important.
++
++All the hardlinks under the Pseudo-link Directory of the writable branch
++should be restored in a proper location later. Aufs provides a utility
++to do this. The userspace helpers executed at remounting and unmounting
++aufs by default.
++During this utility is running, it puts aufs into the pseudo-link
++maintenance mode. In this mode, only the process which began the
++maintenance mode (and its child processes) is allowed to operate in
++aufs. Some other processes which are not related to the pseudo-link will
++be allowed to run too, but the rest have to return an error or wait
++until the maintenance mode ends. If a process already acquires an inode
++mutex (in VFS), it has to return an error.
++
++
++XIB(external inode number bitmap)
++----------------------------------------------------------------------
++Addition to the xino file per a branch, aufs has an external inode number
++bitmap in a superblock object. It is also a file such like a xino file.
++It is a simple bitmap to mark whether the aufs inode number is in-use or
++not.
++To reduce the file I/O, aufs prepares a single memory page to cache xib.
++
++Aufs implements a feature to truncate/refresh both of xino and xib to
++reduce the number of consumed disk blocks for these files.
++
++
++Virtual or Vertical Dir, and Readdir in Userspace
++----------------------------------------------------------------------
++In order to support multiple layers (branches), aufs readdir operation
++constructs a virtual dir block on memory. For readdir, aufs calls
++vfs_readdir() internally for each dir on branches, merges their entries
++with eliminating the whiteout-ed ones, and sets it to file (dir)
++object. So the file object has its entry list until it is closed. The
++entry list will be updated when the file position is zero and becomes
++old. This decision is made in aufs automatically.
++
++The dynamically allocated memory block for the name of entries has a
++unit of 512 bytes (by default) and stores the names contiguously (no
++padding). Another block for each entry is handled by kmem_cache too.
++During building dir blocks, aufs creates hash list and judging whether
++the entry is whiteouted by its upper branch or already listed.
++The merged result is cached in the corresponding inode object and
++maintained by a customizable life-time option.
++
++Some people may call it can be a security hole or invite DoS attack
++since the opened and once readdir-ed dir (file object) holds its entry
++list and becomes a pressure for system memory. But I'd say it is similar
++to files under /proc or /sys. The virtual files in them also holds a
++memory page (generally) while they are opened. When an idea to reduce
++memory for them is introduced, it will be applied to aufs too.
++For those who really hate this situation, I've developed readdir(3)
++library which operates this merging in userspace. You just need to set
++LD_PRELOAD environment variable, and aufs will not consume no memory in
++kernel space for readdir(3).
++
++
++Workqueue
++----------------------------------------------------------------------
++Aufs sometimes requires privilege access to a branch. For instance,
++in copy-up/down operation. When a user process is going to make changes
++to a file which exists in the lower readonly branch only, and the mode
++of one of ancestor directories may not be writable by a user
++process. Here aufs copy-up the file with its ancestors and they may
++require privilege to set its owner/group/mode/etc.
++This is a typical case of a application character of aufs (see
++Introduction).
++
++Aufs uses workqueue synchronously for this case. It creates its own
++workqueue. The workqueue is a kernel thread and has privilege. Aufs
++passes the request to call mkdir or write (for example), and wait for
++its completion. This approach solves a problem of a signal handler
++simply.
++If aufs didn't adopt the workqueue and changed the privilege of the
++process, and if the mkdir/write call arises SIGXFSZ or other signal,
++then the user process might gain a privilege or the generated core file
++was owned by a superuser.
++
++Also aufs uses the system global workqueue ("events" kernel thread) too
++for asynchronous tasks, such like handling inotify/fsnotify, re-creating a
++whiteout base and etc. This is unrelated to a privilege.
++Most of aufs operation tries acquiring a rw_semaphore for aufs
++superblock at the beginning, at the same time waits for the completion
++of all queued asynchronous tasks.
++
++
++Whiteout
++----------------------------------------------------------------------
++The whiteout in aufs is very similar to Unionfs's. That is represented
++by its filename. UnionMount takes an approach of a file mode, but I am
++afraid several utilities (find(1) or something) will have to support it.
++
++Basically the whiteout represents "logical deletion" which stops aufs to
++lookup further, but also it represents "dir is opaque" which also stop
++lookup.
++
++In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively.
++In order to make several functions in a single systemcall to be
++revertible, aufs adopts an approach to rename a directory to a temporary
++unique whiteouted name.
++For example, in rename(2) dir where the target dir already existed, aufs
++renames the target dir to a temporary unique whiteouted name before the
++actual rename on a branch and then handles other actions (make it opaque,
++update the attributes, etc). If an error happens in these actions, aufs
++simply renames the whiteouted name back and returns an error. If all are
++succeeded, aufs registers a function to remove the whiteouted unique
++temporary name completely and asynchronously to the system global
++workqueue.
++
++
++Copy-up
++----------------------------------------------------------------------
++It is a well-known feature or concept.
++When user modifies a file on a readonly branch, aufs operate "copy-up"
++internally and makes change to the new file on the upper writable branch.
++When the trigger systemcall does not update the timestamps of the parent
++dir, aufs reverts it after copy-up.
+--- a/Documentation/filesystems/aufs/design/03lookup.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/03lookup.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,106 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Lookup in a Branch
++----------------------------------------------------------------------
++Since aufs has a character of sub-VFS (see Introduction), it operates
++lookup for branches as VFS does. It may be a heavy work. Generally
++speaking struct nameidata is a bigger structure and includes many
++information. But almost all lookup operation in aufs is the simplest
++case, ie. lookup only an entry directly connected to its parent. Digging
++down the directory hierarchy is unnecessary.
++
++VFS has a function lookup_one_len() for that use, but it is not usable
++for a branch filesystem which requires struct nameidata. So aufs
++implements a simple lookup wrapper function. When a branch filesystem
++allows NULL as nameidata, it calls lookup_one_len(). Otherwise it builds
++a simplest nameidata and calls lookup_hash().
++Here aufs applies "a principle in NFSD", ie. if the filesystem supports
++NFS-export, then it has to support NULL as a nameidata parameter for
++->create(), ->lookup() and ->d_revalidate(). So the lookup wrapper in
++aufs tests if ->s_export_op in the branch is NULL or not.
++
++When a branch is a remote filesystem, aufs basically trusts its
++->d_revalidate(), also aufs forces the hardest revalidate tests for
++them.
++For d_revalidate, aufs implements three levels of revalidate tests. See
++"Revalidate Dentry and UDBA" in detail.
++
++
++Loopback Mount
++----------------------------------------------------------------------
++Basically aufs supports any type of filesystem and block device for a
++branch (actually there are some exceptions). But it is prohibited to add
++a loopback mounted one whose backend file exists in a filesystem which is
++already added to aufs. The reason is to protect aufs from a recursive
++lookup. If it was allowed, the aufs lookup operation might re-enter a
++lookup for the loopback mounted branch in the same context, and will
++cause a deadlock.
++
++
++Revalidate Dentry and UDBA (User's Direct Branch Access)
++----------------------------------------------------------------------
++Generally VFS helpers re-validate a dentry as a part of lookup.
++0. digging down the directory hierarchy.
++1. lock the parent dir by its i_mutex.
++2. lookup the final (child) entry.
++3. revalidate it.
++4. call the actual operation (create, unlink, etc.)
++5. unlock the parent dir
++
++If the filesystem implements its ->d_revalidate() (step 3), then it is
++called. Actually aufs implements it and checks the dentry on a branch is
++still valid.
++But it is not enough. Because aufs has to release the lock for the
++parent dir on a branch at the end of ->lookup() (step 2) and
++->d_revalidate() (step 3) while the i_mutex of the aufs dir is still
++held by VFS.
++If the file on a branch is changed directly, eg. bypassing aufs, after
++aufs released the lock, then the subsequent operation may cause
++something unpleasant result.
++
++This situation is a result of VFS architecture, ->lookup() and
++->d_revalidate() is separated. But I never say it is wrong. It is a good
++design from VFS's point of view. It is just not suitable for sub-VFS
++character in aufs.
++
++Aufs supports such case by three level of revalidation which is
++selectable by user.
++1. Simple Revalidate
++ Addition to the native flow in VFS's, confirm the child-parent
++ relationship on the branch just after locking the parent dir on the
++ branch in the "actual operation" (step 4). When this validation
++ fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still
++ checks the validation of the dentry on branches.
++2. Monitor Changes Internally by Inotify/Fsnotify
++ Addition to above, in the "actual operation" (step 4) aufs re-lookup
++ the dentry on the branch, and returns EBUSY if it finds different
++ dentry.
++ Additionally, aufs sets the inotify/fsnotify watch for every dir on branches
++ during it is in cache. When the event is notified, aufs registers a
++ function to kernel 'events' thread by schedule_work(). And the
++ function sets some special status to the cached aufs dentry and inode
++ private data. If they are not cached, then aufs has nothing to
++ do. When the same file is accessed through aufs (step 0-3) later,
++ aufs will detect the status and refresh all necessary data.
++ In this mode, aufs has to ignore the event which is fired by aufs
++ itself.
++3. No Extra Validation
++ This is the simplest test and doesn't add any additional revalidation
++ test, and skip therevalidatin in step 4. It is useful and improves
++ aufs performance when system surely hide the aufs branches from user,
++ by over-mounting something (or another method).
+--- a/Documentation/filesystems/aufs/design/04branch.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/04branch.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,76 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Branch Manipulation
++
++Since aufs supports dynamic branch manipulation, ie. add/remove a branch
++and changing its permission/attribute, there are a lot of works to do.
++
++
++Add a Branch
++----------------------------------------------------------------------
++o Confirm the adding dir exists outside of aufs, including loopback
++ mount.
++- and other various attributes...
++o Initialize the xino file and whiteout bases if necessary.
++ See struct.txt.
++
++o Check the owner/group/mode of the directory
++ When the owner/group/mode of the adding directory differs from the
++ existing branch, aufs issues a warning because it may impose a
++ security risk.
++ For example, when a upper writable branch has a world writable empty
++ top directory, a malicious user can create any files on the writable
++ branch directly, like copy-up and modify manually. If something like
++ /etc/{passwd,shadow} exists on the lower readonly branch but the upper
++ writable branch, and the writable branch is world-writable, then a
++ malicious guy may create /etc/passwd on the writable branch directly
++ and the infected file will be valid in aufs.
++ I am afraid it can be a security issue, but nothing to do except
++ producing a warning.
++
++
++Delete a Branch
++----------------------------------------------------------------------
++o Confirm the deleting branch is not busy
++ To be general, there is one merit to adopt "remount" interface to
++ manipulate branches. It is to discard caches. At deleting a branch,
++ aufs checks the still cached (and connected) dentries and inodes. If
++ there are any, then they are all in-use. An inode without its
++ corresponding dentry can be alive alone (for example, inotify/fsnotify case).
++
++ For the cached one, aufs checks whether the same named entry exists on
++ other branches.
++ If the cached one is a directory, because aufs provides a merged view
++ to users, as long as one dir is left on any branch aufs can show the
++ dir to users. In this case, the branch can be removed from aufs.
++ Otherwise aufs rejects deleting the branch.
++
++ If any file on the deleting branch is opened by aufs, then aufs
++ rejects deleting.
++
++
++Modify the Permission of a Branch
++----------------------------------------------------------------------
++o Re-initialize or remove the xino file and whiteout bases if necessary.
++ See struct.txt.
++
++o rw --> ro: Confirm the modifying branch is not busy
++ Aufs rejects the request if any of these conditions are true.
++ - a file on the branch is mmap-ed.
++ - a regular file on the branch is opened for write and there is no
++ same named entry on the upper branch.
+--- a/Documentation/filesystems/aufs/design/05wbr_policy.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/05wbr_policy.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,65 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Policies to Select One among Multiple Writable Branches
++----------------------------------------------------------------------
++When the number of writable branch is more than one, aufs has to decide
++the target branch for file creation or copy-up. By default, the highest
++writable branch which has the parent (or ancestor) dir of the target
++file is chosen (top-down-parent policy).
++By user's request, aufs implements some other policies to select the
++writable branch, for file creation two policies, round-robin and
++most-free-space policies. For copy-up three policies, top-down-parent,
++bottom-up-parent and bottom-up policies.
++
++As expected, the round-robin policy selects the branch in circular. When
++you have two writable branches and creates 10 new files, 5 files will be
++created for each branch. mkdir(2) systemcall is an exception. When you
++create 10 new directories, all will be created on the same branch.
++And the most-free-space policy selects the one which has most free
++space among the writable branches. The amount of free space will be
++checked by aufs internally, and users can specify its time interval.
++
++The policies for copy-up is more simple,
++top-down-parent is equivalent to the same named on in create policy,
++bottom-up-parent selects the writable branch where the parent dir
++exists and the nearest upper one from the copyup-source,
++bottom-up selects the nearest upper writable branch from the
++copyup-source, regardless the existence of the parent dir.
++
++There are some rules or exceptions to apply these policies.
++- If there is a readonly branch above the policy-selected branch and
++ the parent dir is marked as opaque (a variation of whiteout), or the
++ target (creating) file is whiteout-ed on the upper readonly branch,
++ then the result of the policy is ignored and the target file will be
++ created on the nearest upper writable branch than the readonly branch.
++- If there is a writable branch above the policy-selected branch and
++ the parent dir is marked as opaque or the target file is whiteouted
++ on the branch, then the result of the policy is ignored and the target
++ file will be created on the highest one among the upper writable
++ branches who has diropq or whiteout. In case of whiteout, aufs removes
++ it as usual.
++- link(2) and rename(2) systemcalls are exceptions in every policy.
++ They try selecting the branch where the source exists as possible
++ since copyup a large file will take long time. If it can't be,
++ ie. the branch where the source exists is readonly, then they will
++ follow the copyup policy.
++- There is an exception for rename(2) when the target exists.
++ If the rename target exists, aufs compares the index of the branches
++ where the source and the target exists and selects the higher
++ one. If the selected branch is readonly, then aufs follows the
++ copyup policy.
+--- a/Documentation/filesystems/aufs/design/06mmap.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/06mmap.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,47 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++mmap(2) -- File Memory Mapping
++----------------------------------------------------------------------
++In aufs, the file-mapped pages are handled by a branch fs directly, no
++interaction with aufs. It means aufs_mmap() calls the branch fs's
++->mmap().
++This approach is simple and good, but there is one problem.
++Under /proc, several entries show the mmap-ped files by its path (with
++device and inode number), and the printed path will be the path on the
++branch fs's instead of virtual aufs's.
++This is not a problem in most cases, but some utilities lsof(1) (and its
++user) may expect the path on aufs.
++
++To address this issue, aufs adds a new member called vm_prfile in struct
++vm_area_struct (and struct vm_region). The original vm_file points to
++the file on the branch fs in order to handle everything correctly as
++usual. The new vm_prfile points to a virtual file in aufs, and the
++show-functions in procfs refers to vm_prfile if it is set.
++Also we need to maintain several other places where touching vm_file
++such like
++- fork()/clone() copies vma and the reference count of vm_file is
++ incremented.
++- merging vma maintains the ref count too.
++
++This is not a good approach. It just faking the printed path. But it
++leaves all behaviour around f_mapping unchanged. This is surely an
++advantage.
++Actually aufs had adopted another complicated approach which calls
++generic_file_mmap() and handles struct vm_operations_struct. In this
++approach, aufs met a hard problem and I could not solve it without
++switching the approach.
+--- a/Documentation/filesystems/aufs/design/07export.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/07export.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,59 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Export Aufs via NFS
++----------------------------------------------------------------------
++Here is an approach.
++- like xino/xib, add a new file 'xigen' which stores aufs inode
++ generation.
++- iget_locked(): initialize aufs inode generation for a new inode, and
++ store it in xigen file.
++- destroy_inode(): increment aufs inode generation and store it in xigen
++ file. it is necessary even if it is not unlinked, because any data of
++ inode may be changed by UDBA.
++- encode_fh(): for a root dir, simply return FILEID_ROOT. otherwise
++ build file handle by
++ + branch id (4 bytes)
++ + superblock generation (4 bytes)
++ + inode number (4 or 8 bytes)
++ + parent dir inode number (4 or 8 bytes)
++ + inode generation (4 bytes))
++ + return value of exportfs_encode_fh() for the parent on a branch (4
++ bytes)
++ + file handle for a branch (by exportfs_encode_fh())
++- fh_to_dentry():
++ + find the index of a branch from its id in handle, and check it is
++ still exist in aufs.
++ + 1st level: get the inode number from handle and search it in cache.
++ + 2nd level: if not found, get the parent inode number from handle and
++ search it in cache. and then open the parent dir, find the matching
++ inode number by vfs_readdir() and get its name, and call
++ lookup_one_len() for the target dentry.
++ + 3rd level: if the parent dir is not cached, call
++ exportfs_decode_fh() for a branch and get the parent on a branch,
++ build a pathname of it, convert it a pathname in aufs, call
++ path_lookup(). now aufs gets a parent dir dentry, then handle it as
++ the 2nd level.
++ + to open the dir, aufs needs struct vfsmount. aufs keeps vfsmount
++ for every branch, but not itself. to get this, (currently) aufs
++ searches in current->nsproxy->mnt_ns list. it may not be a good
++ idea, but I didn't get other approach.
++ + test the generation of the gotten inode.
++- every inode operation: they may get EBUSY due to UDBA. in this case,
++ convert it into ESTALE for NFSD.
++- readdir(): call lockdep_on/off() because filldir in NFSD calls
++ lookup_one_len(), vfs_getattr(), encode_fh() and others.
+--- a/Documentation/filesystems/aufs/design/08shwh.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/08shwh.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,53 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Show Whiteout Mode (shwh)
++----------------------------------------------------------------------
++Generally aufs hides the name of whiteouts. But in some cases, to show
++them is very useful for users. For instance, creating a new middle layer
++(branch) by merging existing layers.
++
++(borrowing aufs1 HOW-TO from a user, Michael Towers)
++When you have three branches,
++- Bottom: 'system', squashfs (underlying base system), read-only
++- Middle: 'mods', squashfs, read-only
++- Top: 'overlay', ram (tmpfs), read-write
++
++The top layer is loaded at boot time and saved at shutdown, to preserve
++the changes made to the system during the session.
++When larger changes have been made, or smaller changes have accumulated,
++the size of the saved top layer data grows. At this point, it would be
++nice to be able to merge the two overlay branches ('mods' and 'overlay')
++and rewrite the 'mods' squashfs, clearing the top layer and thus
++restoring save and load speed.
++
++This merging is simplified by the use of another aufs mount, of just the
++two overlay branches using the 'shwh' option.
++# mount -t aufs -o ro,shwh,br:/livesys/overlay=ro+wh:/livesys/mods=rr+wh \
++ aufs /livesys/merge_union
++
++A merged view of these two branches is then available at
++/livesys/merge_union, and the new feature is that the whiteouts are
++visible!
++Note that in 'shwh' mode the aufs mount must be 'ro', which will disable
++writing to all branches. Also the default mode for all branches is 'ro'.
++It is now possible to save the combined contents of the two overlay
++branches to a new squashfs, e.g.:
++# mksquashfs /livesys/merge_union /path/to/newmods.squash
++
++This new squashfs archive can be stored on the boot device and the
++initramfs will use it to replace the old one at the next boot.
+--- a/Documentation/filesystems/aufs/design/10dynop.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/10dynop.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,47 @@
++
++# Copyright (C) 2010-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Dynamically customizable FS operations
++----------------------------------------------------------------------
++Generally FS operations (struct inode_operations, struct
++address_space_operations, struct file_operations, etc.) are defined as
++"static const", but it never means that FS have only one set of
++operation. Some FS have multiple sets of them. For instance, ext2 has
++three sets, one for XIP, for NOBH, and for normal.
++Since aufs overrides and redirects these operations, sometimes aufs has
++to change its behaviour according to the branch FS type. More imporantly
++VFS acts differently if a function (member in the struct) is set or
++not. It means aufs should have several sets of operations and select one
++among them according to the branch FS definition.
++
++In order to solve this problem and not to affect the behavour of VFS,
++aufs defines these operations dynamically. For instance, aufs defines
++aio_read function for struct file_operations, but it may not be set to
++the file_operations. When the branch FS doesn't have it, aufs doesn't
++set it to its file_operations while the function definition itself is
++still alive. So the behaviour of io_submit(2) will not change, and it
++will return an error when aio_read is not defined.
++
++The lifetime of these dynamically generated operation object is
++maintained by aufs branch object. When the branch is removed from aufs,
++the reference counter of the object is decremented. When it reaches
++zero, the dynamically generated operation object will be freed.
++
++This approach is designed to support AIO (io_submit), Direcit I/O and
++XIP mainly.
++Currently this approach is applied to file_operations and
++vm_operations_struct for regular files only.
+--- a/Documentation/filesystems/aufs/design/99plan.txt 1970-01-01 01:00:00.000000000 +0100
++++ b/Documentation/filesystems/aufs/design/99plan.txt 2013-03-10 01:48:58.459093058 +0000
+@@ -0,0 +1,96 @@
++
++# Copyright (C) 2005-2013 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
++
++Plan
++
++Restoring some features which was implemented in aufs1.
++They were dropped in aufs2 in order to make source files simpler and
++easier to be reviewed.
++
++
++Test Only the Highest One for the Directory Permission (dirperm1 option)
++----------------------------------------------------------------------
++Let's try case study.
++- aufs has two branches, upper readwrite and lower readonly.
++ /au = /rw + /ro
++- "dirA" exists under /ro, but /rw. and its mode is 0700.
++- user invoked "chmod a+rx /au/dirA"
++- then "dirA" becomes world readable?
++
++In this case, /ro/dirA is still 0700 since it exists in readonly branch,
++or it may be a natively readonly filesystem. If aufs respects the lower
++branch, it should not respond readdir request from other users. But user
++allowed it by chmod. Should really aufs rejects showing the entries
++under /ro/dirA?
++
++To be honest, I don't have a best solution for this case. So I
++implemented 'dirperm1' and 'nodirperm1' option in aufs1, and leave it to
++users.
++When dirperm1 is specified, aufs checks only the highest one for the
++directory permission, and shows the entries. Otherwise, as usual, checks
++every dir existing on all branches and rejects the request.
++
++As a side effect, dirperm1 option improves the performance of aufs
++because the number of permission check is reduced.
++
++
++Being Another Aufs's Readonly Branch (robr)
++----------------------------------------------------------------------
++Aufs1 allows aufs to be another aufs's readonly branch.
++This feature was developed by a user's request. But it may not be used
++currecnly.
++
++
++Copy-up on Open (coo=)
++----------------------------------------------------------------------
++By default the internal copy-up is executed when it is really necessary.
++It is not done when a file is opened for writing, but when write(2) is
++done. Users who have many (over 100) branches want to know and analyse
++when and what file is copied-up. To insert a new upper branch which
++contains such files only may improve the performance of aufs.
++
++Aufs1 implemented "coo=none | leaf | all" option.
++
++
++Refresh the Opened File (refrof)
++----------------------------------------------------------------------
++This option is implemented in aufs1 but incomplete.
++
++When user reads from a file, he expects to get its latest filedata
++generally. If the file is removed and a new same named file is created,
++the content he gets is unchanged, ie. the unlinked filedata.
++
++Let's try case study again.
++- aufs has two branches.
++ /au = /rw + /ro
++- "fileA" exists under /ro, but /rw.
++- user opened "/au/fileA".
++- he or someone else inserts a branch (/new) between /rw and /ro.
++ /au = /rw + /new + /ro
++- the new branch has "fileA".
++- user reads from the opened "fileA"
++- which filedata should aufs return, from /ro or /new?
++
++Some people says it has to be "from /ro" and it is a semantics of Unix.
++The others say it should be "from /new" because the file is not removed
++and it is equivalent to the case of someone else modifies the file.
++
++Here again I don't have a best and final answer. I got an idea to
++implement 'refrof' and 'norefrof' option. When 'refrof' (REFResh the
++Opened File) is specified (by default), aufs returns the filedata from
++/new.
++Otherwise from /new.
--- a/fs/aufs/Kconfig 1970-01-01 01:00:00.000000000 +0100
+++ b/fs/aufs/Kconfig 2012-01-10 02:15:56.000000000 +0000
@@ -0,0 +1,203 @@
Modified: dists/sid/linux/debian/patches/features/all/aufs3/gen-patch
==============================================================================
--- dists/sid/linux/debian/patches/features/all/aufs3/gen-patch Thu May 9 00:37:34 2013 (r20061)
+++ dists/sid/linux/debian/patches/features/all/aufs3/gen-patch Thu May 9 00:41:03 2013 (r20062)
@@ -9,7 +9,8 @@
{
cd "$aufs_dir" && \
- { find fs -type f; ls include/{uapi/,}linux/aufs_type.h; } | \
+ { find Documentation fs -type f; \
+ ls include/{uapi/,}linux/aufs_type.h; } | \
LC_ALL=C sort | \
while read file; do
diff -uN a/"$file" "$file" | filterdiff --addnewprefix=b/
More information about the Kernel-svn-changes
mailing list