martin f. krafft: update md.txt from linus' HEAD

Sat Sep 10 10:58:06 UTC 2011

Module: mdadm
Branch: contrib/docs/md.txt
Commit: 93842bf7b4f1beb5aac95bbf683944a96ced6ce2
URL:    http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=commit;h=93842bf7b4f1beb5aac95bbf683944a96ced6ce2

Author: martin f. krafft <madduck at debian.org>
Date:   Mon Aug  1 11:39:14 2011 +0200

update md.txt from linus' HEAD

Signed-off-by: martin f. krafft <madduck at debian.org>

---

 docs/md.txt |  113 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/docs/md.txt b/docs/md.txt
index 727238d..fc94770 100644
--- a/docs/md.txt
+++ b/docs/md.txt
@@ -1,7 +1,5 @@
-# From: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/md.txt;hb=v2.6.31
-
 Tools that manage md devices can be found at
-   http://www.<country>.kernel.org/pub/linux/utils/raid/....
+   http://www.kernel.org/pub/linux/utils/raid/ 
 
 
 Boot time assembly of RAID arrays
@@ -138,7 +136,7 @@ raid_disks != 0.
 
 Then uninitialized devices can be added with ADD_NEW_DISK.  The
 structure passed to ADD_NEW_DISK must specify the state of the device
-and it's role in the array.
+and its role in the array.
 
 Once started with RUN_ARRAY, uninitialized spares can be added with
 HOT_ADD_DISK.
@@ -235,9 +233,9 @@ All md devices contain:
 
   resync_start
      The point at which resync should start.  If no resync is needed,
-     this will be a very large number.  At array creation it will
-     default to 0, though starting the array as 'clean' will
-     set it much larger.
+     this will be a very large number (or 'none' since 2.6.30-rc1).  At
+     array creation it will default to 0, though starting the array as
+     'clean' will set it much larger.
 
    new_dev
      This file can be written but not read.  The value written should
@@ -298,6 +296,51 @@ All md devices contain:
      active-idle
          like active, but no writes have been seen for a while (safe_mode_delay).
 
+  bitmap/location
+     This indicates where the write-intent bitmap for the array is
+     stored.
+     It can be one of "none", "file" or "[+-]N".
+     "file" may later be extended to "file:/file/name"
+     "[+-]N" means that many sectors from the start of the metadata.
+       This is replicated on all devices.  For arrays with externally
+       managed metadata, the offset is from the beginning of the
+       device.
+  bitmap/chunksize
+     The size, in bytes, of the chunk which will be represented by a
+     single bit.  For RAID456, it is a portion of an individual
+     device. For RAID10, it is a portion of the array.  For RAID1, it
+     is both (they come to the same thing).
+  bitmap/time_base
+     The time, in seconds, between looking for bits in the bitmap to
+     be cleared. In the current implementation, a bit will be cleared
+     between 2 and 3 times "time_base" after all the covered blocks
+     are known to be in-sync.
+  bitmap/backlog
+     When write-mostly devices are active in a RAID1, write requests
+     to those devices proceed in the background - the filesystem (or
+     other user of the device) does not have to wait for them.
+     'backlog' sets a limit on the number of concurrent background
+     writes.  If there are more than this, new writes will by
+     synchronous.
+  bitmap/metadata
+     This can be either 'internal' or 'external'.
+     'internal' is the default and means the metadata for the bitmap
+     is stored in the first 256 bytes of the allocated space and is
+     managed by the md module.
+     'external' means that bitmap metadata is managed externally to
+     the kernel (i.e. by some userspace program)
+  bitmap/can_clear
+     This is either 'true' or 'false'.  If 'true', then bits in the
+     bitmap will be cleared when the corresponding blocks are thought
+     to be in-sync.  If 'false', bits will never be cleared.
+     This is automatically set to 'false' if a write happens on a
+     degraded array, or if the array becomes degraded during a write.
+     When metadata is managed externally, it should be set to true
+     once the array becomes non-degraded, and this fact has been
+     recorded in the metadata.
+     
+     
+     
 
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
@@ -317,18 +360,20 @@ Each directory contains:
         A file recording the current state of the device in the array
 	which can be a comma separated list of
 	      faulty   - device has been kicked from active use due to
-                         a detected fault
+                         a detected fault or it has unacknowledged bad
+                         blocks
 	      in_sync  - device is a fully in-sync member of the array
 	      writemostly - device will only be subject to read
 		         requests if there are no other options.
 			 This applies only to raid1 arrays.
-	      blocked  - device has failed, metadata is "external",
-	                 and the failure hasn't been acknowledged yet.
+	      blocked  - device has failed, and the failure hasn't been
+			 acknowledged yet by the metadata handler.
 			 Writes that would write to this device if
 			 it were not faulty are blocked.
 	      spare    - device is working, but not a full member.
 			 This includes spares that are in the process
 			 of being recovered to
+	      write_error - device has ever seen a write error.
 	This list may grow in future.
 	This can be written to.
 	Writing "faulty"  simulates a failure on the device.
@@ -336,8 +381,11 @@ Each directory contains:
 	Writing "writemostly" sets the writemostly flag.
 	Writing "-writemostly" clears the writemostly flag.
 	Writing "blocked" sets the "blocked" flag.
-	Writing "-blocked" clear the "blocked" flag and allows writes
-		to complete.
+	Writing "-blocked" clears the "blocked" flags and allows writes
+		to complete and possibly simulates an error.
+	Writing "in_sync" sets the in_sync flag.
+	Writing "write_error" sets writeerrorseen flag.
+	Writing "-write_error" clears writeerrorseen flag.
 
 	This file responds to select/poll. Any change to 'faulty'
 	or 'blocked' causes an event.
@@ -374,6 +422,37 @@ Each directory contains:
         array.  If a value less than the current component_size is
         written, it will be rejected.
 
+      recovery_start
+        When the device is not 'in_sync', this records the number of
+	sectors from the start of the device which are known to be
+	correct.  This is normally zero, but during a recovery
+	operation is will steadily increase, and if the recovery is
+	interrupted, restoring this value can cause recovery to
+	avoid repeating the earlier blocks.  With v1.x metadata, this
+	value is saved and restored automatically.
+
+	This can be set whenever the device is not an active member of
+	the array, either before the array is activated, or before
+	the 'slot' is set.
+
+	Setting this to 'none' is equivalent to setting 'in_sync'.
+	Setting to any other value also clears the 'in_sync' flag.
+	
+      bad_blocks
+	This gives the list of all known bad blocks in the form of
+	start address and length (in sectors respectively). If output
+	is too big to fit in a page, it will be truncated. Writing
+	"sector length" to this file adds new acknowledged (i.e.
+	recorded to disk safely) bad blocks.
+
+      unacknowledged_bad_blocks
+	This gives the list of known-but-not-yet-saved-to-disk bad
+	blocks in the same form of 'bad_blocks'. If output is too big
+	to fit in a page, it will be truncated. Writing to this file
+	adds bad blocks without acknowledging them. This is largely
+	for testing.
+
+
 
 An active md device will also contain and entry for each active device
 in the array.  These are named
@@ -490,6 +569,16 @@ also have
      within the array where IO will be blocked.  This is currently
      only supported for raid4/5/6.
 
+   sync_min
+   sync_max
+     The two values, given as numbers of sectors, indicate a range
+     within the array where 'check'/'repair' will operate. Must be
+     a multiple of chunk_size. When it reaches "sync_max" it will
+     pause, rather than complete.
+     You can use 'select' or 'poll' on "sync_completed" to wait for
+     that number to reach sync_max.  Then you can either increase
+     "sync_max", or can write 'idle' to "sync_action".
+
 
 Each active md device may also have attributes specific to the
 personality module that manages it.