[h5py] 159/455: Docs update

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Thu Jul 2 18:19:28 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to annotated tag 1.3.0
in repository h5py.

commit 479f22c33a102d5e99703c7b2499c13b07ad4241
Author: andrewcollette <andrew.collette at gmail.com>
Date:   Sun Nov 23 08:32:16 2008 +0000

    Docs update
---
 docs/source/guide/hl.rst     | 88 ++++++++++++++++++++++++++++++++++----------
 h5py/highlevel.py            | 18 ++++-----
 h5py/tests/test_highlevel.py |  1 +
 3 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/docs/source/guide/hl.rst b/docs/source/guide/hl.rst
index f5e0654..28f554b 100644
--- a/docs/source/guide/hl.rst
+++ b/docs/source/guide/hl.rst
@@ -401,7 +401,7 @@ directly is not recommended.
 
 A subset of the NumPy indexing techniques is supported, including the
 traditional extended-slice syntax, named-field access, and boolean arrays.
-Discrete coordinate selection are also supported via an special indexer class.
+Discrete coordinate selection is also supported via an special indexer class.
 
 Properties
 ----------
@@ -413,6 +413,7 @@ Like Numpy arrays, Dataset objects have attributes named "shape" and "dtype":
     >>> dset.shape
     (4L, 5L)
 
+
 .. _slicing_access:
 
 Slicing access
@@ -447,26 +448,54 @@ numeric slices:
     >>> dset[0,:,4:5, "FieldA", "FieldB"]
     >>> dset[0, ..., "FieldC"]
 
-Advanced indexing
------------------
+Coordinate lists
+----------------
+
+For any axis, you can provide an explicit list of points you want; for a
+dataset with shape (10, 10)::
+
+    >>> dset.shape
+    (10, 10)
+    >>> result = dset[0, [1,3,8]]
+    >>> result.shape
+    (3,)
+    >>> result = dset[1:6, [5,8,9]]
+    >>> result.shape
+    (5, 3)
+
+The following restrictions exist:
 
-Boolean "mask" arrays can also be used to specify a selection.  The result of
+* List selections may not be empty
+* Selection coordinates must be given in increasing order
+* Duplicate selections are ignored
+
+Sparse selection
+----------------
+
+Two mechanisms exist for the case of scattered and/or sparse selection, for
+which slab or row-based techniques may not be appropriate.
+
+Boolean "mask" arrays can be used to specify a selection.  The result of
 this operation is a 1-D array with elements arranged in the standard NumPy
 (C-style) order:
 
-    >>> arr = numpy.random.random((10,10))
+    >>> arr = numpy.arange(100).reshape((10,10))
     >>> dset = f.create_dataset("MyDataset", data=arr)
-    >>> result = dset[arr > 0.5]
+    >>> result = dset[arr > 50]
+    >>> result.shape
+    (49,)
 
 If you have a set of discrete points you want to access, you may not want to go
 through the overhead of creating a boolean mask.  This is especially the case
 for large datasets, where even a byte-valued mask may not fit in memory.  You
-can pass a list of points to the dataset selector via a custom "CoordsList"
-instance:
+can pass a sequence object containing points to the dataset selector via a
+custom "CoordsList" instance:
 
     >>> mycoords = [ (0,0), (3,4), (7,8), (3,5), (4,5) ]
     >>> coords_list = CoordsList(mycoords)
     >>> result = dset[coords_list]
+    >>> result.shape
+    (5,)
 
 Like boolean-array indexing, the result is a 1-D array.  The order in which
 points are selected is preserved.
@@ -483,15 +512,26 @@ points are selected is preserved.
 Special features
 ----------------
 
-Unlike memory-resident NumPy arrays, HDF5 dataset support a number of optional
+Unlike memory-resident NumPy arrays, HDF5 datasets support a number of optional
 features.  These are enabled by the keywords provided to
 :meth:`Group.create_dataset`.  Some of the more useful are:
 
+Compression
+    Transparent GZIP compression 
+    (keyword *compression*)
+    can substantially reduce the storage space
+    needed for the dataset.  Supply an integer between 0 and 9.  Using the
+    *shuffle* filter along with this option can improve the compression ratio
+    further.
+
 Resizing
-    You can specify a maximum size for the dataset when you create it, by
-    providing a "maxshape" tuple.  Elements with the value ``None`` indicate
-    unlimited dimensions.  Later calls to :meth:`Dataset.resize` will
-    modify the shape in-place::
+    Datasets can be resized, up to a maximum value provided at creation time.
+    You can specify this maximum size via the *maxshape* argument to
+    :meth:`create_dataset <Group.create_dataset>` or
+    :meth:`require_dataset <Group.require_dataset>`. Shape elements with the
+    value ``None`` indicate unlimited dimensions.
+
+    Later calls to :meth:`Dataset.resize` will modify the shape in-place::
 
         >>> dset = grp.create_dataset((10,10), '=f8', maxshape=(None, None))
         >>> dset.shape
@@ -500,11 +540,18 @@ Resizing
         >>> dset.shape
         (20, 20)
 
-Compression
-    Transparent GZIP compression can substantially reduce the storage space
-    needed for the dataset.  Supply an integer between 0 and 9.  Using the
-    *shuffle* filter along with this option can improve the compression ratio
-    further.
+    You can also resize a single axis at a time::
+
+        >>> dset.resize(35, axis=1)
+        >>> dset.shape
+        (20, 35)
+
+    .. note::
+        Only datasets stored in "chunked" format can be resized.  This format
+        is automatically selected when any of the advanced storage options is
+        used, or a *maxshape* tuple is provided.  You can also force it to be
+        used by specifying ``chunks=True`` at creation time.
+
 
 Value attribute and scalar datasets
 -----------------------------------
@@ -532,8 +579,9 @@ axis.  Since Python's ``len`` is limited by the size of a C long, it's
 recommended you use the syntax ``dataset.len()`` instead of ``len(dataset)``
 on 32-bit platforms, if you expect the length of the first row to exceed 2**32.
 
-Iterating over a dataset iterates over the first axis.  As with NumPy arrays,
-mutating the yielded data has no effect.
+Iterating over a dataset iterates over the first axis.  However, modifications
+to the yielded data are not recorded in the file.  Resizing a dataset while
+iterating has undefined results.
 
 Reference
 ---------
diff --git a/h5py/highlevel.py b/h5py/highlevel.py
index 9de07e7..e6c957b 100644
--- a/h5py/highlevel.py
+++ b/h5py/highlevel.py
@@ -711,22 +711,22 @@ class Dataset(HLObject):
                     dtype = numpy.dtype(dtype)
 
                 # Generate chunks if necessary
-                if chunks is True or \
-                   (any((compression, shuffle, fletcher32, maxshape)) and chunks is None):
-                    chunks = guess_chunk(shape, dtype.itemsize)
-                elif chunks is not None:
-                    chunks = tuple(chunks)
+                if any((compression, shuffle, fletcher32, maxshape)) or chunks is True:
+                    if chunks is False:
+                        raise ValueError("Chunked format required for given storage options")
+                    if chunks in (True, None):
+                        chunks = guess_chunk(shape, dtype.itemsize)
 
-                if chunks is not None and shape == ():
+                if chunks and shape == ():
                     raise ValueError("Filter options cannot be used with scalar datasets.")
 
                 plist = h5p.create(h5p.DATASET_CREATE)
-                if chunks is not None:
-                    plist.set_chunk(chunks)
+                if chunks:
+                    plist.set_chunk(tuple(chunks))
                     plist.set_fill_time(h5d.FILL_TIME_ALLOC)
                 if shuffle:
                     plist.set_shuffle()
-                if compression is not None:
+                if compression:
                     if compression is True:
                         compression = 6
                     plist.set_deflate(compression)
diff --git a/h5py/tests/test_highlevel.py b/h5py/tests/test_highlevel.py
index f715eb1..ef1608a 100644
--- a/h5py/tests/test_highlevel.py
+++ b/h5py/tests/test_highlevel.py
@@ -448,6 +448,7 @@ class TestDataset(HDF5TestCase):
         slices += [ s[0], s[1], s[9], s[0,0], s[4,5], s[:] ]
         slices += [ s[3,...], s[3,2,...] ]
         slices += [ numpy.random.random((10,10,50)) > 0.5 ]  # Truth array
+        slices += [ numpy.zeros((10,10,50), dtype='bool') ]
         slices += [ s[0, 1, [2,3,6,7]], s[:,[1,2]], s[[1,2]], s[3:7,[1]]]
 
         for slc in slices:

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/h5py.git



More information about the debian-science-commits mailing list