[h5py] 112/455: Correctly handle zero-length selections; doc updates for upcoming 0.3.1 release

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Thu Jul 2 18:19:23 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to annotated tag 1.3.0
in repository h5py.

commit d63d27ded04b16185f7a34adbd6a5bd834838bc4
Author: andrewcollette <andrew.collette at gmail.com>
Date:   Sat Aug 30 00:24:12 2008 +0000

    Correctly handle zero-length selections; doc updates for upcoming 0.3.1 release
---
 docs/source/build.rst        |  16 +++-
 docs/source/datasets.rst     | 205 ++++++++++++++++++++++++++++++++++++++++---
 docs/source/index.rst        |   1 +
 docs/source/low.rst          |  73 ++++++++++++---
 docs/source/quick.rst        | 192 ++++++++--------------------------------
 h5py/highlevel.py            |  19 +---
 h5py/tests/test_highlevel.py |  21 +++--
 h5py/utils_hl.py             |  40 +++++----
 8 files changed, 349 insertions(+), 218 deletions(-)

diff --git a/docs/source/build.rst b/docs/source/build.rst
index 52496c2..17912c3 100644
--- a/docs/source/build.rst
+++ b/docs/source/build.rst
@@ -10,9 +10,17 @@ Python and a C compiler, for distutils to build the extensions.  Pyrex_ is
 required only if you want to change compile-time options, like the
 debugging level.
 
-It's strongly recommended you use the versions of these packages provided
-by your operating system's package manager/finder.  In particular, HDF5 can
-be painful to install manually.
+Getting HDF5
+------------
+
+HDF5 versions 1.6.5 and later are supported, including 1.8.X.  Since h5py
+consists of multiple modules, HDF5 *must* be available as a dynamic library.
+**The best solution is to install HDF5 via a package manager.**
+`The HDF Group`__ provides several "dumb" (untar in "/") binary distributions
+for Linux, but traditionally only static libraries for Mac.  Mac OS-X users
+should use something like Fink, or compile HDF5 from source.
+
+__ http://www.hdfgroup.com/HDF5
 
 Requires
 --------
@@ -24,7 +32,7 @@ Requires
 - (Optionally) Pyrex_ 0.9.8.4 or higher
 
 .. _Numpy: http://numpy.scipy.org/
-.. _HDF5: http://www.hdfgroup.com/HDF5/
+.. _HDF5: http://www.hdfgroup.com/HDF5
 .. _Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/
 
 Procedure
diff --git a/docs/source/datasets.rst b/docs/source/datasets.rst
index d688571..526a5f5 100644
--- a/docs/source/datasets.rst
+++ b/docs/source/datasets.rst
@@ -1,6 +1,8 @@
-****************
-Datasets in HDF5
-****************
+.. _Datasets:
+
+**************
+Using Datasets
+**************
 
 Datasets are where most of the information in an HDF5 file resides.  Like
 NumPy arrays, they are homogenous collections of data elements, with an
@@ -35,9 +37,9 @@ of datasets are immutable.
 Creating a dataset
 ==================
 
-There are two ways to create a dataset, with nearly identical syntax.  The
-recommended procedure is to use a method on the Group object in which the
-dataset will be stored:
+There are two ways to explicitly create a dataset, with nearly identical
+syntax.  The recommended procedure is to use a method on the Group object in
+which the dataset will be stored:
 
     >>> dset = grp.create_dataset("Dataset Name", ...options...)
 
@@ -116,16 +118,199 @@ taken for both methods.  Default values are in *italics*.
     size of each axis.  You can provide a value of "None" for any axis to
     indicate that the maximum size of that dimension is unlimited.
 
+Automatic creation
+------------------
+
+If you've already got a NumPy array you want to store, you can let h5py guess
+these options for you.  Simply assign the array to a Group entry:
+
+    >>> arr = numpy.ones((100,100), dtype='=f8')
+    >>> my_group["MyDataset"] = arr
+
+The object you provide doesn't even have to be an ndarray; if it isn't, h5py
+will create an intermediate NumPy representation before storing it.
+The resulting dataset is stored contiguously, with no compression or chunking.
+
+.. note::
+    Arrays are auto-created using the NumPy ``asarray`` function.  This means
+    that if you try to create a dataset from a string, you'll get a *scalar*
+    dataset containing the string itself!  To get a char array, pass in
+    something like ``numpy.fromstring(mystring, '|S1')`` instead.
+
 
-Slicing and data access
+Data Access and Slicing
 =======================
 
-A subset of the NumPy extended slicing is supported.  Slice specifications are
-translated directly to HDF5 *hyperslab* selections, and are are a fast and
-efficient way to access data in the file.
+A subset of the NumPy indexing techniques is supported, including the
+traditional extended-slice syntax, named-field access, and boolean arrays.
+Discrete coordinate selection are also supported via an special indexer class.
+
+Properties
+----------
+
+Like Numpy arrays, Dataset objects have attributes named "shape" and "dtype":
+
+    >>> dset.dtype
+    dtype('complex64')
+    >>> dset.shape
+    (4L, 5L)
+
+Slicing access
+--------------
+
+The best way to get at data is to use the traditional NumPy extended-slicing
+syntax.   Slice specifications are translated directly to HDF5 *hyperslab*
+selections, and are are a fast and efficient way to access data in the file.
+The following slicing arguments are recognized:
+
+    * Numbers: anything that can be converted to a Python long
+    * Slice objects: please note negative values are not allowed
+    * Field names, in the case of compound data
+    * At most one ``Ellipsis`` (``...``) object
+
+Here are a few examples (output omitted)
+
+    >>> dset = f.create_dataset("MyDataset", data=numpy.ones((10,10,10),'=f8'))
+    >>> dset[0,0,0]
+    >>> dset[0,2:10,1:9:3]
+    >>> dset[0,...]
+    >>> dset[:,::2,5]
+
+Simple array broadcasting is also supported:
+
+    >>> dset[0]   # Equivalent to dset[0,...]
+
+For compound data, you can specify multiple field names alongside the
+numeric slices:
+
+    >>> dset["FieldA"]
+    >>> dset[0,:,4:5, "FieldA", "FieldB"]
+    >>> dset[0, ..., "FieldC"]
+
+Advanced indexing
+-----------------
+
+Boolean "mask" arrays can also be used to specify a selection.  The result of
+this operation is a 1-D array with elements arranged in the standard NumPy
+(C-style) order:
+
+    >>> arr = numpy.random.random((10,10))
+    >>> dset = f.create_dataset("MyDataset", data=arr)
+    >>> result = dset[arr > 0.5]
+
+If you have a set of discrete points you want to access, you may not want to go
+through the overhead of creating a boolean mask.  This is especially the case
+for large datasets, where even a byte-valued mask may not fit in memory.  You
+can pass a list of points to the dataset selector via a custom "CoordsList"
+instance:
+
+    >>> mycoords = [ (0,0), (3,4), (7,8), (3,5), (4,5) ]
+    >>> coords_list = CoordsList(mycoords)
+    >>> result = dset[coords_list]
+
+Like boolean-array indexing, the result is a 1-D array.  The order in which
+points are selected is preserved.
+
+.. note::
+    These two techniques rely on an HDF5 construct which explicitly enumerates the
+    points to be selected.  It's very flexible but most appropriate for 
+    reasonably-sized (or sparse) selections.  The coordinate list takes at
+    least 8*<rank> bytes per point, and may need to be internally copied.  For
+    example, it takes 40MB to express a 1-million point selection on a rank-3
+    array.  Be careful, especially with boolean masks.
+
+Value attribute and scalar datasets
+-----------------------------------
+
+HDF5 allows you to store "scalar" datasets.  These have the shape "()".  You
+can use the syntax ``dset[...]`` to recover the value as an 0-dimensional
+array.  Also, the special attribute ``value`` will return a scalar for an 0-dim
+array, and a full n-dimensional array for all other cases:
+
+    >>> f["ArrayDS"] = numpy.ones((2,2))
+    >>> f["ScalarDS"] = 1.0
+    >>> f["ArrayDS"].value
+    array([[ 1.,  1.],
+           [ 1.,  1.]])
+    >>> f["ScalarDS"].value
+    1.0
+
+Extending Datasets
+------------------
+
+If the dataset is created with the *maxshape* option set, you can later expand
+its size.  Simply call the *extend* method:
+
+    >>> dset = f.create_dataset("MyDataset", (5,5), maxshape=(None,None))
+    >>> dset.shape
+    (5, 5)
+    >>> dset.extend((15,20))
+    >>> dset.shape
+    (15, 20)
+
+More on Datatypes
+=================
+
+Storing compound data
+---------------------
+
+You can store "compound" data (struct-like, using named fields) using the Numpy
+facility for compound data types.  For example, suppose we have data that takes
+the form of (temperature, voltage) pairs::
+
+    >>> import numpy
+    >>> mydtype = numpy.dtype([('temp','=f4'),('voltage','=f8')])
+    >>> dset = f.create_dataset("MyDataset", (20,30), mydtype)
+    >>> dset
+    Dataset "MyDataset": (20L, 30L) dtype([('temp', '<f4'), ('voltage', '<f8')])
+    
+These types may contain any supported type, and be arbitrarily nested.
+
+.. _supported:
+
+Supported types
+-----------------
+
+The HDF5 type system is mostly a superset of its NumPy equivalent.  The
+following are the NumPy types currently supported by the interface:
+
+    ========    ==========  ==========  ===============================
+    Datatype    NumPy kind  HDF5 class  Notes
+    ========    ==========  ==========  ===============================
+    Integer     i, u        INTEGER
+    Float       f           FLOAT
+    Complex     c           COMPOUND    Stored as an HDF5 struct
+    Array       V           ARRAY       NumPy array with "subdtype"
+    Opaque      V           OPAQUE      Stored as HDF5 fixed-length opaque
+    Compound    V           COMPOUND    May be arbitarily nested
+    String      S           STRING      Stored as HDF5 fixed-length C-style strings
+    ========    ==========  ==========  ===============================
+
+Byte order is always preserved.  The following additional features are known
+not to be supported:
+
+    * Read/write HDF5 variable-length (VLEN) data
+
+      No obvious way exists to handle variable-length data in NumPy.
+
+    * NumPy object types (dtype "O")
+
+      This could potentially be solved by pickling, but requires low-level
+      VLEN infrastructure.
+
+    * HDF5 enums
 
+      There's no NumPy dtype support for enums.  Enum data is read as plain
+      integer data.  However, the low-level conversion routine
+      ``h5t.py_create`` can create an HDF5 enum from a integer dtype and a
+      dictionary of names.
+    
+    * HDF5 "time" datatype
 
+      This datatype is deprecated, and has no close NumPy equivalent.
 
+    
+     
 
 
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 1c93ec0..7ed6f6d 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -36,6 +36,7 @@ Contents:
 
     build
     quick
+    datasets
     low
     threads
     licenses
diff --git a/docs/source/low.rst b/docs/source/low.rst
index 16bb962..1c8ab9e 100644
--- a/docs/source/low.rst
+++ b/docs/source/low.rst
@@ -34,7 +34,7 @@ Identifier wrapping
 -------------------
 
 No matter how complete, a library full of C functions is not very fun to use.
-Additionally, since HDF5 identifiers are natively handled as integers, their
+Additionally, since HDF5 identifiers are natively expressed as integers, their
 lifespan must be manually tracked by the library user.  This quickly becomes
 impossible for applications of even a moderate size; errors will lead to
 resource leaks or (in the worst case) accidentally invalidating identifiers.
@@ -45,12 +45,23 @@ for an integer identifier, which allows Python reference counting to manage
 the lifespan of the identifer.  When no more references exist to the Python
 object, the HDF5 identifier is automatically closed.
 
+    >>> from h5py import h5s
+    >>> sid = h5s.create_simple( (2,3) )
+    >>> sid
+    67108866 [1] (U) SpaceID
+    >>> sid.id
+    67108866
+
 A side benefit is that many HDF5 functions take an identifier as their first
-argument.  They are naturally expressed as methods on an identifier object.
+argument.  These are naturally expressed as methods on an identifier object.
 For example, the HDF5 function``H5Dwrite`` becomes the method
 ``h5d.DatasetID.write``.  Code using this technique is easier to write and
 maintain.
 
+    >>> sid.select_hyperslab((0,0),(2,2))
+    >>> sid.get_select_bounds()
+    ((0L, 0L), (1L, 1L))
+
 State & Hashing
 ---------------
 
@@ -60,9 +71,27 @@ itself.  A side effect of this is that the hash and equality operations on
 ObjectID instances are determined by the status of the underlying HDF5 object.
 For example, if two GroupID objects with different HDF5 integer identifiers
 point to the same group, they will have identical hashes and compare equal.
-Among other things, this means that if you can use
-ObjectID/GroupID/DatasetID/etc. instances as keys in a dictionary.
-
+Among other things, this means that you can reliably use identifiers as keys
+in a dictionary.
+
+    >>> from h5py import h5f, h5g
+    >>> fid = h5f.open('foo.hdf5')
+    >>> grp1 = h5g.open(fid, '/')
+    >>> grp2 = h5g.open(fid, '/')
+    >>> grp1.id == grp2.id
+    False
+    >>> grp1 == grp2
+    True
+    >>> hash(grp1) == hash(grp2)
+    True
+    >>> x = {grp1: "The root group"}
+    >>> x[grp2]
+    'The root group'
+
+.. note::
+    Currently all subclasses of ObjectID are hashable, including "transient"
+    identifiers like datatypes.  A future version may restrict hashing to
+    "committed", file-resident objects.
 
 Data Conversion
 ===============
@@ -76,11 +105,35 @@ is good mapping between NumPy dtypes and HDF5 basic types.
 The actual conversion between datatypes is performed by the optimised routines
 inside the HDF5 library; all h5py does is provide the mapping between NumPy
 and HDF5 type objects.  Because the HDF5 typing system is more comprehensive
-than the NumPy system, this is an asymmetrical process. While translating
-from ``NumPy => HDF5`` always results in a bit-for-bit identical description,
-the reverse process ``HDF5 => NumPy`` cannot be guaranteed to result in an
-exact description.  In the vast majority of cases, this does not matter; HDF5
-can natively auto-convert a huge variety of representations.
+than the NumPy system, this is an asymmetrical process. 
+
+Translating from an HDF5 datatype object to a dtype results in the closest
+standard NumPy representation of the datatype:
+
+    >>> from h5py import h5t
+    >>> h5t.STD_I32LE
+    50331712 [1] (L) TypeIntegerID int32
+    >>> h5t.STD_I32LE.dtype 
+    dtype('int32')
+
+In the vast majority of cases the two datatypes will have exactly identical
+binary layouts, but not always.  For example, an HDF5 integer can have
+additional leading or trailing padding, which has no NumPy equivalent.  In
+this case the dtype will capture the logical intent of the type (as a 32-bit
+signed integer), but not its layout.
+
+The reverse transformation (NumPy type to HDF5 type) is handled by a separate
+function.  It's guaranteed to result in an exact, binary-compatible
+representation:
+
+    >>> tid = h5t.py_create('=u8')
+    >>> tid
+    50331956 [1] (U) TypeIntegerID uint64
+
+The HDF5 library contains translation routines which can handle almost any
+conversion between types of the same class, including odd precisions and
+padding combinations.  This process is entirely transparent to the user.
+
 
 API Versioning
 ==============
diff --git a/docs/source/quick.rst b/docs/source/quick.rst
index 06d9c0e..45f55e9 100644
--- a/docs/source/quick.rst
+++ b/docs/source/quick.rst
@@ -15,8 +15,8 @@ imports the three classes ``File``, ``Group`` and ``Dataset``, which will cover
 99% of your needs.
 
 
-Storing simple data
-===================
+Getting data into HDF5
+======================
 
 Create a new file
 -----------------
@@ -32,6 +32,8 @@ Files are opened using a Python-file-like syntax::
 Create a dataset
 ----------------
 
+(Main chapter: :ref:`Datasets`)
+
 Datasets are like Numpy arrays which reside on disk; they are identified by
 a unique name, shape, and a Numpy dtype.  The easiest way to create them is
 with a method of the File object you already have::
@@ -45,6 +47,19 @@ with a method of the File object you already have::
 This creates a new 2-d 6-element (2x3) dataset containing 32-bit signed integer
 data, in native byte order, located in the file at "/MyDataset".
 
+Or you can auto-create a dataset from an array, just by giving it a name:
+
+    >>> arr = numpy.ones((2,3), '=i4')
+    >>> f["MyDataset"] = arr
+    >>> dset = f["MyDataset"]
+
+Shape and dtype information is always available via properties:
+
+    >>> dset.dtype
+    dtype('int32')
+    >>> dset.shape
+    (2L, 3L)
+
 Read & write data
 -----------------
 
@@ -74,89 +89,9 @@ will automatically close all the open HDF5 objects::
     Invalid dataset
 
 
-More about datasets
-===================
-
-Automatic creation
-------------------
-
-If you already have an array you want to store, you don't even need to call
-``create_dataset``.  Simply assign it to a name::
-
-    >>> myarr = numpy.ones((50,75))
-    >>> f["MyDataset"] = myarr
-    >>> f["MyDataset"]
-    Dataset "MyDataset": (50L, 75L) dtype('float64')
-
-Storing compound data
----------------------
-
-You can store "compound" data (struct-like, using named fields) using the Numpy
-facility for compound data types.  For example, suppose we have data that takes
-the form of (temperature, voltage) pairs::
-
-    >>> import numpy
-    >>> mydtype = numpy.dtype([('temp','=f4'),('voltage','=f8')])
-    >>> dset = f.create_dataset("MyDataset", (20,30), mydtype)
-    >>> dset
-    Dataset "MyDataset": (20L, 30L) dtype([('temp', '<f4'), ('voltage', '<f8')])
-    
-You can also access data using Numpy recarray-style indexing.  The following
-are all legal slicing syntax for the above array (output omitted for brevity)::
-
-    >>> dset[0,0]
-    >>> dset[0,:]
-    >>> dset[...]
-    >>> dset['temp']
-    >>> dset[0,0,'temp']
-    >>> dset[8:14:2, ::2, 'voltage']
-
-Shape and data type
--------------------
-
-Like Numpy arrays, Dataset objects have attributes named "shape" and "dtype"::
-
-    >>> dset = f.create_dataset("MyDataset", (4,5), '=c8')
-    >>> dset.dtype
-    dtype('complex64')
-    >>> dset.shape
-    (4L, 5L)
-
-These attributes are read-only.
-
-Values and 0-dimensional datasets
----------------------------------
-
-HDF5 allows you to store "scalar" datasets.  These have the shape "()".  You
-can use the syntax ``dset[...]`` to recover the value as an 0-dimensional
-array.  Also, the special attribute ``value`` will return a scalar for an 0-dim
-array, and a full n-dimensional array for all other cases:
-
-    >>> f["ArrayDS"] = numpy.ones((2,2))
-    >>> f["ScalarDS"] = 1.0
-    >>> f["ArrayDS"].value
-    array([[ 1.,  1.],
-           [ 1.,  1.]])
-    >>> f["ScalarDS"].value
-    1.0
-
-
-Using HDF5 options
-------------------
-
-You can specify a number of HDF5 features when creating a dataset.  See the
-Dataset constructor for a complete list.  For example, to create a (100,100)
-dataset stored as (100,10) size chunks, using GZIP compression level 6::
-
-    >>> dset = f.create_dataset("MyDataset", (100,100), chunks=(100,10), compression=6)
-
-
 Groups & multiple objects
 =========================
 
-The root group
---------------
-
 Like a filesystem, HDF5 supports the concept of storing multiple objects in
 containers, called "groups".  The File object behaves as one of these
 groups (it's actually the *root group* "``/``", again like a UNIX filesystem).
@@ -167,94 +102,39 @@ You store objects by giving them different names:
     >>> f
     File "myfile.hdf5", root members: "DS1", "DS2"
 
-Beware, you need to delete an existing object; as HDF5 won't do this automatically::
-
-    >>> f["DS3"] = numpy.ones((2,2))
-    >>> f["DS3"] = numpy.ones((2,2))
-    Traceback (most recent call last):
-    ... snip traceback ... 
-    h5py.h5.DatasetError: Unable to create dataset (H5Dcreate)
-    HDF5 Error Stack:
-        0: "Unable to create dataset" at H5Dcreate
-        1: "Unable to name dataset" at H5D_create
-        2: "Already exists" at H5G_insert
-        3: "Unable to insert name" at H5G_namei
-        4: "Unable to insert entry" at H5G_stab_insert
-        5: "Unable to insert key" at H5B_insert
-        6: "Can't insert leaf node" at H5B_insert_helper
-        7: "Symbol is already present in symbol table" at H5G_node_insert
-
-Removing objects
-----------------
+As with other Python container objects, they support iteration and membership
+testing:
+    
+    >>> list(f)
+    ['DS1', 'DS2']
+    >>> dict(x, y.shape for x, y in f.iteritems())
+    {'DS1': (2,3), 'DS2': (1,2)}
+    >>> "DS1" in f
+    True
+    >>> "FOOBAR" in f
+    False
 
 You can "delete" (unlink) an object from a group::
 
     >>> f["DS"] = numpy.ones((10,10))
     >>> f["DS"]
     Dataset "DS": (10L, 10L) dtype('float64')
+    >>> "DS" in f
+    True
     >>> del f["DS"]
-    >>> f["DS"]
-    Traceback (most recent call last):
-    ... snip traceback ...
-    h5py.h5.ArgsError: Cannot stat object (H5Gget_objinfo)
-    HDF5 Error Stack:
-        0: "Cannot stat object" at H5Gget_objinfo
-        1: "Unable to stat object" at H5G_get_objinfo
-        2: "Component not found" at H5G_namei
-        3: "Not found" at H5G_stab_find
-        4: "Not found" at H5G_node_found
-
-Creating subgroups
-------------------
+    >>> "DS" in f
+    False
 
-You can create subgroups by giving them names:
+You create additional subgroups by giving them names:
 
     >>> f.create_group('subgrp')
     Group "subgrp" (0 members)
     
-Be careful, as most versions of HDF5 don't support "automatic" (recursive)
-creation of intermediate groups.  Instead of doing::
-
-    >>> f.create_group('foo/bar/baz')  # WRONG
-
-you have to do:
-
-    >>> f.create_group('foo')
-    >>> f.create_group('foo/bar')
-    >>> f.create_group('foo/bar/baz')
-
-This restriction will be raised in the future, as HDF5 1.8.X provides a feature
-that does this automatically.
+.. note::
 
+    Most HDF5 versions don't support automatic creation of intermediate
+    groups; you can't yet do ``f.create_group('foo/bar/baz')``.
 
-Group tricks
-------------
-
-Groups support iteration (yields the member names), len() (gives the number
-of members), and membership testing:
-
-    >>> g = f.create_group('subgrp')
-    >>> g["DS1"] = numpy.ones((2,2))
-    >>> g["DS2"] = numpy.ones((1,2))
-    >>> g["DS3"] = numpy.ones((10,10))
-    >>> for x in g:
-    ...     print x
-    ...
-    DS1
-    DS2
-    DS3
-    >>> for x, ds in g.iteritems():
-    ...     print x, ds.shape
-    ...
-    DS1 (2L, 2L)
-    DS2 (1L, 2L)
-    DS3 (10L, 10L)
-    >>> len(g)
-    3
-    >>> "DS1" in g
-    True
-    >>> "DS4" in g
-    False
 
 Group caveats
 -------------
diff --git a/h5py/highlevel.py b/h5py/highlevel.py
index 67f1499..a9af080 100644
--- a/h5py/highlevel.py
+++ b/h5py/highlevel.py
@@ -567,22 +567,11 @@ class Dataset(HLObject):
     def __getitem__(self, args):
         """ Read a slice from the HDF5 dataset.  Takes slices and
             recarray-style field names (more than one is allowed!) in any
-            order.
-
-            For a compound dataset ds, with shape (10,10,5) and fields "a", "b" 
-            and "c", the following are all legal syntax:
-
-            ds[1,2,3]
-            ds[1,2,:]
-            ds[...,3]
-            ds[1]
-            ds[:]
-            ds[1,2,3,"a"]
-            ds[0:5:2, ..., 0:2, "a", "b"]
+            order.  Obeys basic NumPy broadcasting rules.
 
             Also supports:
 
-            * Boolean array indexing (True/False)
+            * Boolean "mask" array indexing
             * Discrete point selection via CoordsList instance
 
             Beware; these last two techniques work by explicitly enumerating
@@ -654,9 +643,9 @@ class Dataset(HLObject):
             if val.shape == ():
                 mspace = h5s.create(h5s.SCALAR)
             else:
-                mspace = h5s.create_simple(val.shape)
+                mspace = h5s.create_simple(val.shape, (h5s.UNLIMITED,)*len(val.shape))
 
-            slice_select(fspace, args)
+            result, scalar = slice_select(fspace, args)
 
             self.id.write(mspace, fspace, val)
 
diff --git a/h5py/tests/test_highlevel.py b/h5py/tests/test_highlevel.py
index 96bc3e9..d856740 100644
--- a/h5py/tests/test_highlevel.py
+++ b/h5py/tests/test_highlevel.py
@@ -295,6 +295,7 @@ class TestDataset(unittest.TestCase):
         slices += [ s[0], s[1], s[9], s[0,0], s[4,5], s[:] ]
         slices += [ s[3,...], s[3,2,...] ]
         slices += [ numpy.random.random((10,10,50)) > 0.5 ]  # Truth array
+        slices += [ s[0,0,0:0], s[1:1,:,:], numpy.zeros((10,10,50),dtype='bool')] # Empty selections
         for dt in TYPES1:
 
             srcarr = numpy.arange(10*10*50, dtype=dt).reshape(10,10,50)
@@ -339,6 +340,7 @@ class TestDataset(unittest.TestCase):
         # These need to be increasing to make it easy to compare to the
         # NumPy reference array, which uses a boolean mask.
         selections = [0,1,15,101,102, 557, 664, 1024,9999]
+        selections_list = [ selections, []]  # empty selection
 
         arr = numpy.arange(10000).reshape(space)
         
@@ -350,15 +352,20 @@ class TestDataset(unittest.TestCase):
             self.assertEqual(dset[sel], arr.flat[x])
             self.assert_(not isinstance(dset[sel], numpy.ndarray))
 
-        # Coordinate list selection
-        sel = CoordsList([numpy.unravel_index(x,space) for x in selections])
+        for lst in selections_list:
+            # Coordinate list selection
+            sel = CoordsList([numpy.unravel_index(x,space) for x in lst])
 
-        npy_sel = numpy.zeros(space, dtype='bool')
-        for x in selections:
-            npy_sel.flat[x] = True
+            npy_sel = numpy.zeros(space, dtype='bool')
+            for x in lst:
+                npy_sel.flat[x] = True
 
-        self.assert_(numpy.all(dset[sel] == arr[npy_sel]))
-        self.assert_(isinstance(dset[sel], numpy.ndarray))
+            hresult = dset[sel]
+            nresult = arr[npy_sel]
+            self.assert_(numpy.all(hresult == nresult))
+            self.assert_(isinstance(hresult, numpy.ndarray))
+            self.assertEqual(hresult.dtype, nresult.dtype)
+            self.assertEqual(hresult.shape, nresult.shape)
 
     def test_Dataset_exceptions(self):
         # These trigger exceptions in H5Dread
diff --git a/h5py/utils_hl.py b/h5py/utils_hl.py
index 269d85a..ca6ef34 100644
--- a/h5py/utils_hl.py
+++ b/h5py/utils_hl.py
@@ -74,6 +74,9 @@ class CoordsList(object):
         CoordsList( [ (1,2,3), (7,8,9) ] )  # Multiple indices
     """
 
+    npoints = property(lambda self: len(self.coords),
+        doc = "Number of selected points")
+
     def __init__(self, points):
         """ Create a new list of explicitly selected points.
 
@@ -87,7 +90,12 @@ class CoordsList(object):
             raise ValueError("Selection should be an index or a sequence of equal-rank indices")
 
         if len(self.coords) == 0:
-            raise ValueError("Selection may not be empty")
+            pass # This will be caught at index-time
+        elif self.coords.ndim == 1:
+            self.coords.resize((1,len(self.coords)))
+        elif self.coords.ndim != 2:
+            raise ValueError("Selection should be an index or a sequence of equal-rank indices")
+
 
 def slice_select(space, args):
     """ Perform a selection on the given HDF5 dataspace, using a tuple
@@ -108,9 +116,9 @@ def slice_select(space, args):
         1. Appropriate memory dataspace to use for new array
         2. Boolean indicating if the slice should result in a scalar quantity
     """
-
     shape = space.shape
     rank = len(shape)
+    space.set_extent_simple(shape, (h5s.UNLIMITED,)*rank)
 
     if len(args) == 0 or (len(args) == 1 and args[0] is Ellipsis):
         # The only safe way to access a scalar dataspace
@@ -128,22 +136,22 @@ def slice_select(space, args):
             # It never results in a scalar value
             indices = numpy.transpose(argval.nonzero())
             if len(indices) == 0:
-                raise ValueError("Selection may not be empty")
-            space.select_elements(indices)
-            return h5s.create_simple((len(indices),)), False
+                space.select_none()
+            else:
+                space.select_elements(indices)
+            return h5s.create_simple((len(indices),), (h5s.UNLIMITED,)), False
 
         if isinstance(argval, CoordsList):
             # Coords indexing also uses discrete selection
-            c_ndim = argval.coords.ndim
-            if c_ndim != rank:
-                if c_ndim == 1:
-                    argval.coords.resize((1,len(argval.coords)))
-                else:
-                    raise ValueError("Coordinate list must contain %d-rank indices (not %d-rank)" % (rank, c_ndim))
-
-            space.select_elements(argval.coords)
-            npoints = space.get_select_elem_npoints()
-            return h5s.create_simple((npoints,)), len(argval.coords) == 1
+            if len(argval.coords) == 0:
+                space.select_none()
+                npoints = 0
+            elif argval.coords.ndim != 2 or argval.coords.shape[1] != rank:
+                raise ValueError("Coordinate list incompatible with %d-rank dataset" % rank)
+            else:
+                space.select_elements(argval.coords)
+                npoints = space.get_select_elem_npoints()
+            return h5s.create_simple((npoints,), (h5s.UNLIMITED,)), len(argval.coords) == 1
 
     # Proceed to hyperslab selection
 
@@ -223,7 +231,7 @@ def slice_select(space, args):
     # do not result in a length-1 axis.
     mem_shape = tuple(x for x, smpl in zip(count, simple) if not smpl) 
 
-    return h5s.create_simple(mem_shape), all(simple)
+    return h5s.create_simple(mem_shape, (h5s.UNLIMITED,)*len(mem_shape)), all(simple)
 
 def strhdr(line, char='-'):
     """ Print a line followed by an ASCII-art underline """

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/h5py.git



More information about the debian-science-commits mailing list