[h5py] 154/455: Finished user guide
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Thu Jul 2 18:19:27 UTC 2015
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a commit to annotated tag 1.3.0
in repository h5py.
commit 30cdef04158dad7ea3830cab9745a5abeb5ed637
Author: andrewcollette <andrew.collette at gmail.com>
Date: Sat Nov 15 04:56:37 2008 +0000
Finished user guide
---
README.txt | 2 +-
docs/source/_static/h5py.css | 15 +-
docs/source/guide/datasets.rst | 320 ------------------------
docs/source/guide/hl.rst | 551 +++++++++++++++++++++++++++++++++++++++--
docs/source/guide/index.rst | 2 -
docs/source/guide/quick.rst | 2 +
docs/source/guide/threads.rst | 89 -------
h5py/highlevel.py | 2 +-
setup.py | 13 +-
9 files changed, 549 insertions(+), 447 deletions(-)
diff --git a/README.txt b/README.txt
index 467a18b..2c5298a 100644
--- a/README.txt
+++ b/README.txt
@@ -2,7 +2,7 @@ README for the "h5py" Python/HDF5 interface
===========================================
Copyright (c) 2008 Andrew Collette
-Version 0.3.1
+Version 1.0.0
* http://h5py.alfven.org Main site, docs, quick-start guide
* http://h5py.googlecode.com Downloads, FAQ and bug tracker
diff --git a/docs/source/_static/h5py.css b/docs/source/_static/h5py.css
index 58ee017..cf5da6d 100644
--- a/docs/source/_static/h5py.css
+++ b/docs/source/_static/h5py.css
@@ -97,7 +97,7 @@ table.highlighttable td {
cite, code, tt {
font-family: 'Consolas', 'Deja Vu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.95em;
- letter-spacing: 0.01em;
+ /*letter-spacing: 0.01em;*/
}
hr {
@@ -106,9 +106,10 @@ hr {
}
tt {
- background-color: #f2f2f2;
- border-bottom: 1px solid #ddd;
- color: #333;
+ /*background-color: #f2f2f2;
+ border-bottom: 1px solid #ddd;*/
+ font-weight: bold;
+ /*color: #333;*/
}
tt.descname {
@@ -116,11 +117,13 @@ tt.descname {
font-weight: bold;
font-size: 1.2em;
border: 0;
+ letter-spacing: 0.01em;
}
tt.descclassname {
background-color: transparent;
border: 0;
+ letter-spacing: 0.01em;
}
tt.xref {
@@ -173,7 +176,7 @@ dd {
dt:target,
.highlight {
- background-color: #fbe54e;
+ background-color: #eee; /*#fbe54e;*/
}
dl.glossary dt {
@@ -259,7 +262,7 @@ div.bodywrapper {
}
div.body a {
- text-decoration: underline;
+ text-decoration: none;
}
div.sphinxsidebar {
diff --git a/docs/source/guide/datasets.rst b/docs/source/guide/datasets.rst
deleted file mode 100644
index 526a5f5..0000000
--- a/docs/source/guide/datasets.rst
+++ /dev/null
@@ -1,320 +0,0 @@
-.. _Datasets:
-
-**************
-Using Datasets
-**************
-
-Datasets are where most of the information in an HDF5 file resides. Like
-NumPy arrays, they are homogenous collections of data elements, with an
-immutable datatype and (hyper)rectangular shape. Unlike NumPy arrays, they
-support a variety of transparent storage features such as compression,
-error-detection, and chunked I/O.
-
-Metadata can be associated with an HDF5 dataset in the form of an "attribute".
-It's recommended that you use this scheme for any small bits of information
-you want to associate with the dataset. For example, a descriptive title,
-digitizer settings, or data collection time are appropriate things to store
-as HDF5 attributes.
-
-
-Opening an existing dataset
-===========================
-
-Since datasets reside in groups, the best way to retrive a dataset is by
-indexing the group directly:
-
- >>> dset = grp["Dataset Name"]
-
-You can also open a dataset by passing the group and name directly to the
-constructor:
-
- >>> dset = Dataset(grp, "Dataset Name")
-
-No options can be specified when opening a dataset, as almost all properties
-of datasets are immutable.
-
-
-Creating a dataset
-==================
-
-There are two ways to explicitly create a dataset, with nearly identical
-syntax. The recommended procedure is to use a method on the Group object in
-which the dataset will be stored:
-
- >>> dset = grp.create_dataset("Dataset Name", ...options...)
-
-Or you can call the Dataset constructor. When providing more than just the
-group and name, the constructor will try to create a new dataset:
-
- >>> dset = Dataset(grp, "Dataset name", ...options...)
-
-Bear in mind that if an object of the same name already exists in the group,
-you will have to manually unlink it first:
-
- >>> "Dataset Name" in grp
- True
- >>> del grp["Dataset name"]
- >>> dset = grp.create_dataset("Dataset Name", ...options...)
-
-Logically, there are two ways to specify a dataset; you can tell HDF5 its
-shape and datatype explicitly, or you can provide an existing ndarray from
-which the shape, dtype and contents will be determined. The following options
-are used to communicate this information.
-
-
-Arguments and options
----------------------
-
-All options below can be given to either the Dataset constructor or the
-Group method create_dataset. They are listed in the order the arguments are
-taken for both methods. Default values are in *italics*.
-
-* **shape** = *None* or tuple(<dimensions>)
-
- A Numpy-style shape tuple giving the dataset dimensions. Required if
- option **data** isn't provided.
-
-* **dtype** = *None* or NumPy dtype
-
- A NumPy dtype, or anything from which a dtype can be determined.
- This sets the datatype. If this is omitted, the dataset will
- consist of single-precision floats, in native byte order ("=f4").
-
-* **data** = *None* or ndarray
-
- A NumPy array. The dataset shape and dtype will be determined from
- this array, and the dataset will be initialized to its contents.
- Required if option **shape** isn't provided.
-
-* **chunks** = *None* or tuple(<chunk dimensions>)
-
- Manually set the HDF5 chunk size.
-
- When using any of the following options like compression or error-
- detection, the dataset is stored in chunked format, as small atomic
- pieces of data on which the filters operate. These chunks are then
- indexed by B-trees. Ordinarily h5py will guess a chunk value. If you
- know what you're doing, you can override that value here.
-
-* **compression** = *None* or int(0-9)
-
- Enable the use of GZIP compression, at the given integer level. The
- dataset will be stored in chunked format.
-
-* **shuffle** = True / *False*
-
- Enable the shuffle filter, possibly increasing the GZIP compression
- ratio. The dataset will be stored in chunked format.
-
-* **fletcher32** = True / *False*
-
- Enable Fletcher32 error-detection. The dataset will be stored in
- chunked format.
-
-* **maxshape** = *None* or tuple(<dimensions>)
-
- If provided, the dataset will be stored in a chunked and extendible fashion.
- The value provided should be a tuple of integers indicating the maximum
- size of each axis. You can provide a value of "None" for any axis to
- indicate that the maximum size of that dimension is unlimited.
-
-Automatic creation
-------------------
-
-If you've already got a NumPy array you want to store, you can let h5py guess
-these options for you. Simply assign the array to a Group entry:
-
- >>> arr = numpy.ones((100,100), dtype='=f8')
- >>> my_group["MyDataset"] = arr
-
-The object you provide doesn't even have to be an ndarray; if it isn't, h5py
-will create an intermediate NumPy representation before storing it.
-The resulting dataset is stored contiguously, with no compression or chunking.
-
-.. note::
- Arrays are auto-created using the NumPy ``asarray`` function. This means
- that if you try to create a dataset from a string, you'll get a *scalar*
- dataset containing the string itself! To get a char array, pass in
- something like ``numpy.fromstring(mystring, '|S1')`` instead.
-
-
-Data Access and Slicing
-=======================
-
-A subset of the NumPy indexing techniques is supported, including the
-traditional extended-slice syntax, named-field access, and boolean arrays.
-Discrete coordinate selection are also supported via an special indexer class.
-
-Properties
-----------
-
-Like Numpy arrays, Dataset objects have attributes named "shape" and "dtype":
-
- >>> dset.dtype
- dtype('complex64')
- >>> dset.shape
- (4L, 5L)
-
-Slicing access
---------------
-
-The best way to get at data is to use the traditional NumPy extended-slicing
-syntax. Slice specifications are translated directly to HDF5 *hyperslab*
-selections, and are are a fast and efficient way to access data in the file.
-The following slicing arguments are recognized:
-
- * Numbers: anything that can be converted to a Python long
- * Slice objects: please note negative values are not allowed
- * Field names, in the case of compound data
- * At most one ``Ellipsis`` (``...``) object
-
-Here are a few examples (output omitted)
-
- >>> dset = f.create_dataset("MyDataset", data=numpy.ones((10,10,10),'=f8'))
- >>> dset[0,0,0]
- >>> dset[0,2:10,1:9:3]
- >>> dset[0,...]
- >>> dset[:,::2,5]
-
-Simple array broadcasting is also supported:
-
- >>> dset[0] # Equivalent to dset[0,...]
-
-For compound data, you can specify multiple field names alongside the
-numeric slices:
-
- >>> dset["FieldA"]
- >>> dset[0,:,4:5, "FieldA", "FieldB"]
- >>> dset[0, ..., "FieldC"]
-
-Advanced indexing
------------------
-
-Boolean "mask" arrays can also be used to specify a selection. The result of
-this operation is a 1-D array with elements arranged in the standard NumPy
-(C-style) order:
-
- >>> arr = numpy.random.random((10,10))
- >>> dset = f.create_dataset("MyDataset", data=arr)
- >>> result = dset[arr > 0.5]
-
-If you have a set of discrete points you want to access, you may not want to go
-through the overhead of creating a boolean mask. This is especially the case
-for large datasets, where even a byte-valued mask may not fit in memory. You
-can pass a list of points to the dataset selector via a custom "CoordsList"
-instance:
-
- >>> mycoords = [ (0,0), (3,4), (7,8), (3,5), (4,5) ]
- >>> coords_list = CoordsList(mycoords)
- >>> result = dset[coords_list]
-
-Like boolean-array indexing, the result is a 1-D array. The order in which
-points are selected is preserved.
-
-.. note::
- These two techniques rely on an HDF5 construct which explicitly enumerates the
- points to be selected. It's very flexible but most appropriate for
- reasonably-sized (or sparse) selections. The coordinate list takes at
- least 8*<rank> bytes per point, and may need to be internally copied. For
- example, it takes 40MB to express a 1-million point selection on a rank-3
- array. Be careful, especially with boolean masks.
-
-Value attribute and scalar datasets
------------------------------------
-
-HDF5 allows you to store "scalar" datasets. These have the shape "()". You
-can use the syntax ``dset[...]`` to recover the value as an 0-dimensional
-array. Also, the special attribute ``value`` will return a scalar for an 0-dim
-array, and a full n-dimensional array for all other cases:
-
- >>> f["ArrayDS"] = numpy.ones((2,2))
- >>> f["ScalarDS"] = 1.0
- >>> f["ArrayDS"].value
- array([[ 1., 1.],
- [ 1., 1.]])
- >>> f["ScalarDS"].value
- 1.0
-
-Extending Datasets
-------------------
-
-If the dataset is created with the *maxshape* option set, you can later expand
-its size. Simply call the *extend* method:
-
- >>> dset = f.create_dataset("MyDataset", (5,5), maxshape=(None,None))
- >>> dset.shape
- (5, 5)
- >>> dset.extend((15,20))
- >>> dset.shape
- (15, 20)
-
-More on Datatypes
-=================
-
-Storing compound data
----------------------
-
-You can store "compound" data (struct-like, using named fields) using the Numpy
-facility for compound data types. For example, suppose we have data that takes
-the form of (temperature, voltage) pairs::
-
- >>> import numpy
- >>> mydtype = numpy.dtype([('temp','=f4'),('voltage','=f8')])
- >>> dset = f.create_dataset("MyDataset", (20,30), mydtype)
- >>> dset
- Dataset "MyDataset": (20L, 30L) dtype([('temp', '<f4'), ('voltage', '<f8')])
-
-These types may contain any supported type, and be arbitrarily nested.
-
-.. _supported:
-
-Supported types
------------------
-
-The HDF5 type system is mostly a superset of its NumPy equivalent. The
-following are the NumPy types currently supported by the interface:
-
- ======== ========== ========== ===============================
- Datatype NumPy kind HDF5 class Notes
- ======== ========== ========== ===============================
- Integer i, u INTEGER
- Float f FLOAT
- Complex c COMPOUND Stored as an HDF5 struct
- Array V ARRAY NumPy array with "subdtype"
- Opaque V OPAQUE Stored as HDF5 fixed-length opaque
- Compound V COMPOUND May be arbitarily nested
- String S STRING Stored as HDF5 fixed-length C-style strings
- ======== ========== ========== ===============================
-
-Byte order is always preserved. The following additional features are known
-not to be supported:
-
- * Read/write HDF5 variable-length (VLEN) data
-
- No obvious way exists to handle variable-length data in NumPy.
-
- * NumPy object types (dtype "O")
-
- This could potentially be solved by pickling, but requires low-level
- VLEN infrastructure.
-
- * HDF5 enums
-
- There's no NumPy dtype support for enums. Enum data is read as plain
- integer data. However, the low-level conversion routine
- ``h5t.py_create`` can create an HDF5 enum from a integer dtype and a
- dictionary of names.
-
- * HDF5 "time" datatype
-
- This datatype is deprecated, and has no close NumPy equivalent.
-
-
-
-
-
-
-
-
-
-
diff --git a/docs/source/guide/hl.rst b/docs/source/guide/hl.rst
index f60c832..3feb871 100644
--- a/docs/source/guide/hl.rst
+++ b/docs/source/guide/hl.rst
@@ -1,16 +1,180 @@
-*********
-Reference
-*********
+*************
+Documentation
+*************
.. module:: h5py.highlevel
+The high-level interface is the most convenient method to talk to HDF5. There
+are three main abstractions: files, groups, and datasets. Each is documented
+separately below.
+
+You may want to read the :ref:`quick start guide <quick>` to get a general
+overview.
+
+Everything useful in this module is automatically exported to the `h5py`
+package namespace; you can do::
+
+ >>> from h5py import * # or from h5py import File, etc.
+
+General info
+============
+
+Paths in HDF5
+-------------
+
+HDF5 files are organized like a filesystem. :class:`Group` objects work like
+directories; they can contain other groups, and :class:`Dataset` objects. Like
+a POSIX filesystem, objects are specified by ``/``-separated names, with the
+root group ``/`` (represented by the :class:`File` class) at the base.
+
+Wherever a name or path is called for, it may be relative or absolute.
+Constructs like ``..`` (parent group) are allowed.
+
+Metadata
+--------
+
+Every object in HDF5 supports metadata in the form of "attributes", which are
+small, named bits of data. :class:`Group`, :class:`Dataset` and even
+:class:`File` objects each carry a dictionary-like object which exposes this
+behavior, named ``<obj>.attrs``. This is the correct way to store metadata
+in HDF5 files.
+
+Library configuration
+---------------------
+
+A few library options are available to change the behavior of the library.
+You can get a reference to the global library configuration object via the
+function ``h5py.get_config()``. This object supports the following attributes:
+
+ **complex_names**
+ Set to a 2-tuple of strings (real, imag) to control how complex numbers
+ are saved. The default is ('r','i').
+
+Threading
+---------
+
+H5py is now always thread-safe.
+
+
+Files
+=====
+
+To open an HDF5 file, just instantiate the File object directly::
+
+ >>> from h5py import File # or import *
+ >>> file_obj = File('myfile.hdf5','r')
+
+Valid modes (like Python's file() modes) are:
+
+ === ================================================
+ r Readonly, file must exist
+ r+ Read/write, file must exist
+ w Create file, truncate if exists
+ w- Create file, fail if exists
+ a Read/write if exists, create otherwise (default)
+ === ================================================
+
+Like Python files, you must close the file when done::
+
+ >>> file_obj.close()
+
+File objects can also be used as "context managers" along with the new Python
+``with`` statement. When used in a ``with`` block, they will be closed at
+the end of the block regardless of what exceptions have been raised::
+
+ >>> with File('myfile.hdf5', 'r') as file_obj:
+ ... # do stuff with file_obj
+ ...
+ >>> # file_obj is closed at end of block
+
+.. note::
+
+ In addition to the methods and properties listed below, File objects also
+ have all the methods and properties of :class:`Group` objects. In this
+ case the group in question is the HDF5 *root group* (``/``).
+
+Reference
+---------
+
+.. class:: File
+
+ Represents an HDF5 file on disk.
+
+ .. attribute:: name
+
+ HDF5 filename
+
+ .. attribute:: mode
+
+ Mode used to open file
+
+ .. method:: __init__(name, mode='a')
+
+ Open or create an HDF5 file.
+
+ .. method:: close()
+
+ Close the file. You MUST do this before quitting Python or data may
+ be lost.
+
+ .. method:: flush()
+
+ Ask the HDF5 library to flush its buffers for this file.
+
+
Groups
======
+Groups are the container mechanism by which HDF5 files are organized. From
+a Python perspective, they operate somewhat like dictionaries. In this case
+the "keys" are the names of group entries, and the "values" are the entries
+themselves (:class:`Group` and :class:`Dataset`) objects. Objects are
+retrieved from the file using the standard indexing notation::
+
+ >>> file_obj = File('myfile.hdf5')
+ >>> subgroup = file_obj['/subgroup']
+ >>> dset = subgroup['MyDataset'] # full name /subgroup/Mydataset
+
+Objects can be deleted from the file using the standard syntax::
+
+ >>> del subgroup["MyDataset"]
+
+However, new groups and datasets should generally be created using method calls
+like :meth:`create_group <Group.create_group>` or
+:meth:`create_dataset <Group.create_dataset>`.
+Assigning a name to an existing Group or Dataset
+(e.g. ``group['name'] = another_group``) will create a new link in the file
+pointing to that object. Assigning dtypes and NumPy arrays results in
+different behavior; see :meth:`Group.__setitem__` for details.
+
+In addition, the following behavior approximates the Python dictionary API:
+
+ - Container syntax (``if name in group``)
+ - Iteration yields member names (``for name in group``)
+ - Length (``len(group)``)
+ - :meth:`listnames <Group.listnames>`
+ - :meth:`iternames <Group.iternames>`
+ - :meth:`listobjects <Group.listobjects>`
+ - :meth:`iterobjects <Group.iterobjects>`
+ - :meth:`listitems <Group.listitems>`
+ - :meth:`iteritems <Group.iteritems>`
+
+Reference
+---------
+
.. class:: Group
- .. method:: __getitem__(name)
+ .. attribute:: name
+
+ Full name of this group in the file (e.g. ``/grp/thisgroup``)
+
+ .. attribute:: attrs
+
+ Dictionary-like object which provides access to this group's
+ HDF5 attributes. See :ref:`attributes` for details.
+
+ .. method:: __getitem__(name) -> Group or Dataset
Open an object in this group.
@@ -20,19 +184,20 @@ Groups
The action taken depends on the type of object assigned:
- Named HDF5 object (Dataset, Group, Datatype)
+ **Named HDF5 object** (Dataset, Group, Datatype)
A hard link is created in this group which points to the
given object.
- Numpy ndarray
+ **Numpy ndarray**
The array is converted to a dataset object, with default
settings (contiguous storage, etc.). See :meth:`create_dataset`
for a more flexible way to do this.
- Numpy dtype
- Commit a copy of the datatype as a named datatype in the file.
+ **Numpy dtype**
+ Commit a copy of the datatype as a
+ :ref:`named datatype <named_types>` in the file.
- Anything else
+ **Anything else**
Attempt to convert it to an ndarray and store it. Scalar
values are stored as scalar datasets. Raise ValueError if we
can't understand the resulting array dtype.
@@ -44,18 +209,6 @@ Groups
Remove (unlink) this member.
- .. method:: __len__
-
- Number of group members
-
- .. method:: __iter__
-
- Yields the names of group members
-
- .. method:: __contains__(name)
-
- See if the given name is in this group.
-
.. method:: create_group(name) -> Group
Create a new HDF5 group.
@@ -137,7 +290,9 @@ Groups
Destination. Must be either Group or path. If a Group object, it may
be in a different file.
- .. method:: visit(func)
+ **Only available with HDF5 1.8.X**
+
+ .. method:: visit(func) -> None or return value from func
Recursively iterate a callable over objects in this group.
@@ -145,7 +300,7 @@ Groups
will be called exactly once for each link in this group and every
group below it. Your callable must conform to the signature::
- func(<member name>) => <None or return value>
+ func(<member name>) -> <None or return value>
Returning None continues iteration, returning anything else stops
and immediately returns that value from the visit method. No
@@ -160,7 +315,7 @@ Groups
**Only available with HDF5 1.8.X.**
- .. method:: visititems(func)
+ .. method:: visititems(func) -> None or return value from func
Recursively visit names and objects in this group and subgroups.
@@ -168,7 +323,7 @@ Groups
will be called exactly once for each link in this group and every
group below it. Your callable must conform to the signature::
- func(<member name>, <object>) => <None or return value>
+ func(<member name>, <object>) -> <None or return value>
Returning None continues iteration, returning anything else stops
and immediately returns that value from the visit method. No
@@ -187,6 +342,18 @@ Groups
**Only available with HDF5 1.8.X.**
+ .. method:: __len__
+
+ Number of group members
+
+ .. method:: __iter__
+
+ Yields the names of group members
+
+ .. method:: __contains__(name)
+
+ See if the given name is in this group.
+
.. method:: listnames
Get a list of member names
@@ -212,9 +379,343 @@ Groups
Get an iterator over (name, object) pairs for the members of this group.
+Datasets
+========
+
+Datasets are where most of the information in an HDF5 file resides. Like
+NumPy arrays, they are homogenous collections of data elements, with an
+immutable datatype and (hyper)rectangular shape. Unlike NumPy arrays, they
+support a variety of transparent storage features such as compression,
+error-detection, and chunked I/O.
+
+Metadata can be associated with an HDF5 dataset in the form of an "attribute".
+It's recommended that you use this scheme for any small bits of information
+you want to associate with the dataset. For example, a descriptive title,
+digitizer settings, or data collection time are appropriate things to store
+as HDF5 attributes.
+
+Datasets are created using either :meth:`Group.create_dataset` or
+:meth:`Group.require_dataset`. Existing datasets should be retrieved using
+the group indexing syntax (``dset = group["name"]``). Calling the constructor
+directly is not recommended.
+
+A subset of the NumPy indexing techniques is supported, including the
+traditional extended-slice syntax, named-field access, and boolean arrays.
+Discrete coordinate selection are also supported via an special indexer class.
+
+Properties
+----------
+
+Like Numpy arrays, Dataset objects have attributes named "shape" and "dtype":
+
+ >>> dset.dtype
+ dtype('complex64')
+ >>> dset.shape
+ (4L, 5L)
+
+.. _slicing_access:
+
+Slicing access
+--------------
+
+The best way to get at data is to use the traditional NumPy extended-slicing
+syntax. Slice specifications are translated directly to HDF5 *hyperslab*
+selections, and are are a fast and efficient way to access data in the file.
+The following slicing arguments are recognized:
+
+ * Numbers: anything that can be converted to a Python long
+ * Slice objects: please note negative values are not allowed
+ * Field names, in the case of compound data
+ * At most one ``Ellipsis`` (``...``) object
+
+Here are a few examples (output omitted)
+
+ >>> dset = f.create_dataset("MyDataset", data=numpy.ones((10,10,10),'=f8'))
+ >>> dset[0,0,0]
+ >>> dset[0,2:10,1:9:3]
+ >>> dset[0,...]
+ >>> dset[:,::2,5]
+
+Simple array broadcasting is also supported:
+
+ >>> dset[0] # Equivalent to dset[0,...]
+
+For compound data, you can specify multiple field names alongside the
+numeric slices:
+
+ >>> dset["FieldA"]
+ >>> dset[0,:,4:5, "FieldA", "FieldB"]
+ >>> dset[0, ..., "FieldC"]
+
+Advanced indexing
+-----------------
+
+Boolean "mask" arrays can also be used to specify a selection. The result of
+this operation is a 1-D array with elements arranged in the standard NumPy
+(C-style) order:
+
+ >>> arr = numpy.random.random((10,10))
+ >>> dset = f.create_dataset("MyDataset", data=arr)
+ >>> result = dset[arr > 0.5]
+
+If you have a set of discrete points you want to access, you may not want to go
+through the overhead of creating a boolean mask. This is especially the case
+for large datasets, where even a byte-valued mask may not fit in memory. You
+can pass a list of points to the dataset selector via a custom "CoordsList"
+instance:
+
+ >>> mycoords = [ (0,0), (3,4), (7,8), (3,5), (4,5) ]
+ >>> coords_list = CoordsList(mycoords)
+ >>> result = dset[coords_list]
+
+Like boolean-array indexing, the result is a 1-D array. The order in which
+points are selected is preserved.
+
+.. note::
+ Boolean-mask and CoordsList indexing rely on an HDF5 construct which
+ explicitly enumerates the points to be selected. It's very flexible but
+ most appropriate for
+ reasonably-sized (or sparse) selections. The coordinate list takes at
+ least 8*<rank> bytes per point, and may need to be internally copied. For
+ example, it takes 40MB to express a 1-million point selection on a rank-3
+ array. Be careful, especially with boolean masks.
+
+Value attribute and scalar datasets
+-----------------------------------
+
+HDF5 allows you to store "scalar" datasets. These have the shape "()". You
+can use the syntax ``dset[...]`` to recover the value as an 0-dimensional
+array. Also, the special attribute ``value`` will return a scalar for an 0-dim
+array, and a full n-dimensional array for all other cases:
+
+ >>> f["ArrayDS"] = numpy.ones((2,2))
+ >>> f["ScalarDS"] = 1.0
+ >>> f["ArrayDS"].value
+ array([[ 1., 1.],
+ [ 1., 1.]])
+ >>> f["ScalarDS"].value
+ 1.0
+
+Extending Datasets
+------------------
+
+If the dataset is created with the *maxshape* option set, you can later expand
+its size. Simply call the *extend* method:
+
+ >>> dset = f.create_dataset("MyDataset", (5,5), maxshape=(None,None))
+ >>> dset.shape
+ (5, 5)
+ >>> dset.extend((15,20))
+ >>> dset.shape
+ (15, 20)
+
+Length and iteration
+--------------------
+
+As with NumPy arrays, the ``len()`` of a dataset is the length of the first
+axis. Since Python's ``len`` is limited by the size of a C long, it's
+recommended you use the syntax ``dataset.len()`` instead of ``len(dataset)``
+on 32-bit platforms, if you expect the length of the first row to exceed 2**32.
+
+Iterating over a dataset iterates over the first axis. As with NumPy arrays,
+mutating the yielded data has no effect.
+
+Reference
+---------
+
+.. class:: Dataset
+
+ Represents an HDF5 dataset. All properties are read-only.
+
+ .. attribute:: name
+
+ Full name of this dataset in the file (e.g. ``/grp/MyDataset``)
+
+ .. attribute:: attrs
+
+ Provides access to HDF5 attributes; see :ref:`attributes`.
+
+ .. attribute:: shape
+
+ Numpy-style shape tuple with dataset dimensions
+
+ .. attribute:: dtype
+
+ Numpy dtype object representing the dataset type
+
+ .. attribute:: value
+
+ Special read-only property; for a regular dataset, it's equivalent to
+ dset[:] (an ndarray with all points), but for a scalar dataset, it's
+ a NumPy scalar instead of an 0-dimensional ndarray.
+
+ .. attribute:: chunks
+
+ Dataset chunk size, or None if chunked layout isn't used.
+
+ .. attribute:: compression
+
+ GZIP compression level, or None if compression isn't used.
+
+ .. attribute:: shuffle
+
+ Is the shuffle filter being used? (T/F)
+
+ .. attribute:: fletcher32
+
+ Is the fletcher32 filter (error detection) being used? (T/F)
+
+ .. attribute:: maxshape
+
+ Maximum allowed size of the dataset, as specified when it was created.
+
+ .. method:: __getitem__(*args) -> NumPy ndarray
+
+ Read a slice from the dataset. See :ref:`slicing_access`.
+
+ .. method:: __setitem__(*args, val)
+
+ Write to the dataset. See :ref:`slicing_access`.
+
+ .. method:: extend(shape)
+
+ Expand the size of the dataset to this new shape. Must be compatible
+ with the *maxshape* as specified when the dataset was created.
+
+ .. method:: __len__
+
+ The length of the first axis in the dataset (TypeError if scalar).
+ This **does not work** on 32-bit platforms, if the axis in question
+ is larger than 2^32. Use :meth:`len` instead.
+
+ .. method:: len()
+
+ The length of the first axis in the dataset (TypeError if scalar).
+ Works on all platforms.
+
+ .. method:: __iter__
+
+ Iterate over rows (first axis) in the dataset. TypeError if scalar.
+
+
+.. _attributes:
+
+Attributes
+==========
+
+Groups and datasets can have small bits of named information attached to them.
+This is the official way to store metadata in HDF5. Each of these objects
+has a small proxy object (:class:`AttributeManager`) attached to it as
+``<obj>.attrs``. This dictionary-like object works like a :class:`Group`
+object, with the following differences:
+
+ - Entries may only be scalars and NumPy arrays
+ - Each attribute must be small (recommended < 64k for HDF5 1.6)
+ - No partial I/O (i.e. slicing) is allowed for arrays
+
+They support the same dictionary API as groups, including the following:
+
+ - Container syntax (``if name in obj.attrs``)
+ - Iteration yields member names (``for name in obj.attrs``)
+ - Number of attributes (``len(obj.attrs)``)
+ - :meth:`listnames <AttributeManager.listnames>`
+ - :meth:`iternames <AttributeManager.iternames>`
+ - :meth:`listobjects <AttributeManager.listobjects>`
+ - :meth:`iterobjects <AttributeManager.iterobjects>`
+ - :meth:`listitems <AttributeManager.listitems>`
+ - :meth:`iteritems <AttributeManager.iteritems>`
+
+Reference
+---------
+
+.. class:: AttributeManager
+
+ .. method:: __getitem__(name) -> NumPy scalar or ndarray
+
+ Retrieve an attribute given a string name.
+
+ .. method:: __setitem__(name, value)
+
+ Set an attribute. Value must be convertible to a NumPy scalar
+ or array.
+
+ .. method:: __delitem__(name)
+
+ Delete an attribute.
+
+ .. method:: __len__
+
+ Number of attributes
+
+ .. method:: __iter__
+
+ Yields the names of attributes
+
+ .. method:: __contains__(name)
+
+ See if the given attribute is present
+
+ .. method:: listnames
+
+ Get a list of attribute names
+
+ .. method:: iternames
+
+ Get an iterator over attribute names
+
+ .. method:: listobjects
+
+ Get a list with all attribute values
+
+ .. method:: iterobjects
+
+ Get an iterator over attribute values
+
+ .. method:: listitems
+
+ Get an list of (name, value) pairs for all attributes.
+
+ .. method:: iteritems
+
+ Get an iterator over (name, value) pairs
+
+.. _named_types:
+
+Named types
+===========
+
+There is one last kind of object stored in an HDF5 file. You can store
+datatypes (not associated with any dataset) in a group, simply by assigning
+a NumPy dtype to a name::
+
+ >>> group["name"] = numpy.dtype("<f8")
+
+and to get it back::
+
+ >>> named_type = group["name"]
+ >>> mytype = named_type.dtype
+
+Objects of this class are immutable and have no methods, just read-only
+properties.
+
+Reference
+---------
+
+.. class:: Datatype
+
+ .. attribute:: name
+
+ Full name of this object in the HDF5 file (e.g. ``/grp/MyType``)
+
+ .. attribute:: attrs
+
+ Attributes of this object (see :ref:`attributes section <attributes>`)
+ .. attribute:: dtype
+ NumPy dtype representation of this type
+
diff --git a/docs/source/guide/index.rst b/docs/source/guide/index.rst
index fbd9413..6ed0f3a 100644
--- a/docs/source/guide/index.rst
+++ b/docs/source/guide/index.rst
@@ -10,9 +10,7 @@ User Guide
build
quick
- datasets
hl
- threads
licenses
diff --git a/docs/source/guide/quick.rst b/docs/source/guide/quick.rst
index 0ce2108..17b9e1c 100644
--- a/docs/source/guide/quick.rst
+++ b/docs/source/guide/quick.rst
@@ -1,3 +1,5 @@
+.. _quick:
+
*****************
Quick Start Guide
*****************
diff --git a/docs/source/guide/threads.rst b/docs/source/guide/threads.rst
deleted file mode 100644
index 9cb8352..0000000
--- a/docs/source/guide/threads.rst
+++ /dev/null
@@ -1,89 +0,0 @@
-*********
-Threading
-*********
-
-Threading is an issue in h5py because HDF5 doesn't support thread-level
-concurrency. Some versions of HDF5 are not even thread-safe. The package
-tries to hide as much of these problems as possible using a combination of
-the GIL and Python-side reentrant locks.
-
-High-level
-----------
-
-The objects in h5py.highlevel (File, Dataset, etc) are always thread-safe. You
-don't need to do any explicit locking, regardless of how the library is
-configured.
-
-Low-level
----------
-
-The low-level API (h5py.h5*) is also thread-safe, unless you use the
-experimental non-blocking option to compile h5py. Then, and only then, you
-must acquire a global lock before calling into the low-level API. This lock
-is available on the global configuration object at "h5py.config.lock". The
-decorator "h5sync" in h5py.extras can wrap functions to do this automatically.
-
-
-Non-Blocking Routines
----------------------
-
-By default, all low-level HDF5 routines will lock the entire interpreter
-until they complete, even in the case of lengthy I/O operations. This is
-unnecessarily restrictive, as it means even non-HDF5 threads cannot execute
-while a lengthy HDF5 read or write is in progress.
-
-When the package is compiled with the option ``--io-nonblock``, a few C methods
-involving I/O will release the global interpreter lock. These methods always
-acquire the global HDF5 lock before yielding control to other threads. While
-another thread seeking to acquire the HDF5 lock will block until the write
-completes, other Python threads (GUIs, pure computation threads, etc) will
-execute in a normal fashion.
-
-However, this defeats the thread safety provided by the GIL. If another thread
-skips acquiring the HDF5 lock and blindly calls a low-level HDF5 routine while
-such I/O is in progress, the results are undefined. In the worst case,
-irreversible data corruption and/or a crash of the interpreter is possible.
-
-**You must acquire the global lock (h5py.config.lock) when all the following
-are true:**
-
- 1. You are using the low-level (h5py.h5*) API
- 2. More than one thread is performing HDF5 operations
- 3. Non-blocking I/O (``--io-nonblock``) is enabled
-
-This is not an issue for the h5py.highlevel components (Dataset, Group,
-File objects, etc.) as they acquire the lock automatically.
-
-The following operations will release the GIL during I/O:
-
- * DatasetID.read
- * DatasetID.write
-
-
-Customizing the lock type
--------------------------
-
-Applications that use h5py may have their own threading systems. Since the
-h5py locks are acquired and released alongside application code, you can
-set the type of lock used internally by h5py. The lock is stored as settable
-property "h5py.config.lock" and should be a lock instance (not a constructor)
-which provides the following methods:
-
- * __enter__
- * __exit__
- * acquire
- * release
-
-The default lock type is the native Python threading.RLock, but h5py makes no
-assumptions about the behavior or implementation of locks beyond reentrance and
-the existence of the four required methods above.
-
-It remains to be seen whether this is even necessary. In future versions of
-h5py, this attribute may disappear or become non-writable.
-
-
-
-
-
-
-
diff --git a/h5py/highlevel.py b/h5py/highlevel.py
index 947731c..fb9cbc2 100644
--- a/h5py/highlevel.py
+++ b/h5py/highlevel.py
@@ -73,7 +73,7 @@ class LockableObject(object):
Base class which provides rudimentary locking support.
"""
- _lock = h5.get_phil()
+ _lock = threading.RLock()
class HLObject(LockableObject):
diff --git a/setup.py b/setup.py
index 0a4c560..7c5d8c7 100644
--- a/setup.py
+++ b/setup.py
@@ -48,7 +48,7 @@ from distutils.cmd import Command
# Basic package options
NAME = 'h5py'
-VERSION = '0.4.0'
+VERSION = '1.0.0'
MIN_NUMPY = '1.0.3'
MIN_CYTHON = '0.9.8.1.1'
SRC_PATH = 'h5py' # Name of directory with .pyx files
@@ -60,6 +60,13 @@ MODULES = {16: ['h5', 'h5f', 'h5g', 'h5s', 'h5t', 'h5d', 'h5a', 'h5p', 'h5z',
18: ['h5', 'h5f', 'h5g', 'h5s', 'h5t', 'h5d', 'h5a', 'h5p', 'h5z',
'h5i', 'h5r', 'h5fd', 'utils', 'h5o', 'h5l']}
+def version_check(vers, required):
+
+ def _tpl(istr):
+ return tuple(int(x) for x in istr.split('.'))
+
+ return _tpl(vers) >= _tpl(required)
+
def fatal(instring, code=1):
print >> sys.stderr, "Fatal: "+instring
exit(code)
@@ -274,8 +281,8 @@ DEF H5PY_THREADS = %(THREADS)d # Enable thread-safety and non-blocking reads
except ImportError:
fatal("Cython recompilation required, but Cython >=%s not installed." % MIN_CYTHON)
- if Version.version < MIN_CYTHON:
- fatal("Old Cython version detected; at least %s required" % MIN_CYTHON)
+ if not version_check(Version.version, MIN_CYTHON):
+ fatal("Old Cython %s version detected; at least %s required" % (Version.version, MIN_CYTHON))
print "Running Cython (%s)..." % Version.version
print " API level: %d" % self.api
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/h5py.git
More information about the debian-science-commits
mailing list