[h5py] 286/455: Major docs update
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Thu Jul 2 18:19:42 UTC 2015
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a commit to annotated tag 1.3.0
in repository h5py.
commit 4ce13ecf8ca19f0704fea982a8daac7857bb3dcf
Author: andrewcollette <andrew.collette at gmail.com>
Date: Thu Jun 18 23:22:06 2009 +0000
Major docs update
---
docs/source/api/low/h5.rst | 13 --------
docs/source/conf.py | 4 +--
docs/source/guide/attr.rst | 7 ++--
docs/source/guide/build.rst | 22 +++++++++----
docs/source/guide/dataset.rst | 73 +++++++++++++++++++++++++++--------------
docs/source/guide/file.rst | 21 +++++++++---
docs/source/guide/group.rst | 53 ++++++++++++++++++++++++------
docs/source/guide/quick.rst | 76 +++++++++++++++++--------------------------
docs/source/guide/vl.rst | 64 +++++++++++++++++++++++++++---------
setup.py | 2 +-
10 files changed, 208 insertions(+), 127 deletions(-)
diff --git a/docs/source/api/low/h5.rst b/docs/source/api/low/h5.rst
index 4d0c430..30e0dd1 100644
--- a/docs/source/api/low/h5.rst
+++ b/docs/source/api/low/h5.rst
@@ -21,19 +21,6 @@ Base classes for library
.. autoclass:: PHIL
-Error handling routines
------------------------
-
-Keep in mind that errors are already handled by Python exceptions. These
-functions exist for low-level inspection of the HDF5 error stack.
-
-.. autofunction:: error_stack
-.. autofunction:: error_string
-.. autofunction:: clear
-
-.. autoclass:: ErrorStackElement
- :members:
-
Module constants
----------------
diff --git a/docs/source/conf.py b/docs/source/conf.py
index a2d245a..b6c0974 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -48,7 +48,7 @@ copyright = '2008, Andrew Collette'
# The short X.Y version.
version = '1.2'
# The full version, including alpha/beta/rc tags.
-release = '1.2.0-beta'
+release = '1.2.0'
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
@@ -91,7 +91,7 @@ html_style = 'h5py.css'
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
-html_title = "HDF5 for Python 1.2 BETA"
+html_title = "HDF5 for Python 1.2"
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
diff --git a/docs/source/guide/attr.rst b/docs/source/guide/attr.rst
index db9ae5e..b07c6c8 100644
--- a/docs/source/guide/attr.rst
+++ b/docs/source/guide/attr.rst
@@ -7,12 +7,11 @@ Attributes
Groups and datasets can have small bits of named information attached to them.
This is the official way to store metadata in HDF5. Each of these objects
has a small proxy object (:class:`AttributeManager`) attached to it as
-``<obj>.attrs``. This dictionary-like object works like a :class:`Group`
-object, with the following differences:
+``<obj>.attrs``. Attributes have the following properties:
-- Entries may only be scalars and NumPy arrays
+- They may be created from any scalar or NumPy array
- Each attribute must be small (recommended < 64k for HDF5 1.6)
-- No partial I/O (i.e. slicing) is allowed for arrays
+- There is no partial I/O (i.e. slicing); the entire attribute must be read.
They support the same dictionary API as groups.
diff --git a/docs/source/guide/build.rst b/docs/source/guide/build.rst
index c18dbac..06057b2 100644
--- a/docs/source/guide/build.rst
+++ b/docs/source/guide/build.rst
@@ -12,6 +12,7 @@ Tar files are available for UNIX-like systems (Linux and Mac OS-X), and
a binary installer for Windows which includes HDF5 1.8. As of version 1.1,
h5py can also be installed via easy_install.
+
Getting HDF5
============
@@ -122,13 +123,22 @@ The standard command::
will clean up all temporary files, including the output of ``configure``.
-Problems
-========
+Testing
+=======
+
+Running unit tests can help diagnose problems unique to your platform or
+software configuration. For the Unix version of h5py, running the command:
+
+ $ python setup.py test
+
+before installing will run the h5py test suite. On both Unix and Windows
+platforms, the tests may be run after installation:
+
+ >>> import h5py.tests
+ >>> h5py.tests.runtests()
-If you have trouble installing or using h5py, first read the FAQ at
-http://h5py.googlecode.com for common issues. You are also welcome to
-open a new bug there, or email me directly at "h5py at alfven dot org".
-Enjoy!
+Please report any failing tests to "h5py at alfven dot org", or file an issue
+report at http://h5py.googlecode.com.
diff --git a/docs/source/guide/dataset.rst b/docs/source/guide/dataset.rst
index e9a6775..6ad52ed 100644
--- a/docs/source/guide/dataset.rst
+++ b/docs/source/guide/dataset.rst
@@ -50,7 +50,7 @@ Error-Detection
All versions of HDF5 include the *fletcher32* checksum filter, which enables
read-time error detection for datasets. If part of a dataset becomes
corrupted, a read operation on that section will immediately fail with
- H5Error.
+ an exception.
Resizing
When using HDF5 1.8,
@@ -90,15 +90,14 @@ The following slicing arguments are recognized:
Here are a few examples (output omitted)
- >>> dset = f.create_dataset("MyDataset", data=numpy.ones((10,10,10),'=f8'))
+ >>> dset = f.create_dataset("MyDataset", (10,10,10), 'f')
>>> dset[0,0,0]
>>> dset[0,2:10,1:9:3]
- >>> dset[0,...]
>>> dset[:,::2,5]
-
-Simple array broadcasting is also supported:
-
- >>> dset[0] # Equivalent to dset[0,...]
+ >>> dset[0]
+ >>> dset[1,5]
+ >>> dset[0,...]
+ >>> dset[...,6]
For compound data, you can specify multiple field names alongside the
numeric slices:
@@ -107,6 +106,27 @@ numeric slices:
>>> dset[0,:,4:5, "FieldA", "FieldB"]
>>> dset[0, ..., "FieldC"]
+Broadcasting
+------------
+
+For simple slicing, broadcasting is supported:
+
+ >>> dset[0,:,:] = np.arange(10) # Broadcasts to (10,10)
+
+Importantly, h5py does *not* use NumPy to do broadcasting before the write.
+Broadcasting is implemented using repeated hyperslab selections, and is
+safe to use with very large target selections. In the following example, a
+write from a (1000, 1000) array is broadcast to a (1000, 1000, 1000) target
+selection as a series of 1000 writes:
+
+ >>> dset2 = f.create_dataset("MyDataset", (1000,1000,1000), 'f')
+ >>> data = np.arange(1000*1000, dtype='f').reshape((1000,1000))
+ >>> dset2[:] = data # Does NOT allocate 3.8 G of memory
+
+Broadcasting is supported for "simple" (integer, slice and ellipsis) slicing
+only.
+
+
Coordinate lists
----------------
@@ -136,7 +156,7 @@ Sparse selection
Additional mechanisms exist for the case of scattered and/or sparse selection,
for which slab or row-based techniques may not be appropriate.
-Boolean "mask" arrays can be used to specify a selection. The result of
+NumPy boolean "mask" arrays can be used to specify a selection. The result of
this operation is a 1-D array with elements arranged in the standard NumPy
(C-style) order:
@@ -146,24 +166,33 @@ this operation is a 1-D array with elements arranged in the standard NumPy
>>> result.shape
(49,)
-Advanced selection
-------------------
+Additionally, the ``selections`` module contains additional classes which
+provide access to native HDF5 dataspace selection techniques. These include
+explicit point-based selection and hyperslab selections combined with logical
+operations (AND, OR, XOR, etc). Any instance of a ``selections.Selection``
+subclass can be used for indexing directly:
-The ``selections`` module contains additional classes which provide access to
-HDF5 dataspace selection techniques, including point-based selection. These
-are especially useful for read_direct and write_direct.
+ >>> dset = f.create_dataset("MyDS2", (100,100), 'i')
+ >>> dset[...] = np.arange(100*100).reshape((100,100))
+ >>> sel = h5py.selections.PointSelection((100,100))
+ >>> sel.append([(1,1), (57,82)])
+ >>> dset[sel]
+ array([ 101, 5782])
Length and iteration
--------------------
As with NumPy arrays, the ``len()`` of a dataset is the length of the first
-axis. Since Python's ``len`` is limited by the size of a C long, it's
-recommended you use the syntax ``dataset.len()`` instead of ``len(dataset)``
-on 32-bit platforms, if you expect the length of the first row to exceed 2**32.
+axis, and iterating over a dataset iterates over the first axis. However,
+modifications to the yielded data are not recorded in the file. Resizing a
+dataset while iterating has undefined results.
-Iterating over a dataset iterates over the first axis. However, modifications
-to the yielded data are not recorded in the file. Resizing a dataset while
-iterating has undefined results.
+.. note::
+
+ Since Python's ``len`` is limited by the size of a C long, it's
+ recommended you use the syntax ``dataset.len()`` instead of
+ ``len(dataset)`` on 32-bit platforms, if you expect the length of the
+ first row to exceed 2**32.
Reference
---------
@@ -196,12 +225,6 @@ Reference
Numpy dtype object representing the dataset type
- .. attribute:: value
-
- Special read-only property; for a regular dataset, it's equivalent to
- dset[:] (an ndarray with all points), but for a scalar dataset, it's
- a NumPy scalar instead of an 0-dimensional ndarray.
-
.. attribute:: chunks
Dataset chunk size, or None if chunked layout isn't used.
diff --git a/docs/source/guide/file.rst b/docs/source/guide/file.rst
index e8dc02e..2606c55 100644
--- a/docs/source/guide/file.rst
+++ b/docs/source/guide/file.rst
@@ -23,6 +23,8 @@ Valid modes are:
a Read/write if exists, create otherwise (default)
=== ================================================
+The file name may also be a Unicode string.
+
File drivers
------------
@@ -39,17 +41,19 @@ supported drivers are:
None
Use the standard HDF5 driver appropriate for the current platform.
On UNIX, this is the H5FD_SEC2 driver; on Windows, it is
- H5FD_WINDOWS.
+ H5FD_WINDOWS. This driver is almost always the best choice.
'sec2'
- Unbuffered, optimized I/O using standard POSIX functions.
+ Optimized I/O using standard POSIX functions. Default on UNIX platforms.
'stdio'
- Buffered I/O using functions from stdio.h.
+ I/O uses functions from stdio.h. This introduces an additional layer
+ of buffering between the HDF5 library and the filesystem.
'core'
- Memory-map the entire file; all operations are performed in
- memory and written back out when the file is closed. Keywords:
+ Creates a memory-resident file. With HDF5 1.8, you may specify an
+ existing file on disk. When the file is closed, by default it is
+ written back to disk with the given name. Keywords:
backing_store
If True (default), save changes to a real file
@@ -76,6 +80,13 @@ In addition to the properties and methods defined here, File objects inherit
the full API of Group objects; in this case, the group in question is the
*root group* (/) of the file.
+.. note::
+
+ Please note that unlike Python file objects, and h5py.File objects from
+ h5py 1.1, the attribute ``File.name`` does *not* refer to the file name
+ on disk. ``File.name`` gives the HDF5 name of the root group, "``/``". To
+ access the on-disk name, use ``File.filename``.
+
.. class:: File
Represents an HDF5 file on disk, and provides access to the root
diff --git a/docs/source/guide/group.rst b/docs/source/guide/group.rst
index cdccb7b..aa08204 100644
--- a/docs/source/guide/group.rst
+++ b/docs/source/guide/group.rst
@@ -2,15 +2,49 @@
Group Objects
=============
+Creating and using groups
+-------------------------
+
Groups are the container mechanism by which HDF5 files are organized. From
a Python perspective, they operate somewhat like dictionaries. In this case
the "keys" are the names of group entries, and the "values" are the entries
-themselves (:class:`Group` and :class:`Dataset`) objects. Objects are
-retrieved from the file using the standard indexing notation::
+themselves (:class:`Group` and :class:`Dataset`) objects.
+
+Group objects also contain most of the machinery which makes HDF5 useful.
+The :ref:`File object <hlfile>` does double duty as the HDF5 `root group`, and
+serves as your entry point into the file:
+
+ >>> f = h5py.File('foo.hdf5','w')
+ >>> f.name
+ '/'
+ >>> f.keys()
+ []
+
+New groups are easy to create:
+
+ >>> grp = f.create_group("bar")
+ >>> grp.name
+ '/bar'
+ >>> subgrp = grp.create_group("baz")
+ >>> subgrp.name
+ '/bar/baz'
+
+Datasets are also created by a Group method:
+
+ >>> dset = subgrp.create_dataset("MyDS", (100,100), dtype='i')
+ >>> dset.name
+ '/bar/baz/MyDS'
+
+Accessing objects
+-----------------
+
+Groups implement a subset of the Python dictionary convention. They have
+methods like ``keys()``, ``values()`` and support iteration. Most importantly,
+they support the indexing syntax, and standard exceptions:
- >>> file_obj = File('myfile.hdf5')
- >>> subgroup = file_obj['/subgroup']
- >>> dset = subgroup['MyDataset'] # full name /subgroup/Mydataset
+ >>> myds = subgrp["MyDS"]
+ >>> missing = subgrp["missing"]
+ KeyError: "Name doesn't exist (Symbol table: Object not found)"
Objects can be deleted from the file using the standard syntax::
@@ -74,8 +108,7 @@ Reference
for a more flexible way to do this.
**Numpy dtype**
- Commit a copy of the datatype as a
- :ref:`named datatype <named_types>` in the file.
+ Commit a copy of the datatype as a named type in the file.
**Anything else**
Attempt to convert it to an ndarray and store it. Scalar
@@ -93,13 +126,13 @@ Reference
Create a new HDF5 group.
- Fails with H5Error if the group already exists.
+ Fails with ValueError if the group already exists.
.. method:: require_group(name) -> Group
Open the specified HDF5 group, creating it if it doesn't exist.
- Fails with H5Error if an incompatible object (dataset or named type)
+ Fails with TypeError if an incompatible object (dataset or named type)
already exists.
.. method:: create_dataset(name, [shape, [dtype]], [data], **kwds) -> Dataset
@@ -175,7 +208,7 @@ Reference
creating a dataset; they are ignored for the comparison.
If an existing incompatible object (Group or Datatype) already exists
- with the given name, fails with H5Error.
+ with the given name, fails with ValueError.
.. method:: copy(source, dest, name=None)
diff --git a/docs/source/guide/quick.rst b/docs/source/guide/quick.rst
index 1508589..70d4236 100644
--- a/docs/source/guide/quick.rst
+++ b/docs/source/guide/quick.rst
@@ -47,31 +47,28 @@ efficient multidimensional indexing and nested compound datatypes.
One additional benefit of h5py is that the files it reads and writes are
"plain-vanilla" HDF5 files. No Python-specific metadata or features are used.
-You can read HDF5 files created by any application, and write files that any
-HDF5-aware application can understand.
+You can read files created by most HDF5 applications, and write files that
+any HDF5-aware application can understand.
Getting data into HDF5
======================
First, install h5py by following the :ref:`installation instructions <build>`.
-Since an example is worth a thousand words, here's how to create a new file,
-create a dataset, and store some data::
+Since an example is worth a thousand words, here's how to make a new file,
+and create an integer dataset inside it. The new dataset has shape (100, 100),
+is located in the file at "/MyDataset", and initialized to the value 42.
- import numpy as np
- import h5py
+ >>> import h5py
+ >>> f = h5py.File('myfile.hdf5')
+ >>> dset = f.create_dataset("MyDataset", (100, 100), 'i')
+ >>> dset[...] = 42
- mydata = np.arange(10).reshape((5,2))
+The :ref:`File <hlfile>` constructor accepts modes similar to Python file modes,
+including "r", "w", and "a" (the default):
- f = h5py.File('myfile.hdf5', 'w')
-
- dset = f.create_dataset("MyDataset", (10, 2), 'i')
-
- dset[0:5,:] = mydata
-
-
-The `File <hlfile>`_ constructor accepts modes similar to Python file modes,
-including "r", "w", and "a" (the default).
+ >>> f = h5py.File('file1.hdf5', 'w') # overwrite any existing file
+ >>> f = h5py.File('file2.hdf5', 'r') # open read-only
The dataset object ``dset`` here represents a new 2-d HDF5 dataset. Some
features will be familiar to NumPy users::
@@ -81,14 +78,14 @@ features will be familiar to NumPy users::
>>> dset.dtype
dtype('int32')
-If you already have a NumPy array you want to store, just hand it off to h5py::
-
- arr = numpy.ones((2,3), '=i4')
- dset = f.create_dataset('AnotherDataset', data=arr)
+You can even automatically create a dataset from an existing array:
-Additional features like transparent compression are also available::
+ >>> import numpy as np
+ >>> arr = np.ones((2,3), '=i4')
+ >>> dset = f.create_dataset('AnotherDataset', data=arr)
- dset2 = f.create_dataset("CompressedDatset", data=arr, compression='lzf')
+HDF5 datasets support many other features, like chunking and transparent
+compression.
Getting your data back
----------------------
@@ -150,22 +147,22 @@ POSIX-style paths::
Groups (including File objects; "f" in this example) support other
dictionary-like operations::
- >>> list(f) # iteration
+ >>> list(f)
['MyDataset', 'SubGroup']
- >>> 'MyDataset' in f # membership testing
+ >>> 'MyDataset' in f
True
- >>> 'Subgroup/MyOtherDataset' in f # even for arbitrary paths!
+ >>> 'Subgroup/MyOtherDataset' in f
True
- >>> del f['MyDataset'] # Delete (unlink) a group member
+ >>> del f['MyDataset']
As a safety feature, you can't create an object with a pre-existing name;
you have to manually delete the existing object first::
>>> grp = f.create_group("NewGroup")
- >>> grp2 = f.create_group("NewGroup") # wrong
- (H5Error raised)
+ >>> grp = f.create_group("NewGroup")
+ ValueError: Name already exists (Symbol table: Object already exists)
>>> del f['NewGroup']
- grp2 = f.create_group("NewGroup")
+ >>> grp = f.create_group("NewGroup")
This restriction reflects HDF5's lack of transactional support, and will not
change.
@@ -203,24 +200,11 @@ unlike group members, you can directly overwrite existing attributes:
>>> dset.attrs["Name"] = "New Name"
-Named datatypes
-===============
-
-There is in fact one additional, rarely-used kind of object which can be
-permanently stored in an HDF5 file. You can permanently store a *datatype*
-object in any group, simply by assigning a NumPy dtype to a name:
-
- >>> f["MyIntegerDatatype"] = numpy.dtype('<i8')
- >>> htype = f["MyIntegerDatatype"]
- >>> htype
- <HDF5 named type "MyIntegerDatatype" (dtype <i8)>
- >>> htype.dtype
- dtype('int64')
-
-This isn't ordinarily useful because each dataset already carries its own
-dtype attribute. However, if you want to store datatypes which are not used
-in any dataset, this is the right way to do it.
+More information
+================
+Full documentation on files, groups, datasets and attributes is available
+in the section ":ref:`h5pyreference`".
diff --git a/docs/source/guide/vl.rst b/docs/source/guide/vl.rst
index 191f2f3..8173dfc 100644
--- a/docs/source/guide/vl.rst
+++ b/docs/source/guide/vl.rst
@@ -1,14 +1,14 @@
-=========================
-VL and Enum types in h5py
-=========================
+=====================
+Special types in h5py
+=====================
HDF5 supports a few types which have no direct NumPy equivalent. Among the
most useful and widely used are *variable-length* (VL) types, and enumerated
types. As of version 1.2, h5py fully supports HDF5 enums, and has partial
support for VL types.
-VL strings
-----------
+Variable-length strings
+-----------------------
In HDF5, data in VL format is stored as arbitrary-length vectors of a base
type. In particular, strings are stored C-style in null-terminated buffers.
@@ -21,24 +21,22 @@ dtype. In h5py 1.2, variable-length strings are mapped to object arrays. A
small amount of metadata attached to an "O" dtype tells h5py that its contents
should be converted to VL strings when stored in the file.
-VL functions
-------------
Existing VL strings can be read and written to with no additional effort;
Python strings and fixed-length NumPy strings can be auto-converted to VL
data and stored. However, creating VL data requires the use of a special
-"hinted" dtype object. Two functions are provided at the package level to
-perform this function:
+"hinted" dtype object. Two functions are provided at the package level for
+this purpose:
- .. function:: h5py.new_vlen(basetype)
+.. function:: h5py.new_vlen(basetype) -> dtype
- Create a new object dtype which represents a VL type. Currently
- *basetype* must be the Python string type (str).
+ Create a new object dtype which represents a VL type. Currently
+ *basetype* must be the Python string type (str).
- .. function:: h5py.get_vlen(dtype)
+.. function:: h5py.get_vlen(dtype) -> dtype or None
- Get the base type of a variable-length dtype, or None if *dtype*
- doesn't represent a VL type.
+ Get the base type of a variable-length dtype, or None if *dtype*
+ doesn't represent a VL type.
Here's an example showing how to create a VL array of strings::
@@ -50,6 +48,42 @@ Here's an example showing how to create a VL array of strings::
>>> h5py.get_vlen(ds.dtype)
... <type 'str'>
+Enumerated types
+----------------
+
+HDF5 has the concept of an *enumerated type*, which is an integer datatype
+with a restriction to certain named values. Since NumPy has no such datatype,
+HDF5 ENUM types are read and written as integers. Like variable-length
+strings, you can create a new enumerated type from a NumPy integer base type
+by using convenience functions to attach a small amount of metadata:
+
+.. function:: h5py.new_enum(dtype, values) -> dtype
+
+ Create a new enumerated type, from a NumPy integer dtype and a dictionary
+ of {"name": value} pairs. Keys must be strings, and values must be
+ integers.
+
+.. function:: h5py.get_enum(dtype) -> dict or None
+
+ Extract the name/value dictionary from an existing enum dtype. Returns
+ None if the dtype does not contain metadata.
+
+Here's an example::
+
+ >>> dt = h5py.new_enum('i', {"RED": 0, "GREEN": 1, "BLUE": 42})
+ >>> h5py.get_enum(dt)
+ {'BLUE': 42, 'GREEN': 1, 'RED': 0}
+ >>> f = h5py.File('foo.hdf5','w')
+ >>> ds = f.create_dataset("EnumDS", (100,100), dtype=dt)
+ >>> ds.dtype.kind
+ 'i'
+ >>> ds[0,:] = 42
+ >>> ds[0,0]
+ 42
+ >>> ds[1,0]
+ 0
+
+
diff --git a/setup.py b/setup.py
index 2b89dd2..d0212cb 100644
--- a/setup.py
+++ b/setup.py
@@ -26,7 +26,7 @@ import os.path as op
import pickle
NAME = 'h5py'
-VERSION = '1.2.0-beta'
+VERSION = '1.2.0'
MIN_NUMPY = '1.0.3'
MIN_CYTHON = '0.9.8.1.1'
SRC_PATH = 'h5py' # Name of directory with .pyx files
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/h5py.git
More information about the debian-science-commits
mailing list