[python-hdf5storage] 135/152: Added a development information page in the documentation.

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Mon Feb 29 08:24:42 UTC 2016


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to annotated tag 0.1
in repository python-hdf5storage.

commit 8145a42c571b2145be939c7338fcd6662e02c0e7
Author: Freja Nordsiek <fnordsie at gmail.com>
Date:   Sat Feb 15 21:58:00 2014 -0500

    Added a development information page in the documentation.
---
 doc/source/development.rst | 126 +++++++++++++++++++++++++++++++++++++++++++++
 doc/source/index.rst       |   1 +
 2 files changed, 127 insertions(+)

diff --git a/doc/source/development.rst b/doc/source/development.rst
new file mode 100644
index 0000000..70d8b22
--- /dev/null
+++ b/doc/source/development.rst
@@ -0,0 +1,126 @@
+.. currentmodule:: hdf5storage
+
+=======================
+Development Information
+=======================
+
+The source code can be found on Github at
+https://github.com/frejanordsiek/hdf5storage
+
+Package Overview
+================
+
+The package is currently a pure Python package; using no Cython, C/C++,
+or other languages.
+
+The :py:mod:`hdf5storage` module contains the high level reading and
+writing functions, as well as the :py:class:`Options` class for
+encapsulating all the various options governing how data is read and
+written. The high level reading and writing functions can either be
+given an :py:class:`Options` object, or be given the keyword arguments
+that its constructur takes (they will make one from those
+arguments). There is also the :py:class:`MarshallerCollection` which
+holds all the Marshallers (more below) and provides functions to find
+the appropriate Marshaller given the ``type`` of a Python object, the
+type string used for the 'Python.Type' Attribute, or the MATLAB class
+string (contained in the 'MATLAB_class' Attribute). One can give the
+collection additional user provided Marshallers.
+
+:py:mod:`hdf5storage.lowlevel` contains the low level reading and
+writing functions :py:func:`lowlevel.read_data` and
+:py:func:`lowlevel.write_data`. They can only work on already opened
+HDF5 files (the high level ones handle file creation/opening), can only
+be given options using a :py:class:`Options` object, and read/write
+individual Groups/Datasets and Python objects. Any Marshaller (more
+below) that needs to read or write a nested object within a Group or
+Python object must call these functions.
+
+:py:mod:`hdf5storage.Marshallers` contains all the Marshallers for the
+different Python data types that can be read from or written to an HDF5
+file. They are all automitically added to any
+:py:class:`MarshallerCollection` which inspects this module and grabs
+all classes within it (if a class other than a Marshaller is added to
+this module, :py:class:`MarshallerCollection` will need to be
+modified). All Marshallers need to provide the same interface as
+:py:class:`Marshallers.TypeMarshaller`, which is the base class for all
+Marshallers in this module, and should probably be inherited from by any
+custom Marshallers that one would write (while it can't marshall any
+types, it does have some useful built in functionality). The main
+Marshaller in the module is
+:py:class:`Marshallers.NumpyScalarArrayMarshaller`, which can marshall
+most Numpy types. All the other built in Marshallers other than
+:py:class:`Marshallers.PythonDictMarshaller` inherit from it since they
+convert their types to and from Numpy types and use the inherited
+functions to do the actual work with the HDF5 file.
+
+:py:mod:`hdf5storage.utilities` contains many functions that are used
+throughout the pacakge, especially by the Marshallers. There are several
+functions to get, set, and delete different kinds of HDF5 Attributes
+(handle things such as them already existing, not existing, etc). Then
+there functions to convert between different string representations, as
+well as encode for writing and decode after reading complex types. And
+then there is the function
+:py:func:`utilities.next_unused_name_in_group` which produces a random
+unused name in a Group.
+
+
+TODO
+====
+
+There are several features that need to be added, bugs that need to be
+fixed, etc.
+
+Standing Bugs
+-------------
+
+* Complex numbers where one of the parts (real or imaginary) is ``nan``
+  but the other part is not, are read from file as
+  ``(nan + nanj)``. See :py:func:`utilities.decode_complex`.
+* Structured ``np.ndarray`` with no elements, when
+  :py:attr:`Options.structured_numpy_ndarray_as_struct` is set, are not
+  written in a way that the dtypes for the fields can be restored when
+  it is read back from file.
+* The Attribute 'MATLAB_fields' is not currently set when writing
+  data that should be imported into MATLAB as structures, and is ignored
+  when reading data from file. This is because the h5py package cannot
+  work with its format. If a structure with fields 'a' and 'cd' are
+  saved, the Attribute looks like the following when using the
+  ``h5dump`` utility::
+
+    ATTRIBUTE "MATLAB_fields" {
+       DATATYPE  H5T_VLEN { H5T_STRING {
+          STRSIZE 1;
+          STRPAD H5T_STR_NULLTERM;
+          CSET H5T_CSET_ASCII;
+          CTYPE H5T_C_S1;
+       }}
+       DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
+       DATA {
+       (0): ("a"), ("c", "d")
+       }
+    }
+
+  MATLAB doesn't strictly require this field, but supporting it will
+  help with reading/writing empty MATLAB structs. Would probably require
+  writiing a custom Cython or C function to fix this.
+
+Features to Add
+---------------
+
+* Marshallers for more Python types.
+* Marshallers to be able to read the following MATLAB types
+
+  * Categorical Arrays
+  * Tables
+  * Maps
+  * Time Series
+  * Classes (could be hard if they don't look like a struct in file)
+  * Function Handles (wouldn't be able run in Python, but could at least
+    manipulate)
+
+* A ``whosmat`` function like the SciPy one :py:func:`scipy.io.whosmat`.
+* A function to find and delete Datasets and Groups inside the Group
+  :py:attr:`Options.group_for_references` that are not referenced by
+  other Datasets in the file.
+* Optional compression for large Datasets.
+
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 700f171..a0c006e 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -14,6 +14,7 @@ Contents:
    information
    introduction
    storage_format
+   development
    api
 
 Indices and tables

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/python-hdf5storage.git



More information about the debian-science-commits mailing list