[python-hdf5storage] 135/152: Added a development information page in the documentation.
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Mon Feb 29 08:24:42 UTC 2016
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a commit to annotated tag 0.1
in repository python-hdf5storage.
commit 8145a42c571b2145be939c7338fcd6662e02c0e7
Author: Freja Nordsiek <fnordsie at gmail.com>
Date: Sat Feb 15 21:58:00 2014 -0500
Added a development information page in the documentation.
---
doc/source/development.rst | 126 +++++++++++++++++++++++++++++++++++++++++++++
doc/source/index.rst | 1 +
2 files changed, 127 insertions(+)
diff --git a/doc/source/development.rst b/doc/source/development.rst
new file mode 100644
index 0000000..70d8b22
--- /dev/null
+++ b/doc/source/development.rst
@@ -0,0 +1,126 @@
+.. currentmodule:: hdf5storage
+
+=======================
+Development Information
+=======================
+
+The source code can be found on Github at
+https://github.com/frejanordsiek/hdf5storage
+
+Package Overview
+================
+
+The package is currently a pure Python package; using no Cython, C/C++,
+or other languages.
+
+The :py:mod:`hdf5storage` module contains the high level reading and
+writing functions, as well as the :py:class:`Options` class for
+encapsulating all the various options governing how data is read and
+written. The high level reading and writing functions can either be
+given an :py:class:`Options` object, or be given the keyword arguments
+that its constructur takes (they will make one from those
+arguments). There is also the :py:class:`MarshallerCollection` which
+holds all the Marshallers (more below) and provides functions to find
+the appropriate Marshaller given the ``type`` of a Python object, the
+type string used for the 'Python.Type' Attribute, or the MATLAB class
+string (contained in the 'MATLAB_class' Attribute). One can give the
+collection additional user provided Marshallers.
+
+:py:mod:`hdf5storage.lowlevel` contains the low level reading and
+writing functions :py:func:`lowlevel.read_data` and
+:py:func:`lowlevel.write_data`. They can only work on already opened
+HDF5 files (the high level ones handle file creation/opening), can only
+be given options using a :py:class:`Options` object, and read/write
+individual Groups/Datasets and Python objects. Any Marshaller (more
+below) that needs to read or write a nested object within a Group or
+Python object must call these functions.
+
+:py:mod:`hdf5storage.Marshallers` contains all the Marshallers for the
+different Python data types that can be read from or written to an HDF5
+file. They are all automitically added to any
+:py:class:`MarshallerCollection` which inspects this module and grabs
+all classes within it (if a class other than a Marshaller is added to
+this module, :py:class:`MarshallerCollection` will need to be
+modified). All Marshallers need to provide the same interface as
+:py:class:`Marshallers.TypeMarshaller`, which is the base class for all
+Marshallers in this module, and should probably be inherited from by any
+custom Marshallers that one would write (while it can't marshall any
+types, it does have some useful built in functionality). The main
+Marshaller in the module is
+:py:class:`Marshallers.NumpyScalarArrayMarshaller`, which can marshall
+most Numpy types. All the other built in Marshallers other than
+:py:class:`Marshallers.PythonDictMarshaller` inherit from it since they
+convert their types to and from Numpy types and use the inherited
+functions to do the actual work with the HDF5 file.
+
+:py:mod:`hdf5storage.utilities` contains many functions that are used
+throughout the pacakge, especially by the Marshallers. There are several
+functions to get, set, and delete different kinds of HDF5 Attributes
+(handle things such as them already existing, not existing, etc). Then
+there functions to convert between different string representations, as
+well as encode for writing and decode after reading complex types. And
+then there is the function
+:py:func:`utilities.next_unused_name_in_group` which produces a random
+unused name in a Group.
+
+
+TODO
+====
+
+There are several features that need to be added, bugs that need to be
+fixed, etc.
+
+Standing Bugs
+-------------
+
+* Complex numbers where one of the parts (real or imaginary) is ``nan``
+ but the other part is not, are read from file as
+ ``(nan + nanj)``. See :py:func:`utilities.decode_complex`.
+* Structured ``np.ndarray`` with no elements, when
+ :py:attr:`Options.structured_numpy_ndarray_as_struct` is set, are not
+ written in a way that the dtypes for the fields can be restored when
+ it is read back from file.
+* The Attribute 'MATLAB_fields' is not currently set when writing
+ data that should be imported into MATLAB as structures, and is ignored
+ when reading data from file. This is because the h5py package cannot
+ work with its format. If a structure with fields 'a' and 'cd' are
+ saved, the Attribute looks like the following when using the
+ ``h5dump`` utility::
+
+ ATTRIBUTE "MATLAB_fields" {
+ DATATYPE H5T_VLEN { H5T_STRING {
+ STRSIZE 1;
+ STRPAD H5T_STR_NULLTERM;
+ CSET H5T_CSET_ASCII;
+ CTYPE H5T_C_S1;
+ }}
+ DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
+ DATA {
+ (0): ("a"), ("c", "d")
+ }
+ }
+
+ MATLAB doesn't strictly require this field, but supporting it will
+ help with reading/writing empty MATLAB structs. Would probably require
+ writiing a custom Cython or C function to fix this.
+
+Features to Add
+---------------
+
+* Marshallers for more Python types.
+* Marshallers to be able to read the following MATLAB types
+
+ * Categorical Arrays
+ * Tables
+ * Maps
+ * Time Series
+ * Classes (could be hard if they don't look like a struct in file)
+ * Function Handles (wouldn't be able run in Python, but could at least
+ manipulate)
+
+* A ``whosmat`` function like the SciPy one :py:func:`scipy.io.whosmat`.
+* A function to find and delete Datasets and Groups inside the Group
+ :py:attr:`Options.group_for_references` that are not referenced by
+ other Datasets in the file.
+* Optional compression for large Datasets.
+
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 700f171..a0c006e 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -14,6 +14,7 @@ Contents:
information
introduction
storage_format
+ development
api
Indices and tables
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/python-hdf5storage.git
More information about the debian-science-commits
mailing list