[python-hdf5storage] 72/84: Added documentation page for compression. (cherry picked from commit e8b33a0bf1b4c0897fc7ae7491143ed2000d4325)
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Mon Feb 29 08:25:05 UTC 2016
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a commit to annotated tag 0.1.10
in repository python-hdf5storage.
commit 55992e285703e81f3226895deb8575c92b074b83
Author: Freja Nordsiek <fnordsie at gmail.com>
Date: Tue Sep 1 01:44:22 2015 -0400
Added documentation page for compression.
(cherry picked from commit e8b33a0bf1b4c0897fc7ae7491143ed2000d4325)
---
doc/source/compression.rst | 180 +++++++++++++++++++++++++++++++++++++++++++++
doc/source/index.rst | 1 +
2 files changed, 181 insertions(+)
diff --git a/doc/source/compression.rst b/doc/source/compression.rst
new file mode 100644
index 0000000..13b9c04
--- /dev/null
+++ b/doc/source/compression.rst
@@ -0,0 +1,180 @@
+.. currentmodule:: hdf5storage
+
+===========
+Compression
+===========
+
+.. versionadded:: 0.2
+
+ HDF5 compression features added along with several options to
+ control it in :py:class:`Options`.
+
+
+.. versionadded:: 0.1.7
+
+ :py:class:`Options` will take the compression options but ignores
+ them.
+
+
+.. warning::
+
+ Passing the compression options for versions earlier than ``0.1.7``
+ will result in an error.
+
+
+The HDF5 libraries and the :py:mod:`h5py` module support transparent
+compression of data in HDF5 files.
+
+The use of compression can sometimes drastically reduce file size, often
+makes it faster to read the data from the file, and sometimes makes it
+faster to write the data. Though, not all data compresses very well and
+can occassionally end up larger after compression than it was
+uncompressed. Compression does cost CPU time both when compressing the
+data and when decompressing it. The reason this can sometimes lead to
+faster read and write times is because disks are very slow and the space
+savings can save enough disk access time to make up for the CPU time.
+
+
+Enabling Compression
+====================
+
+Compression, which is enabled by default, is controlled by setting
+:py:attr:`Options.compress` to ``True`` or passing ``compress=X`` to
+:py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or
+``False``.
+
+
+.. note::
+
+ Not all python objects written to the HDF5 file will be compressed,
+ or even support compression. For one, :py:mod:`numpy` scalars or any
+ type that is stored as one do not support compression due to
+ limitations of the HDF5 library, though compressing them would be a
+ waste (hence the lack of support).
+
+
+Setting The Minimum Data Size for Compression
+=============================================
+
+Compressing small pieces of data often wastes space (compressed size is
+larger than uncompressed size) and CPU time. Due to this, python objects
+have to be larger than a particular size before this package will
+compress them. The threshold, in bytes, is controlled by setting
+:py:attr:`Options.compress_size_threshold` or passing
+``compress_size_threshold=X`` to :py:func:`write` and
+:py:func:`savemat` where ``X`` is a non-negative integer. The default
+value is 16 KB.
+
+
+Controlling The Compression Algorithm And Level
+===============================================
+
+Many compression algorithms can be used with HDF5 files, though only
+three are common. The Deflate algorithm (sometimes known as the GZIP
+algorithm), LZF algorithm, and SZIP algorithms are the algorithms that
+the HDF5 library is explicitly setup to support. The library has a
+mechanism for adding additional algorithms. Popular ones include the
+BZIP2 and BLOSC algorithms.
+
+The compression algorithm used is controlled by setting
+:py:attr:`Options.compression_algorithm` or passing
+``compression_algorithm=X`` to :py:func:`write` and :py:func:`savemat`.
+``X`` is the ``str`` name of the algorithm. The default is ``'gzip'``
+corresponding to the Deflate/GZIP algorithm.
+
+.. note::
+
+ As of version ``0.2``, only the Deflate (``X = 'gzip'``), LZF
+ (``X = 'lzf'``), and SZIP (``X = 'szip'``) algorithms are supported.
+
+
+.. note::
+
+ If doing MATLAB compatibility (:py:attr:`Options.matlab_compatible`
+ is ``True``), only the Deflate algorithm is supported.
+
+
+The algorithms, in more detail
+
+GZIP / Deflate (``'gzip'``)
+ The common Deflate algorithm seen in the Unix and Linux ``gzip``
+ utility and the most common compression algorithm used in ZIP files.
+ It is the most compatible algorithm. It achieves good compression and
+ is reasonably fast. It has no patent or license restrictions.
+
+LZF (``'lzf'``)
+ A very fast algorithm but with inferior compression to GZIP/Deflate.
+ It is less commonly used than GZIP/Deflate, but similarly has no
+ patent or license restrictions.
+
+SZIP (``'szip'``)
+ This compression algorithm isn't always available and has patent
+ and license restrictions. See
+ `SZIP License <https://www.hdfgroup.org/doc_resource/SZIP/Commercial_szip.html>`_.
+
+
+If GZIP/Deflate compression is being used, the compression level can be
+adjusted by setting :py:attr:`Options.gzip_compression_level` or passing
+``gzip_compression_level=X`` to :py:func:`write` and :py:func:`savemat`
+where ``X`` is an integer between ``0`` and ``9`` inclusive. ``0`` is
+the lowest compression, but is the fastest. ``9`` gives the best
+compression, but is the slowest. The default is ``7``.
+
+For all compression algorithms, there is an additional filter which can
+help achieve better compression at relatively low cost in CPU time. It
+is the shuffle filter. It is controlled by setting
+:py:attr:`Options.shuffle_filter` or passing ``shuffle_filter=X`` to
+:py:func:`write` and :py:func:`savemat` where ``X`` is ``True`` or
+``False``. The default is ``True``.
+
+
+Using Checksums
+===============
+
+Fletcher32 checksums can be calculated and stored for most types of
+stored data in an HDF5 file. These are then checked when the data is
+read to catch file corruption, which will cause an error when reading
+the data informing the user that there is data corruption. The filter
+can be enabled or disabled separately for data that is compressed and
+data that is not compressed (e.g. compression is disabled or the python
+object's data size is smaller than the compression threshold).
+
+For compressed data, it is controlled by setting
+:py:attr:`Options.compressed_fletcher32_filter` or passing
+``compressed_fletcher32_filter=X`` to :py:func:`write` and
+:py:func:`savemat` where ``X`` is ``True`` or ``False``. The default is
+``True``.
+
+For uncompressed data, it is controlled by setting
+:py:attr:`Options.uncompressed_fletcher32_filter` or passing
+``uncompressed_fletcher32_filter=X`` to :py:func:`write` and
+:py:func:`savemat` where ``X`` is ``True`` or ``False``. The default is
+``False``.
+
+
+.. note::
+
+ Fletcher32 checksums are not computed for anything that is stored
+ as a :py:mod:`numpy` scalar.
+
+
+Further Reading
+===============
+
+.. seealso::
+
+ `HDF5 Datasets Filter pipeline <http://docs.h5py.org/en/latest/high/dataset.html#filter-pipeline>`_
+ Description of the Dataset filter pipeline in the :py:mod:`h5py`
+
+ `Using Compression in HDF5 <http://www.hdfgroup.org/HDF5/faq/compression.html>`_
+ FAQ on compression from the HDF Group.
+
+ `SZIP License <https://www.hdfgroup.org/doc_resource/SZIP/Commercial_szip.html>`_
+ The license for using the SZIP compression algorithm.
+
+ `SZIP COMPRESSION IN HDF PRODUCTS <https://www.hdfgroup.org/doc_resource/SZIP>`_
+ Information on using SZIP compression from the HDF Group.
+
+ `3rd Party Compression Algorithms for HDF5 <https://www.hdfgroup.org/services/contributions.html>`_
+ List of common additional compression algorithms.
+
diff --git a/doc/source/index.rst b/doc/source/index.rst
index a0c006e..ea07a16 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -13,6 +13,7 @@ Contents:
information
introduction
+ compression
storage_format
development
api
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/python-hdf5storage.git
More information about the debian-science-commits
mailing list