[h5py] 85/455: Revert to global locking; nonblocking IO is optional

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Thu Jul 2 18:19:20 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to annotated tag 1.3.0
in repository h5py.

commit d888507c945b018eb3d7463d8b9e5c096380492b
Author: andrewcollette <andrew.collette at gmail.com>
Date:   Tue Jul 29 02:36:53 2008 +0000

    Revert to global locking; nonblocking IO is optional
---
 docs/info.txt          | 94 ++++++++++++++++++++++++++++++++++++++------------
 h5py/__init__.py       |  2 ++
 h5py/extras.py         | 26 ++++++++++++++
 h5py/h5.pxd            |  5 ++-
 h5py/h5.pyx            | 89 ++++++++++++++++-------------------------------
 h5py/h5d.pyx           | 82 +++++++++++++++++++++++--------------------
 h5py/highlevel.py      |  6 ++--
 h5py/tests/__init__.py |  4 +--
 setup.py               | 14 ++++++--
 9 files changed, 192 insertions(+), 130 deletions(-)

diff --git a/docs/info.txt b/docs/info.txt
index 3109a80..b36ab13 100644
--- a/docs/info.txt
+++ b/docs/info.txt
@@ -3,41 +3,87 @@
 Threading
 =========
 
-All h5py routines are thread-safe in the sense that they are implemented in
-C, and (with a few exceptions) hold the global intepreter lock until they
-finish.  From the standpoint of a Python programmer, they are atomic operations;
-the execution of all threads blocks until they complete.  This means you
-can call the same method on the same object from two different threads, and the
-two calls will execute serially without interfering with one another.
-
-Additionally, each ObjectID instance provides a reentrant lock via the property
-"pylock".  If you acquire this lock, the HDF5 structure you've got hold of
-is guaranteed not to be modified by another thread until you release it.  This
-is the case even your HDF5 structure is "pointed to" by different ObjectID
-instances.
- 
+All h5py routines are intended to be thread-safe. In this context "thread-safe"
+means that if a method is called on an object by one thread, no other thread
+can execute the same method on the same object until the first one finishes.
+
+Most routines are written in C, and accomplish this by holding the global
+interpreter lock (GIL) until they finish, automatically guaranteeing that no
+other thread can execute.  Native-Python routines like those in h5py.highlevel
+enforce thread safety through the use of reentrant locks.
+
+Since HDF5 does not (yet) provide concurrency support for threads, the same
+global lock is used for all objects.  It is available in the low-level
+component as h5.config.lock, and comes attached to all high-level objects
+(Dataset, etc.) as "<obj>.lock".  
+
+You are encouraged to use this locking mechanism for blocks of Python
+statements that need to be executed in a thread-safe manner (i.e. atomically).
+
+A few examples are:
+
+    # Example 1 (h5py.highlevel API)
+    def fiddle_with_data(ds):  # ds is a highlevel.Dataset object
+        with ds.lock:
+            ds[0,0] = 4.0
+            ds[1,2] = 8.0
+            ... more stuff ...
+        # lock released at end of block
+
+    # Example 2 (h5py.h5* API)
+    def write_some_data(shared_file, data_array):
+
+        with h5.config.lock:
+            dset = h5d.open(shared_file, "dsname")
+            dset.write(h5s.ALL, h5s.ALL, data_array)
+
+    # Example 3 (using a decorator)
+
+    from h5py.extras import h5sync
+
+    @h5sync  # Does exactly the same thing as "with" in example 2
+    def write_some_data(shared_file, data_array):
+
+        dset = h5d.open(shared_file, "dsname")
+        dset.write(h5s.ALL, h5s.ALL, data_array)
+    
 Non-Blocking Routines
 ---------------------
 
-A few methods will release the global interpreter lock around I/O operations
-which can take a long time to complete.  These methods always acquire their
-own "pylock" lock before beginning.  They are still thread-safe in that
-multiple threads attempting to run the same routine will execute serially,
-although threads that do other things can run unimpeded.
+By default, all low-level HDF5 routines will lock the entire interpreter
+until they complete, even in the case of lengthy I/O operations.  This is
+unnecessarily restrictive, as it means even non-HDF5 threads cannot execute.
+
+When the package is compiled with the option "--io-nonblock", a few C methods
+involving I/O will release the global interpreter lock.  These methods always
+acquire the global HDF5 lock before yielding control to other threads.  While
+no other HDF5 operation can acquire the HDF5 lock until the write completes,
+other Python threads (GUIs, pure computation threads, etc) will execute in
+a normal fashion.
 
-The following operations will release the GIL:
+However, if another thread skips acquiring the HDF5 lock and blindly calls a
+low-level HDF5 routine while such I/O is in progress, the results are
+undefined.  In the worst case, irreversible data corruption and/or a crash of
+the interpreter is possible.  Therefore, it's very important to always acquire
+the global HDF5 lock before calling into the h5py.h5* API when (1) more than
+one thread is performing HDF5 operations, and (2) non-blocking I/O is enabled.
+
+This is not an issue for the h5py.highlevel components (Dataset, Group,
+File objects, etc.) as they acquire the lock automatically.
+
+The following operations will release the GIL during I/O:
     
     * DatasetID.read
     * DatasetID.write
 
+
 Customizing Locks
 -----------------
 
 Because applications that use h5py may have their own threading systems, the
-type of lock used is settable at runtime.  The settable property
-h5.config.RLock determines the lock class used.  This can be set to any
-callable which produces a reentrant lock.  It must implement the following
-methods:
+lock used is settable at runtime.  The lock is stored as settable property
+"h5py.config.lock" and should be a lock instance (not a constructor) which
+provides the following methods:
 
     __enter__(), __exit__()     For the Python context manager protocol
     acquire(), release()        For manual lock management
@@ -46,6 +92,8 @@ The default lock type is the native Python threading.RLock, but h5py makes no
 assumptions about the behavior or implementation of locks beyond reentrance and
 the existence of the four required methods above.
 
+todo: make this a new section
+
 ObjectID Hashing
 ----------------
 
diff --git a/h5py/__init__.py b/h5py/__init__.py
index f02274a..cb1499c 100644
--- a/h5py/__init__.py
+++ b/h5py/__init__.py
@@ -22,7 +22,9 @@ __doc__ = \
     HDF5 %s (using %s API)
 """
 
+from h5 import _config as config
 import utils, h5, h5a, h5d, h5f, h5g, h5i, h5p, h5r, h5s, h5t, h5z, highlevel
+import extras
 
 from highlevel import File, Group, Dataset, Datatype, AttributeManager
 
diff --git a/h5py/extras.py b/h5py/extras.py
new file mode 100644
index 0000000..8a2e9a3
--- /dev/null
+++ b/h5py/extras.py
@@ -0,0 +1,26 @@
+#+
+# 
+# This file is part of h5py, a low-level Python interface to the HDF5 library.
+# 
+# Copyright (C) 2008 Andrew Collette
+# http://h5py.alfven.org
+# License: BSD  (See LICENSE.txt for full license)
+# 
+# $Date$
+# 
+#-
+from __future__ import with_statement
+
+from h5py import config
+
+# Decorator utility for threads
+from functools import update_wrapper
+def h5sync(func):
+    
+    def wrap(*args, **kwds):
+        with config.lock:
+            return func(*args, **kwds)
+
+    update_wrapper(wrap, func)
+    return wrap
+
diff --git a/h5py/h5.pxd b/h5py/h5.pxd
index 762051f..aa1a3c7 100644
--- a/h5py/h5.pxd
+++ b/h5py/h5.pxd
@@ -253,16 +253,15 @@ cdef object standard_richcmp(object self, object other, int how)
 
 cdef class H5PYConfig:
 
-    cdef object _rlock_type         # RLock constructor or compatible
     cdef object _complex_names      # ('r','i')
-    cdef public object _lockdict    # Weakref dict for RLock instances
+    cdef object _lock               # Primary HDF5 Interface Lock (PHIL)
+    cdef readonly object compile_opts
 
 cdef class ObjectID:
     """ Base wrapper class for HDF5 object identifiers """
     cdef object __weakref__
     cdef readonly hid_t id
     cdef readonly int _locked
-    cdef H5PYConfig _cfg        # Used to cache a reference to the global config object
     cdef object _hash           # Used by subclasses to cache a hash value,
                                 # which may be expensive to compute.
 
diff --git a/h5py/h5.pyx b/h5py/h5.pyx
index 69b808e..ee9e2cd 100644
--- a/h5py/h5.pyx
+++ b/h5py/h5.pyx
@@ -58,19 +58,6 @@ def get_libversion():
 
     return (major, minor, release)
 
-# --- Public versioning info ---
-
-hdf5_version_tuple = get_libversion()        
-hdf5_version = "%d.%d.%d" % hdf5_version_tuple
-api_version_tuple = (H5PY_API_MAJ, H5PY_API_MIN)
-api_version = "%d.%d" % api_version_tuple
-
-version = H5PY_VERSION
-version_tuple = []   # no list comprehensions in Pyrex
-for _x in version.split('.'):
-    version_tuple.append(int(_x))
-version_tuple = tuple(version_tuple)
-
 def _close():
     """ Internal function; do not call unless you want to lose all your data.
     """
@@ -85,17 +72,22 @@ cdef class H5PYConfig:
 
     """
         Global configuration object for the h5py package.
+
+        Properties:
+        lock            Global reentrant lock for threading
+        RLock           Constructor for lock
+        compile_opts    Dictionary of compile-time flags
     """
 
     def __init__(self):
-        self._lockdict = WeakKeyDictionary()  # ObjectID weakref => RLock instance
         self._complex_names = ('r','i')
-        self.RLock = threading.RLock
+        self.compile_opts = {'IO_NONBLOCK': H5PY_NONBLOCK}
+        self.lock = threading.RLock()  # Use the property to double-check its behavior
 
-    property RLock:
-        """ Callable returning a reentrant lock (default is threading.RLock).
+    property lock:
+        """ Reentrant lock for threading (default is threading.RLock()).
             
-            Whatever you provide must support the Python context manager
+            Whatever you provide MUST support the Python context manager
             protocol, and provide the methods acquire() and release().  It
             also MUST be reentrant, or dataset reads/writes will deadlock.
         """
@@ -103,12 +95,15 @@ cdef class H5PYConfig:
             return self._rlock_type
 
         def __set__(self, val):
-            testlock = val()
-            if not (hasattr(testlock, 'acquire') and hasattr(testlock, 'release') and\
-                    hasattr(testlock, '__enter__') and hasattr(testlock, '__exit__')):
+            if not (hasattr(val, 'acquire') and hasattr(val, 'release') and\
+                    hasattr(val, '__enter__') and hasattr(val, '__exit__')):
                 raise ValueError("Generated locks must provide __enter__, __exit__, acquire, release")
-            self._rlock_type = val
-            self._lockdict.clear()
+            current_lock = self._lock
+            current_lock.acquire()
+            try:
+                self._lock = val
+            finally:
+                current_lock.release()
 
     property complex_names:
         """ Tuple (real, img) indicating names used to save complex types.
@@ -120,22 +115,7 @@ cdef class H5PYConfig:
             # TODO: validation
             self._complex_names = val
 
-    def _get_lock(self, ObjectID key not None):
-        """ (ObjectID key) => LOCK 
-
-            Obtain a reentrant lock instance.  Guaranteed to be the same lock
-            for the same key.  Keys are kept as weak references; when they
-            disappear, so do the lock objects.
-        """
-        # ObjectID instances which are both equal and hash to the same value
-        # are guaranteed to point to the same underlying HDF5 object.
-        lock = self._lockdict.get(key, None)
-        if lock is None:
-            lock = self._rlock_type()
-            self._lockdict[key] = lock
-        return lock
 
-config = H5PYConfig()
 
 cdef object standard_richcmp(object self, object other, int how):
     # This needs to be shared because of weird CPython quirks involving
@@ -173,16 +153,6 @@ cdef class ObjectID:
 
         The truth value of an ObjectID (i.e. bool(obj_id)) indicates whether
         the underlying HDF5 identifier is valid.
-
-        Rudimentary thread safety is provided by the property pylock, which is
-        an RLock instance shared by objects that point to the same underlying
-        HDF5 structure.  In multithreaded programs, you should acquire this
-        lock before modifying the structure.  Locks have no relationship;
-        locking a file does not prevent access to its objects, nor a group to
-        its members.
-
-        ObjectID subclasses which release the GIL (e.g. around blocking I/O
-        operations) will lock themselves first.
     """
 
     property _valid:
@@ -191,15 +161,6 @@ cdef class ObjectID:
         def __get__(self):
             return H5Iget_type(self.id) != H5I_BADID
 
-    property pylock:
-        """ RLock or equivalent for threads.  The same lock is returned for
-            objects which point to the same HDF5 structure.
-        """
-        def __get__(self):
-            if self._cfg is None:
-                self._cfg = config
-            return self._cfg._get_lock(self)
-
     def __nonzero__(self):
         """ Truth value for object identifiers (like _valid) """
         return self._valid
@@ -642,11 +603,21 @@ cdef int import_hdf5() except -1:
     return 0
 
 import_hdf5()
+ 
+# --- Public versioning info ---
 
+hdf5_version_tuple = get_libversion()        
+hdf5_version = "%d.%d.%d" % hdf5_version_tuple
+api_version_tuple = (H5PY_API_MAJ, H5PY_API_MIN)
+api_version = "%d.%d" % api_version_tuple
 
+version = H5PY_VERSION
+version_tuple = []   # no list comprehensions in Pyrex
+for _x in version.split('.'):
+    version_tuple.append(int(_x))
+version_tuple = tuple(version_tuple)
 
-
-
+_config = H5PYConfig()
 
 
 
diff --git a/h5py/h5d.pyx b/h5py/h5d.pyx
index 3e71a2e..9e89354 100644
--- a/h5py/h5d.pyx
+++ b/h5py/h5d.pyx
@@ -13,6 +13,7 @@
 """
     Provides access to the low-level HDF5 "H5D" dataset interface.
 """
+include "conditions.pxi"
 
 # Pyrex compile-time imports
 from h5 cimport standard_richcmp
@@ -31,6 +32,7 @@ import h5
 import h5t
 import h5s
 import h5g
+from h5 import _config as config
 
 import_array()
 
@@ -183,28 +185,30 @@ cdef class DatasetID(ObjectID):
         cdef void* data
         cdef int oldflags
 
-        self.pylock.acquire()
-        try:
-            oldflags = arr_obj.flags
-            arr_obj.flags = oldflags & (~NPY_WRITEABLE) # Wish-it-was-a-mutex approach
-
-            mtype = h5t.py_create(arr_obj.dtype)
-            check_numpy_write(arr_obj, -1)
+        mtype = h5t.py_create(arr_obj.dtype)
+        check_numpy_write(arr_obj, -1)
 
-            self_id = self.id
-            mtype_id = mtype.id
-            mspace_id = mspace.id
-            fspace_id = fspace.id
-            plist_id = pdefault(dxpl)
-            data = PyArray_DATA(arr_obj)
+        self_id = self.id
+        mtype_id = mtype.id
+        mspace_id = mspace.id
+        fspace_id = fspace.id
+        plist_id = pdefault(dxpl)
+        data = PyArray_DATA(arr_obj)
 
-            with nogil:
-                H5PY_H5Dread(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
+        IF H5PY_NONBLOCK:
+            lock = config.lock
+            lock.acquire()
+            oldflags = arr_obj.flags
+            arr_obj.flags = oldflags & (~NPY_WRITEABLE) # Wish-it-was-a-mutex approach
+            try:
+                with nogil:
+                    H5PY_H5Dread(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
+            finally:
+                arr_obj.flags = oldflags
+                lock.release()
+        ELSE:
+            H5PY_H5Dread(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
 
-        finally:
-            arr_obj.flags = oldflags
-            self.pylock.release()
-        
     def write(self, SpaceID mspace not None, SpaceID fspace not None, 
                     ndarray arr_obj not None, PropDXID dxpl=None):
         """ (SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, 
@@ -227,27 +231,29 @@ cdef class DatasetID(ObjectID):
         cdef void* data
         cdef int oldflags
 
-        self.pylock.acquire()
-        try:
-            oldflags = arr_obj.flags
-            arr_obj.flags = oldflags & (~NPY_WRITEABLE) # Wish-it-was-a-mutex approach
-
-            mtype = h5t.py_create(arr_obj.dtype)
-            check_numpy_read(arr_obj, -1)
-
-            self_id = self.id
-            mtype_id = mtype.id
-            mspace_id = mspace.id
-            fspace_id = fspace.id
-            plist_id = pdefault(dxpl)
-            data = PyArray_DATA(arr_obj)
+        mtype = h5t.py_create(arr_obj.dtype)
+        check_numpy_read(arr_obj, -1)
 
-            with nogil:
-                H5PY_H5Dwrite(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
+        self_id = self.id
+        mtype_id = mtype.id
+        mspace_id = mspace.id
+        fspace_id = fspace.id
+        plist_id = pdefault(dxpl)
+        data = PyArray_DATA(arr_obj)
 
-        finally:
-            arr_obj.flags = oldflags
-            self.pylock.release()
+        IF H5PY_NONBLOCK:
+            lock = config.lock
+            lock.acquire()
+            oldflags = arr_obj.flags
+            arr_obj.flags = oldflags & (~NPY_WRITEABLE) # Wish-it-was-a-mutex approach
+            try:
+                with nogil:
+                    H5PY_H5Dwrite(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
+            finally:
+                arr_obj.flags = oldflags
+                lock.release()
+        ELSE:
+            H5PY_H5Dwrite(self_id, mtype_id, mspace_id, fspace_id, plist_id, data)
 
     def extend(self, object shape):
         """ (TUPLE shape)
diff --git a/h5py/highlevel.py b/h5py/highlevel.py
index 58ff28c..198c3f6 100644
--- a/h5py/highlevel.py
+++ b/h5py/highlevel.py
@@ -49,7 +49,7 @@ import inspect
 import threading
 from weakref import WeakValueDictionary
 
-from h5py import h5, h5f, h5g, h5s, h5t, h5d, h5a, h5p, h5z, h5i
+from h5py import h5, h5f, h5g, h5s, h5t, h5d, h5a, h5p, h5z, h5i, config
 from h5py.h5 import H5Error
 from utils_hl import slicer, hbasename, strhdr, strlist
 from browse import _H5Browser
@@ -70,8 +70,8 @@ class LockableObject(object):
         Base class which provides rudimentary locking support.
     """
 
-    lock = property(lambda self: self.id.pylock,
-        doc = "A reentrant lock associated with this HDF5 structure")
+    lock = property(lambda self: config.lock,
+        doc = "A reentrant lock for thread-safe use of this object")
 
 class HLObject(LockableObject):
 
diff --git a/h5py/tests/__init__.py b/h5py/tests/__init__.py
index cdafb39..f9e91a5 100644
--- a/h5py/tests/__init__.py
+++ b/h5py/tests/__init__.py
@@ -15,7 +15,7 @@ import sys
 import test_h5a, test_h5d, test_h5f, \
        test_h5g, test_h5i, test_h5p, \
        test_h5s, test_h5t, test_h5, \
-       test_highlevel
+       test_highlevel, test_threads
 
 from h5py import *
 
@@ -23,7 +23,7 @@ TEST_CASES = (test_h5a.TestH5A, test_h5d.TestH5D, test_h5f.TestH5F,
               test_h5g.TestH5G, test_h5i.TestH5I, test_h5p.TestH5P,
               test_h5s.TestH5S, test_h5t.TestH5T, test_h5.TestH5,
               test_highlevel.TestFile, test_highlevel.TestDataset,
-              test_highlevel.TestGroup)
+              test_highlevel.TestGroup, test_threads.TestThreads)
 
 def buildsuite(cases):
 
diff --git a/setup.py b/setup.py
index 2bc7a63..4cf0d0d 100644
--- a/setup.py
+++ b/setup.py
@@ -94,6 +94,7 @@ PYREX_FORCE_OFF = False     # Flag: Don't run Pyrex, no matter what
 API_VERS = (1,6)
 DEBUG_LEVEL = 0
 HDF5_DIR = None
+IO_NONBLOCK = False
 
 for arg in sys.argv[:]:
     if arg == '--pyrex':
@@ -123,7 +124,10 @@ for arg in sys.argv[:]:
         sys.argv.remove(arg)
     elif arg.find('--debug=') == 0:
         ENABLE_PYREX=True
-        DEBUG_LEVEL = int(arg[8:])
+        try:
+            DEBUG_LEVEL = int(arg[8:])
+        except:
+            fatal('Debuglevel not understood (wants --debug=<n>)')
         sys.argv.remove(arg)
     elif arg.find('--hdf5=') == 0:
         splitarg = arg.split('=',1)
@@ -131,6 +135,10 @@ for arg in sys.argv[:]:
             fatal("HDF5 directory not understood (wants --hdf5=/path/to/hdf5)")
         HDF5_DIR = splitarg[1]
         sys.argv.remove(arg)
+    elif arg.find('--io-nonblock') == 0:
+        ENABLE_PYREX=True
+        IO_NONBLOCK = True
+        sys.argv.remove(arg)
 
 if 'sdist' in sys.argv and os.path.exists('MANIFEST'):
     warn("Cleaning up stale MANIFEST file")
@@ -211,8 +219,10 @@ DEF H5PY_DEBUG = %d
 
 DEF H5PY_16API = %d
 DEF H5PY_18API = %d
+
+DEF H5PY_NONBLOCK = %d
 """ % (AUTO_HDR, VERSION, API_VERS[0], API_VERS[1], DEBUG_LEVEL,
-       1 if API_VERS==(1,6) else 0, 1 if API_VERS==(1,8) else 0)
+       1 if API_VERS==(1,6) else 0, 1 if API_VERS==(1,8) else 0, int(IO_NONBLOCK))
 
             try:
                 cond_file = open(cond_path,'r')

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/h5py.git



More information about the debian-science-commits mailing list