[numexpr] 01/01: Imported Upstream version 2.5

Sat Feb 6 11:40:08 UTC 2016

This is an automated email from the git hooks/post-receive script.

a_valentino-guest pushed a commit to annotated tag upstream/2.5
in repository numexpr.

commit 20c1062dd5f894c9c6c737edb93559d6ae2a8be9
Author: Antonio Valentino <antonio.valentino at tiscali.it>
Date:   Sat Feb 6 11:04:54 2016 +0000

    Imported Upstream version 2.5
---
 ANNOUNCE.rst                  |   9 ++--
 AUTHORS.txt                   |   2 +
 README.rst                    | 110 +++++++++++++++++++-----------------------
 RELEASE_NOTES.rst             |  12 ++++-
 numexpr/cpuinfo.py            |  26 +++++-----
 numexpr/expressions.py        |  31 +++++-------
 numexpr/interp_body.cpp       |  10 ++++
 numexpr/interpreter.cpp       |  52 +++++++++++++++-----
 numexpr/module.cpp            |   1 +
 numexpr/module.hpp            |   5 +-
 numexpr/necompiler.py         |  74 ++++++++++++++--------------
 numexpr/opcodes.hpp           |  39 +++++++++------
 numexpr/tests/test_numexpr.py |  40 ++++++++++++++-
 numexpr/version.py            |   2 +-
 14 files changed, 251 insertions(+), 162 deletions(-)

diff --git a/ANNOUNCE.rst b/ANNOUNCE.rst
index 664ee33..807878f 100644
--- a/ANNOUNCE.rst
+++ b/ANNOUNCE.rst
@@ -1,5 +1,5 @@
 =========================
- Announcing Numexpr 2.4.6
+ Announcing Numexpr 2.5
 =========================
 
 Numexpr is a fast numerical expression evaluator for NumPy.  With it,
@@ -21,9 +21,10 @@ don't want to adopt other solutions requiring more heavy dependencies.
 What's new
 ==========
 
-This is a quick maintenance version that offers better handling of
-MSVC symbols (#168, Francesc Alted), as well as fising some
-UserWarnings in Solaris (#189, Graham Jones).
+In this version, a lock has been added so that numexpr can be called
+not from multithreaded apps.  Mind that this does not prevent numexpr
+to use multiple cores internally.  Also, a new min() and max()
+functions have been added.  Thanks to contributors!
 
 In case you want to know more in detail what has changed in this
 version, see:
diff --git a/AUTHORS.txt b/AUTHORS.txt
index d727193..f43b249 100644
--- a/AUTHORS.txt
+++ b/AUTHORS.txt
@@ -20,3 +20,5 @@ enhancements.
 Antonio Valentino contributed the port to Python 3.
 
 Google Inc. contributed bug fixes.
+
+David Cox improved readability of the Readme.
diff --git a/README.rst b/README.rst
index 509bcfc..2c0a37c 100644
--- a/README.rst
+++ b/README.rst
@@ -25,7 +25,7 @@ expressions that operate on arrays (like "3*a+4*b") are accelerated
 and use less memory than doing the same calculation in Python.
 
 In addition, its multi-threaded capabilities can make use of all your
-cores, which may accelerate computations, most specially if they are
+cores -- which may accelerate computations, most specially if they are
 not memory-bounded (e.g. those using transcendental functions).
 
 Last but not least, numexpr can make use of Intel's VML (Vector Math
@@ -33,6 +33,34 @@ Library, normally integrated in its Math Kernel Library, or MKL).
 This allows further acceleration of transcendent expressions.
 
 
+How Numexpr achieves high performance 
+================================================
+
+The main reason why Numexpr achieves better performance than NumPy 
+is that it avoids allocating memory for intermediate results. This 
+results in better cache utilization and reduces memory access in
+general. Due to this, Numexpr works best with large arrays. 
+
+Numexpr parses expressions into its own op-codes that are then used by
+an integrated computing virtual machine. The array operands are split
+into small chunks that easily fit in the cache of the CPU and passed to 
+the virtual machine. The virtual machine then applies the operations 
+on each chunk. It's worth noting that all temporaries and constants 
+in the expression are also chunked.
+
+The result is that Numexpr can get the most of your machine computing
+capabilities for array-wise computations. Common speed-ups with regard 
+to NumPy are usually between 0.95x (for very simple expressions 
+like ’a + 1’) and 4x (for relatively complex ones like 'a*b-4.1*a > 2.5*b'),
+although much higher speed-ups can be achieved (up to 15x in some cases).
+
+Numexpr performs best on matrices that do not fit in CPU cache. 
+In order to get a better idea on the different speed-ups
+that can be achieved on your platform, run the provided benchmarks.
+
+See more info about how Numexpr works in the `wiki <https://github.com/pydata/numexpr/wiki>`_.
+
+
 Examples of use
 ===============
 
@@ -79,9 +107,9 @@ type inference rules, see below).  Have this in mind when doing
 estimations about the memory consumption during the computation of
 your expressions.
 
-Also, the types in Numexpr conditions are somewhat stricter than those
-of Python.  For instance, the only valid constants for booleans are
-`True` and `False`, and they are never automatically cast to integers.
+Also, the types in Numexpr conditions are somewhat more restrictive 
+than those of Python.  For instance, the only valid constants for booleans 
+are `True` and `False`, and they are never automatically cast to integers.
 
 
 Casting rules
@@ -128,7 +156,7 @@ Numexpr supports the set of operators listed below::
 Supported functions
 ===================
 
-The next are the current supported set::
+Supported functions are listed below::
 
   * where(bool, number1, number2): number
       Number1 if the bool condition is true, number2 otherwise.
@@ -171,13 +199,13 @@ The next are the current supported set::
 
   + `contains()` only works with bytes strings, not unicode strings.
 
-More functions can be added if you need them.
+You may add additional functions as needed.
 
 
 Supported reduction operations
 ==============================
 
-The next are the current supported set:
+The following reduction operations are currently supported::
 
   * sum(number, axis=None): Sum of array elements over a given axis.
     Negative axis are not supported.
@@ -185,6 +213,12 @@ The next are the current supported set:
   * prod(number, axis=None): Product of array elements over a given
     axis.  Negative axis are not supported.
 
+  * min(number, axis=None): Minimum of array elements over a given
+    axis.  Negative axis are not supported.
+
+  * max(number, axis=None): Maximum of array elements over a given
+    axis.  Negative axis are not supported.
+
 
 General routines
 ================
@@ -211,7 +245,7 @@ General routines
     `set_vml_num_threads(nthreads)` to perform the parallel job with
     VML instead.  However, you should get very similar performance
     with VML-optimized functions, and VML's parallelizer cannot deal
-    with common expresions like `(x+1)*(x-2)`, while Numexpr's one
+    with common expressions like `(x+1)*(x-2)`, while Numexpr's one
     can.
 
   * detect_number_of_cores(): Detects the number of cores in the
@@ -222,9 +256,9 @@ Intel's VML specific support routines
 =====================================
 
 When compiled with Intel's VML (Vector Math Library), you will be able
-to use some additional functions for controlling its use. These are:
+to use some additional functions for controlling its use. These are outlined below::
 
-* set_vml_accuracy_mode(mode):  Set the accuracy for VML operations.
+  * set_vml_accuracy_mode(mode):  Set the accuracy for VML operations.
 
 The `mode` parameter can take the values:
   - 'low': Equivalent to VML_LA - low accuracy VML functions are called
@@ -234,66 +268,20 @@ The `mode` parameter can take the values:
 It returns the previous mode.
 
 This call is equivalent to the `vmlSetMode()` in the VML library.
-See:
-
-http://www.intel.com/software/products/mkl/docs/webhelp/vml/vml_DataTypesAccuracyModes.html
 
-for more info on the accuracy modes.
+:: 
 
-* set_vml_num_threads(nthreads): Suggests a maximum number of
-  threads to be used in VML operations.
+  * set_vml_num_threads(nthreads): Suggests a maximum number of
+    threads to be used in VML operations.
 
 This function is equivalent to the call
 `mkl_domain_set_num_threads(nthreads, MKL_DOMAIN_VML)` in the MKL library.
-See:
-
-http://www.intel.com/software/products/mkl/docs/webhelp/support/functn_mkl_domain_set_num_threads.html
 
-for more info about it.
+See the Intel documentation on `VM Service Functions <https://software.intel.com/en-us/node/521831>`_ for more information.
 
 * get_vml_version():  Get the VML/MKL library version.
 
 
-How Numexpr can achieve such a high performance?
-================================================
-
-The main reason why Numexpr achieves better performance than NumPy (or
-even than plain C code) is that it avoids the creation of whole
-temporaries for keeping intermediate results, so saving memory
-bandwidth (the main bottleneck in many computations in nowadays
-computers). Due to this, it works best with arrays that are large
-enough (typically larger than processor caches).
-
-Briefly, it works as follows. Numexpr parses the expression into its
-own op-codes, that will be used by the integrated computing virtual
-machine. Then, the array operands are split in small chunks (that
-easily fit in the cache of the CPU) and passed to the virtual
-machine. Then, the computational phase starts, and the virtual machine
-applies the op-code operations for each chunk, saving the outcome in
-the resulting array. It is worth noting that all the temporaries and
-constants in the expression are kept in the same small chunk sizes
-than the operand ones, avoiding additional memory (and most specially,
-memory bandwidth) waste.
-
-The result is that Numexpr can get the most of your machine computing
-capabilities for array-wise computations.  Just to give you an idea of
-its performance, common speed-ups with regard to NumPy are usually
-between 0.95x (for very simple expressions, like ’a + 1’) and 4x (for
-relatively complex ones, like 'a*b-4.1*a > 2.5*b'), although much
-higher speed-ups can be achieved (up to 15x can be seen in not too
-esoteric expressions) because this depends on the kind of the
-operations and how many operands participates in the expression.  Of
-course, Numexpr will perform better (in comparison with NumPy) with
-larger matrices, i.e. typically those that does not fit in the cache
-of your CPU.  In order to get a better idea on the different speed-ups
-that can be achieved for your own platform, you may want to run the
-benchmarks in the directory bench/.
-
-See more info about how Numexpr works in:
-
-https://github.com/pydata/numexpr/wiki
-
-
 Authors
 =======
 
@@ -303,7 +291,7 @@ See AUTHORS.txt
 License
 =======
 
-Numexpr is distributed under the MIT license (see LICENSE.txt file).
+Numexpr is distributed under the MIT license.
 
 
 
diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst
index c376846..398d010 100644
--- a/RELEASE_NOTES.rst
+++ b/RELEASE_NOTES.rst
@@ -1,7 +1,17 @@
 ======================================
- Release notes for Numexpr 2.4 series
+ Release notes for Numexpr 2.5 series
 ======================================
 
+Changes from 2.4.6 to 2.5
+=========================
+
+- Added locking for allowing the use of numexpr in multi-threaded
+  callers (this does not prevent numexpr to use multiple cores
+  simultaneously).  (PR #199, Antoine Pitrou, PR #200, Jenn Olsen).
+
+- Added new min() and max() functions (PR #195, CJ Carey).
+
+
 Changes from 2.4.5 to 2.4.6
 ===========================
 
diff --git a/numexpr/cpuinfo.py b/numexpr/cpuinfo.py
index 962ae9b..f11cf5f 100755
--- a/numexpr/cpuinfo.py
+++ b/numexpr/cpuinfo.py
@@ -38,7 +38,7 @@ def getoutput(cmd, successful_status=(0,), stacklevel=1):
         p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
         output, _ = p.communicate()
         status = p.returncode
-    except EnvironmentError, e:
+    except EnvironmentError as e:
         warnings.warn(str(e), UserWarning, stacklevel=stacklevel)
         return False, ''
     if os.WIFEXITED(status) and os.WEXITSTATUS(status) in successful_status:
@@ -99,7 +99,7 @@ class CPUInfoBase(object):
                     return lambda func=self._try_call, attr=attr: func(attr)
             else:
                 return lambda: None
-        raise AttributeError, name
+        raise AttributeError(name)
 
     def _getNCPUs(self):
         return 1
@@ -128,7 +128,7 @@ class LinuxCPUInfo(CPUInfoBase):
             info[0]['uname_m'] = output.strip()
         try:
             fo = open('/proc/cpuinfo')
-        except EnvironmentError, e:
+        except EnvironmentError as e:
             warnings.warn(str(e), UserWarning)
         else:
             for line in fo:
@@ -600,12 +600,16 @@ class Win32CPUInfo(CPUInfoBase):
     # mean?
 
     def __init__(self):
+        try:
+            import _winreg
+        except ImportError:  # Python 3
+            import winreg as _winreg
+
         if self.info is not None:
             return
         info = []
         try:
             #XXX: Bad style to use so long `try:...except:...`. Fix it!
-            import _winreg
 
             prgx = re.compile(r"family\s+(?P<FML>\d+)\s+model\s+(?P<MDL>\d+)" \
                               "\s+stepping\s+(?P<STP>\d+)", re.IGNORECASE)
@@ -636,8 +640,7 @@ class Win32CPUInfo(CPUInfoBase):
                                     info[-1]["Model"] = int(srch.group("MDL"))
                                     info[-1]["Stepping"] = int(srch.group("STP"))
         except:
-            print
-            sys.exc_value, '(ignoring)'
+            print(sys.exc_value, '(ignoring)')
         self.__class__.info = info
 
     def _not_impl(self):
@@ -796,16 +799,13 @@ if __name__ == "__main__":
     cpu.is_Intel()
     cpu.is_Alpha()
 
-    print
-    'CPU information:',
+    info = []
     for name in dir(cpuinfo):
         if name[0] == '_' and name[1] != '_':
             r = getattr(cpu, name[1:])()
             if r:
                 if r != 1:
-                    print
-                    '%s=%s' % (name[1:], r),
+                    info.append('%s=%s' % (name[1:], r))
                 else:
-                    print
-                    name[1:],
-    print
+                    info.append(name[1:])
+    print('CPU information: ' + ' '.join(info))
diff --git a/numexpr/expressions.py b/numexpr/expressions.py
index 803a98e..635bfdf 100644
--- a/numexpr/expressions.py
+++ b/numexpr/expressions.py
@@ -231,22 +231,15 @@ def encode_axis(axis):
     return RawNode(axis)
 
 
-def sum_func(a, axis=None):
-    axis = encode_axis(axis)
-    if isinstance(a, ConstantNode):
-        return a
-    if isinstance(a, (bool, int_, long_, float, double, complex)):
-        a = ConstantNode(a)
-    return FuncNode('sum', [a, axis], kind=a.astKind)
-
-
-def prod_func(a, axis=None):
-    axis = encode_axis(axis)
-    if isinstance(a, (bool, int_, long_, float, double, complex)):
-        a = ConstantNode(a)
-    if isinstance(a, ConstantNode):
-        return a
-    return FuncNode('prod', [a, axis], kind=a.astKind)
+def gen_reduce_axis_func(name):
+    def _func(a, axis=None):
+        axis = encode_axis(axis)
+        if isinstance(a, ConstantNode):
+            return a
+        if isinstance(a, (bool, int_, long_, float, double, complex)):
+            a = ConstantNode(a)
+        return FuncNode(name, [a, axis], kind=a.astKind)
+    return _func
 
 
 @ophelper
@@ -373,8 +366,10 @@ functions = {
     'complex': func(complex, 'complex'),
     'conj': func(numpy.conj, 'complex'),
 
-    'sum': sum_func,
-    'prod': prod_func,
+    'sum': gen_reduce_axis_func('sum'),
+    'prod': gen_reduce_axis_func('prod'),
+    'min': gen_reduce_axis_func('min'),
+    'max': gen_reduce_axis_func('max'),
     'contains': contains_func,
 }
 
diff --git a/numexpr/interp_body.cpp b/numexpr/interp_body.cpp
index ec7e529..475a89f 100644
--- a/numexpr/interp_body.cpp
+++ b/numexpr/interp_body.cpp
@@ -456,6 +456,16 @@
                                    ci_reduce = cr_reduce*c1i + ci_reduce*c1r;
                                    cr_reduce = da);
 
+        case OP_MIN_IIN: VEC_ARG1(i_reduce = fmin(i_reduce, i1));
+        case OP_MIN_LLN: VEC_ARG1(l_reduce = fmin(l_reduce, l1));
+        case OP_MIN_FFN: VEC_ARG1(f_reduce = fmin(f_reduce, f1));
+        case OP_MIN_DDN: VEC_ARG1(d_reduce = fmin(d_reduce, d1));
+
+        case OP_MAX_IIN: VEC_ARG1(i_reduce = fmax(i_reduce, i1));
+        case OP_MAX_LLN: VEC_ARG1(l_reduce = fmax(l_reduce, l1));
+        case OP_MAX_FFN: VEC_ARG1(f_reduce = fmax(f_reduce, f1));
+        case OP_MAX_DDN: VEC_ARG1(d_reduce = fmax(d_reduce, d1));
+
         default:
             *pc_error = pc;
             return -3;
diff --git a/numexpr/interpreter.cpp b/numexpr/interpreter.cpp
index 4d1576e..b622e7e 100644
--- a/numexpr/interpreter.cpp
+++ b/numexpr/interpreter.cpp
@@ -19,6 +19,13 @@
 #include "interpreter.hpp"
 #include "numexpr_object.hpp"
 
+#ifdef _MSC_VER
+/* Some missing symbols and functions for Win */
+#define fmax max
+#define fmin min
+#define INFINITY (DBL_MAX+DBL_MAX)
+#define NAN (INFINITY-INFINITY)
+#endif
 
 #ifndef SIZE_MAX
 #define SIZE_MAX ((size_t)-1)
@@ -46,7 +53,6 @@
 #endif
 
 
-
 using namespace std;
 
 // Global state
@@ -691,13 +697,19 @@ vm_engine_iter_parallel(NpyIter *iter, const vm_params& params,
                         bool need_output_buffering, int *pc_error,
                         char **errmsg)
 {
-    int i;
+    int i, ret = -1;
     npy_intp numblocks, taskfactor;
 
     if (errmsg == NULL) {
         return -1;
     }
 
+    /* Ensure only one parallel job is running at a time (otherwise
+       the global th_params get corrupted). */
+    Py_BEGIN_ALLOW_THREADS;
+    pthread_mutex_lock(&gs.parallel_mutex);
+    Py_END_ALLOW_THREADS;
+
     /* Populate parameters for worker threads */
     NpyIter_GetIterIndexRange(iter, &th_params.start, &th_params.vlen);
     /*
@@ -723,7 +735,7 @@ vm_engine_iter_parallel(NpyIter *iter, const vm_params& params,
             for (; i > 0; --i) {
                 NpyIter_Deallocate(th_params.iter[i]);
             }
-            return -1;
+            goto end;
         }
     }
     th_params.memsteps[0] = params.memsteps;
@@ -739,7 +751,7 @@ vm_engine_iter_parallel(NpyIter *iter, const vm_params& params,
             for (i = 0; i < gs.nthreads; ++i) {
                 NpyIter_Deallocate(th_params.iter[i]);
             }
-            return -1;
+            goto end;
         }
         memcpy(th_params.memsteps[i], th_params.memsteps[0],
                 sizeof(npy_intp) *
@@ -778,7 +790,11 @@ vm_engine_iter_parallel(NpyIter *iter, const vm_params& params,
         PyMem_Del(th_params.memsteps[i]);
     }
 
-    return th_params.ret_code;
+    ret = th_params.ret_code;
+
+end:
+    pthread_mutex_unlock(&gs.parallel_mutex);
+    return ret;
 }
 
 static int
@@ -1362,16 +1378,26 @@ NumExpr_run(NumExprObject *self, PyObject *args, PyObject *kwds)
     /* Initialize the output to the reduction unit */
     if (is_reduction) {
         PyArrayObject *a = NpyIter_GetOperandArray(iter)[0];
-        if (last_opcode(self->program) >= OP_SUM &&
-            last_opcode(self->program) < OP_PROD) {
-                PyObject *zero = PyLong_FromLong(0);
-                PyArray_FillWithScalar(a, zero);
-                Py_DECREF(zero);
+        PyObject *fill;
+        int op = last_opcode(self->program);
+        if (op < OP_PROD) {
+            /* sum identity is 0 */
+            fill = PyLong_FromLong(0);
+        } else if (op >= OP_PROD && op < OP_MIN) {
+            /* product identity is 1 */
+            fill = PyLong_FromLong(1);
+        } else if (PyArray_DESCR(a)->kind == 'f') {
+            /* floating point min/max identity is NaN */
+            fill = PyFloat_FromDouble(NAN);
+        } else if (op >= OP_MIN && op < OP_MAX) {
+            /* integer min identity */
+            fill = PyLong_FromLong(LONG_MAX);
         } else {
-                PyObject *one = PyLong_FromLong(1);
-                PyArray_FillWithScalar(a, one);
-                Py_DECREF(one);
+            /* integer max identity */
+            fill = PyLong_FromLong(LONG_MIN);
         }
+        PyArray_FillWithScalar(a, fill);
+        Py_DECREF(fill);
     }
 
     /* Get the sizes of all the operands */
diff --git a/numexpr/module.cpp b/numexpr/module.cpp
index af9ce34..25a371d 100644
--- a/numexpr/module.cpp
+++ b/numexpr/module.cpp
@@ -187,6 +187,7 @@ int init_threads(void)
 
     /* Initialize mutex and condition variable objects */
     pthread_mutex_init(&gs.count_mutex, NULL);
+    pthread_mutex_init(&gs.parallel_mutex, NULL);
 
     /* Barrier initialization */
     pthread_mutex_init(&gs.count_threads_mutex, NULL);
diff --git a/numexpr/module.hpp b/numexpr/module.hpp
index 0234e12..b5397ea 100644
--- a/numexpr/module.hpp
+++ b/numexpr/module.hpp
@@ -27,12 +27,15 @@ struct global_state {
     int force_serial;                /* force serial code instead of parallel? */
     int pid;                         /* the PID for this process */
 
-    /* Syncronization variables */
+    /* Synchronization variables for threadpool state */
     pthread_mutex_t count_mutex;
     int count_threads;
     pthread_mutex_t count_threads_mutex;
     pthread_cond_t count_threads_cv;
 
+    /* Mutual exclusion for access to global thread params (th_params) */
+    pthread_mutex_t parallel_mutex;
+
     global_state() {
         nthreads = 1;
         init_threads_done = 0;
diff --git a/numexpr/necompiler.py b/numexpr/necompiler.py
index ee11aec..89716e8 100644
--- a/numexpr/necompiler.py
+++ b/numexpr/necompiler.py
@@ -11,6 +11,7 @@
 import __future__
 import sys
 import numpy
+import threading
 
 from numexpr import interpreter, expressions, use_vml, is_cpu_amd_intel
 from numexpr.utils import CacheDict
@@ -261,7 +262,8 @@ def stringToExpression(s, types, context):
 
 
 def isReduction(ast):
-    return ast.value.startswith(b'sum_') or ast.value.startswith(b'prod_')
+    prefixes = (b'sum_', b'prod_', b'min_', b'max_')
+    return any(ast.value.startswith(p) for p in prefixes)
 
 
 def getInputOrder(ast, input_order=None):
@@ -684,6 +686,7 @@ def getExprNames(text, context):
 _names_cache = CacheDict(256)
 _numexpr_cache = CacheDict(256)
 
+evaluate_lock = threading.Lock()
 
 def evaluate(ex, local_dict=None, global_dict=None,
              out=None, order='K', casting='safe', **kwargs):
@@ -729,39 +732,40 @@ def evaluate(ex, local_dict=None, global_dict=None,
             like float64 to float32, are allowed.
           * 'unsafe' means any data conversions may be done.
     """
-    if not isinstance(ex, (str, unicode)):
-        raise ValueError("must specify expression as a string")
-    # Get the names for this expression
-    context = getContext(kwargs, frame_depth=1)
-    expr_key = (ex, tuple(sorted(context.items())))
-    if expr_key not in _names_cache:
-        _names_cache[expr_key] = getExprNames(ex, context)
-    names, ex_uses_vml = _names_cache[expr_key]
-    # Get the arguments based on the names.
-    call_frame = sys._getframe(1)
-    if local_dict is None:
-        local_dict = call_frame.f_locals
-    if global_dict is None:
-        global_dict = call_frame.f_globals
-
-    arguments = []
-    for name in names:
+    with evaluate_lock:
+        if not isinstance(ex, (str, unicode)):
+            raise ValueError("must specify expression as a string")
+        # Get the names for this expression
+        context = getContext(kwargs, frame_depth=1)
+        expr_key = (ex, tuple(sorted(context.items())))
+        if expr_key not in _names_cache:
+            _names_cache[expr_key] = getExprNames(ex, context)
+        names, ex_uses_vml = _names_cache[expr_key]
+        # Get the arguments based on the names.
+        call_frame = sys._getframe(1)
+        if local_dict is None:
+            local_dict = call_frame.f_locals
+        if global_dict is None:
+            global_dict = call_frame.f_globals
+    
+        arguments = []
+        for name in names:
+            try:
+                a = local_dict[name]
+            except KeyError:
+                a = global_dict[name]
+            arguments.append(numpy.asarray(a))
+    
+        # Create a signature
+        signature = [(name, getType(arg)) for (name, arg) in zip(names, arguments)]
+    
+        # Look up numexpr if possible.
+        numexpr_key = expr_key + (tuple(signature),)
         try:
-            a = local_dict[name]
+            compiled_ex = _numexpr_cache[numexpr_key]
         except KeyError:
-            a = global_dict[name]
-        arguments.append(numpy.asarray(a))
-
-    # Create a signature
-    signature = [(name, getType(arg)) for (name, arg) in zip(names, arguments)]
-
-    # Look up numexpr if possible.
-    numexpr_key = expr_key + (tuple(signature),)
-    try:
-        compiled_ex = _numexpr_cache[numexpr_key]
-    except KeyError:
-        compiled_ex = _numexpr_cache[numexpr_key] = \
-            NumExpr(ex, signature, **context)
-    kwargs = {'out': out, 'order': order, 'casting': casting,
-              'ex_uses_vml': ex_uses_vml}
-    return compiled_ex(*arguments, **kwargs)
+            compiled_ex = _numexpr_cache[numexpr_key] = \
+                NumExpr(ex, signature, **context)
+        kwargs = {'out': out, 'order': order, 'casting': casting,
+                  'ex_uses_vml': ex_uses_vml}
+        return compiled_ex(*arguments, **kwargs)
diff --git a/numexpr/opcodes.hpp b/numexpr/opcodes.hpp
index 6d02459..086c98e 100644
--- a/numexpr/opcodes.hpp
+++ b/numexpr/opcodes.hpp
@@ -150,19 +150,30 @@ OPCODE(106, OP_REDUCTION, NULL, T0, T0, T0, T0)
 /* Last argument in a reduction is the axis of the array the
    reduction should be applied along. */
 
-OPCODE(107, OP_SUM, NULL, T0, T0, T0, T0)
-OPCODE(108, OP_SUM_IIN, "sum_iin", Ti, Ti, Tn, T0)
-OPCODE(109, OP_SUM_LLN, "sum_lln", Tl, Tl, Tn, T0)
-OPCODE(110, OP_SUM_FFN, "sum_ffn", Tf, Tf, Tn, T0)
-OPCODE(111, OP_SUM_DDN, "sum_ddn", Td, Td, Tn, T0)
-OPCODE(112, OP_SUM_CCN, "sum_ccn", Tc, Tc, Tn, T0)
-
-OPCODE(113, OP_PROD, NULL, T0, T0, T0, T0)
-OPCODE(114, OP_PROD_IIN, "prod_iin", Ti, Ti, Tn, T0)
-OPCODE(115, OP_PROD_LLN, "prod_lln", Tl, Tl, Tn, T0)
-OPCODE(116, OP_PROD_FFN, "prod_ffn", Tf, Tf, Tn, T0)
-OPCODE(117, OP_PROD_DDN, "prod_ddn", Td, Td, Tn, T0)
-OPCODE(118, OP_PROD_CCN, "prod_ccn", Tc, Tc, Tn, T0)
+OPCODE(107, OP_SUM_IIN, "sum_iin", Ti, Ti, Tn, T0)
+OPCODE(108, OP_SUM_LLN, "sum_lln", Tl, Tl, Tn, T0)
+OPCODE(109, OP_SUM_FFN, "sum_ffn", Tf, Tf, Tn, T0)
+OPCODE(110, OP_SUM_DDN, "sum_ddn", Td, Td, Tn, T0)
+OPCODE(111, OP_SUM_CCN, "sum_ccn", Tc, Tc, Tn, T0)
+
+OPCODE(112, OP_PROD, NULL, T0, T0, T0, T0)
+OPCODE(113, OP_PROD_IIN, "prod_iin", Ti, Ti, Tn, T0)
+OPCODE(114, OP_PROD_LLN, "prod_lln", Tl, Tl, Tn, T0)
+OPCODE(115, OP_PROD_FFN, "prod_ffn", Tf, Tf, Tn, T0)
+OPCODE(116, OP_PROD_DDN, "prod_ddn", Td, Td, Tn, T0)
+OPCODE(117, OP_PROD_CCN, "prod_ccn", Tc, Tc, Tn, T0)
+
+OPCODE(118, OP_MIN, NULL, T0, T0, T0, T0)
+OPCODE(119, OP_MIN_IIN, "min_iin", Ti, Ti, Tn, T0)
+OPCODE(120, OP_MIN_LLN, "min_lln", Tl, Tl, Tn, T0)
+OPCODE(121, OP_MIN_FFN, "min_ffn", Tf, Tf, Tn, T0)
+OPCODE(122, OP_MIN_DDN, "min_ddn", Td, Td, Tn, T0)
+
+OPCODE(123, OP_MAX, NULL, T0, T0, T0, T0)
+OPCODE(124, OP_MAX_IIN, "max_iin", Ti, Ti, Tn, T0)
+OPCODE(125, OP_MAX_LLN, "max_lln", Tl, Tl, Tn, T0)
+OPCODE(126, OP_MAX_FFN, "max_ffn", Tf, Tf, Tn, T0)
+OPCODE(127, OP_MAX_DDN, "max_ddn", Td, Td, Tn, T0)
 
 /* Should be the last opcode */
-OPCODE(119, OP_END, NULL, T0, T0, T0, T0)
+OPCODE(128, OP_END, NULL, T0, T0, T0, T0)
diff --git a/numexpr/tests/test_numexpr.py b/numexpr/tests/test_numexpr.py
index 59fd19c..a971a95 100644
--- a/numexpr/tests/test_numexpr.py
+++ b/numexpr/tests/test_numexpr.py
@@ -93,21 +93,29 @@ class test_numexpr(TestCase):
                       (b'add_ddd', b't3', b't3', b'c2[2.0]'),
                       (b'prod_ddn', b'r0', b't3', 2)])
         # Check that full reductions work.
-        x = zeros(1e5) + .01  # checks issue #41
+        x = zeros(100000) + .01  # checks issue #41
         assert_allclose(evaluate("sum(x+2,axis=None)"), sum(x + 2, axis=None))
         assert_allclose(evaluate("sum(x+2,axis=0)"), sum(x + 2, axis=0))
         assert_allclose(evaluate("prod(x,axis=0)"), prod(x, axis=0))
+        assert_allclose(evaluate("min(x)"), np.min(x))
+        assert_allclose(evaluate("max(x,axis=0)"), np.max(x, axis=0))
 
         x = arange(10.0)
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
         assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("min(x**2+2,axis=0)"), np.min(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("max(x**2+2,axis=0)"), np.max(x ** 2 + 2, axis=0))
 
         x = arange(100.0)
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
         assert_allclose(evaluate("prod(x-1,axis=0)"), prod(x - 1, axis=0))
+        assert_allclose(evaluate("min(x-1,axis=0)"), np.min(x - 1, axis=0))
+        assert_allclose(evaluate("max(x-1,axis=0)"), np.max(x - 1, axis=0))
         x = linspace(0.1, 1.0, 2000)
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
         assert_allclose(evaluate("prod(x-1,axis=0)"), prod(x - 1, axis=0))
+        assert_allclose(evaluate("min(x-1,axis=0)"), np.min(x - 1, axis=0))
+        assert_allclose(evaluate("max(x-1,axis=0)"), np.max(x - 1, axis=0))
 
         # Check that reductions along an axis work
         y = arange(9.0).reshape(3, 3)
@@ -117,15 +125,25 @@ class test_numexpr(TestCase):
         assert_allclose(evaluate("prod(y**2, axis=1)"), prod(y ** 2, axis=1))
         assert_allclose(evaluate("prod(y**2, axis=0)"), prod(y ** 2, axis=0))
         assert_allclose(evaluate("prod(y**2, axis=None)"), prod(y ** 2, axis=None))
+        assert_allclose(evaluate("min(y**2, axis=1)"), np.min(y ** 2, axis=1))
+        assert_allclose(evaluate("min(y**2, axis=0)"), np.min(y ** 2, axis=0))
+        assert_allclose(evaluate("min(y**2, axis=None)"), np.min(y ** 2, axis=None))
+        assert_allclose(evaluate("max(y**2, axis=1)"), np.max(y ** 2, axis=1))
+        assert_allclose(evaluate("max(y**2, axis=0)"), np.max(y ** 2, axis=0))
+        assert_allclose(evaluate("max(y**2, axis=None)"), np.max(y ** 2, axis=None))
         # Check integers
         x = arange(10.)
         x = x.astype(int)
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
         assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("min(x**2+2,axis=0)"), np.min(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("max(x**2+2,axis=0)"), np.max(x ** 2 + 2, axis=0))
         # Check longs
         x = x.astype(long)
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
         assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("min(x**2+2,axis=0)"), np.min(x ** 2 + 2, axis=0))
+        assert_allclose(evaluate("max(x**2+2,axis=0)"), np.max(x ** 2 + 2, axis=0))
         # Check complex
         x = x + .1j
         assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x ** 2 + 2, axis=0))
@@ -841,6 +859,7 @@ class test_threading_config(TestCase):
 
 # Case test for threads
 class test_threading(TestCase):
+
     def test_thread(self):
         import threading
 
@@ -851,6 +870,25 @@ class test_threading(TestCase):
 
         test = ThreadTest()
         test.start()
+        test.join()
+
+    def test_multithread(self):
+        import threading
+
+        # Running evaluate() from multiple threads shouldn't crash
+        def work(n):
+            a = arange(n)
+            evaluate('a+a')
+
+        work(10)  # warm compilation cache
+
+        nthreads = 30
+        threads = [threading.Thread(target=work, args=(1e5,))
+                   for i in range(nthreads)]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join()
 
 
 # The worker function for the subprocess (needs to be here because Windows
diff --git a/numexpr/version.py b/numexpr/version.py
index 400d234..4393ee8 100644
--- a/numexpr/version.py
+++ b/numexpr/version.py
@@ -8,4 +8,4 @@
 #  rights to use.
 ####################################################################
 
-version = '2.4.6'
+version = '2.5'

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/numexpr.git