[SCM] Packaging for numexpr branch, upstream, updated. b73b60b9a4464e295f837f531c92d212825192fc
Antonio Valentino
antonio.valentino at tiscali.it
Sun Mar 11 17:36:41 UTC 2012
The following commit has been merged in the upstream branch:
commit b73b60b9a4464e295f837f531c92d212825192fc
Author: Antonio Valentino <antonio.valentino at tiscali.it>
Date: Sat Mar 10 19:14:03 2012 +0100
Imported Upstream version 2.0.1
diff --git a/ANNOUNCE.txt b/ANNOUNCE.txt
index 1e6cf69..5d4563a 100644
--- a/ANNOUNCE.txt
+++ b/ANNOUNCE.txt
@@ -1,18 +1,21 @@
==========================
- Announcing Numexpr 1.4.2
+ Announcing Numexpr 2.0.1
==========================
Numexpr is a fast numerical expression evaluator for NumPy. With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.
+It wears multi-threaded capabilities, as well as support for Intel's
+VML library, which allows for squeezing the last drop of performance
+out of your multi-core processors.
+
What's new
==========
-This is a maintenance release. The most annying issues have been
-fixed (including the reduction mal-function introduced in 1.4 series).
-Also, several performance enhancements (specially for VML and small
-array operation) are included too.
+In this release, better docstrings for `evaluate` and reduction
+methods (`sum`, `prod`) is in place. Also, compatibility with Python
+2.5 has been restored (2.4 is definitely not supported anymore).
In case you want to know more in detail what has changed in this
version, see:
diff --git a/AUTHORS.txt b/AUTHORS.txt
new file mode 100644
index 0000000..5ff9ab3
--- /dev/null
+++ b/AUTHORS.txt
@@ -0,0 +1,18 @@
+Numexpr was initially written by David Cooke, and extended to more
+types by Tim Hochberg.
+
+Francesc Alted contributed support for booleans and simple-precision
+floating point types, efficient strided and unaligned array operations
+and multi-threading code.
+
+Ivan Vilata contributed support for strings.
+
+Gregor Thalhammer implemented the support for Intel VML (Vector Math
+Library).
+
+Mark Wiebe added support for the new iterator in NumPy, which allows
+for better performance in more scenarios (like broadcasting,
+fortran-ordered or non-native byte orderings).
+
+Gaëtan de Menten contributed important bug fixes and speed
+enhancements.
diff --git a/INSTALL.txt b/INSTALL.txt
index f12866d..5881d90 100644
--- a/INSTALL.txt
+++ b/INSTALL.txt
@@ -7,7 +7,9 @@ VML support.
Building
========
-`Numexpr` requires Python 2.4 or greater, and NumPy 1.0 or greater.
+This version of `Numexpr` requires Python 2.5 or greater,
+and NumPy 1.6 or greater.
+
It's built in the standard Python way:
$ python setup.py build
@@ -21,10 +23,10 @@ $ python -c "import numexpr; numexpr.test()"
Enabling Intel's VML support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Starting from release 1.2 on, numexpr includes support for Intel's VML
-library. This allows for better performance on Intel architectures,
-mainly when evaluating transcendental functions (trigonometrical,
-exponential...). It also enables numexpr using several CPU cores.
+numexpr includes support for Intel's VML library. This allows for
+better performance on Intel architectures, mainly when evaluating
+transcendental functions (trigonometrical, exponential...). It also
+enables numexpr using several CPU cores.
If you have Intel's MKL (the library that embeds VML), just copy the
`site.cfg.example` that comes in the distribution to `site.cfg` and
diff --git a/LICENSE.txt b/LICENSE.txt
index c583780..de9a582 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -1,5 +1,6 @@
Copyright (c) 2007,2008 David M. Cooke <david.m.cooke at gmail.com>
-Copyright (c) 2009 Francesc Alted <faltet at pytables.org>
+Copyright (c) 2009,2010 Francesc Alted <faltet at pytables.org>
+Copyright (c) 2011- See AUTHORS.txt
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
diff --git a/PKG-INFO b/PKG-INFO
index f6a0d7b..23e96a2 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,6 +1,6 @@
Metadata-Version: 1.0
Name: numexpr
-Version: 1.4.2
+Version: 2.0.1
Summary: Fast numerical expression evaluator for NumPy
Home-page: http://code.google.com/p/numexpr/
Author: David M. Cooke, Francesc Alted and others
diff --git a/README.txt b/README.txt
index da2c7ee..1267c4d 100644
--- a/README.txt
+++ b/README.txt
@@ -5,6 +5,14 @@ Numexpr is a fast numerical expression evaluator for NumPy. With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.
+In addition, its multi-threaded capabilities can make use of all your
+cores, which may accelerate computations, most specially if they are
+not memory-bounded (e.g. those using transcendental functions).
+
+Last but not least, numexpr can make use of Intel's VML (Vector Math
+Library, normally integrated in its Math Kernel Library, or MKL).
+This allows further acceleration of transcendent expressions.
+
Examples of use
===============
@@ -101,49 +109,63 @@ Supported functions
The next are the current supported set:
- * where(bool, number1, number2): number
- Number1 if the bool condition is true, number2 otherwise.
- * {sin,cos,tan}(float|complex): float|complex
- Trigonometric sine, cosine or tangent.
- * {arcsin,arccos,arctan}(float|complex): float|complex
- Trigonometric inverse sine, cosine or tangent.
- * arctan2(float1, float2): float
- Trigonometric inverse tangent of float1/float2.
- * {sinh,cosh,tanh}(float|complex): float|complex
- Hyperbolic sine, cosine or tangent.
- * {arcsinh,arccosh,arctanh}(float|complex): float|complex
- Hyperbolic inverse sine, cosine or tangent.
- * {log,log10,log1p}(float|complex): float|complex
- Natural, base-10 and log(1+x) logarithms.
- * {exp,expm1}(float|complex): float|complex
- Exponential and exponential minus one.
- * sqrt(float|complex): float|complex
- Square root.
- * abs(float|complex): float|complex
- Absolute value.
- * {real,imag}(complex): float
- Real or imaginary part of complex.
- * complex(float, float): complex
- Complex from real and imaginary parts.
+ * where(bool, number1, number2): number
+ Number1 if the bool condition is true, number2 otherwise.
+ * {sin,cos,tan}(float|complex): float|complex
+ Trigonometric sine, cosine or tangent.
+ * {arcsin,arccos,arctan}(float|complex): float|complex
+ Trigonometric inverse sine, cosine or tangent.
+ * arctan2(float1, float2): float
+ Trigonometric inverse tangent of float1/float2.
+ * {sinh,cosh,tanh}(float|complex): float|complex
+ Hyperbolic sine, cosine or tangent.
+ * {arcsinh,arccosh,arctanh}(float|complex): float|complex
+ Hyperbolic inverse sine, cosine or tangent.
+ * {log,log10,log1p}(float|complex): float|complex
+ Natural, base-10 and log(1+x) logarithms.
+ * {exp,expm1}(float|complex): float|complex
+ Exponential and exponential minus one.
+ * sqrt(float|complex): float|complex
+ Square root.
+ * abs(float|complex): float|complex
+ Absolute value.
+ * {real,imag}(complex): float
+ Real or imaginary part of complex.
+ * complex(float, float): complex
+ Complex from real and imaginary parts.
.. Notes:
- + `abs()` for complex inputs returns a ``complex`` output too.
- This is a departure from NumPy where a ``float`` is returned
- instead. However, Numexpr is not flexible enough yet so as to
- allow this to happen. Meanwhile, if you want to mimic NumPy
- behaviour, you may want to select the real part via the ``real``
- function (e.g. "real(abs(cplx))") or via the ``real`` selector
- (e.g. "abs(cplx).real").
+ + `abs()` for complex inputs returns a ``complex`` output too. This
+ is a departure from NumPy where a ``float`` is returned instead.
+ However, Numexpr is not flexible enough yet so as to allow this to
+ happen. Meanwhile, if you want to mimic NumPy behaviour, you may
+ want to select the real part via the ``real`` function
+ (e.g. "real(abs(cplx))") or via the ``real`` selector
+ (e.g. "abs(cplx).real").
More functions can be added if you need them.
+Supported reduction operations
+==============================
+
+The next are the current supported set:
+
+ * sum(number, axis=None): Sum of array elements over a given axis.
+ Negative axis are not supported.
+
+ * prod(number, axis=None): Product of array elements over a given
+ axis. Negative axis are not supported.
+
+
General routines
================
- * evaluate(expression, local_dict=None, global_dict=None, **kwargs):
- Evaluate a simple array expression element-wise. See examples above.
+ * evaluate(expression, local_dict=None, global_dict=None,
+ out=None, order='K', casting='safe', **kwargs):
+ Evaluate a simple array expression element-wise. See docstrings
+ for more info on parameters. Also, see examples above.
* test(): Run all the tests in the test suite.
@@ -246,12 +268,7 @@ http://code.google.com/p/numexpr/wiki/Overview
Authors
=======
-Numexpr was initially written by David Cooke, and extended to more
-types by Tim Hochberg. Francesc Alted contributed support for
-booleans and simple-precision floating point types, efficient strided
-and unaligned array operations and multi-threading code. Ivan Vilata
-contributed support for strings. Gregor Thalhammer implemented the
-support for Intel VML (Vector Math Library).
+See AUTHORS.txt
License
diff --git a/RELEASE_NOTES.txt b/RELEASE_NOTES.txt
index 7ae1294..88beda3 100644
--- a/RELEASE_NOTES.txt
+++ b/RELEASE_NOTES.txt
@@ -1,7 +1,48 @@
======================================
- Release notes for Numexpr 1.4 series
+ Release notes for Numexpr 2.0 series
======================================
+Changes from 2.0 series to 2.0.1
+================================
+
+* Added compatibility with Python 2.5 (2.4 is definitely not supported
+ anymore).
+
+* `numexpr.evaluate` is fully documented now, in particular the new
+ `out`, `order` and `casting` parameters.
+
+* Reduction operations are fully documented now.
+
+* Negative axis in reductions are not supported (they have never been
+ actually), and a `ValueError` will be raised if they are used.
+
+
+Changes from 1.x series to 2.0
+==============================
+
+- Added support for the new iterator object in NumPy 1.6 and later.
+
+ This allows for better performance with operations that implies
+ broadcast operations, fortran-ordered or non-native byte orderings.
+ Performance for other scenarios is preserved (except for very small
+ arrays).
+
+- Division in numexpr is consistent now with Python/NumPy. Fixes #22
+ and #58.
+
+- Constants like "2." or "2.0" must be evaluated as float, not
+ integer. Fixes #59.
+
+- `evaluate()` function has received a new parameter `out` for storing
+ the result in already allocated arrays. This is very useful when
+ dealing with large arrays, and a allocating new space for keeping
+ the result is not acceptable. Closes #56.
+
+- Maximum number of threads raised from 256 to 4096. Machines with a
+ higher number of cores will still be able to import numexpr, but
+ limited to 4096 (which is an absurdly high number already).
+
+
Changes from 1.4.1 to 1.4.2
===========================
diff --git a/bench/boolean_timing.py b/bench/boolean_timing.py
index cd0e81e..711287e 100644
--- a/bench/boolean_timing.py
+++ b/bench/boolean_timing.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import sys
import timeit
import numpy
diff --git a/bench/issue-47.py b/bench/issue-47.py
new file mode 100644
index 0000000..31c68a6
--- /dev/null
+++ b/bench/issue-47.py
@@ -0,0 +1,7 @@
+import numpy
+import numexpr
+
+numexpr.set_num_threads(8)
+x0,x1,x2,x3,x4,x5 = [0,1,2,3,4,5]
+t = numpy.linspace(0,1,44100000).reshape(-1,1)
+numexpr.evaluate('(x0+x1*t+x2*t**2)* cos(x3+x4*t+x5**t)')
diff --git a/bench/latency.py b/bench/latency.py
deleted file mode 100644
index 9fe7802..0000000
--- a/bench/latency.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# Benchmark for checking if numexpr leaks memory when evaluating
-# expressions that changes continously. It also serves for computing
-# the latency of numexpr when working with small arrays.
-
-import sys
-from time import time
-import numpy as np
-import numexpr as ne
-
-N = 1000*10
-M = 1000
-
-print "Number of iterations %s. Length of the array: %s " % (N, M)
-
-a = np.arange(M)
-
-print "Expressions that *cannot* be cached:"
-t1 = time()
-for i in xrange(N):
- r = ne.evaluate("2 * a + (a + 1) ** 2 == %d" % i)
- if i % 1000 == 0:
- sys.stdout.write('.')
-
-print "\nEvaluated %s iterations in: %s seconds" % (N, round(time()-t1,3))
-
-print "Expressions that *can* be cached:"
-t1 = time()
-for i in xrange(N):
- r = ne.evaluate("2 * a + (a + 1) ** 2 == i")
- if i % 1000 == 0:
- sys.stdout.write('.')
-
-print "\nEvaluated %s iterations in: %s seconds" % (N, round(time()-t1,3))
diff --git a/bench/multidim.py b/bench/multidim.py
index 5bc6ae6..c3ddffe 100644
--- a/bench/multidim.py
+++ b/bench/multidim.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
# Script to check that multidimensional arrays are speed-up properly too
# Based on a script provided by Andrew Collette.
diff --git a/bench/poly.py b/bench/poly.py
index 56b9d88..68e295e 100644
--- a/bench/poly.py
+++ b/bench/poly.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
#######################################################################
# This script compares the speed of the computation of a polynomial
# for different (numpy and numexpr) in-memory paradigms.
diff --git a/bench/timing.py b/bench/timing.py
index 7c310fa..6c02614 100644
--- a/bench/timing.py
+++ b/bench/timing.py
@@ -1,7 +1,17 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import timeit, numpy
array_size = 1e6
-iterations = 10
+iterations = 2
# Choose the type you want to benchmark
#dtype = 'int8'
diff --git a/bench/unaligned-simple.py b/bench/unaligned-simple.py
index 9bed22d..9edfb59 100644
--- a/bench/unaligned-simple.py
+++ b/bench/unaligned-simple.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
"""Very simple test that compares the speed of operating with
aligned vs unaligned arrays.
"""
@@ -13,7 +23,7 @@ shape = (1000, 10000) # multidimensional test
ne.print_versions()
Z_fast = np.zeros(shape, dtype=[('x',np.float64),('y',np.int64)])
-Z_slow = np.zeros(shape, dtype=[('x',np.float64),('y',np.bool)])
+Z_slow = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])
x_fast = Z_fast['x']
t = Timer("x_fast * x_fast", "from __main__ import x_fast")
diff --git a/bench/varying-expr.py b/bench/varying-expr.py
index a851f31..b9e1250 100644
--- a/bench/varying-expr.py
+++ b/bench/varying-expr.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
# Benchmark for checking if numexpr leaks memory when evaluating
# expressions that changes continously. It also serves for computing
# the latency of numexpr when working with small arrays.
@@ -7,33 +17,37 @@ from time import time
import numpy as np
import numexpr as ne
-N = 1000*10
-M = 1000
+N = 100
+M = 10
+
+def timed_eval(eval_func, expr_func):
+ t1 = time()
+ for i in xrange(N):
+ r = eval_func(expr_func(i))
+ if i % 10 == 0:
+ sys.stdout.write('.')
+ print " done in %s seconds" % round(time() - t1, 3)
print "Number of iterations %s. Length of the array: %s " % (N, M)
a = np.arange(M)
-print "Expressions that *cannot* be cached:"
-t1 = time()
-for i in xrange(N):
- r = ne.evaluate("2 * a + (a + 1) ** 2 == %d" % i)
- if i % 1000 == 0:
- sys.stdout.write('.')
-print "\nEvaluated %s iterations in: %s seconds" % (N, round(time()-t1,3))
-
-print "Expressions that *can* be cached:"
-t1 = time()
-for i in xrange(N):
- r = ne.evaluate("2 * a + (a + 1) ** 2 == i")
- if i % 1000 == 0:
- sys.stdout.write('.')
-print "\nEvaluated %s iterations in: %s seconds" % (N, round(time()-t1,3))
-
-print "Using python virtual machine with non-cacheable expressions:"
-t1 = time()
-for i in xrange(N):
- r = eval("2 * a + (a + 1) ** 2 == %d" % i)
- if i % 1000 == 0:
- sys.stdout.write('.')
-print "\nEvaluated %s iterations in: %s seconds" % (N, round(time()-t1,3))
+# lots of duplicates to collapse
+#expr = '+'.join('(a + 1) * %d' % i for i in range(50))
+# no duplicate to collapse
+expr = '+'.join('(a + %d) * %d' % (i, i) for i in range(50))
+
+def non_cacheable(i):
+ return expr + '+ %d' % i
+
+def cacheable(i):
+ return expr + '+ i'
+
+print "* Numexpr with non-cacheable expressions: ",
+timed_eval(ne.evaluate, non_cacheable)
+print "* Numexpr with cacheable expressions: ",
+timed_eval(ne.evaluate, cacheable)
+print "* Numpy with non-cacheable expressions: ",
+timed_eval(eval, non_cacheable)
+print "* Numpy with cacheable expressions: ",
+timed_eval(eval, cacheable)
diff --git a/bench/vml_timing.py b/bench/vml_timing.py
index 1f8b3c3..7775d11 100644
--- a/bench/vml_timing.py
+++ b/bench/vml_timing.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import sys
import timeit
import numpy
@@ -71,7 +81,8 @@ def compare_times(expr, nexpr):
setupNP = """\
from numpy import arange, linspace, arctan2, sqrt, sin, cos, exp, log
from numpy import rec as records
-from numexpr import evaluate
+#from numexpr import evaluate
+from numexpr import %s
# Initialize a recarray of 16 MB in size
r=records.array(None, formats='a%s,i4,f4,f8', shape=%s)
@@ -85,11 +96,14 @@ f3[:] = linspace(0,1,len(i2))
f4[:] = f3*1.23
"""
-setupNP_contiguous = setupNP % ((4, array_size,) + \
+eval_method = "evaluate"
+setupNP_contiguous = setupNP % ((eval_method, 4, array_size,) + \
(".copy()",)*4 + \
(array_size,))
-setupNP_strided = setupNP % (4, array_size, "", "", "", "", array_size)
-setupNP_unaligned = setupNP % (1, array_size, "", "", "", "", array_size)
+setupNP_strided = setupNP % (eval_method, 4, array_size,
+ "", "", "", "", array_size)
+setupNP_unaligned = setupNP % (eval_method, 1, array_size,
+ "", "", "", "", array_size)
expressions = []
@@ -117,6 +131,8 @@ if __name__ == '__main__':
import numexpr
numexpr.print_versions()
+ numpy.seterr(all='ignore')
+
numexpr.set_vml_accuracy_mode('low')
numexpr.set_vml_num_threads(2)
@@ -132,6 +148,7 @@ if __name__ == '__main__':
ntratios = numpy.array(numpy_nttime) / numpy.array(numexpr_nttime)
+ print "eval method: %s" % eval_method
print "*************** Numexpr vs NumPy speed-ups *******************"
# print "numpy total:", sum(numpy_ttime)/iterations
# print "numpy strided total:", sum(numpy_sttime)/iterations
diff --git a/numexpr.egg-info/PKG-INFO b/numexpr.egg-info/PKG-INFO
index f6a0d7b..23e96a2 100644
--- a/numexpr.egg-info/PKG-INFO
+++ b/numexpr.egg-info/PKG-INFO
@@ -1,6 +1,6 @@
Metadata-Version: 1.0
Name: numexpr
-Version: 1.4.2
+Version: 2.0.1
Summary: Fast numerical expression evaluator for NumPy
Home-page: http://code.google.com/p/numexpr/
Author: David M. Cooke, Francesc Alted and others
diff --git a/numexpr.egg-info/SOURCES.txt b/numexpr.egg-info/SOURCES.txt
index ea3af86..278a7ba 100644
--- a/numexpr.egg-info/SOURCES.txt
+++ b/numexpr.egg-info/SOURCES.txt
@@ -1,4 +1,5 @@
ANNOUNCE.txt
+AUTHORS.txt
INSTALL.txt
LICENSE.txt
MANIFEST.in
@@ -10,7 +11,7 @@ site.cfg.example
./numexpr/tests/test_numexpr.py
bench/boolean_timing.py
bench/issue-36.py
-bench/latency.py
+bench/issue-47.py
bench/multidim.py
bench/poly.py
bench/timing.py
diff --git a/numexpr/__init__.py b/numexpr/__init__.py
index 69b5d10..24c0b41 100644
--- a/numexpr/__init__.py
+++ b/numexpr/__init__.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
"""
Numexpr is a fast numerical expression evaluator for NumPy. With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
@@ -34,6 +44,9 @@ from numexpr.utils import (
# Initialize the number of threads to be used
ncores = detect_number_of_cores()
+# Check that we don't surpass the MAX_THREADS in interpreter.c
+if ncores > 4096:
+ ncores = 4096
set_num_threads(ncores)
# The default for VML is 1 thread (see #39)
set_vml_num_threads(1)
diff --git a/numexpr/complex_functions.inc b/numexpr/complex_functions.inc
index b3971a7..77e2be2 100644
--- a/numexpr/complex_functions.inc
+++ b/numexpr/complex_functions.inc
@@ -1,3 +1,11 @@
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
/* constants */
diff --git a/numexpr/cpuinfo.py b/numexpr/cpuinfo.py
index c353382..9fb3fe9 100644
--- a/numexpr/cpuinfo.py
+++ b/numexpr/cpuinfo.py
@@ -1,4 +1,15 @@
#!/usr/bin/env python
+
+###################################################################
+# cpuinfo - Get information about CPU
+#
+# License: BSD
+# Author: Pearu Peterson <pearu at cens.ioc.ee>
+#
+# See LICENSES/cpuinfo.txt for details about copyright and
+# rights to use.
+####################################################################
+
"""
cpuinfo
@@ -25,7 +36,7 @@ def getoutput(cmd, successful_status=(0,), stacklevel=1):
status, output = commands.getstatusoutput(cmd)
except EnvironmentError, e:
warnings.warn(str(e), UserWarning, stacklevel=stacklevel)
- return False, output
+ return False, ''
if os.WIFEXITED(status) and os.WEXITSTATUS(status) in successful_status:
return True, output
return False, output
diff --git a/numexpr/expressions.py b/numexpr/expressions.py
index bee19e8..abd1a9a 100644
--- a/numexpr/expressions.py
+++ b/numexpr/expressions.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
__all__ = ['E']
import operator
@@ -60,7 +70,7 @@ def ophelper(f):
if isConstant(x):
args[i] = x = ConstantNode(x)
if not isinstance(x, ExpressionNode):
- raise TypeError("unsupported object type: %s" % (type(x),))
+ raise TypeError("unsupported object type: %s" % type(x))
return f(*args)
func.__name__ = f.__name__
func.__doc__ = f.__doc__
@@ -92,6 +102,7 @@ def commonKind(nodes):
max_int32 = 2147483647
min_int32 = -max_int32 - 1
+
def bestConstantType(x):
if isinstance(x, str): # ``numpy.string_`` is a subclass of ``str``
return str
@@ -103,7 +114,7 @@ def bestConstantType(x):
# ``double`` objects are kept as is to allow the user to force
# promotion of results by using double constants, e.g. by operating
# a float (32-bit) array with a double (64-bit) constant.
- if isinstance(x, (double)):
+ if isinstance(x, double):
return double
# Numeric conversion to boolean values is not tried because
# ``bool(1) == True`` (same for 0 and False), so 0 and 1 would be
@@ -111,23 +122,24 @@ def bestConstantType(x):
# supported.
if isinstance(x, (bool, numpy.bool_)):
return bool
- # ``long`` is not explicitly needed since ``int`` automatically
- # returns longs when needed (since Python 2.3).
+ if isinstance(x, (int, numpy.integer)):
+ # Constants needing more than 32 bits are always
+ # considered ``long``, *regardless of the platform*, so we
+ # can clearly tell 32- and 64-bit constants apart.
+ if not (min_int32 <= x <= max_int32):
+ return long
+ return int
# The duality of float and double in Python avoids that we have to list
# ``double`` too.
- for converter in int, float, complex:
+ for converter in float, complex:
try:
y = converter(x)
except StandardError, err:
continue
- if x == y:
- # Constants needing more than 32 bits are always
- # considered ``long``, *regardless of the platform*, so we
- # can clearly tell 32- and 64-bit constants apart.
- if converter is int and not (min_int32 <= x <= max_int32):
- return long
+ if y == x:
return converter
+
def getKind(x):
converter = bestConstantType(x)
return type_to_kind[converter]
@@ -154,6 +166,9 @@ def func(func, minkind=None, maxkind=None):
kind = commonKind(args)
if kind in ('int', 'long'):
# Exception for following NumPy casting rules
+ #FIXME: this is not always desirable. The following
+ # functions which return ints (for int inputs) on numpy
+ # but not on numexpr: copy, abs, fmod, ones_like
kind = 'double'
else:
# Apply regular casting rules
@@ -167,6 +182,7 @@ def func(func, minkind=None, maxkind=None):
@ophelper
def where_func(a, b, c):
if isinstance(a, ConstantNode):
+ #FIXME: This prevents where(True, a, b)
raise ValueError("too many dimensions")
if allConstantNodes([a,b,c]):
return ConstantNode(numpy.where(a, b, c))
@@ -179,12 +195,12 @@ def encode_axis(axis):
axis = interpreter.allaxes
else:
if axis < 0:
- axis = interpreter.maxdims - axis
+ raise ValueError("negative axis are not supported")
if axis > 254:
raise ValueError("cannot encode axis")
return RawNode(axis)
-def sum_func(a, axis=-1):
+def sum_func(a, axis=None):
axis = encode_axis(axis)
if isinstance(a, ConstantNode):
return a
@@ -192,7 +208,7 @@ def sum_func(a, axis=-1):
a = ConstantNode(a)
return FuncNode('sum', [a, axis], kind=a.astKind)
-def prod_func(a, axis=-1):
+def prod_func(a, axis=None):
axis = encode_axis(axis)
if isinstance(a, (bool, int, long, float, double, complex)):
a = ConstantNode(a)
@@ -210,8 +226,24 @@ def div_op(a, b):
return OpNode('div', [a,b])
@ophelper
+def truediv_op(a, b):
+ if get_optimization() in ('moderate', 'aggressive'):
+ if (isinstance(b, ConstantNode) and
+ (a.astKind == b.astKind) and
+ a.astKind in ('float', 'double', 'complex')):
+ return OpNode('mul', [a, ConstantNode(1./b.value)])
+ kind = commonKind([a, b])
+ if kind in ('bool', 'int', 'long'):
+ kind = 'double'
+ return OpNode('div', [a, b], kind=kind)
+
+ at ophelper
+def rtruediv_op(a, b):
+ return truediv_op(b, a)
+
+ at ophelper
def pow_op(a, b):
- if allConstantNodes([a,b]):
+ if allConstantNodes([a, b]):
return ConstantNode(a**b)
if isinstance(b, ConstantNode):
x = b.value
@@ -237,7 +269,8 @@ def pow_op(a, b):
p = OpNode('mul', [p,p])
if ishalfpower:
kind = commonKind([a])
- if kind in ('int', 'long'): kind = 'double'
+ if kind in ('int', 'long'):
+ kind = 'double'
r = multiply(r, OpNode('sqrt', [a], kind))
if r is None:
r = OpNode('ones_like', [a])
@@ -248,7 +281,7 @@ def pow_op(a, b):
if x == -1:
return OpNode('div', [ConstantNode(1),a])
if x == 0:
- return FuncNode('ones_like', [a])
+ return OpNode('ones_like', [a])
if x == 0.5:
kind = a.astKind
if kind in ('int', 'long'): kind = 'double'
@@ -345,12 +378,22 @@ class ExpressionNode(object):
def __pos__(self):
return self
+ # The next check is commented out. See #24 for more info.
+
+ # def __nonzero__(self):
+ # raise TypeError("You can't use Python's standard boolean operators in "
+ # "NumExpr expressions. You should use their bitwise "
+ # "counterparts instead: '&' instead of 'and', "
+ # "'|' instead of 'or', and '~' instead of 'not'.")
+
__add__ = __radd__ = binop('add')
__sub__ = binop('sub')
__rsub__ = binop('sub', reversed=True)
__mul__ = __rmul__ = binop('mul')
- __div__ = div_op
+ __div__ = div_op
__rdiv__ = binop('div', reversed=True)
+ __truediv__ = truediv_op
+ __rtruediv__ = rtruediv_op
__pow__ = pow_op
__rpow__ = binop('pow', reversed=True)
__mod__ = binop('mod')
diff --git a/numexpr/functions.inc b/numexpr/functions.inc
index 3466770..4db313d 100644
--- a/numexpr/functions.inc
+++ b/numexpr/functions.inc
@@ -1,4 +1,12 @@
// -*- c-mode -*-
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
/* These #if blocks make it easier to query this file, without having
to define every row function before #including it. */
diff --git a/numexpr/interp_body.c b/numexpr/interp_body.c
index 050e039..11be655 100644
--- a/numexpr/interp_body.c
+++ b/numexpr/interp_body.c
@@ -1,7 +1,25 @@
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
+
+
{
#define VEC_LOOP(expr) for(j = 0; j < BLOCK_SIZE; j++) { \
expr; \
}
+
+#define VEC_ARG0(expr) \
+ BOUNDS_CHECK(store_in); \
+ { \
+ char *dest = mem[store_in]; \
+ VEC_LOOP(expr); \
+ } break
+
#define VEC_ARG1(expr) \
BOUNDS_CHECK(store_in); \
BOUNDS_CHECK(arg1); \
@@ -9,7 +27,7 @@
char *dest = mem[store_in]; \
char *x1 = mem[arg1]; \
intp ss1 = params.memsizes[arg1]; \
- intp sb1 = params.memsteps[arg1]; \
+ intp sb1 = memsteps[arg1]; \
/* nowarns is defined and used so as to \
avoid compiler warnings about unused \
variables */ \
@@ -17,6 +35,7 @@
nowarns += 1; \
VEC_LOOP(expr); \
} break
+
#define VEC_ARG2(expr) \
BOUNDS_CHECK(store_in); \
BOUNDS_CHECK(arg1); \
@@ -25,14 +44,14 @@
char *dest = mem[store_in]; \
char *x1 = mem[arg1]; \
intp ss1 = params.memsizes[arg1]; \
- intp sb1 = params.memsteps[arg1]; \
+ intp sb1 = memsteps[arg1]; \
/* nowarns is defined and used so as to \
avoid compiler warnings about unused \
variables */ \
intp nowarns = ss1+sb1+*x1; \
char *x2 = mem[arg2]; \
intp ss2 = params.memsizes[arg2]; \
- intp sb2 = params.memsteps[arg2]; \
+ intp sb2 = memsteps[arg2]; \
nowarns += ss2+sb2+*x2; \
VEC_LOOP(expr); \
} break
@@ -46,17 +65,17 @@
char *dest = mem[store_in]; \
char *x1 = mem[arg1]; \
intp ss1 = params.memsizes[arg1]; \
- intp sb1 = params.memsteps[arg1]; \
+ intp sb1 = memsteps[arg1]; \
/* nowarns is defined and used so as to \
avoid compiler warnings about unused \
variables */ \
intp nowarns = ss1+sb1+*x1; \
char *x2 = mem[arg2]; \
intp ss2 = params.memsizes[arg2]; \
- intp sb2 = params.memsteps[arg2]; \
+ intp sb2 = memsteps[arg2]; \
char *x3 = mem[arg3]; \
intp ss3 = params.memsizes[arg3]; \
- intp sb3 = params.memsteps[arg3]; \
+ intp sb3 = memsteps[arg3]; \
nowarns += ss2+sb2+*x2; \
nowarns += ss3+sb3+*x3; \
VEC_LOOP(expr); \
@@ -95,37 +114,19 @@
expr; \
} break
-
- unsigned int pc, j, k, r;
+ unsigned int pc, j;
/* set up pointers to next block of inputs and outputs */
- mem[0] = params.output + index * params.memsteps[0];
- for (r = 0; r < params.n_inputs; r++) {
- struct index_data id = params.index_data[r+1];
- if (id.count) {
- mem[1+r] = params.inputs[r];
- for (j = 0; j < BLOCK_SIZE; j++) {
- unsigned int flatindex = 0;
- for (k = 0; k < id.count; k ++)
- flatindex += id.strides[k] * id.index[k];
- memcpy(mem[1+r] + (j*id.size), id.buffer + flatindex, id.size);
- k = id.count - 1;
- id.index[k] += 1;
- if (id.index[k] >= id.shape[k])
- while (id.index[k] >= id.shape[k]) {
- id.index[k] -= id.shape[k];
- if (k < 1) break;
- id.index[k-1] += 1;
- k -= 1;
- }
- }
- } else {
- mem[1+r] = params.inputs[r] + index * params.memsteps[1+r];
- }
- }
+#ifdef SINGLE_ITEM_CONST_LOOP
+ mem[0] = params.output;
+#else
+ /* use the iterator's inner loop data */
+ memcpy(mem, iter_dataptr, (1+params.n_inputs)*sizeof(char*));
+ memcpy(memsteps, iter_strides, (1+params.n_inputs)*sizeof(intp));
+#endif
/* WARNING: From now on, only do references to mem[arg[123]]
- & params.memsteps[arg[123]] inside the VEC_ARG[123] macros,
+ & memsteps[arg[123]] inside the VEC_ARG[123] macros,
or you will risk accessing invalid addresses. */
for (pc = 0; pc < params.prog_len; pc += 4) {
@@ -134,14 +135,22 @@
unsigned int arg1 = params.program[pc+2];
unsigned int arg2 = params.program[pc+3];
#define arg3 params.program[pc+5]
- #define store_index params.index_data[store_in]
- #define reduce_ptr (dest + flat_index(&store_index, j))
- #define i_reduce *(int *)reduce_ptr
- #define l_reduce *(long long *)reduce_ptr
- #define f_reduce *(float *)reduce_ptr
- #define d_reduce *(double *)reduce_ptr
- #define cr_reduce *(double *)ptr
- #define ci_reduce *((double *)ptr+1)
+ /* Iterator reduce macros */
+#ifdef REDUCTION_INNER_LOOP /* Reduce is the inner loop */
+ #define i_reduce *(int *)dest
+ #define l_reduce *(long long *)dest
+ #define f_reduce *(float *)dest
+ #define d_reduce *(double *)dest
+ #define cr_reduce *(double *)dest
+ #define ci_reduce *((double *)dest+1)
+#else /* Reduce is the outer loop */
+ #define i_reduce i_dest
+ #define l_reduce l_dest
+ #define f_reduce f_dest
+ #define d_reduce d_dest
+ #define cr_reduce cr_dest
+ #define ci_reduce ci_dest
+#endif
#define b_dest ((char *)dest)[j]
#define i_dest ((int *)dest)[j]
#define l_dest ((long long *)dest)[j]
@@ -149,7 +158,7 @@
#define d_dest ((double *)dest)[j]
#define cr_dest ((double *)dest)[2*j]
#define ci_dest ((double *)dest)[2*j+1]
- #define s_dest ((char *)dest + j*params.memsteps[store_in])
+ #define s_dest ((char *)dest + j*memsteps[store_in])
#define b1 ((char *)(x1+j*sb1))[0]
#define i1 ((int *)(x1+j*sb1))[0]
#define l1 ((long long *)(x1+j*sb1))[0]
@@ -177,7 +186,6 @@
/* Some temporaries */
double da, db;
cdouble ca, cb;
- char *ptr;
switch (op) {
@@ -194,6 +202,7 @@
case OP_COPY_DD: VEC_ARG1(memcpy(&d_dest, s1, sizeof(double)));
case OP_COPY_CC: VEC_ARG1(memcpy(&cr_dest, s1, sizeof(double)*2));
+ /* Bool */
case OP_INVERT_BB: VEC_ARG1(b_dest = !b1);
case OP_AND_BBB: VEC_ARG2(b_dest = (b1 && b2));
case OP_OR_BBB: VEC_ARG2(b_dest = (b1 || b2));
@@ -202,6 +211,7 @@
case OP_NE_BBB: VEC_ARG2(b_dest = (b1 != b2));
case OP_WHERE_BBBB: VEC_ARG3(b_dest = b1 ? b2 : b3);
+ /* Comparisons */
case OP_GT_BII: VEC_ARG2(b_dest = (i1 > i2));
case OP_GE_BII: VEC_ARG2(b_dest = (i1 >= i2));
case OP_EQ_BII: VEC_ARG2(b_dest = (i1 == i2));
@@ -227,8 +237,9 @@
case OP_EQ_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) == 0));
case OP_NE_BSS: VEC_ARG2(b_dest = (stringcmp(s1, s2, ss1, ss2) != 0));
+ /* Int */
case OP_CAST_IB: VEC_ARG1(i_dest = (int)(b1));
- case OP_ONES_LIKE_I: VEC_ARG1(i_dest = 1);
+ case OP_ONES_LIKE_II: VEC_ARG0(i_dest = 1);
case OP_NEG_II: VEC_ARG1(i_dest = -i1);
case OP_ADD_III: VEC_ARG2(i_dest = i1 + i2);
@@ -240,8 +251,9 @@
case OP_WHERE_IBII: VEC_ARG3(i_dest = b1 ? i2 : i3);
+ /* Long */
case OP_CAST_LI: VEC_ARG1(l_dest = (long long)(i1));
- case OP_ONES_LIKE_L: VEC_ARG1(l_dest = 1);
+ case OP_ONES_LIKE_LL: VEC_ARG0(l_dest = 1);
case OP_NEG_LL: VEC_ARG1(l_dest = -l1);
case OP_ADD_LLL: VEC_ARG2(l_dest = l1 + l2);
@@ -253,10 +265,10 @@
case OP_WHERE_LBLL: VEC_ARG3(l_dest = b1 ? l2 : l3);
- /* Float */
+ /* Float */
case OP_CAST_FI: VEC_ARG1(f_dest = (float)(i1));
case OP_CAST_FL: VEC_ARG1(f_dest = (float)(l1));
- case OP_ONES_LIKE_F: VEC_ARG1(f_dest = 1.0);
+ case OP_ONES_LIKE_FF: VEC_ARG0(f_dest = 1.0);
case OP_NEG_FF: VEC_ARG1(f_dest = -f1);
case OP_ADD_FFF: VEC_ARG2(f_dest = f1 + f2);
@@ -303,11 +315,11 @@
VEC_ARG2(f_dest = functions_fff[arg3](f1, f2));
#endif
- /* Double */
+ /* Double */
case OP_CAST_DI: VEC_ARG1(d_dest = (double)(i1));
case OP_CAST_DL: VEC_ARG1(d_dest = (double)(l1));
case OP_CAST_DF: VEC_ARG1(d_dest = (double)(f1));
- case OP_ONES_LIKE_D: VEC_ARG1(d_dest = 1.0);
+ case OP_ONES_LIKE_DD: VEC_ARG0(d_dest = 1.0);
case OP_NEG_DD: VEC_ARG1(d_dest = -d1);
case OP_ADD_DDD: VEC_ARG2(d_dest = d1 + d2);
@@ -354,7 +366,7 @@
VEC_ARG2(d_dest = functions_ddd[arg3](d1, d2));
#endif
- /* Complex */
+ /* Complex */
case OP_CAST_CI: VEC_ARG1(cr_dest = (double)(i1);
ci_dest = 0);
case OP_CAST_CL: VEC_ARG1(cr_dest = (double)(l1);
@@ -363,8 +375,8 @@
ci_dest = 0);
case OP_CAST_CD: VEC_ARG1(cr_dest = d1;
ci_dest = 0);
- case OP_ONES_LIKE_C: VEC_ARG1(cr_dest = 1;
- ci_dest = 0);
+ case OP_ONES_LIKE_CC: VEC_ARG0(cr_dest = 1;
+ ci_dest = 0);
case OP_NEG_CC: VEC_ARG1(cr_dest = -c1r;
ci_dest = -c1i);
@@ -415,20 +427,19 @@
case OP_COMPLEX_CDD: VEC_ARG2(cr_dest = d1;
ci_dest = d2);
+ /* Reductions */
case OP_SUM_IIN: VEC_ARG1(i_reduce += i1);
case OP_SUM_LLN: VEC_ARG1(l_reduce += l1);
case OP_SUM_FFN: VEC_ARG1(f_reduce += f1);
case OP_SUM_DDN: VEC_ARG1(d_reduce += d1);
- case OP_SUM_CCN: VEC_ARG1(ptr = reduce_ptr;
- cr_reduce += c1r;
+ case OP_SUM_CCN: VEC_ARG1(cr_reduce += c1r;
ci_reduce += c1i);
case OP_PROD_IIN: VEC_ARG1(i_reduce *= i1);
case OP_PROD_LLN: VEC_ARG1(l_reduce *= l1);
case OP_PROD_FFN: VEC_ARG1(f_reduce *= f1);
case OP_PROD_DDN: VEC_ARG1(d_reduce *= d1);
- case OP_PROD_CCN: VEC_ARG1(ptr = reduce_ptr;
- da = cr_reduce*c1r - ci_reduce*c1i;
+ case OP_PROD_CCN: VEC_ARG1(da = cr_reduce*c1r - ci_reduce*c1i;
ci_reduce = cr_reduce*c1i + ci_reduce*c1r;
cr_reduce = da);
diff --git a/numexpr/interpreter.c b/numexpr/interpreter.c
index 82a0214..b58d2c7 100644
--- a/numexpr/interpreter.c
+++ b/numexpr/interpreter.c
@@ -1,7 +1,19 @@
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
+
+
#include "Python.h"
#include "structmember.h"
#include "numpy/noprefix.h"
#include "numpy/arrayscalars.h"
+#include "numpy/ndarrayobject.h"
+#include "numpy/npy_cpu.h"
#include "math.h"
#include "string.h"
#include "assert.h"
@@ -27,12 +39,17 @@
#endif
#ifdef _WIN32
-#define inline __inline
-#include "missing_posix_functions.inc"
-#include "msvc_function_stubs.inc"
+ #define inline __inline
+ #ifndef __MINGW32__
+ #include "missing_posix_functions.inc"
+ #endif
+ #include "msvc_function_stubs.inc"
#endif
-#define L1_SIZE 32*1024 /* The average L1 cache size */
+/* x86 platform works with unaligned reads and writes */
+#if (defined(NPY_CPU_X86) || defined(NPY_CPU_AMD64))
+# define USE_UNALIGNED_ACCESS 1
+#endif
#ifdef USE_VML
/* The values below have been tuned for a nowadays Core2 processor */
@@ -51,9 +68,10 @@
#define BLOCK_SIZE2 16
#endif
-
-/* The maximum number of threads (for some static arrays) */
-#define MAX_THREADS 256
+/* The maximum number of threads (for some static arrays).
+ * Choose this large enough for most monsters out there.
+ Keep in sync this with the number in __init__.py. */
+#define MAX_THREADS 4096
/* Global variables for threads */
int nthreads = 1; /* number of desired threads in pool */
@@ -413,11 +431,15 @@ get_return_sig(PyObject* program) {
int sig;
char last_opcode;
int end = PyString_Size(program);
+ char *program_str = PyString_AS_STRING(program);
+
do {
end -= 4;
if (end < 0) return 'X';
+ last_opcode = program_str[end];
}
- while ((last_opcode = PyString_AS_STRING(program)[end]) == OP_NOOP);
+ while (last_opcode == OP_NOOP);
+
sig = op_signature(last_opcode, 0);
if (sig <= 0) {
return 'X';
@@ -906,34 +928,18 @@ struct thread_data {
struct vm_params params;
int ret_code;
int *pc_error;
+ char **errmsg;
+ /* One memsteps array per thread */
+ intp *memsteps[MAX_THREADS];
+ /* One iterator per thread */
+ NpyIter *iter[MAX_THREADS];
+ /* When doing nested iteration for a reduction */
+ NpyIter *reduce_iter[MAX_THREADS];
+ /* Flag indicating reduction is the outer loop instead of the inner */
+ int reduction_outer_loop;
} th_params;
-static inline unsigned int
-flat_index(struct index_data *id, unsigned int j) {
- int i, k = id->count - 1;
- unsigned int findex = id->findex;
- if (k < 0) return 0;
- if (findex == -1) {
- findex = 0;
- for (i = 0; i < id->count; i++)
- findex += id->strides[i] * id->index[i];
- }
- id->index[k] += 1;
- if (id->index[k] >= id->shape[k]) {
- while (id->index[k] >= id->shape[k]) {
- id->index[k] -= id->shape[k];
- if (k < 1) break;
- id->index[--k] += 1;
- }
- id->findex = -1;
- } else {
- id->findex = findex + id->strides[k];
- }
- return findex;
-}
-
-
#define DO_BOUNDS_CHECK 1
#if DO_BOUNDS_CHECK
@@ -991,52 +997,153 @@ free_temps_space(struct vm_params params, char **mem)
}
}
-/* Serial version of VM engine */
-static inline int
-vm_engine_serial(intp start, intp vlen, intp block_size,
- struct vm_params params, int *pc_error)
+/* Serial/parallel task iterator version of the VM engine */
+static int
+vm_engine_iter_task(NpyIter *iter, intp *memsteps, struct vm_params params, int *pc_error, char **errmsg)
{
- intp index;
char **mem = params.mem;
- get_temps_space(params, mem, block_size);
- for (index = start; index < vlen; index += block_size) {
-#define BLOCK_SIZE block_size
+ NpyIter_IterNextFunc *iternext;
+ intp block_size, *size_ptr;
+ char **iter_dataptr;
+ intp *iter_strides;
+
+ iternext = NpyIter_GetIterNext(iter, errmsg);
+ if (iternext == NULL) {
+ return -1;
+ }
+
+ size_ptr = NpyIter_GetInnerLoopSizePtr(iter);
+ iter_dataptr = NpyIter_GetDataPtrArray(iter);
+ iter_strides = NpyIter_GetInnerStrideArray(iter);
+
+ /*
+ * First do all the blocks with a compile-time fixed size.
+ * This makes a big difference (30-50% on some tests).
+ */
+ block_size = *size_ptr;
+ while (block_size == BLOCK_SIZE1) {
+#define REDUCTION_INNER_LOOP
+#define BLOCK_SIZE BLOCK_SIZE1
#include "interp_body.c"
#undef BLOCK_SIZE
+#undef REDUCTION_INNER_LOOP
+ iternext(iter);
+ block_size = *size_ptr;
}
- free_temps_space(params, mem);
+
+ /* Then finish off the rest */
+ if (block_size > 0) do {
+#define REDUCTION_INNER_LOOP
+#define BLOCK_SIZE block_size
+#include "interp_body.c"
+#undef BLOCK_SIZE
+#undef REDUCTION_INNER_LOOP
+ } while (iternext(iter));
+
return 0;
}
-/* Serial version of VM engine (specific for BLOCK_SIZE1) */
-static inline int
-vm_engine_serial1(intp start, intp vlen,
- struct vm_params params, int *pc_error)
+static int
+vm_engine_iter_outer_reduce_task(NpyIter *iter, intp *memsteps, struct vm_params params, int *pc_error, char **errmsg)
{
- intp index;
char **mem = params.mem;
- get_temps_space(params, mem, BLOCK_SIZE1);
- for (index = start; index < vlen; index += BLOCK_SIZE1) {
+ NpyIter_IterNextFunc *iternext;
+ intp block_size, *size_ptr;
+ char **iter_dataptr;
+ intp *iter_strides;
+
+ iternext = NpyIter_GetIterNext(iter, errmsg);
+ if (iternext == NULL) {
+ return -1;
+ }
+
+ size_ptr = NpyIter_GetInnerLoopSizePtr(iter);
+ iter_dataptr = NpyIter_GetDataPtrArray(iter);
+ iter_strides = NpyIter_GetInnerStrideArray(iter);
+
+ /*
+ * First do all the blocks with a compile-time fixed size.
+ * This makes a big difference (30-50% on some tests).
+ */
+ block_size = *size_ptr;
+ while (block_size == BLOCK_SIZE1) {
#define BLOCK_SIZE BLOCK_SIZE1
#include "interp_body.c"
#undef BLOCK_SIZE
+ iternext(iter);
+ block_size = *size_ptr;
}
- free_temps_space(params, mem);
+
+ /* Then finish off the rest */
+ if (block_size > 0) do {
+#define BLOCK_SIZE block_size
+#include "interp_body.c"
+#undef BLOCK_SIZE
+ } while (iternext(iter));
+
return 0;
}
-/* Parallel version of VM engine */
-static inline int
-vm_engine_parallel(intp start, intp vlen, intp block_size,
- struct vm_params params, int *pc_error)
+/* Parallel iterator version of VM engine */
+static int
+vm_engine_iter_parallel(NpyIter *iter, struct vm_params params, int *pc_error,
+ char **errmsg)
{
+ unsigned int i;
+ npy_intp numblocks, taskfactor;
+
+ if (errmsg == NULL) {
+ return -1;
+ }
+
/* Populate parameters for worker threads */
- th_params.start = start;
- th_params.vlen = vlen;
- th_params.block_size = block_size;
+ NpyIter_GetIterIndexRange(iter, &th_params.start, &th_params.vlen);
+ /*
+ * Try to make it so each thread gets 16 tasks. This is a compromise
+ * between 1 task per thread and one block per task.
+ */
+ taskfactor = 16*BLOCK_SIZE1*nthreads;
+ numblocks = (th_params.vlen - th_params.start + taskfactor - 1) /
+ taskfactor;
+ th_params.block_size = numblocks * BLOCK_SIZE1;
+
th_params.params = params;
th_params.ret_code = 0;
th_params.pc_error = pc_error;
+ th_params.errmsg = errmsg;
+ th_params.iter[0] = iter;
+ /* Make one copy for each additional thread */
+ for (i = 1; i < nthreads; ++i) {
+ th_params.iter[i] = NpyIter_Copy(iter);
+ if (th_params.iter[i] == NULL) {
+ --i;
+ for (; i > 0; --i) {
+ NpyIter_Deallocate(th_params.iter[i]);
+ }
+ return -1;
+ }
+ }
+ th_params.memsteps[0] = params.memsteps;
+ /* Make one copy of memsteps for each additional thread */
+ for (i = 1; i < nthreads; ++i) {
+ th_params.memsteps[i] = PyMem_New(intp,
+ 1 + params.n_inputs + params.n_constants + params.n_temps);
+ if (th_params.memsteps[i] == NULL) {
+ --i;
+ for (; i > 0; --i) {
+ PyMem_Del(th_params.memsteps[i]);
+ }
+ for (i = 0; i < nthreads; ++i) {
+ NpyIter_Deallocate(th_params.iter[i]);
+ }
+ return -1;
+ }
+ memcpy(th_params.memsteps[i], th_params.memsteps[0],
+ sizeof(intp) *
+ (1 + params.n_inputs + params.n_constants + params.n_temps));
+ }
+
+ Py_BEGIN_ALLOW_THREADS;
/* Synchronization point for all threads (wait for initialization) */
pthread_mutex_lock(&count_threads_mutex);
@@ -1060,40 +1167,26 @@ vm_engine_parallel(intp start, intp vlen, intp block_size,
}
pthread_mutex_unlock(&count_threads_mutex);
- return th_params.ret_code;
-}
+ Py_END_ALLOW_THREADS;
-/* VM engine for each thread (specific for BLOCK_SIZE1) */
-static inline int
-vm_engine_thread1(char **mem, intp index,
- struct vm_params params, int *pc_error)
-{
-#define BLOCK_SIZE BLOCK_SIZE1
-#include "interp_body.c"
-#undef BLOCK_SIZE
- return 0;
-}
+ /* Deallocate all the iterator and memsteps copies */
+ for (i = 1; i < nthreads; ++i) {
+ NpyIter_Deallocate(th_params.iter[i]);
+ PyMem_Del(th_params.memsteps[i]);
+ }
-/* VM engine for each threadi (general) */
-static inline int
-vm_engine_thread(char **mem, intp index, intp block_size,
- struct vm_params params, int *pc_error)
-{
-#define BLOCK_SIZE block_size
-#include "interp_body.c"
-#undef BLOCK_SIZE
- return 0;
+ return th_params.ret_code;
}
/* Do the worker job for a certain thread */
-void *th_worker(void *tids)
+void *th_worker(void *tidptr)
{
- /* int tid = *(int *)tids; */
- intp index; /* private copy of gindex */
+ int tid = *(int *)tidptr;
/* Parameters for threads */
intp start;
intp vlen;
intp block_size;
+ NpyIter *iter;
struct vm_params params;
int *pc_error;
int ret;
@@ -1102,6 +1195,9 @@ void *th_worker(void *tids)
int n_temps;
size_t memsize;
char **mem;
+ intp *memsteps;
+ intp istart, iend;
+ char **errmsg;
while (1) {
@@ -1138,47 +1234,72 @@ void *th_worker(void *tids)
/* XXX malloc seems thread safe for POSIX, but for Win? */
mem = malloc(memsize);
memcpy(mem, params.mem, memsize);
- /* Get temporary space for each thread */
- ret = get_temps_space(params, mem, block_size);
- if (ret < 0) {
- pthread_mutex_lock(&count_mutex);
- giveup = 1;
- /* Propagate error to main thread */
- th_params.ret_code = ret;
- pthread_mutex_unlock(&count_mutex);
- }
+
+ errmsg = th_params.errmsg;
+
+ params.mem = mem;
/* Loop over blocks */
pthread_mutex_lock(&count_mutex);
if (!init_sentinels_done) {
/* Set sentinels and other global variables */
gindex = start;
- index = gindex;
- init_sentinels_done = 1; /* sentinels have been initialised */
+ istart = gindex;
+ iend = istart + block_size;
+ if (iend > vlen) {
+ iend = vlen;
+ }
+ init_sentinels_done = 1; /* sentinels have been initialised */
giveup = 0; /* no giveup initially */
} else {
gindex += block_size;
- index = gindex;
+ istart = gindex;
+ iend = istart + block_size;
+ if (iend > vlen) {
+ iend = vlen;
+ }
+ }
+ /* Grab one of the iterators */
+ iter = th_params.iter[tid];
+ if (iter == NULL) {
+ th_params.ret_code = -1;
+ giveup = 1;
+ }
+ memsteps = th_params.memsteps[tid];
+ /* Get temporary space for each thread */
+ ret = get_temps_space(params, mem, BLOCK_SIZE1);
+ if (ret < 0) {
+ /* Propagate error to main thread */
+ th_params.ret_code = ret;
+ giveup = 1;
}
pthread_mutex_unlock(&count_mutex);
- while (index < vlen && !giveup) {
- if (block_size == BLOCK_SIZE1) {
- ret = vm_engine_thread1(mem, index, params, pc_error);
- }
- else {
- ret = vm_engine_thread(mem, index, block_size,
- params, pc_error);
+
+ while (istart < vlen && !giveup) {
+ /* Reset the iterator to the range for this task */
+ ret = NpyIter_ResetToIterIndexRange(iter, istart, iend,
+ errmsg);
+ /* Execute the task */
+ if (ret >= 0) {
+ ret = vm_engine_iter_task(iter, memsteps, params, pc_error, errmsg);
}
+
if (ret < 0) {
pthread_mutex_lock(&count_mutex);
giveup = 1;
/* Propagate error to main thread */
th_params.ret_code = ret;
pthread_mutex_unlock(&count_mutex);
+ break;
}
+
pthread_mutex_lock(&count_mutex);
gindex += block_size;
- index = gindex;
+ istart = gindex;
+ iend = istart + block_size;
+ if (iend > vlen) {
+ iend = vlen;
+ }
pthread_mutex_unlock(&count_mutex);
}
@@ -1203,87 +1324,155 @@ void *th_worker(void *tids)
return(0);
}
-/* Compute expresion in [start:vlen], if possible with threads */
-static inline int
-vm_engine_block(intp start, intp vlen, intp block_size,
- struct vm_params params, int *pc_error)
+static int
+run_interpreter(NumExprObject *self, NpyIter *iter, NpyIter *reduce_iter,
+ int reduction_outer_loop, int *pc_error)
{
int r;
+ intp plen;
+ struct vm_params params;
+ char *errmsg = NULL;
- /* From now on, we can release the GIL */
- Py_BEGIN_ALLOW_THREADS;
- /* Run the serial version when nthreads is 1 or when the total
- length to compute is small */
- if ((nthreads == 1) || (vlen <= L1_SIZE) || force_serial) {
- if (block_size == BLOCK_SIZE1) {
- r = vm_engine_serial1(start, vlen, params, pc_error);
+ *pc_error = -1;
+ if (PyString_AsStringAndSize(self->program, (char **)&(params.program),
+ &plen) < 0) {
+ return -1;
+ }
+
+ params.prog_len = plen;
+ params.output = NULL;
+ params.inputs = NULL;
+ params.index_data = NULL;
+ params.n_inputs = self->n_inputs;
+ params.n_constants = self->n_constants;
+ params.n_temps = self->n_temps;
+ params.mem = self->mem;
+ params.memsteps = self->memsteps;
+ params.memsizes = self->memsizes;
+ params.r_end = PyString_Size(self->fullsig);
+
+ if ((nthreads == 1) || force_serial) {
+ /* Can do it as one "task" */
+ if (reduce_iter == NULL) {
+ /* Reset the iterator to allocate its buffers */
+ if(NpyIter_Reset(iter, NULL) != NPY_SUCCEED) {
+ return -1;
+ }
+ get_temps_space(params, params.mem, BLOCK_SIZE1);
+ Py_BEGIN_ALLOW_THREADS;
+ r = vm_engine_iter_task(iter, params.memsteps,
+ params, pc_error, &errmsg);
+ Py_END_ALLOW_THREADS;
+ free_temps_space(params, params.mem);
}
else {
- r = vm_engine_serial(start, vlen, block_size, params, pc_error);
+ if (reduction_outer_loop) {
+ char **dataptr;
+ NpyIter_IterNextFunc *iternext;
+
+ dataptr = NpyIter_GetDataPtrArray(reduce_iter);
+ iternext = NpyIter_GetIterNext(reduce_iter, NULL);
+ if (iternext == NULL) {
+ return -1;
+ }
+
+ get_temps_space(params, params.mem, BLOCK_SIZE1);
+ Py_BEGIN_ALLOW_THREADS;
+ do {
+ r = NpyIter_ResetBasePointers(iter, dataptr, &errmsg);
+ if (r >= 0) {
+ r = vm_engine_iter_outer_reduce_task(iter,
+ params.memsteps, params,
+ pc_error, &errmsg);
+ }
+ if (r < 0) {
+ break;
+ }
+ } while (iternext(reduce_iter));
+ Py_END_ALLOW_THREADS;
+ free_temps_space(params, params.mem);
+ }
+ else {
+ char **dataptr;
+ NpyIter_IterNextFunc *iternext;
+
+ dataptr = NpyIter_GetDataPtrArray(iter);
+ iternext = NpyIter_GetIterNext(iter, NULL);
+ if (iternext == NULL) {
+ return -1;
+ }
+
+ get_temps_space(params, params.mem, BLOCK_SIZE1);
+ Py_BEGIN_ALLOW_THREADS;
+ do {
+ r = NpyIter_ResetBasePointers(reduce_iter, dataptr,
+ &errmsg);
+ if (r >= 0) {
+ r = vm_engine_iter_task(reduce_iter, params.memsteps,
+ params, pc_error, &errmsg);
+ }
+ if (r < 0) {
+ break;
+ }
+ } while (iternext(iter));
+ Py_END_ALLOW_THREADS;
+ free_temps_space(params, params.mem);
+ }
}
}
else {
- r = vm_engine_parallel(start, vlen, block_size, params, pc_error);
+ if (reduce_iter == NULL) {
+ r = vm_engine_iter_parallel(iter, params, pc_error, &errmsg);
+ }
+ else {
+ errmsg = "Parallel engine doesn't support reduction yet";
+ r = -1;
+ }
+ }
+
+ if (r < 0 && errmsg != NULL) {
+ PyErr_SetString(PyExc_RuntimeError, errmsg);
}
- /* Get the GIL again */
- Py_END_ALLOW_THREADS;
- return r;
-}
-static inline int
-vm_engine_rest(intp start, intp blen,
- struct vm_params params, int *pc_error)
-{
- intp index = start;
- intp block_size = blen - start;
- char **mem = params.mem;
- get_temps_space(params, mem, block_size);
-#define BLOCK_SIZE block_size
-#include "interp_body.c"
-#undef BLOCK_SIZE
- free_temps_space(params, mem);
return 0;
}
static int
-run_interpreter(NumExprObject *self, intp len, char *output, char **inputs,
- struct index_data *index_data, int *pc_error)
+run_interpreter_const(NumExprObject *self, char *output, int *pc_error)
{
- int r;
- intp plen;
- intp blen1, blen2;
struct vm_params params;
+ intp plen;
+ char **mem;
+ intp *memsteps;
*pc_error = -1;
if (PyString_AsStringAndSize(self->program, (char **)&(params.program),
&plen) < 0) {
return -1;
}
+ if (self->n_inputs != 0) {
+ return -1;
+ }
params.prog_len = plen;
params.output = output;
- params.inputs = inputs;
- params.index_data = index_data;
+ params.inputs = NULL;
+ params.index_data = NULL;
params.n_inputs = self->n_inputs;
params.n_constants = self->n_constants;
params.n_temps = self->n_temps;
params.mem = self->mem;
- params.memsteps = self->memsteps;
+ memsteps = self->memsteps;
params.memsizes = self->memsizes;
params.r_end = PyString_Size(self->fullsig);
- blen1 = len - len % BLOCK_SIZE1;
- r = vm_engine_block(0, blen1, BLOCK_SIZE1, params, pc_error);
- if (r < 0) return r;
- if (len != blen1) {
- blen2 = len - len % BLOCK_SIZE2;
- /* A block is generally too small for threading to be an advantage */
- r = vm_engine_serial(blen1, blen2, BLOCK_SIZE2, params, pc_error);
- if (r < 0) return r;
- if (len != blen2) {
- r = vm_engine_rest(blen2, len, params, pc_error);
- if (r < 0) return r;
- }
- }
+ mem = params.mem;
+ get_temps_space(params, mem, 1);
+#define SINGLE_ITEM_CONST_LOOP
+#define BLOCK_SIZE 1
+#include "interp_body.c"
+#undef BLOCK_SIZE
+#undef SINGLE_ITEM_CONST_LOOP
+ free_temps_space(params, mem);
return 0;
}
@@ -1419,18 +1608,28 @@ void numexpr_free_resources(void)
}
}
-/* keyword arguments are ignored! */
static PyObject *
NumExpr_run(NumExprObject *self, PyObject *args, PyObject *kwds)
{
- PyObject *output = NULL, *a_inputs = NULL;
- struct index_data *inddata = NULL;
- unsigned int n_inputs, n_dimensions = 0;
- intp shape[MAX_DIMS];
- int i, j, r, pc_error;
- intp size;
- char **inputs = NULL;
- intp strides[MAX_DIMS]; /* clean up XXX */
+ PyArrayObject *operands[NPY_MAXARGS];
+ PyArray_Descr *dtypes[NPY_MAXARGS], **dtypes_tmp;
+ PyObject *tmp, *ret;
+ npy_uint32 op_flags[NPY_MAXARGS];
+ NPY_CASTING casting = NPY_SAFE_CASTING;
+ NPY_ORDER order = NPY_KEEPORDER;
+ unsigned int i, n_inputs;
+ int r, pc_error = 0;
+ int reduction_axis = -1, reduction_size = 1;
+ int ex_uses_vml = 0, is_reduction = 0, reduction_outer_loop = 0;
+
+ /* To specify axes when doing a reduction */
+ int op_axes_values[NPY_MAXARGS][NPY_MAXDIMS],
+ op_axes_reduction_values[NPY_MAXARGS];
+ int *op_axes_ptrs[NPY_MAXDIMS];
+ int oa_ndim = 0;
+ int **op_axes = NULL;
+
+ NpyIter *iter = NULL, *reduce_iter = NULL;
/* Check whether we need to restart threads */
if (!init_threads_done || pid != getpid()) {
@@ -1440,292 +1639,420 @@ NumExpr_run(NumExprObject *self, PyObject *args, PyObject *kwds)
/* Don't force serial mode by default */
force_serial = 0;
+ /* Check whether there's a reduction as the final step */
+ is_reduction = last_opcode(self->program) > OP_REDUCTION;
+
n_inputs = PyTuple_Size(args);
if (PyString_Size(self->signature) != n_inputs) {
return PyErr_Format(PyExc_ValueError,
"number of inputs doesn't match program");
}
- if (kwds && PyObject_Length(kwds) > 0) {
+ else if (n_inputs+1 > NPY_MAXARGS) {
return PyErr_Format(PyExc_ValueError,
- "keyword arguments are not accepted");
+ "too many inputs");
}
- /* This is overkill - we shouldn't need to allocate all of this space,
- but this makes it easier figure out */
- a_inputs = PyTuple_New(3*n_inputs);
- if (!a_inputs) goto cleanup_and_exit;
-
- inputs = PyMem_New(char *, n_inputs);
- if (!inputs) goto cleanup_and_exit;
+ memset(operands, 0, sizeof(operands));
+ memset(dtypes, 0, sizeof(dtypes));
- inddata = PyMem_New(struct index_data, n_inputs+1);
- if (!inddata) goto cleanup_and_exit;
- for (i = 0; i < n_inputs+1; i++)
- inddata[i].count = 0;
-
- /* First, make sure everything is some sort of array so that we can work
- with their shapes. Count dimensions concurrently. */
+ if (kwds) {
+ tmp = PyDict_GetItemString(kwds, "casting"); /* borrowed ref */
+ if (tmp != NULL && !PyArray_CastingConverter(tmp, &casting)) {
+ return NULL;
+ }
+ tmp = PyDict_GetItemString(kwds, "order"); /* borrowed ref */
+ if (tmp != NULL && !PyArray_OrderConverter(tmp, &order)) {
+ return NULL;
+ }
+ tmp = PyDict_GetItemString(kwds, "ex_uses_vml"); /* borrowed ref */
+ if (tmp == NULL) {
+ return PyErr_Format(PyExc_ValueError,
+ "ex_uses_vml parameter is required");
+ }
+ if (tmp == Py_True) {
+ ex_uses_vml = 1;
+ }
+ /* borrowed ref */
+ operands[0] = (PyArrayObject *)PyDict_GetItemString(kwds, "out");
+ if (operands[0] != NULL) {
+ if ((PyObject *)operands[0] == Py_None) {
+ operands[0] = NULL;
+ }
+ else if (!PyArray_Check(operands[0])) {
+ return PyErr_Format(PyExc_ValueError,
+ "out keyword parameter is not an array");
+ }
+ else {
+ Py_INCREF(operands[0]);
+ }
+ }
+ }
for (i = 0; i < n_inputs; i++) {
PyObject *o = PyTuple_GET_ITEM(args, i); /* borrowed ref */
PyObject *a;
char c = PyString_AS_STRING(self->signature)[i];
int typecode = typecode_from_char(c);
- if (typecode == -1) goto cleanup_and_exit;
- /* Convert it just in case of a non-swapped array */
- if (!PyArray_Check(o) || PyArray_TYPE(o) != PyArray_STRING) {
+ /* Convert it if it's not an array */
+ if (!PyArray_Check(o)) {
+ if (typecode == -1) goto fail;
a = PyArray_FROM_OTF(o, typecode, NOTSWAPPED);
- } else {
- Py_INCREF(PyArray_DESCR(o)); /* typecode is not enough */
- a = PyArray_FromAny(o, PyArray_DESCR(o), 0, 0, NOTSWAPPED, NULL);
}
- if (!a) goto cleanup_and_exit;
- PyTuple_SET_ITEM(a_inputs, i, a); /* steals reference */
- if (PyArray_NDIM(a) > n_dimensions)
- n_dimensions = PyArray_NDIM(a);
- }
+ else {
+ Py_INCREF(o);
+ a = o;
+ }
+ operands[i+1] = (PyArrayObject *)a;
+ dtypes[i+1] = PyArray_DescrFromType(typecode);
- /* Broadcast all of the inputs to determine the output shape (this will
- require some modifications if we later allow a final reduction
- operation). If an array has too few dimensions it's shape is padded
- with ones fromthe left. All array dimensions must match, or be one. */
+ if (operands[i+1] == NULL || dtypes[i+1] == NULL) {
+ goto fail;
+ }
+ op_flags[i+1] = NPY_ITER_READONLY|
+#ifdef USE_VML
+ (ex_uses_vml ? (NPY_ITER_CONTIG|NPY_ITER_ALIGNED) : 0)|
+#endif
+#ifndef USE_UNALIGNED_ACCESS
+ NPY_ITER_ALIGNED|
+#endif
+ NPY_ITER_NBO
+ ;
+ }
- for (i = 0; i < n_dimensions; i++)
- shape[i] = 1;
- for (i = 0; i < n_inputs; i++) {
- PyObject *a = PyTuple_GET_ITEM(a_inputs, i);
- unsigned int ndims = PyArray_NDIM(a);
- int delta = n_dimensions - ndims;
- for (j = 0; j < ndims; j++) {
- unsigned int n = PyArray_DIM(a, j);
- if (n == 1 || n == shape[delta+j]) continue;
- if (shape[delta+j] == 1)
- shape[delta+j] = n;
+ if (is_reduction) {
+ /* A reduction can not result in a string,
+ so we don't need to worry about item sizes here. */
+ char retsig = get_return_sig(self->program);
+ reduction_axis = get_reduction_axis(self->program);
+
+ /* Need to set up op_axes for the non-reduction part */
+ if (reduction_axis != 255) {
+ /* Get the number of broadcast dimensions */
+ for (i = 0; i < n_inputs; ++i) {
+ intp ndim = PyArray_NDIM(operands[i+1]);
+ if (ndim > oa_ndim) {
+ oa_ndim = ndim;
+ }
+ }
+ if (reduction_axis < 0 || reduction_axis >= oa_ndim) {
+ PyErr_Format(PyExc_ValueError,
+ "reduction axis is out of bounds");
+ goto fail;
+ }
+ /* Fill in the op_axes */
+ op_axes_ptrs[0] = NULL;
+ op_axes_reduction_values[0] = -1;
+ for (i = 0; i < n_inputs; ++i) {
+ intp j = 0, idim, ndim = PyArray_NDIM(operands[i+1]);
+ for (idim = 0; idim < oa_ndim-ndim; ++idim) {
+ if (idim != reduction_axis) {
+ op_axes_values[i+1][j++] = -1;
+ }
+ else {
+ op_axes_reduction_values[i+1] = -1;
+ }
+ }
+ for (idim = oa_ndim-ndim; idim < oa_ndim; ++idim) {
+ if (idim != reduction_axis) {
+ op_axes_values[i+1][j++] = idim-(oa_ndim-ndim);
+ }
+ else {
+ intp size = PyArray_DIM(operands[i+1],
+ idim-(oa_ndim-ndim));
+ if (size > reduction_size) {
+ reduction_size = size;
+ }
+ op_axes_reduction_values[i+1] = idim-(oa_ndim-ndim);
+ }
+ }
+ op_axes_ptrs[i+1] = op_axes_values[i+1];
+ }
+ /* op_axes has one less than the broadcast dimensions */
+ --oa_ndim;
+ if (oa_ndim > 0) {
+ op_axes = op_axes_ptrs;
+ }
else {
- PyErr_SetString(PyExc_ValueError,
- "cannot broadcast inputs to common shape");
- goto cleanup_and_exit;
+ reduction_size = 1;
+ }
+ }
+ /* A full reduction can be done without nested iteration */
+ if (oa_ndim == 0) {
+ if (operands[0] == NULL) {
+ intp dim = 1;
+ operands[0] = (PyArrayObject *)PyArray_SimpleNew(0, &dim,
+ typecode_from_char(retsig));
+ if (!operands[0])
+ goto fail;
+ } else if (PyArray_SIZE(operands[0]) != 1) {
+ PyErr_Format(PyExc_ValueError,
+ "out argument must have size 1 for a full reduction");
+ goto fail;
}
}
- }
- size = PyArray_MultiplyList(shape, n_dimensions);
- /* Broadcast indices of all of the arrays. We could improve efficiency
- by keeping track of what needs to be broadcast above */
+ dtypes[0] = PyArray_DescrFromType(typecode_from_char(retsig));
- for (i = 0; i < n_inputs; i++) {
- PyObject *a = PyTuple_GET_ITEM(a_inputs, i);
- PyObject *b;
- intp strides[MAX_DIMS];
- int delta = n_dimensions - PyArray_NDIM(a);
- if (PyArray_NDIM(a)) {
- for (j = 0; j < n_dimensions; j++)
- strides[j] = (j < delta || PyArray_DIM(a, j-delta) == 1) ?
- 0 : PyArray_STRIDE(a, j-delta);
- Py_INCREF(PyArray_DESCR(a));
- b = PyArray_NewFromDescr(a->ob_type,
- PyArray_DESCR(a),
- n_dimensions, shape,
- strides, PyArray_DATA(a), 0, a);
- if (!b) goto cleanup_and_exit;
- } else { /* Leave scalars alone */
- b = a;
- Py_INCREF(b);
+ op_flags[0] = NPY_ITER_READWRITE|
+ NPY_ITER_ALLOCATE|
+ /* Copy, because it can't buffer the reduction */
+ NPY_ITER_UPDATEIFCOPY|
+ NPY_ITER_NBO|
+#ifndef USE_UNALIGNED_ACCESS
+ NPY_ITER_ALIGNED|
+#endif
+ (oa_ndim == 0 ? 0 : NPY_ITER_NO_BROADCAST);
+ }
+ else {
+ char retsig = get_return_sig(self->program);
+ if (retsig != 's') {
+ dtypes[0] = PyArray_DescrFromType(typecode_from_char(retsig));
+ } else {
+ /* Since the *only* supported operation returning a string
+ * is a copy, the size of returned strings
+ * can be directly gotten from the first (and only)
+ * input/constant/temporary. */
+ if (n_inputs > 0) { /* input, like in 'a' where a -> 'foo' */
+ dtypes[0] = PyArray_DESCR(operands[1]);
+ Py_INCREF(dtypes[0]);
+ } else { /* constant, like in '"foo"' */
+ dtypes[0] = PyArray_DescrNewFromType(PyArray_STRING);
+ dtypes[0]->elsize = self->memsizes[1];
+ } /* no string temporaries, so no third case */
+ }
+ if (dtypes[0] == NULL) {
+ goto fail;
}
- /* Store b so that it stays alive till we're done */
- PyTuple_SET_ITEM(a_inputs, i+n_inputs, b);
+ op_flags[0] = NPY_ITER_WRITEONLY|
+ NPY_ITER_ALLOCATE|
+ NPY_ITER_CONTIG|
+ NPY_ITER_NBO|
+#ifndef USE_UNALIGNED_ACCESS
+ NPY_ITER_ALIGNED|
+#endif
+ NPY_ITER_NO_BROADCAST;
}
+ /* Check for empty arrays in expression */
+ if (n_inputs > 0) {
+ char retsig = get_return_sig(self->program);
- for (i = 0; i < n_inputs; i++) {
- PyObject *a = PyTuple_GET_ITEM(a_inputs, i+n_inputs);
- PyObject *b;
- char c = PyString_AS_STRING(self->signature)[i];
- int typecode = typecode_from_char(c);
- if (PyArray_NDIM(a) == 0) {
- /* Broadcast scalars */
- intp dims[1] = {BLOCK_SIZE1};
- Py_INCREF(PyArray_DESCR(a));
- b = PyArray_SimpleNewFromDescr(1, dims, PyArray_DESCR(a));
- if (!b) goto cleanup_and_exit;
- self->memsteps[i+1] = 0;
- self->memsizes[i+1] = PyArray_ITEMSIZE(a);
- PyTuple_SET_ITEM(a_inputs, i+2*n_inputs, b); /* steals reference */
- inputs[i] = PyArray_DATA(b);
- if (typecode == PyArray_BOOL) {
- char value = ((char*)PyArray_DATA(a))[0];
- for (j = 0; j < BLOCK_SIZE1; j++)
- ((char*)PyArray_DATA(b))[j] = value;
- } else if (typecode == PyArray_INT) {
- int value = ((int*)PyArray_DATA(a))[0];
- for (j = 0; j < BLOCK_SIZE1; j++)
- ((int*)PyArray_DATA(b))[j] = value;
- } else if (typecode == PyArray_LONGLONG) {
- long long value = ((long long*)PyArray_DATA(a))[0];
- for (j = 0; j < BLOCK_SIZE1; j++)
- ((long long*)PyArray_DATA(b))[j] = value;
- } else if (typecode == PyArray_FLOAT) {
- float value = ((float*)PyArray_DATA(a))[0];
- for (j = 0; j < BLOCK_SIZE1; j++)
- ((float*)PyArray_DATA(b))[j] = value;
- } else if (typecode == PyArray_DOUBLE) {
- double value = ((double*)PyArray_DATA(a))[0];
- for (j = 0; j < BLOCK_SIZE1; j++)
- ((double*)PyArray_DATA(b))[j] = value;
- } else if (typecode == PyArray_CDOUBLE) {
- double rvalue = ((double*)PyArray_DATA(a))[0];
- double ivalue = ((double*)PyArray_DATA(a))[1];
- for (j = 0; j < 2*BLOCK_SIZE1; j+=2) {
- ((double*)PyArray_DATA(b))[j] = rvalue;
- ((double*)PyArray_DATA(b))[j+1] = ivalue;
- }
- } else if (typecode == PyArray_STRING) {
- int itemsize = PyArray_ITEMSIZE(a);
- char *value = (char*)(PyArray_DATA(a));
- for (j = 0; j < itemsize*BLOCK_SIZE1; j+=itemsize)
- memcpy((char*)(PyArray_DATA(b)) + j, value, itemsize);
- } else {
- PyErr_SetString(PyExc_RuntimeError, "illegal typecode value");
- goto cleanup_and_exit;
- }
- } else {
- /* Check that discontiguous strides appear only on the last
- dimension. If not, the arrays should be copied.
- Furthermore, such arrays can appear when doing
- broadcasting above, so this check really needs to be
- here, and not in Python space. */
- intp inner_size;
- for (j = PyArray_NDIM(a)-2; j >= 0; j--) {
- inner_size = PyArray_STRIDE(a, j+1) * PyArray_DIM(a, j+1);
- if (PyArray_STRIDE(a, j) != inner_size) {
- intp dims[1] = {BLOCK_SIZE1};
- inddata[i+1].count = PyArray_NDIM(a);
- inddata[i+1].findex = -1;
- inddata[i+1].size = PyArray_ITEMSIZE(a);
- inddata[i+1].shape = PyArray_DIMS(a);
- inddata[i+1].strides = PyArray_STRIDES(a);
- inddata[i+1].buffer = PyArray_BYTES(a);
- inddata[i+1].index = PyMem_New(int, inddata[i+1].count);
- for (j = 0; j < inddata[i+1].count; j++)
- inddata[i+1].index[j] = 0;
- Py_INCREF(PyArray_DESCR(a));
- a = PyArray_SimpleNewFromDescr(1, dims, PyArray_DESCR(a));
- /* steals reference below */
- PyTuple_SET_ITEM(a_inputs, i+2*n_inputs, a);
- /* Broadcasting code only seems to work well for
- serial code (don't know exactly why) */
- force_serial = 1;
- break;
- }
+ /* Check length for all inputs */
+ int zeroi, zerolen = 0;
+ for (i=0; i < n_inputs; i++) {
+ if (PyArray_SIZE(operands[i+1]) == 0) {
+ zerolen = 1;
+ zeroi = i+1;
+ break;
}
+ }
- self->memsteps[i+1] = PyArray_STRIDE(a, PyArray_NDIM(a)-1);
- self->memsizes[i+1] = PyArray_ITEMSIZE(a);
- inputs[i] = PyArray_DATA(a);
+ if (zerolen != 0) {
+ /* Allocate the output */
+ intp ndim = PyArray_NDIM(operands[zeroi]);
+ intp *dims = PyArray_DIMS(operands[zeroi]);
+ operands[0] = (PyArrayObject *)PyArray_SimpleNew(ndim, dims,
+ typecode_from_char(retsig));
+ if (operands[0] == NULL) {
+ goto fail;
+ }
+ ret = (PyObject *)operands[0];
+ Py_INCREF(ret);
+ goto cleanup_and_exit;
}
}
- if (last_opcode(self->program) > OP_REDUCTION) {
- /* A reduction can not result in a string,
- so we don't need to worry about item sizes here. */
+
+ /* A case with a single constant output */
+ if (n_inputs == 0) {
char retsig = get_return_sig(self->program);
- int axis = get_reduction_axis(self->program);
- /* Reduction ops only works with 1 thread */
- force_serial = 1;
- self->memsteps[0] = 0; /*size_from_char(retsig);*/
- if (axis == 255) {
- intp dims[1];
- for (i = 0; i < n_dimensions; i++)
- strides[i] = 0;
- output = PyArray_SimpleNew(0, dims, typecode_from_char(retsig));
- if (!output) goto cleanup_and_exit;
- } else {
- intp dims[MAX_DIMS];
- if (axis < 0)
- axis = n_dimensions + axis;
- if (axis < 0 || axis >= n_dimensions) {
- PyErr_SetString(PyExc_ValueError, "axis out of range");
- goto cleanup_and_exit;
+
+ /* Allocate the output */
+ if (operands[0] == NULL) {
+ intp dim = 1;
+ operands[0] = (PyArrayObject *)PyArray_SimpleNew(0, &dim,
+ typecode_from_char(retsig));
+ if (operands[0] == NULL) {
+ goto fail;
}
- for (i = j = 0; i < n_dimensions; i++) {
- if (i != axis) {
- dims[j] = shape[i];
- j += 1;
- }
+ }
+ else {
+ PyArrayObject *a;
+ if (PyArray_SIZE(operands[0]) != 1) {
+ PyErr_SetString(PyExc_ValueError,
+ "output for a constant expression must have size 1");
+ goto fail;
}
- output = PyArray_SimpleNew(n_dimensions-1, dims,
- typecode_from_char(retsig));
- if (!output) goto cleanup_and_exit;
- for (i = j = 0; i < n_dimensions; i++) {
- if (i != axis) {
- strides[i] = PyArray_STRIDES(output)[j];
- j += 1;
- } else {
- strides[i] = 0;
- }
+ else if (!PyArray_ISWRITEABLE(operands[0])) {
+ PyErr_SetString(PyExc_ValueError,
+ "output is not writeable");
+ goto fail;
+ }
+ Py_INCREF(dtypes[0]);
+ a = (PyArrayObject *)PyArray_FromArray(operands[0], dtypes[0],
+ NPY_ALIGNED|NPY_UPDATEIFCOPY);
+ if (a == NULL) {
+ goto fail;
}
+ Py_DECREF(operands[0]);
+ operands[0] = a;
+ }
+
+ r = run_interpreter_const(self, PyArray_DATA(operands[0]), &pc_error);
+
+ ret = (PyObject *)operands[0];
+ Py_INCREF(ret);
+ goto cleanup_and_exit;
+ }
+ /* Allocate the iterator or nested iterators */
+ if (reduction_size == 1) {
+ /* When there's no reduction, reduction_size is 1 as well */
+ iter = NpyIter_AdvancedNew(n_inputs+1, operands,
+ NPY_ITER_BUFFERED|
+ NPY_ITER_REDUCE_OK|
+ NPY_ITER_RANGED|
+ NPY_ITER_DELAY_BUFALLOC|
+ NPY_ITER_EXTERNAL_LOOP,
+ order, casting,
+ op_flags, dtypes,
+ 0, NULL, NULL,
+ BLOCK_SIZE1);
+ if (iter == NULL) {
+ goto fail;
+ }
+ } else {
+ npy_uint32 op_flags_outer[NPY_MAXDIMS];
+ /* The outer loop is unbuffered */
+ op_flags_outer[0] = NPY_ITER_READWRITE|
+ NPY_ITER_ALLOCATE|
+ NPY_ITER_NO_BROADCAST;
+ for (i = 0; i < n_inputs; ++i) {
+ op_flags_outer[i+1] = NPY_ITER_READONLY;
}
- /* TODO optimize strides -- in this and other inddata cases, strides and
- shape can be tweaked to minimize the amount of looping */
- inddata[0].count = n_dimensions;
- inddata[0].findex = -1;
- inddata[0].size = PyArray_ITEMSIZE(output);
- inddata[0].shape = shape;
- inddata[0].strides = strides;
- inddata[0].buffer = PyArray_BYTES(output);
- inddata[0].index = PyMem_New(int, n_dimensions);
- for (j = 0; j < inddata[0].count; j++)
- inddata[0].index[j] = 0;
+ /* Arbitrary threshold for which is the inner loop...benchmark? */
+ if (reduction_size < 64) {
+ reduction_outer_loop = 1;
+ iter = NpyIter_AdvancedNew(n_inputs+1, operands,
+ NPY_ITER_BUFFERED|
+ NPY_ITER_RANGED|
+ NPY_ITER_DELAY_BUFALLOC|
+ NPY_ITER_EXTERNAL_LOOP,
+ order, casting,
+ op_flags, dtypes,
+ oa_ndim, op_axes, NULL,
+ BLOCK_SIZE1);
+ if (iter == NULL) {
+ goto fail;
+ }
+
+ /* If the output was allocated, get it for the second iterator */
+ if (operands[0] == NULL) {
+ operands[0] = NpyIter_GetOperandArray(iter)[0];
+ Py_INCREF(operands[0]);
+ }
+
+ op_axes[0] = &op_axes_reduction_values[0];
+ for (i = 0; i < n_inputs; ++i) {
+ op_axes[i+1] = &op_axes_reduction_values[i+1];
+ }
+ op_flags_outer[0] &= ~NPY_ITER_NO_BROADCAST;
+ reduce_iter = NpyIter_AdvancedNew(n_inputs+1, operands,
+ NPY_ITER_REDUCE_OK,
+ order, casting,
+ op_flags_outer, NULL,
+ 1, op_axes, NULL,
+ 0);
+ if (reduce_iter == NULL) {
+ goto fail;
+ }
+ }
+ else {
+ PyArray_Descr *dtypes_outer[NPY_MAXDIMS];
+
+ /* If the output is being allocated, need to specify its dtype */
+ dtypes_outer[0] = dtypes[0];
+ for (i = 0; i < n_inputs; ++i) {
+ dtypes_outer[i+1] = NULL;
+ }
+ iter = NpyIter_AdvancedNew(n_inputs+1, operands,
+ NPY_ITER_RANGED,
+ order, casting,
+ op_flags_outer, dtypes_outer,
+ oa_ndim, op_axes, NULL,
+ 0);
+ if (iter == NULL) {
+ goto fail;
+ }
+ /* If the output was allocated, get it for the second iterator */
+ if (operands[0] == NULL) {
+ operands[0] = NpyIter_GetOperandArray(iter)[0];
+ Py_INCREF(operands[0]);
+ }
+
+ op_axes[0] = &op_axes_reduction_values[0];
+ for (i = 0; i < n_inputs; ++i) {
+ op_axes[i+1] = &op_axes_reduction_values[i+1];
+ }
+ op_flags[0] &= ~NPY_ITER_NO_BROADCAST;
+ reduce_iter = NpyIter_AdvancedNew(n_inputs+1, operands,
+ NPY_ITER_BUFFERED|
+ NPY_ITER_REDUCE_OK|
+ NPY_ITER_DELAY_BUFALLOC|
+ NPY_ITER_EXTERNAL_LOOP,
+ order, casting,
+ op_flags, dtypes,
+ 1, op_axes, NULL,
+ BLOCK_SIZE1);
+ if (reduce_iter == NULL) {
+ goto fail;
+ }
+ }
+ }
+
+ /* Initialize the output to the reduction unit */
+ if (is_reduction) {
+ PyArrayObject *a = NpyIter_GetOperandArray(iter)[0];
if (last_opcode(self->program) >= OP_SUM &&
last_opcode(self->program) < OP_PROD) {
PyObject *zero = PyInt_FromLong(0);
- PyArray_FillWithScalar((PyArrayObject *)output, zero);
+ PyArray_FillWithScalar(a, zero);
Py_DECREF(zero);
} else {
PyObject *one = PyInt_FromLong(1);
- PyArray_FillWithScalar((PyArrayObject *)output, one);
+ PyArray_FillWithScalar(a, one);
Py_DECREF(one);
}
}
- else {
- char retsig = get_return_sig(self->program);
- if (retsig != 's') {
- self->memsteps[0] = self->memsizes[0] = size_from_char(retsig);
- output = PyArray_SimpleNew(
- n_dimensions, shape, typecode_from_char(retsig));
- } else {
- /* Since the *only* supported operation returning a string
- * is a copy, the size of returned strings
- * can be directly gotten from the first (and only)
- * input/constant/temporary. */
- PyArray_Descr *descr;
- if (n_inputs > 0) { /* input, like in 'a' where a -> 'foo' */
- descr = PyArray_DESCR(PyTuple_GET_ITEM(a_inputs, 1));
- Py_INCREF(descr);
- } else { /* constant, like in '"foo"' */
- descr = PyArray_DescrFromType(PyArray_STRING);
- descr->elsize = self->memsizes[1];
- } /* no string temporaries, so no third case */
- self->memsteps[0] = self->memsizes[0] = self->memsizes[1];
- output = PyArray_SimpleNewFromDescr(n_dimensions, shape, descr);
- }
- if (!output) goto cleanup_and_exit;
+
+ /* Get the sizes of all the operands */
+ dtypes_tmp = NpyIter_GetDescrArray(iter);
+ for (i = 0; i < n_inputs+1; ++i) {
+ self->memsizes[i] = dtypes_tmp[i]->elsize;
}
+ /* For small calculations, just use 1 thread */
+ if (NpyIter_GetIterSize(iter) < 2*BLOCK_SIZE1) {
+ force_serial = 1;
+ }
- r = run_interpreter(self, size, PyArray_DATA(output), inputs, inddata,
- &pc_error);
+ /* Reductions do not support parallel execution yet */
+ if (is_reduction) {
+ force_serial = 1;
+ }
+
+ r = run_interpreter(self, iter, reduce_iter,
+ reduction_outer_loop, &pc_error);
if (r < 0) {
- Py_XDECREF(output);
- output = NULL;
if (r == -1) {
- PyErr_SetString(PyExc_RuntimeError,
- "an error occurred while running the program");
+ if (!PyErr_Occurred()) {
+ PyErr_SetString(PyExc_RuntimeError,
+ "an error occurred while running the program");
+ }
} else if (r == -2) {
PyErr_Format(PyExc_RuntimeError,
"bad argument at pc=%d", pc_error);
@@ -1736,19 +2063,37 @@ NumExpr_run(NumExprObject *self, PyObject *args, PyObject *kwds)
PyErr_SetString(PyExc_RuntimeError,
"unknown error occurred while running the program");
}
+ goto fail;
+ }
+
+ /* Get the output from the iterator */
+ ret = (PyObject *)NpyIter_GetOperandArray(iter)[0];
+ Py_INCREF(ret);
+
+ NpyIter_Deallocate(iter);
+ if (reduce_iter != NULL) {
+ NpyIter_Deallocate(reduce_iter);
}
cleanup_and_exit:
- Py_XDECREF(a_inputs);
- PyMem_Del(inputs);
- if (inddata) {
- for (i = 0; i < n_inputs+1; i++) {
- if (inddata[i].count) {
- PyMem_Del(inddata[i].index);
- }
- }
+ for (i = 0; i < n_inputs+1; i++) {
+ Py_XDECREF(operands[i]);
+ Py_XDECREF(dtypes[i]);
}
- PyMem_Del(inddata);
- return output;
+
+ return ret;
+fail:
+ for (i = 0; i < n_inputs+1; i++) {
+ Py_XDECREF(operands[i]);
+ Py_XDECREF(dtypes[i]);
+ }
+ if (iter != NULL) {
+ NpyIter_Deallocate(iter);
+ }
+ if (reduce_iter != NULL) {
+ NpyIter_Deallocate(reduce_iter);
+ }
+
+ return NULL;
}
static PyMethodDef NumExpr_methods[] = {
diff --git a/numexpr/missing_posix_functions.inc b/numexpr/missing_posix_functions.inc
index eee6dae..a3ef26a 100644
--- a/numexpr/missing_posix_functions.inc
+++ b/numexpr/missing_posix_functions.inc
@@ -1,3 +1,12 @@
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
+
/* These functions are not included in some non-POSIX compilers,
like MSVC 7.1 */
diff --git a/numexpr/msvc_function_stubs.inc b/numexpr/msvc_function_stubs.inc
index e9b2c8c..dc40367 100644
--- a/numexpr/msvc_function_stubs.inc
+++ b/numexpr/msvc_function_stubs.inc
@@ -1,121 +1,130 @@
-/* Declare stub functions for MSVC. It turns out that single precision
- definitions in <math.h> are actually #define'd and are not usable
- as function pointers :-/ */
-
-#if _MSC_VER < 1400 // 1310 == MSVC 7.1
-/* Apparently, single precision functions are not included in MSVC 7.1 */
-
-#define sqrtf(x) ((float)sqrt((double)(x)))
-#define sinf(x) ((float)sin((double)(x)))
-#define cosf(x) ((float)cos((double)(x)))
-#define tanf(x) ((float)tan((double)(x)))
-#define asinf(x) ((float)asin((double)(x)))
-#define acosf(x) ((float)acos((double)(x)))
-#define atanf(x) ((float)atan((double)(x)))
-#define sinhf(x) ((float)sinh((double)(x)))
-#define coshf(x) ((float)cosh((double)(x)))
-#define tanhf(x) ((float)tanh((double)(x)))
-#define asinhf(x) ((float)asinh((double)(x)))
-#define acoshf(x) ((float)acosh((double)(x)))
-#define atanhf(x) ((float)atanh((double)(x)))
-#define logf(x) ((float)log((double)(x)))
-#define log1pf(x) ((float)log1p((double)(x)))
-#define log10f(x) ((float)log10((double)(x)))
-#define expf(x) ((float)exp((double)(x)))
-#define expm1f(x) ((float)expm1((double)(x)))
-#define fabsf(x) ((float)fabs((double)(x)))
-#define fmodf(x, y) ((float)fmod((double)(x), (double)(y)))
-#define atan2f(x, y) ((float)atan2((double)(x), (double)(y)))
-
-/* The next are directly called from interp_body.c */
-#define powf(x, y) ((float)pow((double)(x), (double)(y)))
-#define floorf(x) ((float)floor((double)(x)))
-
-#endif // _MSC_VER < 1400
-
-
-/* Now the actual stubs */
-
-inline float sqrtf2(float x) {
- return sqrtf(x);
-}
-
-inline float sinf2(float x) {
- return sinf(x);
-}
-
-inline float cosf2(float x) {
- return cosf(x);
-}
-
-inline float tanf2(float x) {
- return tanf(x);
-}
-
-inline float asinf2(float x) {
- return asinf(x);
-}
-
-inline float acosf2(float x) {
- return acosf(x);
-}
-
-inline float atanf2(float x) {
- return atanf(x);
-}
-
-inline float sinhf2(float x) {
- return sinhf(x);
-}
-
-inline float coshf2(float x) {
- return coshf(x);
-}
-
-inline float tanhf2(float x) {
- return tanhf(x);
-}
-
-inline float asinhf2(float x) {
- return asinhf(x);
-}
-
-inline float acoshf2(float x) {
- return acoshf(x);
-}
-
-inline float atanhf2(float x) {
- return atanhf(x);
-}
-
-inline float logf2(float x) {
- return logf(x);
-}
-
-inline float log1pf2(float x) {
- return log1pf(x);
-}
-
-inline float log10f2(float x) {
- return log10f(x);
-}
-
-inline float expf2(float x) {
- return expf(x);
-}
-
-inline float expm1f2(float x) {
- return expm1f(x);
-}
-
-inline float fabsf2(float x) {
- return fabsf(x);
-}
-
-inline float fmodf2(float x, float y) {
- return fmodf(x, y);
-}
-
-inline float atan2f2(float x, float y) {
- return atan2f(x, y);
-}
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
+
+/* Declare stub functions for MSVC. It turns out that single precision
+ definitions in <math.h> are actually #define'd and are not usable
+ as function pointers :-/ */
+
+#if _MSC_VER < 1400 // 1310 == MSVC 7.1
+/* Apparently, single precision functions are not included in MSVC 7.1 */
+
+#define sqrtf(x) ((float)sqrt((double)(x)))
+#define sinf(x) ((float)sin((double)(x)))
+#define cosf(x) ((float)cos((double)(x)))
+#define tanf(x) ((float)tan((double)(x)))
+#define asinf(x) ((float)asin((double)(x)))
+#define acosf(x) ((float)acos((double)(x)))
+#define atanf(x) ((float)atan((double)(x)))
+#define sinhf(x) ((float)sinh((double)(x)))
+#define coshf(x) ((float)cosh((double)(x)))
+#define tanhf(x) ((float)tanh((double)(x)))
+#define asinhf(x) ((float)asinh((double)(x)))
+#define acoshf(x) ((float)acosh((double)(x)))
+#define atanhf(x) ((float)atanh((double)(x)))
+#define logf(x) ((float)log((double)(x)))
+#define log1pf(x) ((float)log1p((double)(x)))
+#define log10f(x) ((float)log10((double)(x)))
+#define expf(x) ((float)exp((double)(x)))
+#define expm1f(x) ((float)expm1((double)(x)))
+#define fabsf(x) ((float)fabs((double)(x)))
+#define fmodf(x, y) ((float)fmod((double)(x), (double)(y)))
+#define atan2f(x, y) ((float)atan2((double)(x), (double)(y)))
+
+/* The next are directly called from interp_body.c */
+#define powf(x, y) ((float)pow((double)(x), (double)(y)))
+#define floorf(x) ((float)floor((double)(x)))
+
+#endif // _MSC_VER < 1400
+
+
+/* Now the actual stubs */
+
+inline float sqrtf2(float x) {
+ return sqrtf(x);
+}
+
+inline float sinf2(float x) {
+ return sinf(x);
+}
+
+inline float cosf2(float x) {
+ return cosf(x);
+}
+
+inline float tanf2(float x) {
+ return tanf(x);
+}
+
+inline float asinf2(float x) {
+ return asinf(x);
+}
+
+inline float acosf2(float x) {
+ return acosf(x);
+}
+
+inline float atanf2(float x) {
+ return atanf(x);
+}
+
+inline float sinhf2(float x) {
+ return sinhf(x);
+}
+
+inline float coshf2(float x) {
+ return coshf(x);
+}
+
+inline float tanhf2(float x) {
+ return tanhf(x);
+}
+
+inline float asinhf2(float x) {
+ return asinhf(x);
+}
+
+inline float acoshf2(float x) {
+ return acoshf(x);
+}
+
+inline float atanhf2(float x) {
+ return atanhf(x);
+}
+
+inline float logf2(float x) {
+ return logf(x);
+}
+
+inline float log1pf2(float x) {
+ return log1pf(x);
+}
+
+inline float log10f2(float x) {
+ return log10f(x);
+}
+
+inline float expf2(float x) {
+ return expf(x);
+}
+
+inline float expm1f2(float x) {
+ return expm1f(x);
+}
+
+inline float fabsf2(float x) {
+ return fabsf(x);
+}
+
+inline float fmodf2(float x, float y) {
+ return fmodf(x, y);
+}
+
+inline float atan2f2(float x, float y) {
+ return atan2f(x, y);
+}
diff --git a/numexpr/necompiler.py b/numexpr/necompiler.py
index b077fcc..899bcf6 100644
--- a/numexpr/necompiler.py
+++ b/numexpr/necompiler.py
@@ -1,3 +1,14 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
+import __future__
import sys
import numpy
@@ -17,6 +28,7 @@ type_to_kind = expressions.type_to_kind
kind_to_type = expressions.kind_to_type
default_type = kind_to_type[expressions.default_kind]
+
class ASTNode(object):
"""Abstract Syntax Tree node.
@@ -54,9 +66,7 @@ class ASTNode(object):
def __hash__(self):
if self.astType == 'alias':
self = self.value
- # Fast hash (see issue #43)
- return ( hash(self.astType) ^ hash(self.astKind) ^
- hash(self.value) ^ hash(self.children) )
+ return hash((self.astType, self.astKind, self.value, self.children))
def __str__(self):
return 'AST(%s, %s, %s, %s, %s)' % (self.astType, self.astKind,
@@ -88,9 +98,8 @@ def expressionToAST(ex):
This is necessary as ExpressionNode overrides many methods to act
like a number.
"""
- this_ast = ASTNode(ex.astType, ex.astKind, ex.value,
- [expressionToAST(c) for c in ex.children])
- return this_ast
+ return ASTNode(ex.astType, ex.astKind, ex.value,
+ [expressionToAST(c) for c in ex.children])
def sigPerms(s):
@@ -150,10 +159,8 @@ def typeCompileAst(ast):
else:
value = ast.value
children = ast.children
- new_ast = ASTNode(ast.astType, ast.astKind, value,
- [typeCompileAst(c) for c in children])
- return new_ast
-
+ return ASTNode(ast.astType, ast.astKind, value,
+ [typeCompileAst(c) for c in children])
class Register(object):
"""Abstraction for a register in the VM.
@@ -202,7 +209,11 @@ def stringToExpression(s, types, context):
try:
expressions._context.set_new_context(context)
# first compile to a code object to determine the names
- c = compile(s, '<expr>', 'eval')
+ if context.get('truediv', False):
+ flags = __future__.division.compiler_flag
+ else:
+ flags = 0
+ c = compile(s, '<expr>', 'eval', flags)
# make VariableNode's for the names
names = {}
for name in c.co_names:
@@ -241,7 +252,10 @@ def getInputOrder(ast, input_order=None):
if input_order:
if variable_names != set(input_order):
- raise ValueError("input names don't match those found in expression")
+ raise ValueError(
+ "input names (%s) don't match those found in expression (%s)"
+ % (input_order, variable_names))
+
ordered_names = input_order
else:
ordered_names = list(variable_names)
@@ -310,28 +324,34 @@ def collapseDuplicateSubtrees(ast):
for a in aliases:
while a.value.astType == 'alias':
a.value = a.value.value
- a.reg = a.value.reg
+ return aliases
def optimizeTemporariesAllocation(ast):
"""Attempt to minimize the number of temporaries needed, by
reusing old ones.
"""
- nodes = list(x for x in ast.postorderWalk() if x.reg.temporary)
+ nodes = [n for n in ast.postorderWalk() if n.reg.temporary]
users_of = dict((n.reg, set()) for n in nodes)
+
+ node_regs = dict((n, set(c.reg for c in n.children if c.reg.temporary))
+ for n in nodes)
if nodes and nodes[-1] is not ast:
- for c in ast.children:
- if c.reg.temporary:
- users_of[c.reg].add(ast)
- for n in reversed(nodes):
+ nodes_to_check = nodes + [ast]
+ else:
+ nodes_to_check = nodes
+ for n in nodes_to_check:
for c in n.children:
if c.reg.temporary:
users_of[c.reg].add(n)
- unused = {'bool' : set(), 'int' : set(), 'long': set(), 'float' : set(),
- 'double': set(), 'complex' : set(), 'str': set()}
+
+ unused = {'bool': set(), 'int': set(), 'long': set(), 'float': set(),
+ 'double': set(), 'complex': set(), 'str': set()}
for n in nodes:
- for reg, users in users_of.iteritems():
- if n in users:
- users.remove(n)
+ for c in n.children:
+ reg = c.reg
+ if reg.temporary:
+ users = users_of[reg]
+ users.discard(n)
if not users:
unused[reg.node.astKind].add(reg)
if unused[n.astKind]:
@@ -361,7 +381,7 @@ def setRegisterNumbersForTemporaries(ast, start):
node.reg.n = node.value
continue
reg = node.reg
- if reg.n < 0:
+ if reg.n is None:
reg.n = start + seen
seen += 1
signature += reg.node.typecode()
@@ -378,13 +398,8 @@ def convertASTtoThreeAddrForm(ast):
I suppose this should be called three register form, but three
address form is found in compiler theory.
"""
- program = []
- for node in ast.allOf('op'):
- children = node.children
- instr = (node.value, node.reg) \
- + tuple([c.reg for c in children])
- program.append(instr)
- return program
+ return [(node.value, node.reg) + tuple([c.reg for c in node.children])
+ for node in ast.allOf('op')]
def compileThreeAddrForm(program):
"""Given a three address form of the program, compile it a string that
@@ -394,7 +409,7 @@ def compileThreeAddrForm(program):
if reg is None:
return '\xff'
elif reg.n < 0:
- raise ValueError("negative value for register number %s" % (reg.n,))
+ raise ValueError("negative value for register number %s" % reg.n)
else:
return chr(reg.n)
@@ -405,10 +420,10 @@ def compileThreeAddrForm(program):
ca2 = nToChr(a2)
return cop + cs + ca1 + ca2
- def toString(*args):
+ def toString(args):
while len(args) < 4:
args += (None,)
- opcode, store, a1, a2 = args[0:4]
+ opcode, store, a1, a2 = args[:4]
s = quadrupleToString(opcode, store, a1, a2)
l = [s]
args = args[4:]
@@ -418,31 +433,38 @@ def compileThreeAddrForm(program):
args = args[3:]
return ''.join(l)
- prog_str = ''.join([toString(*t) for t in program])
+ prog_str = ''.join([toString(t) for t in program])
return prog_str
context_info = [
('optimization', ('none', 'moderate', 'aggressive'), 'aggressive'),
+ ('truediv', (False, True, 'auto'), 'auto')
]
-def getContext(map):
+
+def getContext(kwargs, frame_depth=1):
+ d = kwargs.copy()
context = {}
for name, allowed, default in context_info:
- value = map.pop(name, default)
+ value = d.pop(name, default)
if value in allowed:
context[name] = value
else:
raise ValueError("'%s' must be one of %s" % (name, allowed))
- if map:
- raise ValueError("Unknown keyword argument '%s'" % map.popitem()[0])
- return context
+ if d:
+ raise ValueError("Unknown keyword argument '%s'" % d.popitem()[0])
+ if context['truediv'] == 'auto':
+ caller_globals = sys._getframe(frame_depth + 1).f_globals
+ context['truediv'] = \
+ caller_globals.get('division', None) == __future__.division
-def precompile(ex, signature=(), copy_args=(), **kwargs):
+ return context
+
+def precompile(ex, signature=(), context={}):
"""Compile the expression to an intermediate form.
"""
types = dict(signature)
input_order = [name for (name, type_) in signature]
- context = getContext(kwargs)
if isinstance(ex, str):
ex = stringToExpression(ex, types, context)
@@ -452,29 +474,20 @@ def precompile(ex, signature=(), copy_args=(), **kwargs):
ast = expressionToAST(ex)
- # Add a copy for strided or unaligned unidimensional arrays
- for a in ast.postorderWalk():
- if a.astType == "variable" and a.value in copy_args:
- newVar = ASTNode(*a.key())
- a.astType, a.value, a.children = ('op', 'copy', (newVar,))
-
- if ex.astType not in ('op'):
+ if ex.astType != 'op':
ast = ASTNode('op', value='copy', astKind=ex.astKind, children=(ast,))
ast = typeCompileAst(ast)
- reg_num = [-1]
- def registerMaker(node, temporary=False):
- reg = Register(node, temporary=temporary)
- reg.n = reg_num[0]
- reg_num[0] -= 1
- return reg
+ aliases = collapseDuplicateSubtrees(ast)
assignLeafRegisters(ast.allOf('raw'), Immediate)
- assignLeafRegisters(ast.allOf('variable', 'constant'), registerMaker)
- assignBranchRegisters(ast.allOf('op'), registerMaker)
+ assignLeafRegisters(ast.allOf('variable', 'constant'), Register)
+ assignBranchRegisters(ast.allOf('op'), Register)
- collapseDuplicateSubtrees(ast)
+ # assign registers for aliases
+ for a in aliases:
+ a.reg = a.value.reg
input_order = getInputOrder(ast, input_order)
constants_order, constants = getConstants(ast)
@@ -511,8 +524,21 @@ def NumExpr(ex, signature=(), copy_args=(), **kwargs):
Returns a `NumExpr` object containing the compiled function.
"""
+ # NumExpr can be called either directly by the end-user, in which case
+ # kwargs need to be sanitized by getContext, or by evaluate,
+ # in which case kwargs are in already sanitized.
+ # In that case frame_depth is wrong (it should be 2) but it doesn't matter
+ # since it will not be used (because truediv='auto' has already been
+ # translated to either True or False).
+
+ # NOTE: `copy_args` is not necessary from 2.0 on. It remains here
+ # basically because PyTables trusted on it for certain operations.
+ # I have filed a ticket for PyTables asking for its removal:
+ # https://github.com/PyTables/PyTables/issues/117
+
+ context = getContext(kwargs, frame_depth=1)
threeAddrProgram, inputsig, tempsig, constants, input_names = \
- precompile(ex, signature, copy_args, **kwargs)
+ precompile(ex, signature, context)
program = compileThreeAddrForm(threeAddrProgram)
return interpreter.NumExpr(inputsig, tempsig, program, constants,
input_names)
@@ -608,20 +634,56 @@ def getExprNames(text, context):
_names_cache = CacheDict(256)
_numexpr_cache = CacheDict(256)
-def evaluate(ex, local_dict=None, global_dict=None, **kwargs):
- """Evaluate a simple array expression element-wise.
+def evaluate(ex, local_dict=None, global_dict=None,
+ out=None, order='K', casting='safe', **kwargs):
+ """Evaluate a simple array expression element-wise, using the new iterator.
ex is a string forming an expression, like "2*a+3*b". The values for "a"
and "b" will by default be taken from the calling function's frame
(through use of sys._getframe()). Alternatively, they can be specifed
using the 'local_dict' or 'global_dict' arguments.
+
+ Parameters
+ ----------
+
+ local_dict : dictionary, optional
+ A dictionary that replaces the local operands in current frame.
+
+ global_dict : dictionary, optional
+ A dictionary that replaces the global operands in current frame.
+
+ out : NumPy array, optional
+ An existing array where the outcome is going to be stored. Care is
+ required so that this array has the same shape and type than the
+ actual outcome of the computation. Useful for avoiding unnecessary
+ new array allocations.
+
+ order : {'C', 'F', 'A', or 'K'}, optional
+ Controls the iteration order for operands. 'C' means C order, 'F'
+ means Fortran order, 'A' means 'F' order if all the arrays are
+ Fortran contiguous, 'C' order otherwise, and 'K' means as close to
+ the order the array elements appear in memory as possible. For
+ efficient computations, typically 'K'eep order (the default) is
+ desired.
+
+ casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
+ Controls what kind of data casting may occur when making a copy or
+ buffering. Setting this to 'unsafe' is not recommended, as it can
+ adversely affect accumulations.
+
+ * 'no' means the data types should not be cast at all.
+ * 'equiv' means only byte-order changes are allowed.
+ * 'safe' means only casts which can preserve values are allowed.
+ * 'same_kind' means only safe casts or casts within a kind,
+ like float64 to float32, are allowed.
+ * 'unsafe' means any data conversions may be done.
"""
if not isinstance(ex, str):
raise ValueError("must specify expression as a string")
# Get the names for this expression
- expr_key = (ex, tuple(sorted(kwargs.items())))
+ context = getContext(kwargs, frame_depth=1)
+ expr_key = (ex, tuple(sorted(context.items())))
if expr_key not in _names_cache:
- context = getContext(kwargs)
_names_cache[expr_key] = getExprNames(ex, context)
names, ex_uses_vml = _names_cache[expr_key]
# Get the arguments based on the names.
@@ -630,61 +692,25 @@ def evaluate(ex, local_dict=None, global_dict=None, **kwargs):
local_dict = call_frame.f_locals
if global_dict is None:
global_dict = call_frame.f_globals
- arguments = []
- copy_args = []
+ arguments = []
for name in names:
try:
a = local_dict[name]
except KeyError:
a = global_dict[name]
- b = numpy.asarray(a)
- # Byteswapped arrays are dealt with in the extension
- # All the opcodes can deal with strided arrays directly as
- # long as they are undimensional (strides in other
- # dimensions are dealt within the extension), so we don't
- # need a copy for the strided case.
-
- if not b.flags.aligned:
- # Only take actions if CPU is different from AMD and Intel
- # as they can deal with unaligned arrays very efficiently.
- # If using VML, do the copy as the VML functions works
- # much faster with aligned arrays.
- if not is_cpu_amd_intel or use_vml:
- # For the unaligned case, we have two cases:
- if b.ndim == 1:
- # For unidimensional arrays we can use the copy
- # opcode because it can deal with unaligned arrays
- # as long as they are unidimensionals with a
- # possible stride (very common case for
- # recarrays). This can be up to 2x faster than
- # doing a copy using NumPy.
- copy_args.append(name)
- else:
- # For multimensional unaligned arrays do a plain
- # copy. We could refine more this and do a plain
- # copy only in the case that strides doesn't exist
- # in dimensions other than the last one (whose
- # case is supported by the copy opcode).
- b = b.copy()
- elif use_vml and ex_uses_vml: #only make a copy of strided arrays if
- #vml is in use
- if not b.flags.contiguous:
- if b.ndim == 1:
- copy_args.append(name)
- else:
- b = b.copy()
-
- arguments.append(b)
+ arguments.append(numpy.asarray(a))
# Create a signature
signature = [(name, getType(arg)) for (name, arg) in zip(names, arguments)]
- # Look up numexpr if possible. copy_args *must* be added to the key,
- # just in case a non-copy expression is already in cache.
- numexpr_key = expr_key + (tuple(signature),) + tuple(copy_args)
+
+ # Look up numexpr if possible.
+ numexpr_key = expr_key + (tuple(signature),)
try:
compiled_ex = _numexpr_cache[numexpr_key]
except KeyError:
compiled_ex = _numexpr_cache[numexpr_key] = \
- NumExpr(ex, signature, copy_args, **kwargs)
- return compiled_ex(*arguments)
+ NumExpr(ex, signature, **context)
+ kwargs = {'out': out, 'order': order, 'casting': casting,
+ 'ex_uses_vml': ex_uses_vml}
+ return compiled_ex(*arguments, **kwargs)
diff --git a/numexpr/opcodes.inc b/numexpr/opcodes.inc
index 846633e..7534a57 100644
--- a/numexpr/opcodes.inc
+++ b/numexpr/opcodes.inc
@@ -1,3 +1,12 @@
+/*********************************************************************
+ Numexpr - Fast numerical array expression evaluator for NumPy.
+
+ License: MIT
+ Author: See AUTHORS.txt
+
+ See LICENSE.txt for details about copyright and rights to use.
+**********************************************************************/
+
/*
OPCODE(n, enum_name, exported, return_type, arg1_type, arg2_type, arg3_type)
@@ -45,7 +54,7 @@ OPCODE(26, OP_NE_BSS, "ne_bss", Tb, Ts, Ts, T0)
OPCODE(27, OP_CAST_IB, "cast_ib", Ti, Tb, T0, T0)
OPCODE(28, OP_COPY_II, "copy_ii", Ti, Ti, T0, T0)
-OPCODE(29, OP_ONES_LIKE_I, "ones_like_i", Ti, T0, T0, T0)
+OPCODE(29, OP_ONES_LIKE_II, "ones_like_ii", Ti, T0, T0, T0)
OPCODE(30, OP_NEG_II, "neg_ii", Ti, Ti, T0, T0)
OPCODE(31, OP_ADD_III, "add_iii", Ti, Ti, Ti, T0)
OPCODE(32, OP_SUB_III, "sub_iii", Ti, Ti, Ti, T0)
@@ -57,7 +66,7 @@ OPCODE(37, OP_WHERE_IBII, "where_ibii", Ti, Tb, Ti, Ti)
OPCODE(38, OP_CAST_LI, "cast_li", Tl, Ti, T0, T0)
OPCODE(39, OP_COPY_LL, "copy_ll", Tl, Tl, T0, T0)
-OPCODE(40, OP_ONES_LIKE_L, "ones_like_l", Tl, T0, T0, T0)
+OPCODE(40, OP_ONES_LIKE_LL, "ones_like_ll", Tl, T0, T0, T0)
OPCODE(41, OP_NEG_LL, "neg_ll", Tl, Tl, T0, T0)
OPCODE(42, OP_ADD_LLL, "add_lll", Tl, Tl, Tl, T0)
OPCODE(43, OP_SUB_LLL, "sub_lll", Tl, Tl, Tl, T0)
@@ -70,7 +79,7 @@ OPCODE(48, OP_WHERE_LBLL, "where_lbll", Tl, Tb, Tl, Tl)
OPCODE(49, OP_CAST_FI, "cast_fi", Tf, Ti, T0, T0)
OPCODE(50, OP_CAST_FL, "cast_fl", Tf, Tl, T0, T0)
OPCODE(51, OP_COPY_FF, "copy_ff", Tf, Tf, T0, T0)
-OPCODE(52, OP_ONES_LIKE_F, "ones_like_f", Tf, T0, T0, T0)
+OPCODE(52, OP_ONES_LIKE_FF, "ones_like_ff", Tf, T0, T0, T0)
OPCODE(53, OP_NEG_FF, "neg_ff", Tf, Tf, T0, T0)
OPCODE(54, OP_ADD_FFF, "add_fff", Tf, Tf, Tf, T0)
OPCODE(55, OP_SUB_FFF, "sub_fff", Tf, Tf, Tf, T0)
@@ -87,7 +96,7 @@ OPCODE(64, OP_CAST_DI, "cast_di", Td, Ti, T0, T0)
OPCODE(65, OP_CAST_DL, "cast_dl", Td, Tl, T0, T0)
OPCODE(66, OP_CAST_DF, "cast_df", Td, Tf, T0, T0)
OPCODE(67, OP_COPY_DD, "copy_dd", Td, Td, T0, T0)
-OPCODE(68, OP_ONES_LIKE_D, "ones_like_d", Td, T0, T0, T0)
+OPCODE(68, OP_ONES_LIKE_DD, "ones_like_dd", Td, T0, T0, T0)
OPCODE(69, OP_NEG_DD, "neg_dd", Td, Td, T0, T0)
OPCODE(70, OP_ADD_DDD, "add_ddd", Td, Td, Td, T0)
OPCODE(71, OP_SUB_DDD, "sub_ddd", Td, Td, Td, T0)
@@ -107,7 +116,7 @@ OPCODE(82, OP_CAST_CI, "cast_ci", Tc, Ti, T0, T0)
OPCODE(83, OP_CAST_CL, "cast_cl", Tc, Tl, T0, T0)
OPCODE(84, OP_CAST_CF, "cast_cf", Tc, Tf, T0, T0)
OPCODE(85, OP_CAST_CD, "cast_cd", Tc, Td, T0, T0)
-OPCODE(86, OP_ONES_LIKE_C, "ones_like_c", Tc, T0, T0, T0)
+OPCODE(86, OP_ONES_LIKE_CC, "ones_like_cc", Tc, T0, T0, T0)
OPCODE(87, OP_COPY_CC, "copy_cc", Tc, Tc, T0, T0)
OPCODE(88, OP_NEG_CC, "neg_cc", Tc, Tc, T0, T0)
OPCODE(89, OP_ADD_CCC, "add_ccc", Tc, Tc, Tc, T0)
diff --git a/numexpr/tests/__init__.py b/numexpr/tests/__init__.py
index 0f2391e..3fff411 100644
--- a/numexpr/tests/__init__.py
+++ b/numexpr/tests/__init__.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
from numexpr.tests.test_numexpr import test, print_versions
if __name__ == '__main__':
diff --git a/numexpr/tests/test_numexpr.py b/numexpr/tests/test_numexpr.py
index 52a8129..4f65463 100644
--- a/numexpr/tests/test_numexpr.py
+++ b/numexpr/tests/test_numexpr.py
@@ -1,4 +1,15 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import new, sys, os
+import warnings
import numpy
from numpy import (
@@ -9,7 +20,7 @@ from numpy import (
sinh, cosh, tanh, arcsinh, arccosh, arctanh,
log, log1p, log10, exp, expm1)
from numpy.testing import *
-from numpy import shape, allclose, ravel, isnan, isinf
+from numpy import shape, allclose, array_equal, ravel, isnan, isinf
import numexpr
from numexpr import E, NumExpr, evaluate, disassemble, use_vml
@@ -19,8 +30,9 @@ TestCase = unittest.TestCase
double = numpy.double
+
# Recommended minimum versions
-minimum_numpy_version = "1.2"
+minimum_numpy_version = "1.6"
class test_numexpr(TestCase):
@@ -73,29 +85,42 @@ class test_numexpr(TestCase):
('prod_ddn', 'r0', 't3', 2)])
# Check that full reductions work.
x = zeros(1e5)+.01 # checks issue #41
- assert_equal(evaluate("sum(x+2,axis=0)"), sum(x+2,axis=0))
- assert_equal(evaluate("prod(x,axis=0)"), prod(x,axis=0))
+ assert_allclose(evaluate("sum(x+2,axis=None)"), sum(x+2,axis=None))
+ assert_allclose(evaluate("sum(x+2,axis=0)"), sum(x+2,axis=0))
+ assert_allclose(evaluate("prod(x,axis=0)"), prod(x,axis=0))
+
+ x = arange(10.0)
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
+
+ x = arange(100.0)
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x-1,axis=0)"), prod(x-1,axis=0))
+ x = linspace(0.1,1.0,2000)
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x-1,axis=0)"), prod(x-1,axis=0))
+
# Check that reductions along an axis work
y = arange(9.0).reshape(3,3)
- assert_equal(evaluate("sum(y**2, axis=1)"), sum(y**2, axis=1))
- assert_equal(evaluate("sum(y**2, axis=0)"), sum(y**2, axis=0))
- assert_equal(evaluate("sum(y**2, axis=None)"), sum(y**2, axis=None))
- assert_equal(evaluate("prod(y**2, axis=1)"), prod(y**2, axis=1))
- assert_equal(evaluate("prod(y**2, axis=0)"), prod(y**2, axis=0))
- assert_equal(evaluate("prod(y**2, axis=None)"), prod(y**2, axis=None))
+ assert_allclose(evaluate("sum(y**2, axis=1)"), sum(y**2, axis=1))
+ assert_allclose(evaluate("sum(y**2, axis=0)"), sum(y**2, axis=0))
+ assert_allclose(evaluate("sum(y**2, axis=None)"), sum(y**2, axis=None))
+ assert_allclose(evaluate("prod(y**2, axis=1)"), prod(y**2, axis=1))
+ assert_allclose(evaluate("prod(y**2, axis=0)"), prod(y**2, axis=0))
+ assert_allclose(evaluate("prod(y**2, axis=None)"), prod(y**2, axis=None))
# Check integers
x = arange(10.)
x = x.astype(int)
- assert_equal(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
- assert_equal(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
# Check longs
x = x.astype(long)
- assert_equal(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
- assert_equal(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
# Check complex
- x = x + 5j
- assert_equal(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
- assert_equal(evaluate("prod(x**2+2,axis=0)"), prod(x**2+2,axis=0))
+ x = x + .1j
+ assert_allclose(evaluate("sum(x**2+2,axis=0)"), sum(x**2+2,axis=0))
+ assert_allclose(evaluate("prod(x-1,axis=0)"), prod(x-1,axis=0))
def test_axis(self):
y = arange(9.0).reshape(3,3)
@@ -111,12 +136,16 @@ class test_numexpr(TestCase):
pass
else:
raise ValueError("should raise exception!")
-
-
-
+ try:
+ # Negative axis are not supported
+ evaluate("sum(y, axis=-1)")
+ except ValueError:
+ pass
+ else:
+ raise ValueError("should raise exception!")
def test_r0_reuse(self):
- assert_equal(disassemble(NumExpr("x**2+2", [('x', double)])),
+ assert_equal(disassemble(NumExpr("x * x + 2", [('x', double)])),
[('mul_ddd', 'r0', 'r1[x]', 'r1[x]'),
('add_ddd', 'r0', 'r0', 'c2[2.0]')])
@@ -156,6 +185,25 @@ class test_evaluate(TestCase):
x2[1] = 1
assert_array_equal(x2, y)
+ # Test for issue #22
+ def test_true_div(self):
+ x = arange(10, dtype='i4')
+ assert_array_equal(evaluate("x/2"), x / 2)
+ assert_array_equal(evaluate("x/2", truediv=False), x / 2)
+ assert_array_equal(evaluate("x/2", truediv='auto'), x / 2)
+ assert_array_equal(evaluate("x/2", truediv=True), x / 2.0)
+
+ # PyTables uses __nonzero__ among ExpressionNode objects internally
+ # so this should be commented out for the moment. See #24.
+ def _test_boolean_operator(self):
+ x = arange(10, dtype='i4')
+ try:
+ evaluate("(x > 1) and (x < 9)")
+ except TypeError:
+ pass
+ else:
+ raise ValueError("should raise exception!")
+
def test_rational_expr(self):
a = arange(1e6)
b = arange(1e6) * 0.1
@@ -201,7 +249,7 @@ class test_evaluate(TestCase):
def test_all_scalar(self):
a = 3.
b = 4.
- assert_equal(evaluate("a+b"), a+b)
+ assert_allclose(evaluate("a+b"), a+b)
expr = NumExpr("2*a+3*b",[('a', double),('b', double)])
assert_equal(expr(a,b), 2*a+3*b)
@@ -255,7 +303,7 @@ tests = [
'1+1',
'1',
'cos(a2)',
- '(a+1)**0'])]
+ ])]
optests = []
for op in list('+-*/%') + ['**']:
@@ -282,31 +330,35 @@ for func in ['copy', 'ones_like', 'sqrt',
'sinh', 'cosh', 'tanh', 'arcsinh', 'arccosh', 'arctanh',
'log', 'log1p', 'log10', 'exp', 'expm1', 'abs']:
func1tests.append("a + %s(b+c)" % func)
-tests.append(('1-ARG FUNCS', func1tests))
+tests.append(('1_ARG_FUNCS', func1tests))
func2tests = []
for func in ['arctan2', 'fmod']:
func2tests.append("a + %s(b+c, d+1)" % func)
func2tests.append("a + %s(b+c, 1)" % func)
func2tests.append("a + %s(1, d+1)" % func)
-tests.append(('2-ARG FUNCS', func2tests))
+tests.append(('2_ARG_FUNCS', func2tests))
powtests = []
-for n in (-2.5, -1.5, -1.3, -.5, 0, 0.5, 1, 0.5, 1, 2.3, 2.5):
+# n = -1, 0.5, 2, 4 already handled in section "OPERATIONS"
+for n in (-7, -2.5, -1.5, -1.3, -.5, 0, 0.0, 1, 2.3, 2.5, 3):
powtests.append("(a+1)**%s" % n)
-tests.append(('POW TESTS', powtests))
+tests.append(('POW_TESTS', powtests))
def equal(a, b, exact):
+ if array_equal(a, b):
+ return True
+
if hasattr(a, 'dtype') and a.dtype in ['f4','f8']:
nnans = isnan(a).sum()
- if isnan(a).sum() > 0:
+ if nnans > 0:
# For results containing NaNs, just check that the number
# of NaNs is the same in both arrays. This check could be
# made more exhaustive, but checking element by element in
# python space is very expensive in general.
return nnans == isnan(b).sum()
ninfs = isinf(a).sum()
- if isinf(a).sum() > 0:
+ if ninfs > 0:
# Ditto for Inf's
return ninfs == isinf(b).sum()
if exact:
@@ -324,22 +376,32 @@ class Skip(Exception): pass
def test_expressions():
test_no = [0]
def make_test_method(a, a2, b, c, d, e, x, expr,
- test_scalar, dtype, optimization, exact):
+ test_scalar, dtype, optimization, exact, section):
this_locals = locals()
def method():
- npval = eval(expr, globals(), this_locals)
+ # We don't want to listen at RuntimeWarnings like
+ # "overflows" or "divide by zero". Feel free to expand
+ # the range for this filter, if needed.
+ if dtype.__name__ == "float32" or 'arctanh' in expr:
+ warnings.simplefilter("ignore")
+ npval = eval(expr, globals(), this_locals)
+ warnings.simplefilter("always")
+ else:
+ npval = eval(expr, globals(), this_locals)
try:
neval = evaluate(expr, local_dict=this_locals,
optimization=optimization)
assert equal(npval, neval, exact), """%r
(test_scalar=%r, dtype=%r, optimization=%r, exact=%r,
- npval=%r (%r)\n neval=%r (%r))""" % (expr, test_scalar, dtype.__name__,
+ npval=%r (%r - %r)\n neval=%r (%r - %r))""" % (expr, test_scalar, dtype.__name__,
optimization, exact,
- npval, type(npval), neval, type(neval))
+ npval, type(npval), shape(npval),
+ neval, type(neval), shape(neval))
except AssertionError:
raise
except NotImplementedError:
- print('%r not implemented for %s' % (expr,dtype.__name__))
+ print('%r not implemented for %s (scalar=%d, opt=%s)'
+ % (expr, dtype.__name__, test_scalar, optimization))
except:
print('numexpr error for expression %r' % (expr,))
raise
@@ -347,11 +409,15 @@ def test_expressions():
'dtype=%r, optimization=%r, exact=%r)') \
% (expr, test_scalar, dtype.__name__, optimization, exact)
test_no[0] += 1
- method.__name__ = 'test_%04d' % (test_no[0],)
+ method.__name__ = 'test_scalar%d_%s_%s_%s_%04d' % (test_scalar,
+ dtype.__name__,
+ optimization,
+ section,
+ test_no[0])
return method
x = None
- for test_scalar in [0,1,2]:
- for dtype in [int, long, numpy.float32, double, complex]:
+ for test_scalar in (0, 1, 2):
+ for dtype in (int, long, numpy.float32, double, complex):
array_size = 100
a = arange(2*array_size, dtype=dtype)[::2]
a2 = zeros([array_size, array_size], dtype=dtype)
@@ -365,9 +431,9 @@ def test_expressions():
x += 1j
x *= 1+1j
if test_scalar == 1:
- a = a[array_size/2]
+ a = a[array_size / 2]
if test_scalar == 2:
- b = b[array_size/2]
+ b = b[array_size / 2]
for optimization, exact in [
('none', False), ('moderate', False), ('aggressive', False)]:
for section_name, section_tests in tests:
@@ -375,16 +441,18 @@ def test_expressions():
if dtype == complex and (
'<' in expr or '>' in expr or '%' in expr
or "arctan2" in expr or "fmod" in expr):
- # skip complex comparisons or functions not
- # defined in complex domain.
+ # skip complex comparisons or functions not
+ # defined in complex domain.
continue
if (dtype in (int, long) and test_scalar and
expr == '(a+1) ** -1'):
continue
+
m = make_test_method(a, a2, b, c, d, e, x,
expr, test_scalar, dtype,
- optimization, exact)
- yield m,
+ optimization, exact,
+ section_name)
+ yield m
class test_int64(TestCase):
def test_neg(self):
@@ -561,6 +629,28 @@ class test_irregular_stride(TestCase):
assert_array_equal(f0[i0], arange(5, dtype=int32))
assert_array_equal(f1[i1], arange(5, dtype=float64))
+# Cases for testing arrays with dimensions that can be zero.
+class test_zerodim(TestCase):
+
+ def test_zerodim1d(self):
+ a0 = array([], dtype=int32)
+ a1 = array([], dtype=float64)
+
+ r0 = evaluate('a0 + a1')
+ r1 = evaluate('a0 * a1')
+
+ assert_array_equal(r0, a1)
+ assert_array_equal(r1, a1)
+
+ def test_zerodim3d(self):
+ a0 = array([], dtype=int32).reshape(0,2,4)
+ a1 = array([], dtype=float64).reshape(0,2,4)
+
+ r0 = evaluate('a0 + a1')
+ r1 = evaluate('a0 * a1')
+
+ assert_array_equal(r0, a1)
+ assert_array_equal(r1, a1)
# Case test for threads
class test_threading(TestCase):
@@ -585,7 +675,10 @@ def _worker(qout = None):
# Case test for subprocesses (via multiprocessing module)
class test_subprocess(TestCase):
def test_multiprocess(self):
- import multiprocessing as mp
+ try:
+ import multiprocessing as mp
+ except ImportError:
+ return
# Check for two threads at least
numexpr.set_num_threads(2)
#print "**** Running from main process:"
@@ -640,12 +733,16 @@ def suite():
class TestExpressions(TestCase):
pass
- for m, in test_expressions():
+
+ def add_method(func):
def method(self):
- return m()
- setattr(TestExpressions, m.__name__,
+ return func()
+ setattr(TestExpressions, func.__name__,
new.instancemethod(method, None, TestExpressions))
+ for func in test_expressions():
+ add_method(func)
+
for n in range(niter):
theSuite.addTest(unittest.makeSuite(test_numexpr1))
theSuite.addTest(unittest.makeSuite(test_numexpr2))
@@ -656,6 +753,7 @@ def suite():
theSuite.addTest(unittest.makeSuite(test_strings))
theSuite.addTest(
unittest.makeSuite(test_irregular_stride) )
+ theSuite.addTest(unittest.makeSuite(test_zerodim))
theSuite.addTest(unittest.makeSuite(test_subprocess))
# I need to put this test after test_subprocess because
# if not, the test suite locks immediately before test_subproces.
@@ -668,3 +766,5 @@ def suite():
if __name__ == '__main__':
print_versions()
unittest.main(defaultTest = 'suite')
+# suite = suite()
+# unittest.TextTestRunner(verbosity=2).run(suite)
diff --git a/numexpr/utils.py b/numexpr/utils.py
index e32f3d4..0cfa7e5 100644
--- a/numexpr/utils.py
+++ b/numexpr/utils.py
@@ -1,3 +1,13 @@
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import os
from numexpr.interpreter import _set_num_threads
diff --git a/numexpr/version.py b/numexpr/version.py
index cf03429..8e0f9ac 100644
--- a/numexpr/version.py
+++ b/numexpr/version.py
@@ -1,4 +1,14 @@
-version='1.4.2'
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
+version='2.0.1'
release=True
if not release:
diff --git a/setup.py b/setup.py
index ece0531..26c9150 100644
--- a/setup.py
+++ b/setup.py
@@ -1,10 +1,32 @@
#!/usr/bin/env python
+###################################################################
+# Numexpr - Fast numerical array expression evaluator for NumPy.
+#
+# License: MIT
+# Author: See AUTHORS.txt
+#
+# See LICENSE.txt and LICENSES/*.txt for details about copyright and
+# rights to use.
+####################################################################
+
import shutil
-import os
+import os, sys
import os.path as op
from distutils.command.clean import clean
+import numpy
from numpy.distutils.command.build_ext import build_ext as numpy_build_ext
+
+minimum_numpy_version = "1.6"
+
+if sys.version_info < (2, 4):
+ raise "must use python 2.5 or greater"
+
+if numpy.__version__ < minimum_numpy_version:
+ print "*Error*: NumPy version is lower than needed: %s < %s" % \
+ (numpy.__version__, minimum_numpy_version)
+ sys.exit(1)
+
try:
import setuptools
except ImportError:
@@ -51,6 +73,7 @@ def configuration():
'depends': ['numexpr/interp_body.c',
'numexpr/complex_functions.inc',
'numexpr/msvc_function_stubs.inc'],
+ 'libraries': ['m'],
'extra_compile_args': ['-funroll-all-loops',],
}
dict_append(extension_config_data, **mkl_config_data)
@@ -120,23 +143,13 @@ def setup_package():
class build_ext(numpy_build_ext):
def build_extension(self, ext):
# at this point we know what the C compiler is.
- c = self.compiler
- old_compile_options = None
- # For MS Visual C, we use /O1 instead of the default /Ox,
- # as /Ox takes a long time (~5 mins) to compile.
- # The speed of the code isn't noticeably different.
- if c.compiler_type == 'msvc':
- if not c.initialized:
- c.initialize()
- old_compile_options = c.compile_options[:]
- if '/Ox' in c.compile_options:
- c.compile_options.remove('/Ox')
- c.compile_options.append('/O1')
+ if self.compiler.compiler_type == 'msvc':
ext.extra_compile_args = []
+ # also remove extra linker arguments msvc doesn't understand
+ ext.extra_link_args = []
+ # also remove gcc math library
+ ext.libraries.remove('m')
numpy_build_ext.build_extension(self, ext)
- if old_compile_options is not None:
- self.compiler.compile_options = old_compile_options
if __name__ == '__main__':
setup_package()
-
--
Packaging for numexpr
More information about the debian-science-commits
mailing list