[arrayfire-cuda] annotated tag upstream/v3.2.1+dfsg1 created (now 0148df4)

Sun Dec 20 10:37:30 UTC 2015

This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a change to annotated tag upstream/v3.2.1+dfsg1
in repository arrayfire-cuda.

        at  0148df4   (tag)
   tagging  29797ffd9150fa3494a6088c0263e3fe6f4636b7 (commit)
 tagged by  Ghislain Antony Vaillant
        on  Thu Dec 17 18:31:49 2015 +0000

- Log -----------------------------------------------------------------
Upstream version v3.2.1+dfsg1

Brian Kloppenborg (97):
      CMakeLists.txt does not need to be executable.
      Add definition and directive to fix BOOST_INLINE not being defined on nvcc / CUDA < 6.5. Move FIND command for CUDA into backend/cuda/CMakeLists.txt
      Restore default compile state.
      Merge branch 'cuda_6_0_compile_fix'
      Use ArrayFire's CUDA_VERSION instead of CMake-specific detection.
      Add install steps.
      Move macro to the top of the file.
      Added description, renamed Requirements to Prerequisites, put clBLAS and clFFT subheadings as h4 under OpenCL backend, point clFFT/clBLAS to ArrayFire fork, formatting.
      Use standard package handling, search in both system and local paths for clBLAS, add CLBLAS_ROOT_DIR hint.
      Add install steps for CUDA and OpenCL. Add workaround for clFFT and clBLAS installers.
      Move OpenCL into backend CMakeLists.txt
      Prefer system libraries.
      Leave CUDA and OpenCL off by default.
      Adding install instructions for tests and examples.
      Only install .h and .hpp files. Exclude the .gitignore file from the installation step.
      Include .hpp files.
      Remove installation of tests and examples.
      Add CMake find script from arrayfire_benchmark.
      Fixes #107
      - Add blas as CPU backend dependency
      Fixed merge conflicts.
      Strip whitespace from OpenCL device information.
      Merge pull request #161 from easuter/devel
      Add FindSubversion.cmake to ensure Subversion_SVN_EXECUTABLE is set.
      Quote user-supplied paths. Search CMAKE_INSTALL_PREFIX last to prefer local installs.
      Fix signed to unsigned comparision warning.
      Fix signed vs. unsigned integer comparison.
      Bugfix for signed vs. unsigned comparison error.
      Fix no return from non-void function.
      Fix out-of-bounds memory access in array-based indexing.
      Fix pedantic compiler warning.
      Add function to test/find first non-zero dimension
      Return user-specified dimension.
      Install ArrayFire version file (version.h)
      Fix documentation source directory for install
      Merge with upstream, fix conflict.
      Merge pull request #470 from glehmann/cmake-config
      Remove execution bit.
      Create .tar.gz package for libraries and documentation using CPack
      Make example CMakeList standalone
      Merge branch 'devel' into cmake_packaging
      Fix missing asset definition.
      Package examples
      Fix incorrect reference to ArrayFire libraries from FIND script.
      Restore example naming convention and output directories.
      Add missing includes for stand-alone compliation of examples.
      Add copyright to header.
      Install example assets along with examples.
      Add OS and architecture information to generated installer files.
      Fix OpenCL not found for examples.
      Remove FindArrayFire, script is automatically generated.
      Add source package. Create using
      Merge pull request #579 from pavanky/ocl_helpers
      Doc for using ArrayFire with external OpenCL code
      Merge pull request #583 from pentschev/err_cufft_fix
      Add self-extracting zip for Linux.
      Add DEB and RPM packages for Linux.
      Build DEB and RPM packages only on Linux.
      Pull packging out of main CMake file.
      Remove package directory, move packaging to CPack.txt file.
      Add installation documentation for Windows, OSX, and Linux.
      Moved installation instructions to Doxygen. Updated links.
      Fix markdown for Doxygen.
      Add Windows (MSVS) and Linux (CMake/Make) usage instructions.
      Add lapacke dependency for Debian-based distributions.
      Document AF CMake variables. Add non-standard install instructions.
      Add lapack(e) dependency on Fedora.
      Add compiliation instructions to examples. This resolves #606.
      Only build examples if full backend is found, fixes #738.
      Fix missing includes in standalone compilation.
      Sort member functions alphabetically.
      Revamp the getting started guide.
      Remove extra FindOpenCL.cmake file.
      Fix incorrect array construction.
      Fix missing init of vector.
      Skip complex<float> object creation in favor of direct init.
      Move example install directory.
      Fix typo in installer mode.
      Fix path for CMake config files.
      Added citation and acknowledgements.
      Move acknowledgements to separate file.
      Point all acknowledgement documents to GitHub repo.
      Automatically enable and build CUDA and OpenCL backends by default.
      Suppress FIND output for non-essential libraries.
      Update installation documentation to match current methods.
      Include special instructions for Windows.
      Add OpenGL requirements for Forge.
      Update documentation.
      Add Ubuntu 14.04 installation quirk
      Update INSTALL.md
      Update CMake and Make examples.
      Update section titles, helloworld exe.
      Update link to example projects.
      Add XCode instructions, assets.
      Add link to ArrayFire project template repo.
      Add PPA for glfw3 on Ubuntu 14.04
      Update INSTALL.md

Casey Goodlett (1):
      Fix googletest build on other cmake build types

Filipe Maia (14):
      Fix missing dereference.
      Ensure af_get_last_error and af_err_to_string are exported by including the public header.
      Add missing definitions for array_proxy operators
      Add missing constant() declaration
      Add support for s64 and u64 arrays
      Requires some knowledge of array_proxy_impl. Fixes #787
      Implement CUDA complex dot product
      Add complex dot product for CPU backend
      Add complex dot product for OpenCL backend
      Add forgotten instatiations
      A few functions (e.g. setunique) were not being exported to the library
      Fix double free issue.
      Add test for index copy assignment.
      Add move constructor and move assignment op

Gaëtan Lehmann (11):
      fix gtest build with ninja and simplify gtest external project
      simplify freeimage cmake code
      fix gtest byproducts
      install arrayfire cmake configuration and version files
      display a warning when the assets can't be found in the source dir
      set the lib version
      fix 1000+ warnings about unused function with clang
      replace tabs with 4 spaces
      fix the missing external projects byproducts
      fix the condition used to set the external projects by products
      fix missing include

Gallagher Pryor (2):
      fix for building gtest on systems w/ svn < v1.8
      Merge branch 'test' into 'master'

Gaëtan Lehmann (1):
      download and build the opencl dependencies with cmake

Ghislain Antony Vaillant (13):
      TYPO: fix ArrayFire URL
      add module centralizing all install paths
      use new install path variables
      make paths overridable
      fix #828: remove unnecessary include of a cmake module
      fix #957: remove unused dtype trait for size_t
      update cl.hpp to upstream version 1.2.7
      fix instantiation of Platform objects
      fix instantiation of cl::Platform objects
      Build and install documentation in a separate output folder.
      Add missing linkage with libdl
      Fix examples target.
      Clean tree from non DFSG-compliant files.

John Melonakos (1):
      Adding the license info for all source files in ArrayFire

Keno Fischer (5):
      Allow building against 64bit index OpenBlas
      Add support for BLAS symbol renaming
      Also use the void* interface for MKL on windows
      Don't link the installed version of forge on Linux/OSX
      Merge pull request #1 from umar456/JuliaComputing-kf/openblas64

Kumar Aatish (1):
      Changed OpenCL library search path order

Kyle Lutz (3):
      Add setUnique()/setUnion()/setIntersection() for OpenCL
      Use OpenCL error codes from Boost.Compute
      Reduce Boost.Compute header includes

Marius Brehler (2):
      Try PkgConf first to find CBLAS
      Try PkgConf first to find LAPACKE

Michael Carilli (1):
      Changed THREADS to 256 and BLOCKS to 64 in random.hpp.

Michael Nowotny (1):
      Added new example: heston_model in financial

Miguel Lloreda (2):
      fixed formatting
      Fixed typos in documentation.

Muneer Macbook (3):
      Added a mutex in the memory alloc and dealloc of the CPU backend to facilitate multi-threaded use
      Fixed memory leak in cpu/where.cpp
      Temporary workaround for incorrect weak copy symantics of af_array

Nathan Jackson (5):
      Added mean calculation support for complex types.
      Added tests for computing the mean of complex values.
      Moved division function into math utilities.  Fixed mean function.
      Added variance interface and var_all implementation.
      Fix race condition in reduce_first_kernel.

Pavan Yalamanchili (1255):
      Renaming the helper files
      API changes for data transfer functions
      Create context using DEFAULT instead of CPU for OpenCL backend
      Data type changes to make the backends self contained
      Merge branch 'testHelper' into 'master'
      BUG fix: Fixing CalcBaseStride for greater than 2 dimensions
      Merge branch 'cpu_diff1' into 'master'
      Merge branch 'moddims'
      Fixing build errors from diff branch
      Fixing warnings from the new tests
      Cleanup of src/backend
      Merge branch 'transpose' into 'master'
      Fixing warnings in transpose tests
      Merge branch 'random'
      Fixing constructors for CUDA and OpenCL
      Cleaning up CMAKE files to automatically pick up source files
      CMAKE Fix: Explicitly state source extensions
      Fixing the formula for baseoffset.
      FEAT: Reductions for CPU added
      Updating the formula to work with negative and strided offsets
      Cleaning up diff kernels to have unified functions
      Merge branch 'diff_cuda' into 'master'
      Merge branch 'cuda_trs' into 'master'
      Merge branch 'consistant_headers'
      Changed the headers from earlier merges to be consistent as well
      Merge branch 'reduce'
      Merge branch 'rand'
      Adding explicit methods to modify ArrayInfo
      Removed all references to af_array inside src/backend/*/.
      Make all Array<T> constructors private
      FEAT: randu and randn for CUDA implemented
      Merge branch 'ocl_fixes' into 'master'
      Changing loop iterators to proper type
      A better way to handle the template specialization of size_t
      Merge remote-tracking branch 'origin/diff_opencl'
      BUG Fix: calcStride was accessing out of bounds.
      Fixing the bugfix to calcStrides
      Separate out reduce and transform functors
      Merge remote-tracking branch 'origin/ocl_transpose'
      Using dim_type in diff for opencl instead of size_t
      Removing trailing whitespaces
      Getting rid of .cu files in src/backend/cuda/kernel/
      Remove the unnecessary template instantiations
      Minor changes post merge to imageio.cpp
      Merge remote-tracking branch 'origin/cpu_histogram'
      Fixes for tests to compile properly by adding std:: prefix
      Style change to fix the compiler warnings on gcc 4.9
      Indexing support for CUDA backend
      Indexing support for OpenCL
      Removing unnecessary print from test/index.cpp
      Enabling diff tests and minor fix to work with indexing
      Enabling transpose tests and fixes to make tranpose pass the tests
      Merge remote-tracking branch 'origin/resize'
      Merge remote-tracking branch 'origin/cpu_morph'
      Merge remote-tracking branch 'origin/cpu_bilateral'
      Changing variable name to be consistent with the rest of the file
      Moving the functions in cuda/complex.hpp to global namespace
      Changing ops.hpp and */backend.hpp to work nicely with NVCC
      Removing unnecessary include file
      Reductions for CUDA backend
      Accum implementation for CPU backend
      Merge remote-tracking branch 'origin/info_helpers'
      Merge remote-tracking branch 'origin/cuda_morph'
      Bug fix to random number generation in CUDA
      Adding random number generation support to OpenCL backend
      Merge remote-tracking branch 'origin/ocl_morph'
      Merge remote-tracking branch 'origin/transform'
      Merge remote-tracking branch 'origin/blas'
      Merge remote-tracking branch 'origin/cuda_bilateral'
      Merge remote-tracking branch 'origin/ocl_bilateral'
      Merge branch 'ocl_morph_opt' into 'master'
      Merge remote-tracking branch 'origin/cuda_histogram'
      Merge remote-tracking branch 'origin/approx'
      Merge remote-tracking branch 'origin/ocl_histogram'
      Merge remote-tracking branch 'origin/random'
      Changing the location of the data repository
      Added Param and CParam structs that can be passed to the GPU
      Renaming helper functions and functors
      Unified print function for all backends
      Cleaning up header files
      Merge remote-tracking branch 'origin/header'
      Merge remote-tracking branch 'origin/master' into unify
      Merge remote-tracking branch 'origin/intel_histfix'
      Adding Param<T> to the remaining functions in CUDA backend
      Merge remote-tracking branch 'origin/unify'
      Merge branch 'ocl_dselector' into 'master'
      Adding a missing std:: in opencl/platform.cpp
      Adding __CL_ENABLE_EXCEPTIONS to the build process
      Merge branch 'origin/ocl_kernel_caching'
      Merge remote-tracking branch 'origin/tile'
      Add caching support for tile in xOpenCL
      Fixing copy paste error
      Introduced a common struct and build function for OpenCL kernels
      Changing beta == 0 instead of memsetting C to 0 in gemm
      Merge branch 'origin/blas_fix'
      Merge remote-tracking branch 'origin/unify'
      extended CATCHALL to include Type and Support errors
      AfError now supports line numbers and user specified af_errs
      Added *_NOT_SUPPORTED macros for each backend
      Added macro CUDA_CHECK that checks for cudaError and throws AfError
      change cldebug to debug_opencl
      Added POST_LAUNCH_CHECK to CUDA backend
      Added new error type --> ArgumentError
      Changed backend/reduce.cpp to include the new error mechanisms
      Changed backend/diff.cpp to use new error checks
      Changed backend/morph.cpp to use new error checks
      Changed SHOW_CL_ERROR() to CL_TO_AF_ERROR() in opencl backend
      Fixing a minor bug for ArgumentError
      Fixing the dimension checks for backend/morph.cpp
      Fixing the morph tests to check for correct errors
      Moving ARG_ASSERT to within try catch blocks
      Merge remote-tracking branch 'origin/error'
      Cleaning up a couple of lines
      buildProgram now accepts multiple source files
      added iscplx to backend/opencl/traits.hpp
      Reductions backend for OpenCL
      Merge branch 'clreduce' into 'master'
      Merge remote-tracking branch 'origin/cuda_device'
      Merge remote-tracking branch 'origin/matmul_fixes'
      Merge remote-tracking branch 'origin/reorder'
      Fixing issues with CUDA reductions
      Fixing typo in scan tests
      Cleaning up reductions code a bit more
      Scan algorithm for CUDA implemented
      Change to make sure autogenerated string headers are only included once
      Style clean up of OpenCL reduction code
      Scan algorithm for OpenCL backend
      Merge remote-tracking branch 'origin/shift'
      Merge remote-tracking branch 'origin/scan'
      Merge remote-tracking branch 'origin/gradient'
      Merge remote-tracking branch 'origin/cpu_medfilt'
      Updating gitignore to include unwanted emacs files
      Merge remote-tracking branch 'origin/cuda_medfilt'
      Merge remote-tracking branch 'origin/ocl_medfilt'
      Bug fix to gradient in CUDA and OpenCL backends
      Merge remote-tracking branch 'origin/intel_scan_fix'
      Cleaning up buggy strided dimensions in scan for CUDA and OpenCL
      Merge branch 'scan_bugfix'
      Launch configuration fix for AMD GPUs
      Merge remote-tracking branch 'origin/cpu_fft'
      Cleaning up backend/scan.cpp to include proper error checks
      Adding support for where for CPU backend
      Fixing corner cases in scan algorithm for CUDA and OpenCL backends
      Exceptions now display file names instead of function names
      Added a new function to create Array<T> from Param<T>
      Where implemented for CUDA backend
      Making the double buffering in OpenCL backend more explicit
      Change scan tests to run on OpenCL devices available on the system
      Adding support to create Array<T> from Param in OpenCL backend
      Changing where in CUDA backend to pass out by reference
      Style changes to OpenCL scan function
      Tentative support for where in OpenCL backend
      Merge remote-tracking branch 'origin/fft'
      Modifying FindclFFT.cmake to look in clFFT build directory
      Change required to suppress comparision warnings
      Removing unnecessary variadic templates
      Removing "static" from template specializations
      Changes to cuda and OpenCL backends to improve parallel compiles
      Merge remote-tracking branch 'origin/compile'
      Making sure ret in imageio is initialized before returning
      Adding dl libs explicitly to the OpenCL backend
      Merge branch 'cpu_where'
      Fix header locations to fix compilation in debug mode
      Merge remote-tracking branch 'origin/meanshift'
      Merge remote-tracking branch 'origin/pad'
      Merge remote-tracking branch 'origin/master' into where
      BUG: Fixed boundary checks for scan_first in CUDA and OpenCL
      Passing Params as references to where_* in OpenCL backend
      Merge remote-tracking branch 'origin/where'
      Added simple JIT kernel generation for OpenCL backend
      Updated the OpenCL backend to have simpler kernel name generation
      Kernel compilation and Launching added for OpenCL JIT backend
      Reorganizing files in src/backend
      Cleaning up jit.cpp
      Adding logical functions to the external API
      Adding the last few binary functions
      Adding cast function to OpenCL JIT backend
      Unary functions added to OpenCL JIT backend
      Adding new binary functions to OpenCL JIT backend
      Adding support for ScalarNodes in OpenCL JIT backend
      ndims() now returns atleast 1 instead of 0 from before.
      Added proper error checking to af_print
      Fixing the API of af_cast
      Adding cache support for OpenCL JIT kernels
      Merge branch 'bug_fixes' into 'master'
      Changing the implicit cast behavior to mimic c/c++
      Adding CUDA/CPU_NOT_SUPPORTED macros for elementary operations
      JIT kernel generation support for OpenCL backend
      Removing unnecessary member variables from BufferNode
      BUG fix in JIT kernel generation in OpenCL backend
      Merge branch 'jit'
      Change required to make blas compile on centos 6
      Changing the cpu blas to depend on CBLAS instead of blas
      Merge remote-tracking branch 'origin/blas_fix'
      Fixes for OpenCL backend for gcc 4.7.2
      Updating the README.md
      Merge remote-tracking branch 'origin/ocl_fix'
      Adding FindMKL.cmake to ArrayFire repo
      Merge branch 'rotate_fix' into 'master'
      Merge branch 'fft_fix' into 'master'
      Bug fix in speicalizations for max<cfloat> and max<cdouble>
      Enabling double precision support for JIT kernels
      Fixing typo in opencl/jit.cpp
      Fixing the initial value for max on complex numbers
      Merge branch 'sort' into 'master'
      Merge branch 'warning-fixes' into 'master'
      Merge branch 'random_fix' into 'master'
      Merge branch 'platform_fixes' into 'master'
      Update README.md to have better formatting.
      Merge branch 'conv' into 'master'
      Stripping end of line characters from README.md
      Removing unnecessary line from cuda/CMakeLists.txt
      First draft of CUDA JIT
      Added FindNVVM.cmake
      Adding libcuda as a dependency
      Changes to make nvvm code to compile and execute
      Making the child nodes decide the types when calling functions
      Removing untracked folder from the repository
      Removing untracked folder from the repository
      Adding support for CAST and COMPLEX operations in CUDA backend
      Adding back tests in `basic.cpp` for CUDA backend
      Merge branch 'sort_split' into 'master'
      Merge branch 'bilateral_fixes' into 'master'
      Merge branch 'compute_cmake_fix' into 'master'
      Merge branch 'subref_assign' into 'master'
      Merge remote-tracking branch 'origin/jit'
      Element wise support for CPU backend
      Merge branch 'header-files' into 'master'
      Merge remote-tracking branch 'origin/TNJ'
      CPU backend now uses std::shared_ptr for holding data
      Using boost::shared_ptr for reference counting in CUDA backend
      Renaming files in CUDA backend
      Adding support for weak copy in src/backend/*.cpp
      Merge branch 'ocl_cmake_changes' into 'master'
      Merge branch 'ref'
      Bug fix for OpenCL backend when creating empty Arrays
      Adding basic functions to the C++ API
      Merge branch 'cuda_limit' into 'master'
      BUG_FIX: bin2cpp now adds NULL character towards the end of string
      bin2cpp now adds newline for CUDA but does not for OpenCL
      Merge branch 'regions' into 'master'
      Adding the license file to the repo
      Merge remote-tracking branch 'origin/master'
      Updating README.md to include clone command and fftw dependency
      Updating the arrayfire_data repo URL
      Updating README.md
      Merge pull request #25 from arrayfire/cpp_tests
      Merge pull request #28 from arrayfire/nan_inf_fix
      Merge pull request #33 from arrayfire/sort_cpp
      Merge pull request #31 from arrayfire/cmake_cuda_compute
      Merge pull request #35 from arrayfire/api_changes
      Merge data.h and reduce.h into algorithm.h
      Moving constant, randu and randn into af/data.h
      Moving approx1 and approx2 to af/signal.h
      Merge pull request #38 from arrayfire/header
      Moving important utility functions from data.cpp to handle.hpp
      Cleaning up the sort functions
      Adding set functions for the CPU backend
      Merge pull request #42 from kylelutz/opencl-set
      Fixing the iterators for union and intersect in OpenCL backend
      Adding set operations for CUDA backend
      Reducing the memory footprint for set_intersect
      Style changes
      Merge pull request #43 from arrayfire/set
      Merge pull request #48 from arrayfire/seq
      Making arrayfire_data a submodule
      Scoping out unimplemented code
      Adding af_eval() and array::eval()
      Adding af_get_device and af_sync to all backends
      Merge pull request #56 from arrayfire/data
      Merge pull request #58 from arrayfire/eval
      Merge pull request #59 from arrayfire/win_fixes
      Merge pull request #64 from arrayfire/timer
      Merge pull request #65 from 9prady9/additional_api
      Fixing a bug with +=, -=, *=, /=
      BUG fix: evaluate array before assignment operator
      Cleaning up moddims
      Changing enum so it does not clash with functions
      Merge pull request #66 from pavanky/misc
      Update CONTRIBUTING.md
      Merge pull request #69 from arrayfire/devel
      Merge pull request #72 from kylelutz/opencl-error-codes
      Merge pull request #74 from bkloppenborg/cmake_install
      Merge pull request #76 from arrayfire/minor-fixes
      Fixing compilation warnings in CPU and CUDA backends
      Merge pull request #79 from pavanky/warnings
      Merge pull request #75 from arrayfire/devel
      Reorganizing helloworld.cpp
      Updating README.md
      Update README.md
      Merge pull request #85 from 9prady9/format_array_print
      Merge pull request #87 from arrayfire/devel
      Adding new functions to src/backend
      Adding new methods to af::array class
      Merge pull request #91 from shehzan10/unary-fix
      Updating the commit hash of test/data submodule
      Destroy temporary variables from binary.cpp
      Make sure functions are not being declared more than once in NVVM IR
      Fixing a bug in CUDA backend to reset flags properly
      Merge pull request #94 from 9prady9/win_cblas_fixes
      Merge pull request #96 from arrayfire/devel
      Update README.md
      Merge pull request #98 from pavanky/readme
      Merge pull request #97 from kaatish/ocl_cmake_changes
      Merge pull request #99 from mlloreda/patch-1
      Fixed formatting
      Merge pull request #104 from firemanphil/master
      Merge pull request #102 from gcasey/buildfixes
      Merge pull request #105 from shehzan10/devel
      Merge pull request #112 from shehzan10/cuda_build_fix
      Merge branch 'issue_107' of https://github.com/bkloppenborg/arrayfire into devel
      Merge pull request #106 from arrayfire/devel
      Merge pull request #116 from shehzan10/transpose_perf
      Fixed bugs in ScalarNode for CUDA and OpenCL JIT backends
      Unary math functions convert arrays to floating point arrays
      Merge pull request #124 from 9prady9/index_tests
      Bug fix for reductions in CUDA backend
      Bug fix to random number generation in CUDA backend
      Removing deprecated files from the repo
      Suppress them warnings
      Fixing leaks in CUDA JIT backend
      Fixing Leaks in OpenCL JIT backend
      Fixing Memory leaks in CPU TNJ
      Merge branch 'pavanky/jit_fixes' into bugfixes
      Merge pull request #131 from pavanky/bugfixes
      Moving src/array to src/frontend/cpp
      Adding support for binary functions with scalar inputs
      Removed s8. Changed b8 to be of type char
      cast to b8 now results in arrays made up of 1s or 0s
      Add a debug version of CU_CHECK
      updated math operations for all backends
      Merge remote-tracking branch 'origin/arith' into devel
      Merge pull request #148 from arrayfire/devel
      Fixing complex function support in arrayfire
      Adding data check functions: isNaN, isInf, iszero
      Unifying af_constant_c32/c64 into af_constant_complex
      Support for global reductions in CPU backend
      Merge pull request #150 from bkloppenborg/findarrayfire_fixes
      Merge pull request #157 from shehzan10/devel
      Support for global reductions in CUDA backend
      Global reduction support for OpenCL backend
      changing reduce_global --> reduce_all
      Merge remote-tracking branch 'pavanky/algos' into devel
      Reorganizing the directory structure
      Unified the way complex numbers are printed
      Merge pull request #163 from shehzan10/transform_linear
      Merge pull request #165 from shehzan10/devel
      PERF: improvements to random number genration in CPU backend
      Wrapping af_get functions in AF_CHECK macro
      PERF: improvements to random number generation in CUDA backend
      Merge pull request #170 from umar456/devel
      PERF: improvements to random number generation in OpenCL backend
      Merge pull request #173 from bkloppenborg/FindclFFTImprovements
      Merge pull request #174 from shehzan10/devel
      Merge pull request #175 from pentschev/fast
      Merge pull request #179 from pentschev/fast_return_fix
      Merge branch 'devel' into perf
      PERF: improvements to CUDA JIT when memory is linear
      PERF: improvements to OpenCL JIT when memory is linear
      EXAMPLE: Monte Carlo estimation of PI
      BUGFIX: in JIT for CUDA backend
      Merge branch 'devel' into ocl_win_fixes
      correctly adding USE_DOUBLE to OpenCL JIT
      Merge pull request #188 from arrayfire/ocl_win_fixes
      Fixing the commit id of test/data submodule
      Merge pull request #190 from shehzan10/devel
      Merge pull request #181 from arrayfire/devel
      Merge pull request #191 from 9prady9/ocl_dev_sort
      PERF: Added memory manager for CUDA backend
      PERF: Added memory manager for CPU backend
      Merge pull request #192 from shehzan10/join
      Merge pull request #200 from shehzan10/imageio_fixes
      Merge pull request #202 from shehzan10/devel
      Merge pull request #205 from shehzan10/sort_fixes
      BUG: Fix in memory manager for CUDA backend with multiple devices
      Changing new/delete to malloc/free for CPU backend
      Adding variable names for MAX_BUFFERS and MAX_BYTES
      Changing Array.data from cl::Buffer to cl::Buffer *
      Fixing memory leak inside af_print_array
      PERF: Added memory manager for OpenCL backend
      Adding C api calls for malloc and free
      Changing the error message for pinned memory alloc / free
      Merge branch 'devel' into memory
      Fixing typo / bug in implicit.cpp
      Merge pull request #209 from shehzan10/devel
      Merge pull request #219 from shehzan10/devel
      PERF: using cuda::mem{Alloc,Free} instead of cuda{Malloc,Free}
      PERF: Improvements to reductions in CUDA and OpenCL
      BUG: Fixed issues with binary operations with scalar on LHS
      Updating math_ptx submodule
      Adding abs support for complex numbers
      Merge pull request #227 from 9prady9/conv2d_perf_fixes
      Merge pull request #229 from shehzan10/devel
      Updating CONTRIBUTING.md
      Merge pull request #238 from shehzan10/devel
      Merge pull request #239 from bkloppenborg/devel
      BUG: Fixed issues with atan2 in CUDA and OpenCL backends
      TEST: Adding global reduction tests
      Changing af::af_cfloat to af::cfloat for C++ API
      Properly catching and returning errors from af_sort*
      TESTS: Adding tests for math functions
      TEST: Adding tests for binary functions
      FEAT: Adding support for hypot
      TEST: Adding tests for complex binary functions
      FEAT: Adding identity function for all backends
      BUG: Fixed problem in cast for OpenCL backend
      BUG: Fixed a problem when casting complex numbers
      FEAT: Adding diag for all backends
      BUG: Fixed memory leak in C++ API when doing indexing
      SubArrays now contain reference to shared_ptr instead of parent
      Merge pull request #252 from shehzan10/devel
      Fixing problems with isOwner() in all backends
      Adding support for casting seq to array
      Default constructor now creates array of size (0,0,0,0)
      Minor changes to API
      Merge pull request #258 from mcclanahoochie/osx_fixes
      BUGFIX: Hotfix for cast in opencl backend
      Adding proper checks to tests
      Merge pull request #259 from arrayfire/devel
      Merge pull request #270 from umar456/clean_ocl_morph
      Merge pull request #271 from umar456/osx_build
      Fixing compilation errors
      Merge pull request #276 from shehzan10/devel
      adding math constants to ArrayFire
      Changing api of few functions to match v2.1
      BUG: Fixed issues with metadata while indexing
      cleaning up bugs created by previous commit
      Remove warnings when running fft in OpenCL backend
      Initial commit wih gfor support
      Merge pull request #285 from shehzan10/devel
      Adding dimension checks for cplx2
      Binary functions in C API now have batchMode parameter
      binaryNode now accepts output dimension size
      Adding support for batch mode in all backends
      Merge remote-tracking branch 'upstream/devel' into gfor
      Adding proper error checking macros to src/api/c/index.cpp
      Adding batchFunc support for CPP bakend
      FEAT: Adding GFOR support with for indexing
      EXAMPLE: Adding vectorize example to arrayfire
      Changing batchMode to batch
      Merge pull request #303 from 9prady9/match_template
      Cleaning up error handling in src/api/c/
      Adding error messages when necessary for CPP API functions
      Adding bounds checks for index and assign
      Merge pull request #307 from shehzan10/devel
      Merge pull request #311 from shehzan10/devel
      Merge pull request #312 from 9prady9/perf_fixes
      Merge pull request #317 from shehzan10/devel
      Merge pull request #319 from arrayfire/devel
      Exposing ArrayFire OpenCL internals for interoperability
      Fixing compile issues in OSX when using af/opencl.h
      BUGFIX: Fixing GFOR bug during assign
      Cleaning up the error checking in api/c/binary.cpp
      BUGFIX: seq --> array inside GFOR creates batche array
      BUGFIX: Fixed OpenCL JIT bug when variables were going out of scope
      Fixing typo in ToNum()
      Merge pull request #334 from pentschev/devel
      Merge pull request #337 from pentschev/fix_windows_cuda_math
      Merge pull request #335 from pentschev/devel
      Fixing warnings in ORB implementation and tests
      BUGFIX: Fixing data access patterns in OpenCL backend for diag
      BUGFIX: Fixing data access patterns in OpenCL backend for identity
      Fixing commit id for test/data
      BUGFIX: dims() now gets dimensions properly after indexing
      BUGFIX: Fixing issues with indexing after JIT operation
      BUGFIX/FEAT: Adding support for more 4d indexing operations
      FEAT: Adding support for negative offsets from end in CPP API
      BUGFIX: Fixed memory leak in af_copy_array
      Merge branch 'sobel' of https://github.com/9prady9/arrayfire into devel
      Merge pull request #344 from pentschev/fix_windows_orb
      BUGFIX: Fixing indexing to support reverse indexing
      Merge pull request #354 from pentschev/orb_fixes
      BUGFIX: Assignment operators now properly implement copy on write
      TEST: Adding additional tests for CPP indexing
      TEST: Adding new tests for CPP assign operators
      FEAT: Added support for bitand, bitor and bitxor for all backends
      FEAT: Adding preliminary support for 64 bit integers
      FEAT: reorder, transpose, moddims support for 64 bit ints
      FEAT: Adding binary function support for 64 bit ints
      BUGFIX: for numeric operations on integer types in OpenCL backend
      FEAT: CUDA backend support for numerical operations on 64 bit ints
      BUGFIX: Enabling mod / rem for integer types
      BUGFIX: Changing % to mean remainder instead of modulus
      Cleaning up mod and rem for integer types
      FEAT: Adding bitshiftl, bitshiftr
      TEST: Adding tests for 64 bit ints and bit shift functions
      Compile fix for windows
      Merge pull request #360 from pentschev/fix_missing_deleter
      BUGFIX: Adding target triple for when generating NVVM IR
      Merge pull request #366 from pentschev/fix_fast_zerofeat
      Merge pull request #368 from shehzan10/devel
      Merge pull request #367 from arrayfire/cuda7
      Removing math_ptx submodule as a dependency
      Fixing dependency issues during ptx generation
      Bugfix: fixed improper caching when casting in CUDA backend
      Bugfix: fixed improper caching when casting in OpenCL backend
      Merge remote-tracking branch 'upstream/devel' into ptxgen
      Merge pull request #369 from shehzan10/devel
      Changing std::string inputs to be references
      Changing DeviceManager in OpenCL backend to use one context per device
      Fixing copy paste error in sobel kernels in OpenCL backend
      Cleaning up af::info for OpenCL backend
      Sanitizing af::array class and constructor
      Merge pull request #372 from arrayfire/devel
      BUG: Fixed problem with JIT caching in CUDA backend
      BUG: Fixed problem with JIT caching in OpenCL backend
      TEST: Adding priliminary test for JIT
      STYLE: Removing unnecessary include files
      Renaming tests in test/jit.cpp
      Hashing the kernel names for CUDA and OpenCL
      BUILD: auto generated PTX files are copied instead of renaming them
      Merge pull request #381 from arrayfire/devel
      Use decimal notation instead of hex for OpenCL JIT names
      BUGFIX: Enable double precision support properly in OpenCL backend
      BUGFIX: Fixing randu for complex numbers in OpenCL backend
      Cleaning up opencl/kernel/random.cl
      FEAT: Adding support for randu(.., b8)
      Disabling OpenCL CPU and Accelerator support for OSX
      Adding skeleton code for indexed min and max
      FEAT: Indexed min and max for CPU backend
      FEAT: Indexed min and max for CUDA backend
      Removing unnecessary files from OpenCL backend
      FEAT: Indexed min and max for OpenCL backend
      Reorganizing features.cpp
      Adding proper checks in src/api/c/gradient.cpp
      Bit operations now supported for scalar integers and bools
      BUG: Fixed kernel compile issues with ireduce_dim.cl
      BUG: Fixed typo in ireduce_dim.cl
      STYLE: Fixed typos in test/reduce.cpp
      TEST: Adding tests for indexed min and max
      Fixing issues with min and max on boolean arrays
      Merge pull request #387 from 9prady9/colorspace
      Merge pull request #389 from 9prady9/statistics
      FEAT: Adding flat for all backends
      TEST: Adding tests for flat
      Enable scalar(real, imag) in all backends
      Changing overloaded createHandle appropriate function names
      Moving AF_THROW(af_init()) inside try/catch blocks
      af_constant_complex does not use temporary variables anymore
      FEAT: constant(val,...) now accepts val from all types
      TEST: Adding tests for constants of various types
      Merge pull request #396 from 9prady9/histeq
      Merge pull request #395 from shehzan10/devel
      FEAT: Adding binary operations for each type
      BUGFIX: memcopy kernel was creating indices incorrectly
      Adding isLinear() to ArrayInfo
      PERF: moddims no longer performs a copy if Input is Linear
      Code clean up in FAST and ORB for all backends
      Cleaning up the CPP features class
      Cleaning up memory.cpp in cuda backend
      Reverting a dumb commit I made to the code
      Destroy af_array at the end of tests
      Changing the internal API
      Making assign exception safe
      Destroying af_arrays properly in reduce and scan tests
      Organizing the examples directory
      Adding back examples from arrayfire_examples repo
      FEAT: Adding gaussian kernel to all backends
      Merge pull request #416 from shehzan10/devel
      Merge remote-tracking branch 'upstream/devel' into examples
      Merge pull request #408 from 9prady9/perf_conv
      Merge remote-tracking branch 'upstream/devel' into examples
      Changing the API of seprable convolution to match 2.1
      Fixing the dimensions of separable convolution
      Fixing dim checks for separable convolve in CUDA and OpenCL backends
      Fixing convolve example
      Fixing the rainfall example
      FEAT: Adding "product" for all backends
      FEAT: Adding flip for all backends
      Enabling commented parts of integer.cpp and monte_carlo_options.cpp
      Merge pull request #419 from 9prady9/sep_conv_fixes
      Changing the order of dimensions for monte carlo example
      Merge branch 'examples' of github.com:arrayfire/arrayfire into examples
      BUGFIX: in moddims when input is a jit node
      Merge pull request #423 from umar456/docs
      Merge pull request #424 from umar456/gtest
      Merge pull request #425 from shehzan10/devel
      BUGFIX for cascaded indexing.
      TEST: Adding cascaded indexing tests
      TEST: Adding back commented out tests from flip
      Merge pull request #427 from 9prady9/hsv_rgb
      Merge branch 'devel' into docs
      Merge pull request #428 from 9prady9/colorspace
      Fixing path of arrayfire/assets
      Build docs when you docs is enabled and "make all" is used
      Merge pull request #430 from umar456/devel
      FEAT: Adding lookup
      Adding new instantiations for reductions
      STYLE: Making the function "where" more explicit in C API
      Changing the dimension checks for index in C APi
      EXAMPLES: All machine learning examples now compile
      BUGFIX: in ArrayIndex aka lookup for CUDA backend
      Merge pull request #432 from 9prady9/conv_changes
      BUGFIX, EXAMPLE, Fixing a mistake in mnist_common
      Merge pull request #433 from 9prady9/ocl_fix
      Adding deep belief net example to ArrayFire
      Changing neural network example to use batches and epochs
      Merge pull request #441 from 9prady9/lookup_fixes
      EXAMPLE: Cleaning up DBN and ANN examples
      Adding new functions matmulNT, matmulTN, matmulTT
      Cleaning up DBN example to use new matmul functions
      Adding RBM example for ArrayFire
      Merge pull request #449 from shehzan10/devel
      PERF: Break large JIT trees into smaller nodes
      Fixing test names in complex.cpp
      Merge pull request #456 from shehzan10/devel
      STYLE: Changing cast operations in all backends
      FEAT: filter in convolutions is cast to the accum type
      BUILD: Adding /usr/local/include and /usr/include to FindOpenCL
      Merge branch 'gtest-ninja' into devel
      Merge pull request #472 from umar456/clang
      Renaming logit to logistic_regression
      BUGFIX: corrected the dimensions passed to gemv for tranpose(A)
      BUGFIX: var and stdev now use the getFNSD from common.hpp
      EXAMPLE: Cleaning up rbm example
      Example: Naive bayes example now uses prior probabilities
      Merge pull request #464 from arrayfire/devel
      Merge pull request #475 from bkloppenborg/cmake_install
      BUILD :Changes to suppress warnings in tests
      Example: clean up logistic regression
      Example: Adding comments to naive bayes
      Example: Adding new example to demo perceptron
      PERF: Making the isLinear() to only look upto ndims()
      PERF: Perform an async copy when data is linear
      Merge pull request #491 from pentschev/example_harris
      Removing OPENCL_LIBRARIES from CLBLAS_LIBRARIES in FindCLBLAS.cmake
      Merge pull request #494 from bkloppenborg/cmake_packaging
      Linear indexing now flattens the arrays before the operation
      Changing the layout of the documentation
      cleaning up the groups structure
      Merge pull request #503 from shehzan10/devel
      Removing empty file reduce.h
      Minor tweaks to blas documentation
      Added documentation for reductions
      Adding doxygen briefs for image processing functions
      Function groups organized
      Adding the remaining documentation for functions in algorithm.h
      Added documentation for part of arith.h
      Merge pull request #507 from glehmann/assets-submodule-msg
      DOCS: documentation for statistics.h
      DOCS: Adding brief descriptions for all documented functions
      DOCS: Adding documentation for remaining functions in image.h
      DOCS: Remove src/api/c from header path
      DOCS: Fixing code in getting_started
      DOCS: Fixing the formatting in image.h
      DOCS: Adding documentation for all functions in arith.h
      Merge pull request #510 from 9prady9/signal_docs
      DOCS: Fixing warnings
      DOCS: Adding examples tab to the generated documentation
      DOCS: Adding documentation for device.h and array.h
      DOCS: Adding documentation for manip_mat in index.h
      DOCS: Adding documentation for data.h
      DOCS: Fixing documentation errors for arith functions
      DOCS: Adding documentation for arith and logical operators in array.h
      DOCS: Adding documentation for indexing operations
      DOCS: Fixing links in the documentation landing page
      DOCS: Adding download links for arrayfire
      Merge pull request #512 from arrayfire/devel
      Merge pull request #517 from shehzan10/devel
      Merge branch 'general_index' of https://github.com/9prady9/arrayfire into general_index
      Merge pull request #521 from mcarilli/randufix
      EXAMPLE: Cleaning up machine learning examples
      EXAMPLE: Added softmax regression
      FEAT: Changing indexing to the generalized APIs
      TEST: Adding test for a(idx) = b where idx is array
      FEAT: array::isbool() is now implemented
      BUGFIX: Fixed bug in seq to array casting inside gfor
      STYLE: Cleaning up rainfall example
      BUGFIX: Logical operations now return b8 instead of u8
      BUGFIX: Making sure the array index is locked
      TEST: Test for logical assignment
      TEST: added for scoped out indexing with arrays
      Merge pull request #523 from 9prady9/batch4gfor
      Merge pull request #534 from glehmann/lib-version
      STYLE: Removing AF_VERSION_MINOR output in af_info()
      Merge branch 'cmake-freeimage' of https://github.com/glehmann/arrayfire into devel
      Merge pull request #535 from umar456/coverall
      Merge pull request #536 from shehzan10/devel
      FEAT: Adding combinations of arrays and sequences for indexing
      TEST: Additional tests for gen_index
      TEST: added tests for gen_assign
      FEAT: Adding functions to expose memory info and garbage collection
      Add the id of the device the memory is allocated on
      BUGFIX: Decrment the used buffer and byte count only once
      BUGFIX: Binary operations with scalars create proper types
      BUGFIX: Cast logical and bitwise operators to the right type
      TEST: get the proper output type for binary operations
      Merge pull request #540 from 9prady9/conv_batch4gfor
      Merge pull request #543 from umar456/rm_del
      BUGFIX: for accum along non-first dimension
      Retiring warps early for accum along first dimension
      Merge pull request #547 from umar456/cppcheck
      PERF: Minor improvements to accum in CUDA
      BUGFIX: in accum for OpenCL backend
      Merge pull request #551 from arrayfire/revert-550-devel
      Revert "Revert "download and build the opencl dependencies with cmake""
      Merge branch 'devel' of https://github.com/shehzan10/arrayfire into devel
      BUILD: Making sure boost compute path is included properly
      Merge pull request #555 from umar456/multigpublas
      Merge pull request #559 from 9prady9/fft_fixes
      BUGFIX: The 3rd and 4th dimensions offsets were flipped for identity
      TEST: enabling tests for 3D identity
      Merge branch 'umar456-identity' into devel
      BUGFIX: Maintaining the proper data type in binary functions
      Merge pull request #564 from pgovind/identity_fix
      Call garbageCollect() when cufft plan creation fails and try again
      Call garbageCollect() when CLFFT plan creation fails and try again
      Renaming cuda/fft.cu to cuda/fft.cpp
      Removing trailing whitespaces
      Merge pull request #569 from pentschev/err_cufft
      Merge pull request #570 from umar456/reduce_refactor
      Changes to fix warnings on gcc
      Style changes in array.cpp
      BUGFIX: Assignment inside GFOR with start and end points
      BUGFIX: batched mode assignment inside GFOR
      TEST: Adding tests for GFOR
      Style fixes in src/api/c/assign.cpp
      Merge branch 'arrayfire/devel' into lin_algebra
      Changes to remove compiler warnings
      TEST: Adding tests for solve
      Cleaning up error handling for cublas functions
      Merge pull request #575 from umar456/xcode
      Cleaning up cublas and cusolve managers
      Fixing memory leaks for linear algebra in cuda backend
      BUGFIX: QR decomposition for CUDA backend when M >= N
      FEAT: Adding solve for CUDA backend
      FFTWL libraries are not required
      Moving FindMKL to FindCBLAS.cmake
      Merge pull request #577 from umar456/rel_dir
      Merge pull request #578 from pentschev/err_clfft
      Helper functions in OpenCL to create af::array from cl_mem
      Removing unnecessary cl_device_id check
      Merge pull request #580 from umar456/assign64
      Adding support to retain and release cl mem objects
      Adding an optional retain parameter for getContext and getQueue
      Merge pull request #582 from munnybearz/threading
      Merge pull request #587 from pentschev/fftconvolve
      STYLE: Removing swtich case from FFT
      STYLE: Cleaning up the FFT code in CPU backend
      STYLE: cleaning up fft_common in CUDA backend
      Fixing a warning when compiling cufft
      STYLE: Fixing fft_common in OpenCL backend
      STYLE: cleanup computeDims and computePaddedDims in all backends
      Merge pull request #593 from shehzan10/join
      Merge pull request #602 from umar456/diagonal
      Merge pull request #599 from pentschev/fftconvolve_cuda_cache
      FEAT: Adding multi-dimensional batch support for fft
      TEST: Updating the fft tests with batch mode support
      Merge remote-tracking branch 'upstream/devel' into fft_fixes
      BUILD: Fixing uncaught merge conflicts
      BUGFIX: Median should now use floating point numbers for outputs
      BUGFIX: Median was using the improper count for number of elements
      TEST: Added tests to cover all cases of median
      BUGFIX: Changing the behavior of indexing to relfect old arrayfire
      BUGFIX: Assignment of scalars to values now uses to the storage type
      BUGFIX: Removing the evaluations before they are done
      BUGFIX: Fixed assign when both out and rhs are vectors
      TEST: Adding tests for assignment when special vector cases are involved
      TEST: Adding tests for indexing when inputs are vectors
      Moving dim4 to the common backend area
      BUGFIX: calculating ndims from dim4 properly when elements() == 0
      TEST: Re-writing the random test properly after earlier bugfix
      BUGFIX: Adding proper dim asserts for all data creation functions
      STYLE: Making sure af_destroy_array is called on non empty array
      Added proper error checking for special indexing cases
      STYLE: Moving dim asserts in data creation functions to a single place
      Copying FindLAPACK from cmake distribution to CMakeModules
      Merge pull request #612 from umar456/array_attrib
      Merge pull request #618 from pentschev/add_missing_opencl_checks
      Importing the first set of clmagma functions
      FEAT: LU decomposition for OpenCL backend
      Merge remote-tracking branch 'upstream/devel' into lin_algebra
      Merge pull request #619 from umar456/index_refactor
      BUGFIX: Making sure the lu_split kernel is working properly
      BUGFIX: Bugfixes in the magma code for getrf
      BUGFIX: input and output were flipped in magma/transpose.cpp
      BUGFIX: Fixing out of bound acceses in createPivot when M > N
      Merge pull request #620 from pentschev/remove_exec_permissions
      Merge remote-tracking branch 'upstream/devel' into lin_algebra
      Adding CLBLAS_CHECK to clblas.cpp and magma_blas.h
      Making sure initBlas is called before calling magma functions
      BUGFIX: Call the clblas routines only for non zero dimension lengths
      FEAT: real and imag now work with non complex numbers
      TEST: Adding extended tests for LU decomposition
      Merge pull request #623 from pentschev/fftconvolve_tests_large
      Adding magma files necessary for cholesky decomposition
      FEAT: Adding cholesky for OpenCL
      Removing unused variable from CPU and CUDA cholesky functions
      STYLE: Reduce redundant code in LU for all backends
      STYLE: Reduce redundant code in cholesky for all backends
      STYLE: Reuse qr_inplace inside qr for CPU backend
      Merge pull request #624 from pentschev/fftconvolve_opencl_cache
      Adding triangle matrix extraction to CUDA
      Adding triangle matrix extracting to CPU backend
      Adding triangle matrix extraction to OpenCL backend
      BUGFIX: Extracting lower triangle now works as expected in all backends
      Out of place cholesky now returns triangular matrix for all backends
      TEST: Adding tests for cholesky for large matrix sizes
      Merge remote-tracking branch 'upstream/devel' into lin_algebra
      Merge pull request #628 from umar456/cpp_idioms
      Merge pull request #629 from pentschev/fix_ml_examples
      Remove the need for volatile memory by always using __syncthreads();
      Adding shuffle instructions for __CUDA_ARCH__ > 300 in reduce
      Removing volatile memory and race conditions by adding ireduce
      Removing warnings from test/assign.cpp
      Merge pull request #635 from umar456/index_fix
      Merge pull request #639 from pentschev/fix_scan_osx
      Merge pull request #641 from pentschev/opencl_osx_fixes
      Adding magma files required for QR decomposition
      FEAT: Adding QR decomposition for OpenCL backend
      Updating err_clblas
      TEST: Adding tests for QR decomposition
      TEST: Fixing tests in LU decomposition
      FEAT: Solve for square systems
      Merge pull request #640 from umar456/proxy
      Merge pull request #647 from umar456/simplify_idx
      Merge pull request #648 from bkloppenborg/install_docs
      BUGFIX / STYLE: Cleaning up of indexing code to fix minor bugs
      STYLE: Minor style changes to indexing structs and classes
      Merge pull request #651 from umar456/lookup
      Merge pull request #652 from pentschev/increase_fft_coverage
      FEAT: Solve for non square systems in OpenCL
      Minor style changes
      FEAT: inverse for opencl backend
      STYLE changes in CPU and CUDA inverse functions
      STYLE minor changes to cuda/solve.cu
      Fixing a minor bug in swapdblk
      TEST: Updated solve tests
      BUGFIX: memory leak in opencl linear algebra routine
      TEST: Added tests for inverse
      Merge pull request #654 from umar456/idx_errchk
      FEATURE / STYLE: Changes to af::exception. Added af_err_to_string
      Moving non index functions from index.h to data.h
      af_print no longer uses af_reorder
      BUGFIX: Fixing a minor bug in CUDA solve
      TEST: Changes to dense linear algebra tests
      BUGFIX: Reduce the local memory useage for transpose_inplace in OpenCL
      Removing default option from C API
      Fixing error checks in cholesky and solve
      FEAT: Adding lower and upper for all backends
      BUGFIX: base_type of intl and uintl is now fixed
      TEST: tests for lower and upper triangle matrices
      Removing new and delete from cuda and opencl linear algebra functions
      *Inplace --> *InPlace
      Removing commented out test
      Random number generators now have unified states for all types
      Merge pull request #659 from bkloppenborg/usage_instructions
      FEAT: Adding setSeed and getSeed for all backends
      TEST: for setSeed and getSeed
      BUGFIX: af_print now prints 1D arrays properly
      Merge remote-tracking branch 'upstream/devel' into minor_fixes
      Merge pull request #660 from shehzan10/devel
      TEST: Updating random test to make sure there are no clashes
      Merge pull request #666 from FilipeMaia/fix_get_last_error
      Merge pull request #667 from shehzan10/devel
      Merge pull request #669 from FilipeMaia/fix_non_exported_symbols
      BUGFIX in random number generation for multiple GPUS in CUDA backend
      Exporting the symbols from dim4.hpp
      Instantiating array::unlock, array_proxy::unlock()
      Fixing error handling in seqToDims
      Changing af_blas_transpose to af_transpose_t
      Changing enums in af_pad_type
      FEAT: 2D spatial convolution now supported until 17x17
      Making the expand parameter an enum for convolutions
      Making all the inputs consts
      SOLVE, MATMUL and INVERSE now use af_mat_prop
      Adding argument checks for matmul, dot, inverse and solve
      FEAT: convolve automatically switches to frequency domain when necessary
      Fixing compile warnings in tests
      FEAT: Chaining matrix multiplications
      FEAT: IIR and FIR for all backends
      Revert "BUGFIX: af_print now prints 1D arrays properly"
      Revert "af_print no longer uses af_reorder"
      BUGFIXES: fixing batch mode in IIR filter for all backends
      Merge pull request #675 from FilipeMaia/where_64bit_int_support
      FEAT: Add complex support for fftconvolve
      TEST: Adding complex tests for fftconvolve
      TEST: Initial tests for IIR filter
      FEAT: Add short circuit code for iir when only a0 is available
      TEST: Adding tests for iir when only a0 is available
      BUGFIX in iir for all backends
      TEST: Adding more tests for iir filter
      Merge pull request #679 from arrayfire/var_tests
      Merge remote-tracking branch 'upstream/devel' into features
      Fix warnings in test/var.cpp
      BUGFIX: Fixed random number generation for CPU backend
      Merge pull request #683 from pentschev/add_log2_support
      STYLE: Changing dim_type to dim_t
      Merge pull request #686 from shehzan10/devel
      Merge pull request #689 from munnybearz/memleak
      API: dim_t is now a signed type
      BUILD: Making the default build type to be Release
      BUGFIX: IIR now uses lower local memory in opencl backend
      Updating licenses and copyright
      Updating the gtest submodule commit
      BUGFIX: fixed memleak in indexing
      Merge pull request #695 from shehzan10/devel
      STYLE: af_err enums now have hard coded values
      TEST: Check if output type is double in var tests
      FEAT,TEST: Adding 64 bit int support and tests for mean
      FEAT: Adding 64 bit support for stdev, corrcoef, covariance
      Merge pull request #699 from munnybearz/indexmemleak
      STYLE: changing AF_ERR_NOMEM to AF_ERR_NO_MEM
      STYLE: Fixing the way debug info is displayed when building opencl programs
      Added comments to differentiate af_err enum ranges
      Merge branch 'gs_tests' of https://github.com/umar456/arrayfire into devel
      STYLE: removing templates from gen_indexing
      Changing the geqrf to geqrf3 to reflect function name from magma
      Merge pull request #701 from bkloppenborg/example_docs
      FEAT/TEST: lower and upper support making the diagonal == 1
      FEAT: Adding QR in place for OpenCL backend
      Changing the API of linear algebra functions
      TEST: Adding tests for luInPlace, qrInPlace, choleskyInPlace
      Fixing a minor style issue for solve in CPU backend
      Merge branch 'devel' of https://github.com/shehzan10/arrayfire into lapack
      Fixing matrix_manipulation test
      First draft of lapack documentation
      DOC: Updated lapack documentation with code snippets
      DOC: cleaning up af/opencl.h documentation
      DOC: Adding documentation for missing parameters
      API: Removing domain parameter from separable convolution
      Merge branch 'devel' of https://github.com/shehzan10/arrayfire into lapack
      Merge pull request #706 from umar456/rowcol
      Merge pull request #705 from shehzan10/devel
      Removing redundant enum: af_source_t was just a clone of af_source
      Marking arrayfire constants to be externs
      Adding documentation for enums in af/defines
      Moving unsanitized C++ API to be inside __cplusplus checks
      API: changing (af_pad_type, padType) --> (af_border_type, borderType)
      Fixing compile warnings
      DOC: Adding documentation for IIR and FIR
      Merge pull request #710 from pentschev/hamming
      Moving af_features to be an internal structure
      BUILD fix for test/hamming.cpp
      Functions that don't support GFOR / batch mode now return errors
      Changing the messages af::exception generates
      BUGFIX: Fix for assign with linear indexing
      TEST: Adding tests for linear assign and index
      BUGFIX: Fix GFOR support for convolve in CPU backend
      BUGFIX: Convolve in OpenCL now work when filters are sub arrays
      TEST: Adding GFOR tests for convolve
      BUGFIX/TEST: medfilt in CPU backend now works inside GFOR
      BUGFIX/TEST: morph fixed for GFOR in CPU backend.
      BUGFIX/TEST: Fixing bug inside GFOR for bilateral in CPU backend
      BUGFIX/TEST: histogram inside GFOR fixed for CPU backend
      BUGFIX/TEST: fixed meanshift in GFOR for CPU backend
      BUGFIX/TEST: Fixed lower and upper in OpenCL backend for sub arrays
      TEST: GFOR tests for resize, diagonal and transpose
      Merge pull request #718 from umar456/api_update
      Inplace LU now returns pivot in lapack compliant format
      Fixing warning messages during compilation
      FEAT: solveLU has been added to all backends
      TEST: Adding tests for solveLU
      DOCS: Adding documentation details for solveLU
      Fixing the cuda error check in cuda/platform.cpp
      Minor fixes to solveLU documentation
      Adding getrs to backend/lapacke.cpp
      Merge pull request #725 from shehzan10/devel
      Merge pull request #724 from pentschev/hamming_fixes
      Merge pull request #723 from 9prady9/fixes
      Fixes for building on OSX
      Minor style changes
      FEAT: Adding support for triangle matrix solve
      TEST: Adding tests for solve
      TEST: Updating documentation for af::solve
      Merge remote-tracking branch 'upstream/devel' into solvers
      Removing unnecessary const from af_lu_inplace
      BUILD fixes for OSX
      BUGFIX: solve with upper triangular matrices fixed for NVIDIA gpus
      Merge pull request #732 from umar456/array_docs
      Merge pull request #734 from unbornchikken/pr1
      BUGFIX: anytrue / alltrue now return b8 instead of u8
      FEAT: Adding det and rank for all backends
      Updating documentation for rank and det
      Merge pull request #722 from ghisvail/enh/cmake-install-paths
      FEAT: Adding support for matrix norm in all backends
      FEAT: Add missing element wise functions
      Adding array.nonzeros()
      FEAT: Adding minfilt and maxfilt
      Adding "tests" for missing functions
      Fixing up documentation for image processing and lapack
      Merge pull request #737 from umar456/style_docs
      Change uint to unsigned for fixing builds on windows
      Merge pull request #741 from bkloppenborg/improve_docs
      array.device<T>() now hands complete control of the memory to user
      BUGFIX: cpu memory manager incorrectly calculating total_bytes
      FEAT: Adding functions to get and set memory chunk resolution
      TEST: Adding tests for memory manager
      Fixing code for af/opencl.h
      TEST: Updating memory test for indexing and assignment
      BUILD fix for OSX.
      Fixing fractal example
      BUGFIX: Fixing the reset methods in backend/CPU/TNJ/*
      Removing exceptions being thrown from clean up code
      Adding a specialziation for array.device<cl_mem>() in af/opencl.h
      BUGFIX: element wise operations on sub arrays work now in CPU backend
      Merge pull request #748 from bkloppenborg/example_install_path
      Merge pull request #747 from shehzan10/devel
      Merge pull request #749 from 9prady9/ImageEditingExample
      Moving header files to fix build issues
      Merge pull request #751 from shehzan10/devel
      Merge pull request #752 from pentschev/fftconvolve_warnings
      Fixing issues with doxygen
      Changes to AF_VERSION*
      Fixing documentation to have pre-requisites
      Fixing path of the installed examples directory
      Removing LINK_INTERFACE_LIBRARIES and INTERFACE_LINK_LIBRARIES
      Merge pull request #756 from pentschev/doc_typos
      Merge pull request #754 from shehzan10/devel
      Merge pull request #755 from umar456/constant
      Merge pull request #753 from umar456/proxy_fix
      Fixing build failures for OpenCL on a few machines
      Merge branch 'dep_func' of https://github.com/umar456/arrayfire into devel
      Commenting out broken test
      BUILD: Adding INSTALLER_MODE flags for all backends
      Fixing documentation and release notes
      Fixing CPack.txt
      Fixing CPack.txt
      Merge pull request #773 from shehzan10/devel
      Merge pull request #770 from umar456/basic_c
      Merge pull request #774 from bkloppenborg/cmake_fix
      Merge pull request #777 from pentschev/vision_docs
      Merge pull request #778 from pentschev/matchtemplate_vision
      Merge branch 'fftconvolve_fix' of https://github.com/pentschev/arrayfire into hotfixes-3.0.1
      BUGFIX/TEST: fftConvolve now does multi dimensional batching properly
      BUGFIX/TEST: fftConvolve now does multi dimensional batching properly
      Merge pull request #789 from FilipeMaia/array_proxy_constructor
      Merge pull request #808 from shehzan10/hotfixes-3.0.1
      Merge pull request #807 from pentschev/ml_examples_fix
      Adding using_on_osx document
      Update release notes and version details
      Merge pull request #814 from arrayfire/hotfixes-3.0.1
      Merge pull request #817 from FilipeMaia/complex_dot_product
      Merge pull request #819 from FilipeMaia/missing_header
      Merge pull request #820 from shehzan10/CUDA-fixes
      Merge pull request #831 from ghisvail/hotfix/unnecessary-include
      Merge pull request #835 from 9prady9/dog
      Merge pull request #841 from shehzan10/gradient_fix
      Merge branch 'unwrap' of shehzan10/arrayfire into devel
      Merge pull request #842 from umar456/pod
      Merge pull request #843 from umar456/offset_check
      FEAT: Implementing array::lock() and array::unlock()
      DOCS: Adding documentation for constant
      Adding documentation for array::scalar
      DOCS: Updating arith functions to specify their input limitations
      FEAT / TEST: Adding af::copy()
      Display if a backend is enabled or not when building examples and tests
      DOCS: Fixing documentation for lock and unlock
      Merge pull request #852 from shehzan10/resize_lower
      Merge pull request #851 from shehzan10/linear_fix
      Merge pull request #849 from bkloppenborg/citation
      Merge pull request #855 from umar456/long_long
      Merge pull request #853 from shehzan10/nearestNeighbour
      DOCS: Updating release notes
      Merge pull request #857 from arrayfire/hotfixes-3.0.2
      Merge remote-tracking branch 'upstream/master' into devel
      Updating version number
      Merge branch 'master' into devel
      BUFIX: Remove memory leak in af::copy()
      Updating forge tag to fix build issuess with ninja
      FEAT: Adding function to get use_count of shared pointers
      PERF: Do not make copies if the number of references is only 1
      Renaming enums for convolve batch modes
      BUGFIX: Ignoring NaN values in min and max for all backends
      Merge pull request #866 from JuliaComputing/kf/forgerpath
      Merge pull request #867 from JuliaComputing/kf/blasrename
      Merge pull request #873 from shehzan10/more-lower
      Merge pull request #865 from JuliaComputing/kf/openblas64
      FEAT: Added support to substitute nan values for sum and product
      Adding missing instantiations for compat functions
      TEST: Adding tests for reductions when using NaNs
      BUGFIX: Fixing casting to and from complex numbers in CPU backend
      Adding more operator overloading for af::cfloat and af::cdouble
      BUGFIX: Making sure c32/c64 imitate f32/f64 when operating with scalars
      TEST: Add mixed type tests with complex inputs
      PERF: Improvements for non linear JIT kernels in OpenCL backend
      TEST: Adding batched mode tests
      PERF: Speeding up JIT for 3D arrays in OpenCL backends
      BUGFIX: median of all elements is now fixed
      PERF: Improvements to tile when tiling along singleton dimensions
      PERF: Improvements to CUDA JIT for non linear 3D and 4D arrays
      FEAT: Adding support for non overlapping batched convolution
      TEST: Adding tests for non overlapping convolves
      DOCS: Updating the documentation for convolution
      Adding missing license for a few files
      Updating version to 3.1.0
      Merge pull request #891 from 9prady9/assets_changes
      Merge pull request #892 from 9prady9/cmake_fixes
      Merge pull request #893 from shehzan10/devel
      BUGFIX: Check for NULL values when allocating memory on CPU backend
      Ensure CUDA and OpenCL return proper errors when out of memory
      TEST: Adding test to trash the memory manager and see if it recovers
      FEAT,TEST: Adding sigmoid function for all backends
      Adding the option to remove tests from ctest
      BUGFIX/TEST: Fixing bug in rank. Added appropriate tests
      BUGFIX/TEST: Fixing not for C API. Added relevant tests.
      BUGFIX: Fixing a bug in randn for CPU backend
      BUGFIX: Fixing setSeed for randu
      TEST: Updating and fixing the randu/randn tests
      Merge pull request #882 from klemmster/cusolver_svd
      TEST: Updating random tests to properly reset seeds
      TEST: Fixing out of bounds access in fft tests
      BUGFIX in randn for apple systems
      Renaming rank test to rank_dense
      Merge pull request #910 from shehzan10/devel
      Merge pull request #909 from bkloppenborg/autobuild-backends
      BUGFIX: Fixed issues with mixed real and complex types
      Fixing the checks for skew
      BUGFIX: conjg no longer errors out for real inputs
      af_scale now checks for default parameters properly
      Extended support for interleaved convolution
      Updating the COPYRIGHT.md document
      Updating the language in COPYRIGHT document
      Merge pull request #922 from 9prady9/match_template_example
      Merge pull request #921 from 9prady9/susan
      Moving af_array info gathering functions from util.h to array.h
      Making fft_inplace consistent across all backends
      FEAT / TEST: Adding support for inplace fft
      Moving general fft implementation to src/api/c
      Merge pull request #931 from 9prady9/sat
      Merge pull request #929 from 9prady9/cuda_default_stream
      Merge pull request #928 from pavanky/minor_changes
      Removing consts from the fftInplace API
      FEAT/TEST: Adding R2C and C2R FFT transforms for all backends
      DOCS: Adding documentation for real to complex transforms
      Fixing a minor issue in ArrayFireConfigVersion.cmake file
      Fixing issue in documentation
      Cleaning up multiply_inplace in cpu backend
      Enabling memory manager back in cuda backend
      Merge pull request #937 from 9prady9/wind_resize
      FEAT: Adding select for CPU backend
      FEAT: Select added for CUDA backend
      FEAT: Select for opencl backend
      FEAT: replace for all backends
      TEST: Adding tests for select and replace
      FEAT: adding complex support for exp
      Binary operations with floating point scalars default to single precision
      BUGFIX: Fixing offset issue with CPU element wise operations
      PERF: improvements to element wise operations in CPU backend
      Merge pull request #942 from 9prady9/ycbcr
      Merge branch 'devel' into svd
      Changes to style and fix compile errors
      Merge pull request #943 from shehzan10/stream
      Style changes to code in unwrap
      FEAT: Adding support to unwrap along rows as well as columns
      BUGFIX: Fixed a bug for unwrap in all backends
      FEAT/TEST/DOC: Adding wrap for CPU backend
      FEAT: wrap for CUDA backend
      Adding atomics.hpp file for CUDA that can be used in the future
      Moving the kernel cache map to a centralized location
      FEAT: wrap for OpenCL backend
      Removing faulty test
      Cleaning up unwrap code in OpenCL by using cache store
      Merge pull request #952 from 9prady9/cpuinfo
      Fixing the compile error on windows
      Cleaning up cpu blas / lapack in OpenCL backend
      Fixes to supress annoying compiler warnings in OpenCL backend
      Adding functions from clMagma necessary for OpenCL SVD:
      Initial support for SVD in OpenCL backend
      Adding proper error checking in magma
      BUGFIX: in array_proxy::get() const
      Merge pull request #961 from 9prady9/cl_hpp_fixes
      Merge pull request #953 from umar456/fix_951
      Fixing svd params to reflect clmagma
      Merge pull request #962 from shehzan10/update-deps
      Fixing the output of af::info() for OpenCL backend
      Updating documentation and adding version guards for for 3.1
      Merge pull request #948 from pentschev/sift
      Reorganizing non free build process.
      Changing build flags to build non free algorithms
      Templated options are now runtime compile options for opencl reductions
      Templated options are now runtime compile options for opencl convolutions
      Templated options are now runtime compile options for opencl indexed min/max
      Templated options are now runtime compile options for opencl scan
      Templated options are now runtime compile options for opencl nearest neighbor
      Removing unnecessary switch case from opencl ireduce
      Templated options are now runtime compile options for opencl FAST
      Splitting up opencl sort_by_key files to compile in parallel
      Splitting sort_by_key across too many files slows down compile times
      Fixing a bug introduced a couple of commits ago in OpenCL SIFT
      Merge branch 'devel' into nonfree_fixes
      Merge pull request #968 from shehzan10/opencl_fixes
      Merge remote-tracking branch 'upstream/devel' into svd
      Work around for issues in OpenCL svd
      API clean up and adding support for complex numbers for SVD
      Fixing various typos and bug fixes for SVD in CUDA and OpenCL
      TEST: for SVD
      DOCS: Updating the documentation for SVD
      Adding version guards for svd
      Adding more pragma directives to supress GCC warnings
      TEST: updating SVD tests to contain all four floating point data types
      Fixing svd example to reflect the change in API
      Revert "Updated boost compute version tags"
      Compilation fixes for OSX
      Use xGESVD instead of xGESDD for ARM platforms
      Merge pull request #970 from shehzan10/rel_31
      FEAT: Adding support for linear assignment in C API
      Restore original shape after flattening input for linear indexing
      FEAT: Adding support for linear indexing in C API
      Merge pull request #974 from arrayfire/devel
      Merge pull request #979 from shehzan10/approx-batch
      BUGFIX: For calculating number of elements for a buffer in CUDA backend
      BUGFIX: For calculating number of elements for a buffer in OpenCL backend
      TEST: Adding tests for indexed reductions
      Renaming src/api/hapi to src/api/unified
      Fixing CMakeFiles for unified backend
      Changes required to make unified library build the cpp bindings
      Changes to examples and test CMakeLists to build *_unified binaries
      Merge pull request #990 from marbre/hotfixes-3.1.1-cblas
      Merge pull request #991 from marbre/hotfixes-3.1.1-lapacke
      Changes to Heston model to remove c++11 dependencies
      BUGFIX: seq.begin can now use negative offsets just like seq.end
      Updating Release notes for 3.1.1
      Merge pull request #977 from arrayfire/heterogeneous_api
      BUG: Fixing seq when passing af::end to af::seq
      Merge pull request #997 from 9prady9/gfx_examples_fixes
      Updating release notes
      Merge pull request #998 from arrayfire/hotfixes-3.1.1
      Merge branch 'arrayfire/master' into 'arrayfire/devel'
      Merge pull request #1001 from arrayfire/hotfixes-3.1.2
      BUGFIX: Fix indexed reductions with complex types in OpenCL backend
      BUGFIX: Fix kernel name generation in ireduce for OpenCL backend
      BUFIX: Converting non-linear indices to linear indices in ireduce
      TEST: Adding tests for bugs in indexed reductions
      Merge pull request #1007 from shehzan10/unified_doc
      Reduction fixes for smaller arrays (<4096 elements)
      Merge pull request #1012 from 9prady9/histogram_fixes
      Merge pull request #1016 from 9prady9/cpuid_fixes
      Merge pull request #1017 from shehzan10/hotfixes-3.1.2
      Merge pull request #1014 from shehzan10/16bit
      Removing ARCH_32 and ARCH_64 flags
      Fixing missing symbol issues when freeimage is not found
      Merge pull request #1053 from shehzan10/cudalapack
      Adding missing offsets for various OpenCL kernels
      Merge pull request #1065 from umar456/swe
      Merge pull request #1070 from bkloppenborg/devel
      Merge pull request #1069 from umar456/devel
      Merge pull request #1064 from shehzan10/devel
      Merge pull request #1089 from shehzan10/devel2
      Merge pull request #1087 from shehzan10/devel
      Merge pull request #1097 from shehzan10/unified_checks
      Merge pull request #1099 from syurkevi/maniparr_docupdate
      Merge pull request #1100 from shehzan10/docs-3.2
      Merge pull request #1115 from shehzan10/hotfixes-3.2.1
      BUGFIX: GFOR assignment when other dimensions have step indices
      BUGFIX: Issue with vector indexing when using spans
      Do not perform copies in moddims if memory is contiguous
      TEST: Adding test for GFOR assign bug
      BUGFIX: Getting the device pointer performs memory copy when needed
      TEST: Adding tests to verify unnecessary copies aren't being done
      Compile fixes for older compilers
      Merge pull request #1137 from shehzan10/hotfixes-3.2.1
      Merge pull request #1125 from syurkevi/tutorials
      Merge pull request #1138 from shehzan10/hotfixes-3.2.1
      Merge pull request #1139 from arrayfire/hotfixes-3.2.1

Peter Andreas Entschev (270):
      Bug fix in OpenCL scan for Intel.
      Added regions API and CUDA backend.
      Added regions CPU backend as not supported.
      Added regions OpenCL backend as not supported.
      Added unit tests for regions.
      Improved regions for CUDA, faster on large regions.
      Merge branch 'master' into regions
      Added OpenCL implementation of regions.
      Added CPU implementation of regions.
      Fixed template on CUDA regions.
      Fixed limits of double type on CUDA backend.
      Minor improvements to CPU regions.
      Added enum for regions connectivity type.
      Fixed regions unit tests
      Added struct af_features to store image features (aka keypoints).
      Added features class to manage af_features structs.
      Added FAST feature detector frontend.
      Added FAST feature detector CPU backend.
      Added FAST feature detector CUDA backend.
      Added FAST feature detector OpenCL backend.
      Added handlers for array type in features class.
      Fixed failing abs() call for int/unsigned types on CUDA backend of FAST.
      Added test reader for image input with array output.
      Added FAST unit tests.
      Merge remote-tracking branch 'upstream/devel' into fast
      Fixed FAST files to comply with new directory structure.
      Fixed data filename on FAST unit test.
      Updating test/data submodule
      Fixed wrong memory type allocation on OpenCL backend of FAST.
      FAST will return (af_)features instead of (af_)features *
      Merge pull request #323 from pavanky/ocl
      Changed CUDA convolve to avoid issues with constant memory.
      Added ORB API.
      Added ORB CPU backend.
      Added ORB CUDA backend.
      Changed thread variable names of some OpenCL functions.
      Added ORB OpenCL backend.
      Added test helper to read feature/descriptor test data.
      Added ORB unit tests.
      Added missing STL algorithm include to CUDA math.hpp.
      Added pi definition to fix ORB on Windows.
      Added check before freeing Gaussian filters in ORB OpenCL backend.
      ORB to return empty arrays ORB when no features exist.
      Merge pull request #355 from pavanky/index_fixes
      Added missing shared_ptr deleter in OpenCL backend.
      Added missing destructor for features class.
      Fixed FAST C++ API, added proper destructor calls.
      Fixed ORB C++ API to properly destroy af_features
      Fixed FAST memory leaks on CPU backend
      Fixed ORB memory leaks on CPU backend
      Fixed FAST memory leaks on CUDA backend
      Fixed ORB memory leaks on CUDA backend
      Fixed FAST memory leaks on OpenCL backend
      Fixed ORB memory leaks on OpenCL backend
      Added missing memory deletions on FAST unit test.
      Merge branch 'devel' into orb_fixes
      Passing argument as reference to features operator=
      Renamed feature.cpp to features.cpp to match class name
      Fixed FAST CUDA backend case when no features are found
      Fixed FAST CPU backend case when no features are found
      Added image blur argument to ORB API
      Added image blurring to ORB CPU backend
      Added image blurring to ORB CUDA backend
      Added image blurring to ORB OpenCL backend
      Added image blurring argument to ORB unit tests
      Improved ORB performance and memory usage on CUDA backend
      Improved FAST performance on CUDA backend
      Added argument to define length of edge discard in FAST.
      Changed the way FAST handles different datatypes internally
      Removed cudaMemset from FAST
      Added documentation for FAST
      Moved FAST description to docs directory.
      Fixed FAST edge assertions
      Added ORB documentation
      Updated test data
      Made FAST CPU results match CUDA results
      Made FAST OpenCL results match CUDA results
      Merge pull request #485 from pavanky/examples
      Added Harris corner detector example
      Fixed FAST type comparison mismatch warning
      Merge pull request #500 from bkloppenborg/cmake_packaging
      Merge pull request #504 from 9prady9/TemplateFunction
      Merge pull request #509 from pavanky/docs
      Added Hamming Distance API and CUDA backend
      Changed regions to accept b8 input only
      Fixed regions unit tests to use b8 as input
      Updated regions docs
      Merge pull request #563 from pavanky/binary_fixes
      Added CUFFT_CHECK() to check for cuFFT errors
      Changed CUDA FFT functions to use CUFFT_CHECK()
      Added fftconvolve() C API
      Added fftconvolve() C++ API
      Added fftconvolve() prototypes to signal.h
      Added documentation for fftconvolve()
      Added template parameter to fftconvolve() C API
      Added expand flag to fftconvolve()
      Added CLFFT_CHECK() to check for clFFT errors
      Merge branch 'devel' into fftconvolve
      Added fftconvolve() support for CUDA
      Added CUDA version check to err_cufft.hpp
      Added fftconvolve() support for OpenCL
      Added unit tests for fftconvolve()
      Added fftconvolve() support for CPU
      Merge remote-tracking branch 'upstream/devel' into fftconvolve
      Remove usage of max() in fftconvolve()
      Fixed bugs on fftconvolve()
      Fixed fftconvolve() bug for large input sizes
      Added fftconvolve() interface to fft() plan cache
      Updated OpenCL and CPU fftconvolve() templates
      Merge remote-tracking branch 'upstream/devel' into devel
      Removed unused template parameters from CUDA fftconvolve()
      Added u64 support to join()
      Merge remote-tracking branch 'upstream/devel' into add_join_u64
      Added s64 support to join()
      Moved large FFT unit tests to fft_large.cpp
      Added missing checks to OpenCL functions
      Removed execution permissions of source files
      Fixed size calculation of packed array in fftconvolve()
      Added large unit tests for fftconvolve()
      Fixed wrong fftconvolve() large unit tests
      Exposed fft_common() in OpenCL backend
      Integrated OpenCL fftconvolve() with fft() plan cacher
      Fixed wrong labels printing in ML examples
      Merge pull request #630 from 9prady9/win_fixes
      Fixed FAST on Mac OS X
      Merge pull request #631 from umar456/assign_tests
      Fixed build warnings
      Merge pull request #637 from shehzan10/compute_update
      Fixed scan for OS X.
      Added missing calls to OPENCL_DEBUG_FINISH()
      Added missing OpenCL exception handling
      Fixed assign()/index() on OSX
      Added missing POST_LAUNCH_CHECK() calls on CUDA
      Moved conv_image() to testHelpers.hpp
      Fixed meanshift() unit tests for double data type
      Merge pull request #643 from shehzan10/af_write
      Increase resize() test coverage
      Removed unecessary checks in resize()
      Added new test helper for output only functions
      Added tests for gaussiankernel()
      Removed test helper for output only functions
      Changed way how gaussiankernel() size is handle in tests
      Updated test data
      Increased fft() test coverage
      Fixed fft() tests to properly call dft() wrapper
      Fixed af_mul_t OpenCL operation
      Added log2() support
      Added missing copyright headers
      Merge pull request #685 from ghisvail/typo/copyrights-af-url
      Merge remote-tracking branch 'origin/hamming_distance' into devel
      Changed dim_type to dim_t in hamming_matcher
      Added CPU backend for hamming_matcher()
      Added OpenCL backend for hamming_matcher()
      Make proper use of local memory for OpenCL hamming_matcher()
      Fixed race condition in CUDA hamming_matcher()
      Reusing test condition in OpenCL hamming_matcher() kernel
      Added missing syncs to hamming_matcher()
      Added unit tests for hamming_matcher()
      Updated test data
      Added Hamming matcher documentation
      Moved computer vision functions to vision.h
      Prevent CUDA hamming_matcher from allocating additional device memory
      Added more syncs to hamming_matcher()
      Changed C++ hamming_matcher() to hammingMatcher()
      Added C++ unit test for hammingMatcher()
      Prevent OpenCL hamming_matcher from allocating additional device memory
      Cleaned up some hamming_matcher code
      Fixed memory leak in unit test helper's conv_image()
      Fixed OpenCL hamming_matcher() local memory size query
      Fixed hamming_matcher() OpenCL kernel for Intel devices
      Changed CUDA hamming_matcher switch case to if condition
      Added FAST feature detector example
      Changed exception handling in examples to stderr.
      Fixed CUDA fftconvolve warning
      Added missing return statement to OpenCL getQueue()
      Fixed several docs typos and confusing sentences
      Moved CV documentation to vision.dox
      Added missing Hamming matcher documentation
      Moved matchTemplate definition to vision.h
      Merge remote-tracking branch 'upstream/hotfixes-3.0.1' into vision_docs
      Fixed matchTemplate includes
      Fixed fftconvolve() bug, resulting in wrong output
      Fixed wrong indexing in some machine learning examples
      Added C API for Harris corner detector
      Added C++ API for Harris corner detector
      Added CPU backend for Harris corner detector
      Added CUDA backend for Harris corner detector
      Added OpenCL backend for Harris corner detector
      Added Harris corner detector unit tests
      Added documentation and function definitions for Harris
      Fixed FAST unit tests
      Updated data
      Removed unused variables from OpenCL's sort_index
      Merge remote-tracking branch 'upstream/devel' into harris
      Fixed bug affecting Harris on AMD GPUs
      Added SIFT prototypes and parameter documentation
      Added C API for SIFT
      Added C++ API for SIFT
      Added CUDA backend for SIFT
      Added OpenCL backend for SIFT
      Added CPU backend for SIFT
      Added AF_ERR_NONFREE to defines.h
      Added BUILD_NONFREE option to CMake
      Added SIFT Copyright information
      Added OpenSIFT License
      Added SIFT documentation
      Updated test data
      Added SIFT unit tests
      Made SIFT image indexing more readable in CPU backend
      SIFT fix for CUDA on Windows, made it more readable
      Made SIFT image indexing more readable in OpenCL backend
      Templated SIFT gaussianElimination() in CPU and CUDA backends
      Added missing CUDA_LAUNCH and THRUST_SELECT to SIFT
      Improved CUDA SIFT coalescing and performance
      Improved OpenCL SIFT coalescing and performance
      Passing shared size memory dynamically to CUDA SIFT
      Moved OpenCL's conv2Helper to kernel directory
      Improved SIFT OpenCL code
      Using pre-defined constants for workgroup sizes in CUDA SIFT
      Using 3D arrays for Gaussian/DoG pyramids in CUDA SIFT
      Using cudaMemsetAsync for SIFT
      Fixed OpenCL SIFT bug causing segmentation faults on Intel
      Added missing buffer freeing call to OpenCL SIFT
      Fixed CUDA SIFT on unused memory buffer
      Changed SIFT unit test to use std::stable_sort()
      Moved syncthreads/barriers out of thread conditionals
      Moving CUDA SIFT syncthreads calls out of thread conditionals
      Improved SIFT descriptor scaling
      Update test data
      Fixed SIFT on CPU backend when double_input is false
      Fixed several memory leaks in CUDA and OpenCL SIFT
      Fixed min/max values of sigma in SIFT scale levels
      Added several SIFT unit tests
      Merge remote-tracking branch 'upstream/hotfixes-3.1.3' into sift_tests
      Updated test data
      Merge pull request #1048 from shehzan10/hotfixes-3.1.3
      Added GLOH function prototypes
      Added C API for GLOH
      Added C++ API for GLOH
      Added CPU implementation of GLOH
      Added CUDA implementation of GLOH
      Added OpenCL implementation of GLOH
      Added GLOH documentation
      Added GLOH unit tests
      Merge branch 'devel' into gloh
      Added missing 'AFAPI' to C++ API
      Added unified API for GLOH
      Updated test data
      Changed std::sort to std::stable_sort in CPU SIFT
      Updated SIFT/GLOH test thresholds
      Merge remote-tracking branch 'upstream/devel' into sift_fixes
      Fixed out-of-bounds memory access in CUDA/OpenCL SIFT
      Added homography function prototype and API
      Added CPU backend for homography
      Added CUDA backend for homography
      Added OpenCL backend for homography
      Added homography documentation
      Added homography unit tests
      Updated test data
      Merge remote-tracking branch 'upstream/devel' into homography
      Fixed homography for Intel OpenCL
      Disabled homography LMedS unit tests
      Split vision.h prototypes into multiple lines
      Fixed __syncthreads() calls in homography
      Added AF_HOMOGRAPHY prefix to af_homography_t enum
      Fixed homography documentation
      Removed unnecessary __syncthreads() on homography
      Removed unnecessary barrier from homography
      Fixed and improved OpenCL's homography
      Fixed and improved CUDA's homography

Pradeep (281):
      af_transpose and corresponding unit tests
      Style changes in transpose
      Invalid arguments unit test for transpose
      BUG Fix: af_print in CUDA backend was directly using device pointer.
      CUDA backend transpose implementation
      Changes to transpose kernel
      changes to include all cl kernels in build
      type fixes for opencl
      af_print implementation for opencl backend
      opencl buffer read/write fixes in Array
      Added traits specilization for size_t
      opencl backend implementation for af_transpose
      Macro fix in transpose opencl backend
      Reverted dim_type to long long
      af_histogram cpu backend implementation
      Changed readTests helper function to accept multiple input arrays
      cpu implementations for af_[erode|dilate] and af_[erode3d|dilate3d]
      Added readImageTests and compareArraysRMSD helpers for unit tests
      af_bilateral API and cpu backend implementation
      Adding missing namespace qualifiers
      exp equation modification in bilateral cpu backend
      BugFix: type issue fix in compareArraysRMSD
      Disable unit test for int type in bilateral
      morph cuda backend
      cuda backend implementation for [af_erode3d|af_dilate3d]
      erode/dilate unit tests using images
      morph cuda kernel optimizations
      opencl morph implementation
      opencl backend for volumetric morphological ops
      cuda backend bilateral
      bilateral opencl backend implementation
      morph kernel optimization for supported window sizes
      histogram cuda backend
      histogram opencl backend
      Bug Fix in histogram cuda kernel
      min call in histogram kernel was ambiguous for intel compilers
      opencl device selection feature
      Replaced member funcs with friend funcs in opencl::DeviceManager
      added opencl kernel caching for transpose
      Removed cl.hpp from af/opencl.h
      Modified tranpose tests to run for all devices for opencl backend
      enabled ocl kernel caching in transform
      enabled ocl kernel caching for all exiting functions
      style changes in ocl transpose
      unify kernel params changes for opencl morph
      Renamed CL_FINISH to CL_DEBUG_FINISH
      unify kernel param changes to opencl bilateral
      unify kernel param changes to opencl histogram
      corrected typo in cpu_err header
      Proper error handling added to transpose
      Proper error handling added to erode/dilate
      added error handling for bilateral
      added error handling for histogram
      median filter cpu backend and cuda/opencl placeholders
      modified symmetric pad equation in medfilt cpu backend
      median filter implementation in cuda backend
      median filter opencl backend implementation
      fft/ifft functions in cpu backend
      fft framework changes
      fft/ifft cuda backend
      fft/ifft opencl backend
      meanshift API and cpu backend
      meanshift cuda backend implementation
      meanshift opencl backend
      createPaddedArray optimizations for cuda and opencl backend
      BUGFIX: copy kernel
      convolve cpu backend
      convolve cuda backend
      convolve opencl backend
      renamed ConvolveBatchKind variables
      Changed output array type for bilateral function
      subscript assignment feature for cpu, cuda and opencl backends
      cmake changes for opencl backend
      C++ wrappers for functions, includes a bugfix as well
      C++ wrappers for image and indexing functions
      Merge branch 'cpp' of ssh://mule/area51/arrayfire into cpp
      Bugfix: corrected array handle check in destructor
      Added index support
      Adding assign operator overloading in CPP
      Merge branch 'origin/cpp' to cpp
      Fixing copy assignment operator
      convenience member functions for array indexing
      changed separable convolve cpp API
      changed gradient cpp API
      additional unit tests for cpp wrapper
      Bugfix: af_assign
      Moved cpp wrapper functions to appropriate files
      regions cpp wrapper
      cpp wrapper unit tests
      Merge fft & convolve headers
      convolve API changes
      Added new cpp wrappers for ffts
      windows fixes for cuda backend
      windows fixes for opencl backend
      fix for google test build command
      Visual Studio File Grouping for Projects
      windows and *nix OS compatibility fixes
      windows fixes for cpu backend
      boost compute fixes for windows, had to undef min and max macros
      Merge branch 'master' into win_fixes
      undef min,max macros before boost/compute headers inclusion
      Commenting out cpu blas funcions temporarily on windows
      Additional fixes in cpu backend for windows platform
      Additional cpp convenience functions for moddims
      added compatibility APIs
      Added NOMINMAX definition for windows platform
      Removing PIC compiler flag for windows platform
      Removing undef min, max as NOMINAX is added for windows
      Merge remote-tracking branch 'origin/master' into ocl_win_fixes
      Corrected gtest library path for debug mode
      Added missing template specilizations for copy
      Corrected visual studio link libraries for  test build process
      Merge remote-tracking branch 'upstream/devel' into ocl_win_fixes
      Updated template specilizations for copy in cuda/cpu backends
      Merge remote-tracking branch 'origin/ocl_win_fixes' into devel
      add formatting to array print functions
      Windows compatibility changes for BLAS on cpu backend
      Merge branch 'devel' into win_cblas_fixes
      windows compatibility fixes
      style changes in cpu blas functions
      typo corrections
      changed dim_type typedef to int from long long
      Fix for copy function in cpu backend
      indexing unit tests for 3d and 4d arrays
      bugfix for cuda on windows
      changing variables names in reduce kernel for cuda backend
      cmake changes for windows MSVC Projects
      correcting test data commit number
      correcting test data commit number
      Changed setContext function scope
      added isDoubleSupported func for opencl backend
      Added double precision checks in opencl
      function to check double precision availability
      handle double precision in opencl tranpose
      adding missing header in testHelper hpp
      mean function
      added getDevice internal function for opencl
      modified buildProgram opencl helper function
      moved ocl kernel resources from stack to heap
      Moved cl_khr_fp64 extension
      Removed cl_khr_fp64 from individual cl files
      opencl device sorting
      Merge branch 'devel' into statistics
      2d convolve performance improvements
      feature: indexing array using array
      cuda backend for indexing array using array
      opencl backend for indexing array using array
      cpp wrapper for array based index
      Removed indices size check in array-index
      using 0 as default for dim to array index cpp wrapper
      bugfix: fixes complex types for mean on cuda/opencl backend
      Merge branch 'devel' into statistics
      Merge branch 'devel' into statistics
      feature: match template
      cpp wrapper for match template
      Corrected typo in median filter opencl kernel wrapper
      bugfix: match template cpp unit test
      Moved match template c api to apt location
      changed shared mem access pattern for conv3d
      Removed long long numeric qualifier for constants
      perf: minor performance improvements for bilateral
      Removed an obsolete condition in af_assign
      Merge branch 'devel' into array_idx
      perffix: 3d separable convolve
      Merge branch 'devel' into statistics
      feature: af_sobel_dxdy
      af_sobel_dxdy CUDA backend
      af_sobel_dxdy OpenCL backend
      cpp wrapper for sobel derivatives
      Changed c api for sobel operator
      Merge branch 'devel' into statistics
      Corrected test data hash tag
      Multiple func definition fix for arith operations: mod and rem
      BUGFIX: added same complex type cast noop
      FEATURE: convience functions for weighted mean
      FEATURE: variance
      FEATURE: standard deviation
      BUGFIX: added static qualifier for helper arithmetic functions
      FEATURE: RGB to GRAY and vice versa color space convertion
      FEATURE: covariance
      Code cleanup for mean, var, stdev
      FEATURE: median function
      FEATURE: correlation coefficient function
      BUGFIX: corrected scalar constant typo in median
      type correction in median removes warnings
      Merge branch 'devel' into statistics
      Code cleanup mean, median, stdev
      BUGFIX: windows fix for division helper function
      BUGFIX: fixed multiple definition error for unaryName function
      FEATURE: histogram equalization for images
      BUGFIX: increased filter/mask length for convolve kernels
      BUGFIX: modified default normalization factor
      PERFFIX: convolution perf improved by 2-4%
      PERFFIX: improved 2d convolve perf in cuda by 33%
      Renamed separable conv cuda kernel file
      Merge branch 'devel' into perf_conv
      PERFFIX: improved opencl 2d convolution peformance by 4%
      modified expand param to default to false for convolution
      BUGFIX: 2d separable convolution
      FEATURE: hsv to rgb and vice versa conversion functions
      FEATURE: colorspace function
      Reduced convolution compilation time
      BUGFIX: added type check for tests on opencl backend
      Adding copyright to examples
      namespace fix in machine learning examples
      Renamed af_array_index backend files to match new name af_lookup
      Documentation for colorspace conversion functions
      Documentation for histogram & histequal
      Moved repeat function docs content to common location for image.h
      Reuse unit tests to write documentation examples
      Removed duplicate lines in mean & var tests
      BUGFIX: fix in af_mean_all for cdouble type
      Removed USE_SYSTEM_GTEST cmake option
      FEATURE: generalized indexing function
      Renamed af_assign to af_assign_seq
      BUGFIX: corrected conv2 filter length constant
      FEATURE: af_assign_gen
      af_assign_gen cuda backend implementation
      af_assign_gen opencl backend implementation
      Merge branch 'devel' into general_index
      Documented code related to 'How to add function to ArrayFire' wiki
      Style and typo corrections in exampleFunction
      Regions documentation and code example
      Renamed image processing titles for morph & filters subgroups
      Documentation for gaussian kernel functions
      Documentation for Sobel Operator functions
      Documentation for matchTemplate function
      Documentation for medfilt function
      Documentation for meanshift & bilateral functions
      Documentation for Morphological Operator functions
      Documentation for Convolution functions
      Documentation for fft & ifft functions
      Documentation for approx1 & approx2 functions
      Documentation corrections
      Added 4th dimension batch for erode,dilate
      Added 4th dimension batch for erode3d,dilate3d
      Revised documentation for convolve
      Added 4th dimension batch for bilateral
      Added 4th dimension batch for meanshift
      Added 4th dimension batch for histogram
      Added 4th dimension batch for median filter
      matchTemplate 4th dimension batch support
      4th dimension batch support for rgb-gray transformations
      Enabled 4th dim batch support for hsv-rgb conversions
      Added 4th dim batch support for 2d separable convolve
      Added 4-connectivity code sample
      batch support for indexed arrays in separable 2d convolution
      BUGFIX: batch support for indexed arrays in morphological functions
      batch support for indexed arrays in bilateral, meanshift
      batch support for indexed arrays in histogram
      batch support for indexed arrays in medfilt function
      batch support for indexed arrays in matchTemplate
      batch suppor for indexed arrays in sobel functions
      Moved cuda::trimIndex to utility header
      BUGFIX: fixed a condition check in cpp wrapper for convolve
      Renamed ConvolveBatchKind::[ONE2ALL to ONE2MANY]
      BUGFIX: added .as(u8) call for input in regions
      Multidimensional batch support for convolve
      Changed the shared memory loading pattern in 3d convolve
      HOTFIX: fixes normalization factor bug in ifft
      additional cpp convenience wrappers for fft and ifft
      renamed unified fft wrapper API
      BUGFIX corrected size_t format specifier
      BUGFIX fixed 3rd dimension index in cpu histogram implementation
      BUGFIX fixed Windows Debug mode issue
      Windows specific fixes for graphics
      Changes to reflect forge API changes
      Removed unncessary CheckGL calls
      OSX graphics fixes
      Replaced static arrays of cl::program/kernels with maps
      OS X interop context creation changes
      Corrected no graphics enum in graphics functions
      Additional Image Processing Examples
      Windows specific changes to HAPI Symbol Manager
      Cleaned up function call in hapi functions
      Fixed typo in data, device & index wrapper source files
      Corrected BUILD_ALL cmake macro arguments
      fix: opencl backend alone build fails due to this missing header
      Fixed CMake source bugs for windows platform in unified api sources
      Removed AFAPI attribute declaration where no needed for func definitions
      Another cmake fix for windows platform in unified api project

Pradeep Garigipati (69):
      Read me redirection to repository wiki.
      Basic contribution guidelines for pull requests
      Merge pull request #83 from pavanky/readme
      Adding unit tests related info
      Merge pull request #146 from pavanky/gtest
      Merge pull request #152 from shehzan10/devel
      Merge pull request #160 from arrayfire/devel
      Merge pull request #162 from pavanky/reorg
      Merge pull request #207 from pavanky/memory
      Merge pull request #222 from shehzan10/devel
      Merge pull request #223 from arrayfire/devel
      Merge pull request #226 from pavanky/cplx
      Merge pull request #244 from pavanky/jit_fixes
      Merge pull request #248 from shehzan10/devel
      Merge pull request #256 from pavanky/iota
      Merge pull request #257 from shehzan10/devel
      Merge pull request #297 from shehzan10/devel
      Merge pull request #315 from bkloppenborg/devel
      Merge pull request #330 from pavanky/ocl_jit_fix
      Merge pull request #375 from pavanky/jit_fixes
      Merge pull request #379 from shehzan10/devel
      Merge pull request #380 from shehzan10/devel
      Merge pull request #383 from pavanky/clcontext
      Merge pull request #384 from pavanky/random
      Merge pull request #386 from pavanky/ireduce
      Merge pull request #407 from arrayfire/memory
      Merge pull request #415 from umar456/cxx_fix
      Merge pull request #417 from pavanky/gausskern
      Merge pull request #435 from bkloppenborg/warning_fix
      Merge pull request #439 from bkloppenborg/array_indexing
      Merge pull request #447 from pentschev/improve_orb_perf
      Merge pull request #448 from pentschev/improve_fast_perf
      Merge pull request #450 from pentschev/fast_edge
      Merge pull request #455 from bkloppenborg/remove_unneeded_chars
      Merge pull request #460 from bkloppenborg/get_non-zero_dims
      Merge pull request #459 from ogreen/MRead
      Merge pull request #463 from pavanky/minor_fixes
      Merge pull request #462 from shehzan10/devel
      Merge pull request #466 from pentschev/doc_fast
      Merge pull request #471 from pentschev/doc_orb
      Merge pull request #478 from pavanky/bug_fixes
      Merge pull request #495 from pentschev/fix_fast_warning
      Merge pull request #497 from shehzan10/devel
      Merge pull request #502 from bkloppenborg/standalone_examples
      Merge pull request #505 from pavanky/docs
      Merge pull request #515 from bkloppenborg/cmake_opencl_fix
      Merge pull request #519 from shehzan10/devel
      Merge pull request #537 from pavanky/indexing
      Merge pull request #550 from shehzan10/devel
      Merge pull request #560 from pentschev/regions_b8
      Merge branch 'plot' into 'graphics'
      Merge branch 'border' into 'graphics'
      Merge pull request #603 from pavanky/fft_fixes
      Merge branch 'api_changes' into 'graphics'
      Merge branch 'cpu_hist' into 'graphics'
      Merge pull request #634 from pavanky/bug_fixes
      Merge pull request #645 from pentschev/increase_resize_coverage
      Merge pull request #649 from pavanky/index_cleanup
      Merge pull request #671 from pavanky/fixes
      Merge pull request #863 from pavanky/bugfix
      Merge pull request #906 from pavanky/bugfixes
      Merge pull request #936 from pavanky/fft
      Merge pull request #941 from pavanky/new_funcs_31
      Merge pull request #945 from pavanky/jit_fixes
      Merge pull request #944 from pavanky/minor
      Merge pull request #994 from vakopian/fi-leak-fix
      Merge pull request #996 from pavanky/hotfixes-3.1.1
      Merge pull request #1023 from 9prady9/cpuid_lp64_marco_fix
      Merge pull request #1031 from shehzan10/unified_fixes

Prashanth Govindarajan (4):
      Build fix for Windows
      FEAT Added plot to cpu backend
      AF changes for forge borders and ticks
      AF CPU backend changes for forge 1dhistogram

Richard Klemm (5):
      Add SVD API
      Add SVD Cuda backend
      Add SVD CPU Backend
      Add SVD OpenCL Stump
      Add SVD example

Shehzan Mohammed (768):
      Added af_diff1 function with cpu backend implementation.
      Fixing typo in cuda/opencl placeholders for diff1
      Added af_diff2 function with cpu backend implementation.
      Added randu and randn functions to cpu
      Added AF_<backend> definitions to test
      Added CUDA backend for diff1 and diff2
      Optimized diff to use just two kernels
      Change launch configuration when inputs are just vectors
      Added OpenCL backend for diff1 and diff2
      Fixing ostream << operator for uchar to print numbers
      Added image IO functions to all backends (code is independent of backend)
      Removed flags from CMAKE for cuda build
      Using static channel_split in imageio
      Created image.h header file
      Added CPU backend for resize
      Added CUDA backend for resize
      Added OpenCL backend for resize
      Merge branch 'master' into resize
      Updated OpenCL and CPU with offset changes
      Updated diff CPU with offset changes
      Code cleanup for Resize (all backends)
      Large tests for resize. Minor type fixes for resize.
      Added Transform and Rotate for CPU, CUDA and OpenCL backends
      Merge branch 'master' into transform
      Merge branch 'master' into transform
      Added wrappers for translate, scale and skew. Added tests for rotate
      Added helper functions to ArrayInfo
      Using failure count for rotate tests. Minor type corrections in resize.
      Added == and != operators for dim4
      Added base_type to traits
      Added Approx1 and Approx2 to all backends
      Kernel code cleanup for Approx1,2 linear interp
      Make random test deterministic
      Using Params in cuda kernels
      Added tile to CPU, CUDA, OpenCL backends
      Performance improvement to tile in CUDA, OpenCL
      Fix buildProgram multiple definition error
      Unified kernel arguments for approx
      Unified kernel arguments for diff
      Unified kernel arguments for resize
      Unified kernel arguments for transform
      Unified kernel arguments for tile
      Added error framework to approx
      Added error framework to diff
      Added error framework to resize
      Added error framework to tile
      Added error framework to transform
      Device management for CUDA
      Moved dimension checks for matmul to src/backend
      Added reorder to all 3 backends
      Added circular shift to all backends
      Create af_create_handle wrapper for createEmptyArray
      Added gradient to all backends
      Added empty wrappers for sort
      Added CUDA and OpenCL backends for Sort on dim0
      Added multi-dimensional support for sort on dim 0
      Fixed cudaGetDriverVersion for Mac and ARM
      BUGFIX const correctness in ArrayInfo functions
      Separated sort into sort (only values) and sort_index (values and indices)
      Merge remote-tracking branch 'origin/master' into sort
      Increase tolerance for rotate test
      API change for sort
      Fix for FindCBlas.cmake on debian based OS
      Fixing blas uninitialzed warnings
      Optimizing loops in sort
      Added sort_by_key to all backends
      Merge remote-tracking branch 'origin/master' into sort
      Added Boost 1.48 requirement to OpenCL CMake (for compute)
      Fixing warnings for ostream in opencl/kernel
      BUGFIX for cuda random number generation on multiple devices
      Update for device manager for OpenCL and CPU
      Added test to print af_info
      Fix for erroring out if Boost.Compute not found for OpenCL
      Split sort* functions into separate files
      Updated README.md
      Separated functions into header files
      Disable large sort tests
      Added af_copy_array function for deep copy
      Added helper functions to c++ wrapper
      Added implementation for af::print
      Added examples folder
      Merge branch 'cpp' of mule:area51/arrayfire into cpp
      Merge remote-tracking branch 'origin/master' into cpp
      Added CPP wrappers for blas, device, data, reduce header files
      Added Version.h functionality. Updated info()
      Added c-api wrapper for weak copy
      Updated operator= for array class
      Added constant function to create complex arrays
      Added wrapper for unary and binary operations
      Added operator overloading for arithmetic and relational operations
      Moved operator overloading to src/array/arith.cpp
      Fixed af_print to print regular array (not transposed)
      Added iota function to C and C++ API
      Merge branch 'cpp' into 'master'
      Fix for isnan, isinf compilation error
      Overload sort C++ API
      Change dim4 from struct to class
      Added cmake code for CUDA Compute variables
      Removed usage of ArrayInfo from C++ API
      Merge pull request #40 from arrayfire/info_cpp_fixes
      OSX Compilation Fixes
      Using *operator instead of pow
      Merge pull request #41 from arrayfire/osx_fixes
      BUGFIX Fix sort on CPU
      Added seq class for C++ API
      Updating helloworld example
      Merge pull request #44 from bkloppenborg/cuda_6_0_compile_fix
      Make FreeImage library optional
      Change FREEIMAGE_FOUND to WITH_FREEIMAGE
      Added AF_ERR_NOT_CONFIGURED, added to imageio
      Merge pull request #49 from arrayfire/freeimage-optional
      Upgrading C++11 flag from 0x to 11
      Added #if __cplusplus around utils
      BUGFIX op= in array class
      Added timing code
      Fixed CPU random generator
      Merge pull request #73 from kylelutz/reduce-header-includes
      Fixed whitespaces, unsigned warning, removed printf
      Merge pull request #80 from bkloppenborg/cmake_additional_install
      Merge pull request #82 from bkloppenborg/cmake_additional_install
      Update README.md with Jenkins Build tags
      Updated README.md
      Choose CUDA default device using AF_CUDA_DEFAULT_DEVICE
      Choose OPENCL default device using AF_OPENCL_DEFAULT_DEVICE
      Merge pull request #89 from arrayfire/default_device
      Added round, floor, ceil and abs instances to CPU
      Commit for CUDA PTX update fix for abs
      Merge pull request #90 from pavanky/device
      Merge pull request #93 from pavanky/bugfixes
      Merge pull request #95 from pavanky/bugfixes
      Merge pull request #100 from 9prady9/readme
      Remove use of cmake variable CUDA_DRIVER_LIBRARY
      Merge pull request #110 from bkloppenborg/add_cmake_find_script
      Added option to not run CUDACmputeCheck.cmake
      Formatting changes in cuda/CMakeLists.txt
      Removing -DWINDOWS_REMOTE and just checking CUDA_COMPUTE_CAPABILITY
      Improved performance of tranpose
      Updated README.md with windows build tag
      Update README.md
      Remove cmake whitespace warning
      Merge pull request #121 from 9prady9/ocl_perf_fix
      Update README.md
      Merge pull request #123 from pavanky/jit_fixes
      Updated README.md
      Increase tolerance for approx tests
      Merge pull request #130 from arrayfire/devel
      Compilation fix for null pointer in constructor
      Fix for get device ptr
      Merge pull request #147 from pentschev/regions_tests_fix
      Update submodules
      Added deviceprop functionality to CUDA and OpenCL.
      Changed REVISION to AF_REVISION
      Merge pull request #153 from arrayfire/update_test_submodule
      Update submodules
      Added deviceprop functionality to CUDA and OpenCL.
      Changed REVISION to AF_REVISION
      Merge pull request #154 from pavanky/arith
      Calling af_init to initialize contexts
      Merge pull request #158 from bkloppenborg/platform_format_fix
      Added linear interpolation for transform (adds for rotate, scale, skew etc)
      Added tests for bilinear transforms (rotate)
      Fixes for rotate bilinear tests
      Removed device selection from transpose test
      Use & operator for array in resize definition
      Added image graphics to CPU and CUDA backend
      Merge branch 'devel' into graphics
      Fix for no graphics builds
      Move graphics include files into GRAPHICS_FOUND in CMake
      Merge pull request #177 from pentschev/fast_opencl_fix
      Merge branch 'devel' into graphics
      Merge pull request #180 from pavanky/perf
      Added path for GLEWmx for Tegra
      Added OpenGL error checks, better window close handling
      Added Conway's Game of Life example
      Added OpenGL errors, better window closing to CPU
      Added AF_ERR_GL_ERROR and error checking in graphics
      Fix cmake error caused by same filename in test and example
      PERF Improvements to transform and rotate
      Merge branch 'devel' into graphics
      Wrappers for resize
      Add missing license headers to transform_interp
      Added tests for iota
      Added placeholders for join
      Added CPU backend for join
      Added CUDA backend for join
      Added OpenCL backend for join
      Added tests and linking test data for join
      Merge branch 'devel' into graphics
      Better error handling in ImageIO
      PERF Batching + Blocks images in rotate and transform
      Fixed tests for imageio after error changes
      Split sort_by_key instantiation into multiple files
      Changed dir to isAscending in all sort functions
      Merge branch 'devel' into graphics
      Update image API to accept scaling factors
      Change CMAKE_SOURCE_DIR/common to CMAKE_MODULE_PATH
      Moved common to CMakeModules
      Default arguments for medfilt
      Change int to dim_type in cuda transpose
      PERF Improvement to transpose opencl
      Added conjugate option to transpose
      Added transpose .T() and conjugate transpose .H() to array class
      Changed minor version from .200 to .beta
      Merge pull request #220 from pavanky/perf
      Adding noDoubleTests condition to all tests
      Merge branch 'devel' into graphics
      Compilation fix for cpu complex.hpp
      Added pinned memory functionality
      Added warning messages for when submodules are not cloned
      Merge pull request #240 from pavanky/atan2
      Merge pull request #241 from pavanky/test
      Merge pull request #242 from pavanky/hypot
      Merge branch 'devel' into graphics
      BUGFIX Fixed memory leak in image io, performance improvements
      Merge pull request #246 from pavanky/new_funcs
      Added dim checks for binary element wise ops
      BUGFIX Fix segfault in copy array
      Merge branch 'memory' of github.com:pavanky/arrayfire into devel
      Fixed references to shared_ptr for cpu and opencl backend
      Fixed compilation fix for identity
      Correcting typo in FindFreeImage
      Merge pull request #249 from arrayfire/devel
      Added memory manager for pinned memory
      Using pinned memory in imageio
      Changed API for iota
      BUGFIX Fixed index-based array operators
      Merge remote-tracking branch 'origin/devel' into graphics
      Removing boost chrono required from opencl
      Merge pull request #277 from pavanky/constants
      Merge pull request #278 from pavanky/api
      Merge pull request #279 from umar456/clang_warn
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge pull request #283 from pavanky/minor_fixes
      BUGFIX Reorder condition fix
      BUGFIX for generating array from seq using negative step
      Merge pull request #299 from pavanky/gfor
      Compilation fix for Windows
      Merge pull request #304 from pavanky/bug_fixes
      Call init from pinned memory and load image
      Added 4th dimension support to resize
      Changed tiling block from y to x in rotate kernels
      Added 4th dimension support to rotate
      Merge pull request #310 from 9prady9/array_idx
      Added 4th dimension support to transpose
      BUGFIX in set device for cuda
      BUGFIX Fixes seq generation for positive numbers
      Merge branch 'devel' of github.com:arrayfire/arrayfire into devel
      Merge pull request #316 from bkloppenborg/devel
      Split sort_by_key instantiation into multiple files for CUDA backend
      Moved sort_by_key instantiation files into directory
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge pull request #345 from pavanky/more_fixes
      Merge remote-tracking branch 'origin/devel' into graphics
      Compilation fix for windows
      Merge remote-tracking branch 'origin/devel' into graphics
      Fixed clBLAS/clFFT libs install for Windows and OSX
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge pull request #371 from pavanky/fixes
      Updated README.md
      Fix windows compile issue for opencl context handling
      Merge remote-tracking branch 'origin/devel' into graphics
      BUGFIX Fixed sobel output types
      Sobel returns int for integer types instead of float
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge pull request #382 from pavanky/ocl_build
      Added support for AFGFX
      Merge remote-tracking branch 'origin/devel' into graphics
      Using colorspace conv in graphics. Fixed memory leaks
      Mirroring change in handle names from AFGFX
      Merge remote-tracking branch 'origin/devel' into graphics
      Fixed memory leak in image c-api
      Merge remote-tracking branch 'upstream/devel' into graphics
      Fixed backend API for join
      Merge remote-tracking branch 'upstream/devel' into graphics
      Using direct backend calls for join, tile reorder in af_image
      Merge pull request #392 from pavanky/flat
      Disable key testing for sort_index and sort_by_key for OpenCL
      Merge remote-tracking branch 'origin/devel' into graphics
      Fix opencl build errors:
      Merge pull request #399 from pavanky/copy_fixes
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/devel' into graphics
      Merge pull request #401 from 9prady9/conv_fixes
      Merge remote-tracking branch 'upstream/devel' into graphics
      Added pretty version of conway
      Add pragma once to copy.hpp
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Better templating convert_and_copy_image in graphics
      Better backend API for iota (allow default argument for reps)
      BUGFIX seq ops
      Changed DIRECTORY to PATH in examples/CMakeList.txt
      Add warning for not cloning gtest submodule
      Remove GIT_SUBMODULES from build_gtest. Not supported on older Cmake
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Changed graphics names to match Forge nomenclature
      Merge pull request #438 from umar456/double_test
      BUGFIX for indexing after JIT ops
      Fixes to save image
      Merge pull request #452 from pavanky/jit_fixes
      Fixes and code optimization to join
      Merge pull request #458 from 9prady9/win_fix
      API Change iota to range
      Update test data for orb
      Merge pull request #465 from pentschev/remove_fast_memset
      BUGFIX Fix windows is_same ambiguity
      Merge pull request #481 from pentschev/match_fast_results
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Merge pull request #490 from umar456/mnist
      API Change order of data that range generates
      Added operators for dim4
      Added new functionality iota
      Removing test/range. Will add back when corrected for new functionality
      BUGFIX Fix offsets and strides when using moddims
      Merge pull request #506 from bkloppenborg/example_fix
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Merge pull request #511 from umar456/assets
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge pull request #514 from bkloppenborg/cmake_packaging
      Merge remote-tracking branch 'origin/devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      BUGFIX Fixed bug in range
      Added test for range
      API Add overloaded api for tile
      BUGFIX Fixed tiling bug in iota
      Added test for iota
      Updated README.md with binary downloads link
      Using FindOpenCL from CMake 3.2
      Merge branch 'build-opencl-dependencies' of https://github.com/glehmann/arrayfire into devel
      Update clFFT commit id
      Merge pull request #545 from 9prady9/fft_api_additions
      Revert "download and build the opencl dependencies with cmake"
      Using commits as version for Boost.Compute external build
      Fixed building clBLAS external
      Fixed building clFFT external
      Using wget if cmake downloads empty file for boost compute
      Formatting and case in cmake files
      Merge remote-tracking branch 'origin/devel' into graphics
      Change download link
      Merge pull request #553 from glehmann/fix-ep-byproducts
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Renaming lib_deps to FreeImage_LIBS
      Added Linear Algebra for CUDA using cusolver
      Fix compiler errors for lapack
      Added Cholesky for CPU
      Added LU to CPU
      Fixed dimensions in QR CUDA
      Added QR to CPU
      Added brief linear algebra example files
      Fixed cholesky upper-lower issue
      Fixed return arrays for lapack functions
      Returning index value array for LU as pivot
      Added CPP tests for linear algebra
      Added inplace transpose API
      Merge branch 'name_changes' into 'graphics'
      Added framework for solve
      Added CPU implementation for solve
      Merge pull request #571 from pavanky/gfor_fix
      Merge branch 'lin_algebra' of github.com:alltheflops/arrayfire into lin_algebra
      Added TRSM to cuda backend
      Added convert pivot option to cuda lu
      Solve CUDA and CPU fixes. Work in progress
      Compilation fixes for trsm
      Change lapack api leading dims to use strides
      Added convert_pivot option for LU to all backends
      CMake fixes for CUDA 7 with CMake 3.2. Change WITH_LINEAR_ALGEBRA to WITH_<backend>_LINEAR_ALGEBRA
      Remove AFAPI from cpp/lapack.cpp
      Added inverse to all backends
      Merge branch 'graphics' into plot
      Split plot from image
      Merge remote-tracking branch 'mule/opencl_image' into plot
      Merge remote-tracking branch 'upstream/devel' into plot
      Move CUDA InteropManager into its own file
      Move OpenCL InteropManager into its own file
      Added plot to CUDA backend
      Updated plot example
      Added plot to OpenCL backend
      Removed size from fg_plot2d in accordance with change in forge
      FEAT Added 3 and 4 array join functions
      Changed af_join3/4 into af_join_many
      Added CPP test for join many
      Fixes for number of arrays in join
      Merge pull request #609 from pentschev/add_join_u64
      Merge pull request #617 from pentschev/move_fft_large_tests
      Update boost.compute repository and commit
      Merge pull request #636 from pentschev/fix_build_warnings
      Added write and af_write_array functions to all backends
      Added tests for write
      Merge pull request #642 from pentschev/fix_meanshift_tests
      Merge remote-tracking branch 'upstream/devel' into lin_algebra
      Fix for min max in magma files
      Fixes for qr and cholesky tests
      Changes to accomodate f77 blas on windows for lapack
      Adding AF_CPU definition to CPU backend
      Separate variable for f77 blas for CPU and OpenCL
      Fixes for CUDA linear algebra on windows
      Add AFAPI to proxy class for windows
      Remove setenv from inverse test
      Using MKL CBLAS, LAPACKE
      Increase tolerance for LU float test
      Added provisional lapacke wrapper for lapack
      Fixes for lapacke wrapper on OSX
      Fixes for lapack tests
      Added return in swapblk opencl, fixed size of qr test
      Fix CUDA cusolver library find in CMake
      Fix compiler warnings in tests
      Disable lapacke.cpp for OSX CUDA
      Added class for FreeImage such that init and deinit are called once
      Reduced size for triangle tests - Used > 1.5GB mem
      Added options for static FreeImage in CMAKE
      Calling freeimage init/deinit in constructor/destructor
      Updated FindFreeImage.cmake to handle switch of lib type
      Added COPYRIGHT.md for software credits and corresponding licenses
      Fix freeimage_include_path
      CMake Fix for broken build due to freeimage static
      Fixed Freeimage initialization bug
      Changed FI, MKL toggle variables from *_USE_* to USE_*
      Using simpler FindLAPACKE script
      Add double test disabling to random getseed
      Type corrections in COPYRIGHT.md
      Added default arguments to translate, scale, skew
      Fixed linear interpolation for transform/rotate
      Merge remote-tracking branch 'upstream/devel' into devel
      Merge branch 'devel' of github.com:arrayfire/arrayfire into devel
      Build fixed for windows for b685fa6
      Merge remote-tracking branch 'upstream/devel' into graphics
      Fix Forge CMake scripts
      Warning fixes in hist and morphing
      Building forge as external project
      Build fixes for rotate and abs
      abs fix for size_t on tegra
      Merge remote-tracking branch 'upstream/devel' into graphics
      Fixed to build_forge for building on windows
      Installs forge includes/libs with install command
      Divide VS projects into VS Filters for Tests and Examples
      Update build_forge.cmake
      Add Find Package GLFW back for include directories
      Change translate test verification to ratio
      Fixes for Solve OpenCL for NVIDIA GPUs
      Disable solve/qr test failing on windows opencl
      Fixed qr and solve test to compile
      Merge remote-tracking branch 'upstream/devel' into graphics
      Fix build_forge to build properly on windows
      Adding BUILD_GRAPHICS Option to find/build Forge
      CMakeList Tabbing/Formatting
      Added GLEW and GLFW along with licenses to copyright
      Changing forge-external to forge-ext
      Change macro for dim_t definition
      Expanded macro for definition of dim_t
      Adding LP64 macro for dim_t
      Merge remote-tracking branch 'upstream/devel' into graphics
      Compilation fixes and getting started test fixes
      Documentation for data.h
      Documentation for util.h
      Documentation for print
      Documentation for inplace transpose
      Fix double tests in getting started and matrix manip
      Added c32/64 s/u/32/64 to resize
      DOC: Added documentation for image
      Compilation fixes in resize for osx and windows
      CPack filename based on graphics support
      Compilation fix for afHost in conway examples
      Change conway loop condition to window.close()
      Fixing double tests in getting started
      Making the triangle test slightly smaller in size
      Removing forge headers from install command
      Merge pull request #711 from 9prady9/no_gfx_fix
      Option to use static GLEWmx
      Remove print statements
      Add option to use system GTest
      Fixing GLEWmx for build forge
      Fix __builtin_popcount for windows
      Fix e5d2cf9
      Add AF_DISABLE_GRAPHICS to disable OpenCL window creation on init
      Correct alternate functions for deprecated functions
      Replace deprecated calls with new APIs
      Fix reduce test files
      Added more paths to FindGLFW
      Add descriptions to Conway, update conway_pretty
      Merge pull request #726 from 9prady9/examples
      Added wrapper for setTitle for window
      Added arrayfire pro tips page
      Renamed arrayfire_pro_tips to configuring_arrayfire_environment
      Add option for user to set relative test data directory
      More uint to unsigned
      Merge pull request #739 from 9prady9/cmap_additions
      Merge pull request #742 from bkloppenborg/examples_fix
      Merge pull request #745 from pavanky/fractal_fixes
      Adding capability to build multiple compute versions
      Change CUDA_COMPUTE_CAPABILITY to CUDA_DETECTED_COMPUTE
      Added CUDA_COMPUTE_DETECT Option to disable auto detection
      For single compute, set PTX to compute version
      Replace deprecated functions with new API
      Deprecated functions will throw errors in examples
      Disable cuda compute check from being run everytime cmake is run
      Fix deprecated warning to error for visual studio
      Fix headers for c-only compilation
      Merge pull request #757 from pavanky/final_fixes
      Tagging build_forge for af3.0
      Added Step 0 to windows to run pre-built executables
      Merge pull request #759 from arrayfire/devel
      Fixing conflicts between std::array and af::array
      Move snprintf/static macros from defines.h to backend
      Disable RPM and DEB packages
      Merge pull request #776 from 9prady9/gfx_changes
      Change forge branch to master
      Add defines.hpp to opencl files where snprintf is required on windows
      Enable multiple CUDA computes to be detected and enabled
      Disable fallback computes if any CUDA_COMPUTE_XY is set
      Change build status to reflect master status
      Disable fallback only if any compute is set to ON
      Merge branch 'fix-missing-include' of git://github.com/glehmann/arrayfire into multiple_computes
      Merge pull request #796 from umar456/osx_install
      Merge pull request #798 from umar456/fractal
      Optimized compute detection code
      Merge pull request #803 from bkloppenborg/hotfixes
      Merge remote-tracking branch 'upstream/hotfixes-3.0.1' into devel
      Merge branch 'private_lib' of git://github.com/umar456/arrayfire into hotfixes-3.0.1
      Fixed examples cmake for unix
      Merge remote-tracking branch 'upstream/devel' into hotfixes-3.0.1
      Fix header file in basic_c
      Merge pull request #805 from 9prady9/gfx_image_fixes
      Add gitter label
      Fix CUDACheckCompute when error is returned
      Minor typo fixes in windows doc
      Fixed gtest git link
      Merge pull request #811 from 9prady9/printf_cleanup
      Merge pull request #810 from umar456/osx_install
      Merge pull request #815 from arrayfire/hotfixes-3.0.1
      Remove cache path from NVVM path
      Function return type fix for blas
      Compilation fix for windows when graphics disabled
      FEAT Add CPU backend for unwrap function
      Added CUDA backend for Unwrap
      Added OpenCL backend for Unwrap
      Bugfixes, comments
      Adding padding for strides > 1
      Added test for unwrap, updated data submodule
      Added documentation for unwrap
      Changing behavior of unwrap using padding
      Changed unwrap tests to new behavior
      Updated documentation for unwrap
      Unwrap: Remove duplicate asserts, add intl, uintl to test
      Remove unused lapack definition
      BUGFIX for gradient when single element is in new block
      Added AF_INTERP_LOWER and implementation for resize
      Merge pull request #847 from pavanky/docs
      Fixing bug in linear interpolation functions
      FEAT Added nearest neighbour with SSD, SAD and SHD
      Added tests for nearest neighbour
      Added documentation for nearest neighbour
      Merge remote-tracking branch 'upstream/devel' into nearestNeighbour
      Fix double compilation
      Remove redefition warning for blas
      Added options for dotc and dotu to dot function
      Bug fixes for nearest neighbour and hamming
      Added tests for dot
      Compilation and warning fixes
      Add lower interpolation to rotate and transform
      Allow users to set precision when using print
      Changed af_print macro, added documentation
      Fix print macro
      Change build labels to be for devel branch
      Added noDoubleTest for nearest neighbour and dot test
      Added AF_API_VERSION
      Added 64-bit integer type support for functions
      Added print errors to documentation
      Merge pull request #915 from pavanky/fixes
      Fix type in linux doc
      Fix signed-usigned comp warnings
      Fixes to print functions
      FEAT Added saveArray and readArray functions for file read/write
      Removed af_print_array_c. af::print now calls af_print_array_p
      saveArray returns index of array
      Rename af_print_array_p to af_print_array_gen
      FEAT added to string function
      FEAT Added image IO using memory functions and tests
      Fix conjugate transpose for vectors
      Changed saveImageMem API. Added image format enum
      Add typedef af_image_format af::imageFormat
      Updated boost compute version tags
      Remove set_scalar(x, 0) instructions
      Remove unused opencl/kernel/set files
      Fix typo
      Merge pull request #969 from pavanky/svd
      Added missing 3.1 version gaurds
      BUGFIX SVD use gesdd only with MKL, use gesvd with atlas
      Updated release notes for v3.1.0
      Version guards for complex operators
      Added SIFT license info to release notes
      SVD using gesdd on Apple
      Merge pull request #971 from pavanky/assign
      Merge pull request #973 from 9prady9/upstream_updates
      Increment version to 3.1.1
      Fixes for snprintf on windows
      FEAT Added batch support for approx1 and approx2
      Changing int to dim_t in approx kernels
      Added any dimension batching and gfor support for approx1 and approx2
      Change condition structure in approx
      Merge pull request #983 from pavanky/indexed_reduce_fixes
      DOC fix for AF_PATH rendering missing %
      Read me fixes
      BUGFIX in assign
      Add missing compute2cores versions
      Increment version to 3.1.2
      BUGFIX convolve frequency condition is now based on kernel size
      Add missing AFAPIs
      Fix sizes for approx batch tests
      Use af_print_array_gen in unified basic example
      Change unified backend priority. Add af/backend.h to arrayfire.h
      Changed unified/basic.cpp to use C++ api
      Add unified backend details to using on pages and cmake.in file
      Documentation for unified backend
      Reduced size of approx1 batched linear test
      Change output of DOG to floating type
      Increment version to 3.2.0
      Added AF_MSG macro
      Added short (s16) and ushort (u16) types for CPU
      Added short, ushort support for CUDA backend
      Fix memory alloc for fast opencl
      Added short and ushort support for CUDA backend
      Remove ushort redifinition from imageio
      Change ushort to unsigned short in cpp
      Add typedef for ushort in tests
      Add missing examples to documentation
      Fix quoting text in readme docs
      Minor fixes in documentation. Fix cmake command for docs
      Corrections in unified backend doc
      Add -L to lib path in using on pages
      Add 16-bit enums to docs
      Fix tests for 32-bit systems
      Added enviornment variable to disable multi gpu tests
      Fix median test (again)
      Merge branch 'osx_inst_fix' of git://github.com/umar456/arrayfire into hotfixes-3.1.2
      Updated release notes
      Merge branch 'master' into devel
      Fix dlopen string for OSX
      Documentation fixes
      Increment version to 3.1.3
      Merge pull request #1034 from 9prady9/set_native_device
      Merge pull request #1041 from pavanky/fixes
      Return CUDA Driver version on windows too
      Add Paths to FindOpenCL for linux
      Merge branch 'sift_scale' of git://github.com/pentschev/arrayfire into hotfixes-3.1.3
      Fix memory leak in median
      Fix windows builds when not using MKL
      Merge pull request #1050 from pentschev/sift_tests
      Add paths to examples FindOpenCL.cmake file
      Fix documentation groups for select and replace
      Merge remote-tracking branch 'upstream/hotfixes-3.1.3' into devel
      Merge pull request #1052 from pentschev/gloh
      Added CPU fallback for CUDA LU when CUDA older than 7
      Added CPU fallback for CUDA QR when CUDA older than 7
      Added CPU fallback for CUDA QR when CUDA older than 7
      Added CPU fallback for CUDA Solve when CUDA older than 7
      Added CPU fallback for CUDA Inverse when CUDA older than 7
      Added CPU fallback for CUDA SVD when CUDA older than 7
      Call deviceGC before solve tests to minimize memory (tegra)
      Default CPU fallback for CUDA LAPACK to OFF. Use CUDA_LAPACK_CPU_FALLBACK=ON
      Change condition when nonfree are removed from ctest
      Fix comparison warnings
      Fixes for building without lapack
      Call submodule update if submodules are missing
      Merge pull request #7 from pentschev/sift_fixes
      Updated SIFT/GLOH test thresholds
      Fix doc for af_isnan
      Updated release notes for 3.1.3
      Merge pull request #1061 from pavanky/bugfixes-3.1.3
      Add return type to cuda blas (for windows)
      Add change to release notes for 3.1.3
      Merge pull request #1059 from arrayfire/hotfixes-3.1.3
      Merge pull request #1062 from arrayfire/master
      Provide option for MKL use for CUDA lapack cpu fallback
      Fix compilation fixes for VS2015
      Add return type docs for functions with varying return type
      Fix warnings
      Change clBLAS tag to the corrected commit
      Added function to get available backends
      Optimizations to backends available computation
      Optimization for JPEG, cleanup
      Moved common functions from imageio into header file
      FEAT add loadImageT and saveImageT. Provides loading in different types
      Change loop in surface example
      Fix enum value conversion in image
      Fix imageio load order in case of bitmap and not bitmap
      Add s16 and u16 types to image (graphics)
      Add s16 and u16 types to surface (graphics)
      Add s16 and u16 types to histogram (graphics)
      Add s16 and u16 types to plot (graphics)
      Add s16 and u16 types to plot3 (graphics)
      Update forge build tag
      Add load_image_t and save_image_t to unified
      Doc for loadImageT and saveImageT
      Fixes for ushort on windows
      Update test data
      Add intl/uintl to sort, sort_index, sort_by_key
      use cl_long and cl_ulong in sort functions
      Add intl/uintl to lookup
      Add intl/uintl to histogram and histeq
      Add intl/uintl to convolve and fftconvolve
      Add intl/uintl to set functions
      Add intl/uintl to meanshift
      Fix cuda shared memory instantiation for s64 and u64
      Fix comparison warning
      Compilation fix for non-imageio builds
      Add install page to layout
      More documentation updates for tutorials
      API Change loadImageT -> loadImageNative
      Add support for c32/c64 for isInf, isNaN, iszero
      Update links
      Fix iota dims check
      Fix af_device_array dims check
      Typo AFF_ERR_NONFREE -> AF_ERR_NONFREE
      Add version guards for v3.2
      Encode backend info into ArrayInfo::devId
      Added array/backend checks to unified backend
      Add getBackendId function to get backend info of an array
      Update unified api docs
      CHECK_ARRAYS lets C-API return errors in case of arr = 0
      Merge branch 'doc-updates' of https://github.com/bkloppenborg/arrayfire into unified_checks
      Merge branch 'devel' of https://github.com/shehzan10/arrayfire into unified_checks
      Added version checks for getBackendId
      Fix triangle test failures
      Merge branch 'homography' of https://github.com/pentschev/arrayfire into unified_checks
      Merge pull request #1096 from 9prady9/susan_fixes
      Merge branch 'homography' of https://github.com/pentschev/arrayfire into unified_checks
      Moved det to rank test file. Removed rank and det from missing test
      Removed gfor unsupported functionality
      Added new examples
      Added release notes for 3.2.0
      Update forge tag for af3.2.0
      Add Tegra X1 badges to readme
      Transpose build table in readme
      Added groups for graphics func documentation
      Fixes for examples when used with installer
      Fix documentation when using older doxygen
      Merge branch 'patch-2' of https://github.com/mlloreda/arrayfire into hotfixes-3.2.1
      Merge branch 'fix/missing-libdl-linkage' of https://github.com/ghisvail/arrayfire into hotfixes-3.2.1
      Merge branch 'minor-docs' of https://github.com/shehzan10/arrayfire into hotfixes-3.2.1
      Merge branch 'gfx_surface_fix' of https://github.com/9prady9/arrayfire into hotfixes-3.2.1
      Merge branch 'enh/docs-target-settings' of https://github.com/ghisvail/arrayfire into hotfixes-3.2.1
      Merge branch 'fix/examples-target' of https://github.com/ghisvail/arrayfire into hotfixes-3.2.1
      Fix type in documentation
      Fixes for examples cmakelists for dl lib
      Tests are now available as standalone
      Fix examples/cmakelist arguments for osx and windows
      Documentation for seq class
      Fix possible divide by zero case in cpu info
      Add enable_testing to test/CMakeLists.txt
      Merge pull request #1120 from shehzan10/tests-standalone
      Merge pull request #1130 from pavanky/bugfixes-3.2.1
      Merge pull request #1136 from pentschev/homography_fixes
      Merge pull request #1132 from shehzan10/seq_docs
      Fix examples installation directory
      Use folders (VS sln) for examples/tests when built out of source
      Install examples source irrespective of value of BUILD_EXAMPLES
      Updated forge tag
      CMake generates the list of examples
      Generate examples as dir/filename.cpp
      Update examples refs to match updated example style
      Updated release notes for 3.2.1
      Fix typo
      DOC Add background and bold to inline code tags
      DOC corrections, proper linking and syntaxes
      Increment version to 3.2.1

Umar Arshad (297):
      Initial Commit
      FIX: Don't build test files if BUILD_TEST is OFF
      Removed MESSAGE from opencl CMakeLists.txt
      Fixed gcc47 parsing error
      Removed tagged struct initilization to support older compilers
      Fixed #1: Building OpenCL library fails on Linux
      Fix #16: Patch command need to be used instead of svn patch
      Merge branch '16' into 'master'
      Merge branch 'data' into 'master'
      Automatic download of arrayfire test data
      forgot GetTestData.cmale
      Only run tests whose binaries are being built
      Fixed errors which popped up on Ubuntu related to pthreads
      Merge branch 'ubuntu' into 'master'
      Merge branch 'datatypes' into 'master'
      Merge branch 'test_definition' into 'master'
      Consistant naming for headers in src/backend
      Cleanup rand functions. Remove macros
      Instantiate Array distructors
      Merge branch 'master' into rand
      Cleanup. Improve readability in random.
      Merge branch 'ocl_cmake' into 'master'
      Merge branch 'random' into 'master'
      Fix various things to get clang working on OSX
      Merge branch 'master' into clang_fix
      Compile on Linux
      Replace operator overload with ToNum.
      Merge branch 'fix_print_uchar' into 'master'
      Merge branch 'simple_index' into 'master'
      CPU GEMM, GEMV, and DOT
      Tests for GEMM and GEMV
      CUDA GEMM, GEMV and DOT
      Merge branch 'cuda_reduce' into 'master'
      Merge branch 'cpuscan' into 'master'
      Merge branch 'testHelper_fix' into 'master'
      BLAS on OpenCL using clBLAS library.
      Cleanup CMake files(i.e. remove messages)
      Changes to compile on Linux. Fix warnings on g++.
      Merge branch 'clrand' into 'master'
      A more robust FindCLBLAS.cmake file
      Removed unnecessary code from tests.
      Merge branch 'master' into blas
      Formatting changes. Fix leak in CPU. Enable dot
      Initial Error commit
      Updated interface for error checking.
      Created a common exception handling format
      Merge pull request #140 from 9prady9/cuda_fix
      Merge pull request #142 from 9prady9/msvc_filters
      Fixes #167: Check if driver is unloaded when freeing array
      Fix warnings in CPU backend on clang
      Added the __ANSI_STRICT definition for OSX
      Compile using g++ on OSX
      Remove unnecessary instanciations of morph in OpenCL
      Merge pull request #267 from mlloreda/macosx_rpath_fix
      Fix g++ warnings on OSX
      Fix linker warning on OSX
      Fixed warning due to if/switch
      Merge branch 'clang_fixes' into clang_warn
      Merge pull request #327 from pavanky/bugfix
      Merge pull request #328 from pavanky/gfor_seq
      Merge pull request #340 from pavanky/bug_fixes
      Merge pull request #356 from pavanky/math_funcs
      Merge pull request #361 from pentschev/orb_fixes
      Merge pull request #359 from jramapuram/devel
      Merge pull request #370 from arrayfire/ptxgen
      Initial documentation for ArrayFire 3.0
      Simple description for array constructor and BLAS
      Merge pull request #390 from shehzan10/devel
      Fixed incorrect use of std::map::erase in OpenCL
      Merge pull request #398 from pavanky/64bit
      Merge pull request #400 from arrayfire/devel
      Merge pull request #402 from pavanky/cleanup
      Merge pull request #405 from 9prady9/fft_fix
      Remove C++11 conditional from src/api/c.
      Moved assets folder/submodule to the root dir.
      Added basic C interface functions.
      Merge branch 'devel' into docs
      Merge pull request #422 from arrayfire/examples
      Use the chromium repository to build gtest.
      Merge pull request #426 from pavanky/bug_fixes
      Fix unused variable warnings in convolve_separable
      Merge pull request #434 from pavanky/examples
      Double precision checks in testing
      Merge pull request #440 from pavanky/dbn
      Merge pull request #442 from pentschev/orb_blurring
      Merge pull request #446 from pavanky/rbm
      Merge pull request #457 from pentschev/change_fast_datatype_internals
      Merge pull request #443 from 9prady9/cspace_hist_docs
      Faster DBN convergence. Test updates
      Merge branch 'devel' of github.com:arrayfire/arrayfire into dbn_rand
      Merge pull request #467 from pentschev/fix_fast_edge_assert
      Make tests C++03 complient.
      Merge branch 'devel' into clang
      Remove additional c++11 features from test
      Make tests using libstdc++ for clang builds on OSX
      Removed messages an unnecessary functions
      Merge pull request #484 from glehmann/fix-gtest-byproducts
      Merge pull request #489 from pavanky/perf
      Added display ASCII display function for MNIST.
      Merge pull request #496 from pavanky/dox
      Merge pull request #508 from 9prady9/image_docs
      Fix ASSET_DIR path
      Remove macro templates in constant. Use Gtest templates
      Merge pull request #527 from 9prady9/docs_improvements
      Add code coverage flags for UNIX platforms
      Add coverall configuration file
      Merge pull request #531 from pavanky/indexing
      Merge pull request #530 from pavanky/softmax
      Added coveralls target to cmake
      Remove old token from coveralls config
      Merge pull request #539 from pavanky/memory
      Remove delete calls
      vector -> unique_ptr for unitilized data. Removed init loops
      Merge pull request #546 from glehmann/fix-unused-function
      Changes based on cppcheck static analysis
      Merge pull request #548 from pavanky/scan_fixes
      Create one cuBlasHandle per GPU. Tests
      Merge pull request #557 from bkloppenborg/cmake_packaging
      Identity support for s64 and u64. Unit Tests
      Fix identity for char and complex types
      Identity unit tests for C++
      Remove unnecessary operator overloads for complex in OpenCL
      Sort functions in docs. Update coveralls
      Merge pull request #568 from pavanky/fft_mem_fix
      Minor cpu/reduce refactor
      Add unit test coverage information in the README
      Fix CMake to create Xcode projects
      Print error messages based on environment vars
      Remove iostream.
      Use relitive path for unit test data path
      Add s64 and u64 support for assign
      Merge pull request #581 from bkloppenborg/docs_external_code
      Remove unnecessary template parameters from tests
      Use explicit for af::array constructors
      Typed tests for alltrue and anytrue
      Diagonal Tests
      Fix char ZERO option bugs in diagonal. Tests. Refactor
      Added support for s64 and u64 to diagonal. Tests
      Merge pull request #604 from pavanky/bug_fixes
      Merge pull request #605 from pavanky/index_fixes
      BUGFIX: adjust a few is* functions
      Unit tests for several af::array member funcitons
      Adjusted failure tests for constant and random
      Minor indexing refactor
      Destructors must not throw exceptions
      Merge pull request #610 from pavanky/dim4
      Merge branch 'devel' of github.com:arrayfire/arrayfire into array_attrib
      Scalar is not a vector
      Simplify isRow/isColumn
      Remove gen_index member function
      Move initBlas to the blas header.
      Including missing headers for call_once in blas.hpp
      Merge pull request #622 from bkloppenborg/linux_installers
      Use non-member non-friend functions to perform binary operations
      Use forward declarations instead of including headers
      Merge branch 'devel' into proxy
      Include array.h in opencl.h: forward decl doesn't work
      Indexing using proxy class. Initial implementation.
      Assign unit tests
      Assignment tests for row and col member functions.
      Assign unit tests
      Assignment tests for row and col member functions.
      Update test data commit
      Merge pull request #632 from pentschev/fix_fast_osx
      Fixed faulty float comparison
      Merge branch 'devel' into proxy
      Unit tests for array_proxy to array_proxy assignment
      Fixes indexed array to indexed array assignment
      Simplify indexing. Fix const correctness.
      Remove unused functions. Cleanup
      Image save tests
      Merge pull request #650 from pentschev/gaussiankernel_tests
      Add s64/u64 type support for lookup
      array_proxy member func tests. Test additional types
      Tests assignment with the slice function.
      Test rows and cols functions.
      remove scalar test.
      Error checks for indexing operations
      Merge pull request #653 from bkloppenborg/install_docs
      Merge pull request #657 from alltheflops/lin_algebra
      Remove unit tests from docs examples
      Merge pull request #664 from pavanky/minor_fixes
      Merge pull request #673 from shehzan10/devel
      Merge pull request #672 from FilipeMaia/fix_missing_operators
      Merge pull request #674 from pavanky/minor_features
      Add tests for where for u64 and s64
      Add support to var for u64 and s64.
      Additional var tests
      Merge pull request #680 from pavanky/features
      Merge pull request #684 from pavanky/api_changes
      Merge pull request #691 from pavanky/bug_fixes
      Test Getting started code examples
      Remove unused gfor parameter from array constructor
      Move matrix_manipulation examples to unit tests.
      Add missing constructors for af::array
      Fix indexing ND using linear indexing
      Partial fix for linear indexing on ND array
      Create indexing unit tests for indexing documentation.
      Apply Rule of 3/5 to array_proxy class.
      Merge branch 'devel' into gs_tests
      BUGFIX: Memory leak in array_proxy operator array()
      BUGFIX/STYLE: Refactor moddims/assign;  Memory leak in assign.
      API: rename af_weak_copy/af_destroy_array to af_retain/af_release
      Merge pull request #698 from pavanky/osx_fixes
      Document index and enums in defines.h
      Merge pull request #703 from pavanky/lapack
      DOCS: Added docs for operator(). Style updates in array.h
      Fix const correctness for row(s)/col(s)
      Merge branch 'devel' into array_docs
      Merge pull request #708 from pavanky/docs
      Merge pull request #707 from arrayfire/graphics
      Merge pull request #714 from pavanky/fixes
      renamed uint to unsigned
      Merge pull request #716 from shehzan10/devel
      API: Update set and reduce API
      API: fftconvolve, meanshift, colorspace, erode/dilate
      API: histequal->histEqual, deviceprop->deviceInfo
      API: Add old names to compatibility header
      Create a deprecated macro.
      Merge pull request #720 from pavanky/bug_fixes
      Merge pull request #717 from shehzan10/devel
      Merge branch 'devel' into array_docs
      DOCS: Additional operator overload docs
      Merge pull request #729 from pavanky/solvers
      Merge pull request #730 from pentschev/fast_example
      Merge pull request #728 from shehzan10/devel
      DOCS: Document rest of array.h. Fix doxygen labels
      Merge pull request #733 from shehzan10/devel
      DOCS: Style updates. Deprecated list
      Updated doxygen to show deprecated list.
      Fix documentation warnings
      Merge fixes
      Merge branch 'pavanky-missing_functions' into devel
      Convert constant into a template
      Added the row(s)/col(s)/slice(s) member functions to proxy
      std::complex -> af_cfloat/af_cdouble
      Add deprecated functions from AF2.1
      Merge branch 'devel' into dep_func
      c++ checks in complex
      Added test to check the validity of the headers in C
      Remove git folder form docs folder
      Resolve ambiguous pow error on osx cuda(6.5)
      Add missing parameter to CREATE_TESTS macro
      Updated fractal example
      Merge pull request #797 from shehzan10/multiple_computes
      Make all link interfaces private
      Fix cuda linking.
      Fix cmake errors in linux for private linking
      Remove references to installer_mode
      Created OSX Installer
      Make osx installer based on targets. Style
      Finalize osx Installer
      Put guards around osx install scripts
      Renamed cpack.txt to cpack.cmake
      Update the installer name
      Make dim4 a POD object
      Make ArrayInfo a POD object
      Add static checks for POD for dim4 and ArrayInfo
      Make Array<T> a standard layout type
      Ensure Array<T> is standard layout using static_asserts
      Revert "Make dim4 a POD object"
      Reduce size of Array<T> by rearranging mem vars
      Add static checks to make sure ArrayInfo is the first mem var
      Merge pull request #848 from pavanky/new_additions
      FEAT: intl/uintl for random
      FEAT: intl/uintl support for all reduce functions
      TEST: Refactor reduce tests. Test intl/uintl
      BUILD: Fix redefinition warning in blas.
      Revert "BUILD: Fix redefinition warning in blas."
      Merge pull request #870 from pavanky/refs
      STYLE: Remove macros; Simplify templates;
      Merge pull request #875 from pavanky/reduce-nan
      Merge pull request #874 from pentschev/harris
      Merge pull request #877 from pavanky/cplx_fixes
      Merge pull request #889 from pavanky/features-3.1
      Merge pull request #895 from FilipeMaia/devel
      Merge pull request #897 from pavanky/minor_changes
      Merge pull request #933 from 9prady9/ker_fixes
      Merge pull request #935 from 9prady9/cudaMem_stream
      Create instances for const index member functions
      Merge pull request #954 from shehzan10/imageio_mem
      Merge pull request #955 from pavanky/wrap
      Merge pull request #958 from shehzan10/ctrans_fix
      Merge pull request #959 from ghisvail/bugfix/remove-unused-dtype-traits
      Merge pull request #966 from pavanky/compile_fixes
      Merge pull request #964 from pavanky/fixes_310
      Merge pull request #965 from pavanky/nonfree_fixes
      Merge pull request #1009 from pavanky/reduce_fixes
      Merge pull request #1018 from shehzan10/hotfixes-3.1.2
      Send err messages to file for OSX installer
      Merge pull request #1026 from arrayfire/hotfixes-3.1.2
      Merge pull request #1056 from shehzan10/devel
      Port shallow water eq example from 2.1
      Add unified backend binaries to the OSX installer
      Merge pull request #1076 from 9prady9/cuda_memcpy_stream_fixes
      Merge pull request #1077 from shehzan10/imageio
      Merge pull request #1102 from arrayfire/devel

Vardan Akopian (1):
      use RAII to avoid freeimage bitmap resource leaks

chris (3):
      opencl build program fixes for osx
      optional double support for jit.cl
      fix double support in jit.cl

easuter (2):
      Use "Subversion_SVN_EXECUTABLE" explicitly instead of "svn patch".
      Correction to commit e7ab4f6

firemanphil (1):
      Fix "Could not read from remote repository" issue

greenman (2):
      Fixed typo
      Modified readme file with additional ArrayFire contact info

jramapuram (3):
      Update CMakeLists.txt
      allow for cuda7+ and backwards changes
      comma to space

mlloreda (9):
      Initial 1D sort implementation for cpu.
      Sorting across first dimension
      Merged origin sort to local
      no more async; non-global indexing
      test/sort.cpp: test on working types
      src/backend/sort.cpp: disable s8 for sort
      src/backend/sort.cpp: modify dim assertion for 2D
      test/sort.cpp: 2D test
      CMakeLists.txt: Update OSX RPath settings

ogreen (2):
      Update README.md
      Update README.md

orbitcowboy (1):
      Fixed potential memory leaks. Each array allocated with new [n] must be deallocated using delete [].

pavan at arrayfire.com (3):
      BUGFIX: Buffer nodes from subArrays now use the parent ptr and offsets
      Tests use google test present on system if BUILD_GTEST=OFF
      Adding a new option to use system GTEST

pradeep (192):
      Moved helper functions to common source file to enable resuse
      af_moddims function
      Bugfix for opencl resource cleanup on windows
      Code cleanup
      BUGFIX: used a wrong offset for 0th dim in kernels
      Updated contribution guidelines with new wiki page link
      BUGFIX: corrected padding kernel offset
      Unit test for fft on padded Arrays
      HOTFIX:Corrected kernel window lengths meanshift
      Renamed files with image related functions to image
      cuda image rendering resource manager changes
      opencl backend graphics
      BUGFIX: Corrected PBO binding for image cpu
      BUGFIX: fixed vbo index in plot, cuda & opencl backends
      Changes to reflect API change in Forge
      BUGFIX: build/compile changes
      Merge branch 'devel' into graphics
      Updated assets submodule
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Replaced loadFont with loadSystemFont call
      Merge branch 'graphics' into cpu_hist
      Modified histogram draw function
      CUDA backend copy_hist function
      OpenCL backend copy_hist function
      Style fixes for string macro concatenation in sprintf
      BUGFIX fixed signed-unsigned comparison warnings
      BUGFIX added /bigobj CXX flag for opencl on windows
      Added missing cpp wrappers for set functions
      Merge branch 'devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      Added multiview examples: edge filters and morphing
      Added titles for multiple view mode graphics examples
      Modified CATCHALL macro to handle forge exceptions
      ArrayFire Graphics API changes
      BUGFIX fixed histogram draw params in example
      Modified histogram & plot examples to use render loop
      Added default axes labels for plot, hist functions
      Merge branch 'devel' into graphics
      Merge remote-tracking branch 'upstream/graphics' into graphics
      bug fixes: related to convolve API change and OpenCL headers
      BUGFIX fixed graphics namespace when not needed
      Removed uncessary dependency GLFW from ArrayFire
      Merge branch 'devel' into graphics
      Merge branch 'devel' into graphics
      Added copy-via-host fallback option for opencl graphics
      Merge branch 'devel' into graphics
      Style fixes
      Fixed segfault error when graphics is not used for opencl
      Changed af::Window::operator() to return reference
      Added checks for graphics calls
      Fixed histequal to return result with same dims as input
      BUGFIX fixed ForgeManager caching mechanism
      BUGFIX: corrected af_min_t to min for char specilization
      Added image processing examples
      Added set position function to af::Window
      Offseted window start position in pyramids examples
      Modified harris example to use graphics when appropriate
      Added fractal examples
      Fixed math header in optical flow example
      Modified fractal array to be normalized before rendering
      Added colormap attribute to af::Window
      Added colormap option to examples to make them look pretty
      Corrected doxygen group tag for matchTemplate C++ API
      Documentation for af::Window class and graphics C API
      Merge branch 'devel' into cmap_additions
      BUGFIX in jit opencl
      Graphics changes to reflect changes in upstream library forge
      Fixed glew header search hints for cmake
      Removed the need to Find glfw/glew from build_forge script
      updated external project forge tag
      Fix for displaying images of type uchar, int and uint
      Removed a printf from opencl backend
      Changed forge tag to af3.0.1 in build_forge.cmake
      FEAT: Difference of Gaussians
      Additional operator* overloads for cfloat, cdouble
      Added mean<T> instantiations for int64 and uint64 in C++ API
      Clean up mean helper functions & typo fix in af_mean_all_weighted
      Additional unit tests for mean
      Updating assets commit tag
      Changes in examples to reflect asset modifications
      Merge branch 'devel' into stats_tests
      turned off clFFT examples in external project build
      fix forge dependency target errors in cpu, opencl backends
      Merge branch 'devel' into stats_tests
      Corrected path typo in mean tests
      Made cpu::Array constructor consistent with CUDA & OpenCL
      Fixed filenames for the files used in computer vision examples
      SUSAN Corner Detector
      Added error checks non-array parameters of susan API
      CUDA backend for SUSAN dectector
      OpenCL backend for SUSAN dectector
      fixes: typos; specific to windows; additional unit test
      Corrected memory allocation bug in cpu backend for orb, where
      Removed uncessary corner sorting for SUSAN
      matchTemplate example
      matchTemplate fix in opencl to support indexed template images
      Removed uncessary copy in matchTemplate example
      Added heat colormap to display disparity values for matching
      Changed CUDA/OpenCL kernels to use zero leading dimension
      typo fix in cuda SUSAN kernel
      SUSAN CUDA/OpenCL: Added bound checks to load shared/local Memory
      Replaced static shared memory with dynamic in SUSAN CUDA kernel
      Merge branch 'devel' into stats_tests
      Changed default cuda stream to be non-zero
      Wrapped cuda kernel launches with CUDA_LAUNCH macro
      FEAT: Summed Area Tables (sat, af_sat) a.k.a integral images
      Added stream parameter for upstream{thrust, cufft, cublas} calls
      Added check for skipping double type test when not supported
      Added uintl, intl support for jit operations in cuda backend
      thrust fixes for cuda stream selection on cuda < 7.0
      shared/local memory loading fix
      namespace fix for POST_LAUNCH_CHECK macro: cuda backend
      Replaced cuda Memcopy/Memset with async versions
      typo fix in opencl morph kernel
      graphics window set size functions
      style fixes in graphics examples
      Merge branch 'devel' into stats_tests
      YCbCr <-> RGB conversion functions
      unit tests for YCbCr <-> RGB conversion
      Documentation for ycbcr_rgb conversion functions
      Modified colorspace function to handle new colorspace - YCbCr
      Modifed colorspace function wrapper code for efficiency
      Replaced padArray calls with Jit operations in ycbcr conversions
      Updated forge tag
      modified cpu::getInfo to display CPU information
      Fixes for cpu backend getInfo on Tegra platform
      Fixed missing header error for windows platform
      adding GL headers in platform.cpp
      Moved GL headers in platform.cpp inside WITH_GRAPHICS block
      Initial commit for heterogenous api for ArrayFire backends
      Heterogeneous API for arith and algorithm header functions
      OpenCL backend af_info function string fixes
      Documentation for CUDA backend specific API
      Documentation fixes for 3.1 release
      Updated forge upstream tag for 3.1 release
      Documentation for missing index header functions
      Added missing docs for complex and opencl backend specific fns
      Added missing docs for operator%, array::H and array::T
      Merge branch 'devel' into stats_tests
      Merge branch 'devel' into heterogeneous_api
      Updated copyright year in hapi source files
      backend-independent api wrapper for image & vision headers
      backend-independent wrapper for arrayfire funtions
      Renamed cmake file hapi build-identifier
      Cleaned up symbol manager class in HAPI wrapper
      Changed default backend enum to point to zero
      Documentation for runtime backend selection functions
      Wrapper work around for af_make_seq function in hapi
      set_backend and get_backend_count functions
      Moved HAPI examples into standard examples location
      Merge branch 'devel' into heterogeneous_api
      Utility functions for generating af_index_t array objects
      Moved indexing utility functions to common location
      Added missing functions hapi wrapper
      changed unified api to load libraries using prioritized list of paths
      Fixed cmake bug in examples also
      Added error display strings for unified api error codes
      fix in unified api for af_save_image
      Disabled Sort1000 & SortMed tests for sort_by_key and sort_index
      bug fix in image_editing example
      Updated forge tag for 3.1 release
      Merge branch 'devel' into stats_tests
      typo fixes in mean unit test
      Fixed histogram cuda/opencl kernels for indexed arrays
      Removed unncessary memory overhead in histogram cuda/opencl kernels
      type cast fix in histogram unit test
      Restricts cpuid usage to only 64 & 32 bit architectures
      Removed __LP64__ macro from checks related to valid cpuid usability
      Forge tag update for ArrayFire 3.1.2
      Merge branch 'devel' into stats_tests
      Corrected a typo in statistics functions documentation
      basic unit tests for `af::cov` and `af_cov`
      unit tests for standard deviation function
      unit tests for correlation coefficient function
      Enabled integral types to float/double reduction
      statistics functions fixes
      Merge branch 'devel' into stats_tests
      function to set active cuda device using native id
      function to set active opencl device using cl_device_d
      Fixed template specilization for MSVC compiler in mean function
      Added check to verify f64 support in covariance unit test
      Specilization for Binary functor for cdouble type in cpu backend
      Merge branch 'surface_plot' of git://github.com/syurkevi/arrayfire into syurkevi-surface_plot
      Fix for cuda backend surface rendering function
      Merge branch 'syurkevi-surface_plot' into devel
      Replaced deviceSychronize calls with async versions
      Removed uncessary stream synchronizes in device pointer functions
      Style fixes
      Memory leak fix in SUSAN feature detector
      specilizations for abs math function for int & char
      Indexing test for out of bounds access
      Added missing symbol export for af_draw_surface

syurkevi (13):
      3D line plot feature
      3d surface rendering features
      updates matrix manipulation documentation
      fix code formatting in doxygen
      initial vectorization tutorial
      forge visualization tutorial
      initial opencl, cuda interop tutorials
      initial interop tutorials
      doxygen formatting and reference fixes
      interop tweaks
      interop formatting tweaks
      additional vectorization content
      remove extra information from vectorization

unbornchikken (2):
      This fixes that serious memory leak in OpenCL backend, and gets rid of unnecessary CLBuffer referencing.
      Retain buffer added.

xumbu (1):
      Fixed 'snprintf' definition conflict in Visual Studio 2015

-----------------------------------------------------------------------

No new revisions were added by this update.

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/arrayfire-cuda.git