[arrayfire] 32/61: initial interop tutorials

Tue Dec 8 11:55:06 UTC 2015

This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to branch dfsg-clean
in repository arrayfire.

commit b6e75429e0fa5dc6672c89a4f7f55a59890b4cce
Author: syurkevi <stefan at arrayfire.com>
Date:   Tue Dec 1 12:30:34 2015 -0500

    initial interop tutorials
---
 docs/layout.xml              |   1 +
 docs/pages/interop_cuda.md   |  19 +++++-
 docs/pages/interop_opencl.md | 150 ++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 153 insertions(+), 17 deletions(-)

diff --git a/docs/layout.xml b/docs/layout.xml
index 720e9d6..76b6bcc 100644
--- a/docs/layout.xml
+++ b/docs/layout.xml
@@ -12,6 +12,7 @@
       <tab type="user" url="\ref matrixmanipulation" visible="yes" title="Matrix Manipulation"/>
       <tab type="user" url="\ref vectorization" visible="yes" title="Vectorization"/>
       <tab type="user" url="\ref forge_visualization" visible="yes" title="Forge Visualization"/>
+      <tab type="user" url="\ref interop_opencl" visible="yes" title="OpenCL Interoperability"/>
       <tab type="user" url="\ref indexing" visible="yes" title="Indexing"/>
       <tab type="user" url="\ref timing" visible="yes" title="Timing ArrayFire"/>
       <tab type="user" url="\ref configuring_environment" visible="yes" title="Configuring ArrayFire Environment"/>
diff --git a/docs/pages/interop_cuda.md b/docs/pages/interop_cuda.md
index 822a887..e20cf66 100644
--- a/docs/pages/interop_cuda.md
+++ b/docs/pages/interop_cuda.md
@@ -1,7 +1,7 @@
 Interoperability with CUDA {#interop_cuda}
 ========
 
-As extensive as ArrayFire is, there are a few cases where you are still working with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. In this post we are going to talk about how to integrate your custom kernels into ArrayFire in a seamless fashion.
+As extensive as ArrayFire is, there are a few cases where you are still working with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. Arrayfire provides a number of functions that allow it to work alongside native CUDA commands. In this tutorial we are going to talk about how to use native C [...]
 
 # In and Out of Arrayfire
 
@@ -45,6 +45,7 @@ In this example, the output is the same size as in the input. Note that the actu
     float *d_x = x.device<float>();
     float *d_y = y.device<float>();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Accesing the device pointer in this manner internally sets a flag prohibiting the arrayfire object from further managing the memory. Ownership will need to be returned to the af::array object once we are finished using it.
 
 Before  launching your custom kernel, it is best to make sure that all ArrayFire computations have finished. This can be called by using af::sync(). The function ensures you are not unintentionally doing out of order executions.
 af::sync() is not strictly required if you are not using streams in CUDA.
@@ -55,7 +56,9 @@ af::sync() is not strictly required if you are not using streams in CUDA.
     // y = sin(x)^2 + cos(x)^2
     launch_simple_kernel(d_x, d_y, num);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The function **launch_simple_kernel** handles the launching of your custom kernel. We will have a look at how to do this in CUDA and OpenCL later in the post. Once you have finished your computations, you have to tell ArrayFire to take control of the memory objects.
+The function **launch_simple_kernel** handles the launching of your custom kernel. We will have a look at how to do this in CUDA and OpenCL later in the post.
+
+Once you have finished your computations, you have to tell ArrayFire to take control of the memory objects.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     x.unlock();
     y.unlock();
@@ -101,3 +104,15 @@ void inline launch_simple_kernel(float *d_y,
 }
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+# Additional interop functions and CUDA Streams
+
+Arrayfire provides a collection of CUDA interoperability functions for additional capabilities when working with custom CUDA code. To use them, we need to include the appropriate header.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+#include <af/cuda.h>
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The first thing these headers allow us to do are to get and set the active device using native CUDA device ids. This is achieved through the following functions:
+    **static int getNativeId (int id)** -- Get the native device id of the CUDA device with id in the ArrayFire context.
+    **static void setNativeId (int nativeId)**  -- Set the CUDA device with given native id as the active device for ArrayFire. 
+
+These functions are available within the afcu:: namespace and equal C variants can be fund in the full [cuda interop documentation.](group__cuda__mat.htm)
diff --git a/docs/pages/interop_opencl.md b/docs/pages/interop_opencl.md
index 8a8acef..2b81972 100644
--- a/docs/pages/interop_opencl.md
+++ b/docs/pages/interop_opencl.md
@@ -1,11 +1,18 @@
-Interoperability with OpenCL {#interop_cuda}
+Interoperability with OpenCL {#interop_opencl}
 ========
 
-As extensive as ArrayFire is, there are a few cases where you are still working with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. In this post we are going to talk about how to integrate your custom kernels into ArrayFire in a seamless fashion.
+As extensive as ArrayFire is, there are a few cases where you are still working
+with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels.
+For example, you may want to integrate ArrayFire into an existing code base for 
+productivity or you may want to keep it around the old implementation for testing
+purposes. Arrayfire provides a number of functions that allow it to work alongside 
+native OpenCL commands. In this tutorial we are going to talk about how to use 
+native OpenCL memory operations and custom OpenCL kernels alongside ArrayFire
+in a seamless fashion.
 
-# In and Out of Arrayfire
-
-First, let's consider the following code and then break it down bit by bit.
+# OpenCL Kernels with Arrayfire arrays
+First, we will see how custom OpenCL kernels can be integrated into Arrayfire code.
+Let's consider the following code and then break it down bit by bit.
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 int main() {
@@ -26,7 +33,7 @@ int main() {
 
     // check for errors, should be 0,
     // since sin(x)^2 + cos(x)^2 == 1
-    float err = af::sum(af::abs(y-1));
+    float err = af::sum<float>(af::abs(y-1));
     printf("Error: %f\n", err);
     return 0;
 }
@@ -34,21 +41,30 @@ int main() {
 
 ## Breakdown
 Most kernels require an input. In this case, we created a random uniform array **x**.
-We also go ahead and prepare the output array. The necessary memory required is allocated in array **y** before the kernel launch.
+We also go ahead and prepare the output array. The necessary memory required is
+allocated in array **y** before the kernel launch.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     af::array x = randu(num);
     af::array y = randu(num);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In this example, the output is the same size as in the input. Note that the actual output data type is not specified. For such cases, ArrayFire assumes the data type is single precision floating point ( af::f32 ). If necessary, the data type can be specified at the end of the array(..) constructor. Once you have the input and output arrays, you will need to extract the device pointers / objects using array::device() method in the following manner.
+In this example, the output is the same size as in the input. Note that the actual
+output data type is not specified. For such cases, ArrayFire assumes the data type
+is single precision floating point ( af::f32 ). If necessary, the data type can
+be specified at the end of the array(..) constructor. Once you have the input and
+output arrays, you will need to extract the device pointers / objects using 
+array::device() method in the following manner.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     float *d_x = x.device<float>();
     float *d_y = y.device<float>();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Accesing the device pointer in this manner internally sets a flag prohibiting the arrayfire object from further managing the memory. Ownership will need to be returned to the af::array object once we are finished using it.
+Accesing the device pointer in this manner internally sets a flag prohibiting 
+the arrayfire object from further managing the memory. Ownership will need to be
+returned to the af::array object once we are finished using it.
 
-Before  launching your custom kernel, it is best to make sure that all ArrayFire computations have finished. This can be called by using af::sync(). The function ensures you are not unintentionally doing out of order executions.
-af::sync() is not strictly required if you are not using streams in CUDA.
+Before  launching your custom kernel, it is best to make sure that all ArrayFire
+computations have finished. This can be called by using af::sync(). The function
+ensures you are not unintentionally doing out of order executions.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     af::sync();
 
@@ -56,12 +72,20 @@ af::sync() is not strictly required if you are not using streams in CUDA.
     // y = sin(x)^2 + cos(x)^2
     launch_simple_kernel(d_x, d_y, num);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The function **launch_simple_kernel** handles the launching of your custom kernel. We will have a look at how to do this in CUDA and OpenCL later in the post. Once you have finished your computations, you have to tell ArrayFire to take control of the memory objects.
+The function **launch_simple_kernel** handles the launching of your custom kernel.
+We will have a look at the specific functions Arrayfire provides to interface with
+OpenCL later in the post. 
+
+Once you have finished your computations, you have to tell ArrayFire to take control
+of the memory objects.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     x.unlock();
     y.unlock();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This is a very crucial step as ArrayFire believes the user is still in control of the pointer. This means that ArrayFire will not perform garbage collection on these objects resulting in memory leaks. You can now proceed with the rest of the program. In our particular example, we are just performing an error check and exiting.
+This is a very crucial step as ArrayFire believes the user is still in control
+of the pointer. This means that ArrayFire will not perform garbage collection 
+on these objects resulting in memory leaks. You can now proceed with the rest of
+the program. In our particular example, we are just performing an error check and exiting.
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     // check for errors, should be 0,
@@ -70,9 +94,105 @@ This is a very crucial step as ArrayFire believes the user is still in control o
     printf("Error: %f\n", err);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-# Launching an OpenCL kernel
-If you are integrating an OpenCL kernel into your ArrayFire code base, launching a kernel is slightly complicated. Since ArrayFire uses its own context internally, you need to get the context from a memory object. Once you have access to the same context ArrayFire is using, the rest of the process is exactly the same as launching a stand alone OpenCL context.
+## Launching an OpenCL kernel
+If you are integrating an OpenCL kernel into your ArrayFire code base you will
+need several additional steps to access Arrayfire's internal OpenCL context. 
+Once you have access to the same context ArrayFire is using, the rest of the 
+process is exactly the same as launching a stand alone OpenCL context.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+void inline launch_simple_kernel(float *d_y,
+                                 const float *d_x,
+                                 const int num)
+{
+    std::string simple_kernel_str = CONST_KERNEL_STRING;
+
+    // Get OpenCL context from memory buffer and create a Queue
+    cl::Context context(afcl::getContext(true));
+    cl::CommandQueue queue(afcl::getQueue(true));
+
+    //Build program and get the required kernel
+    cl::Program prog = cl::Program(context, simple_kernel_str, true);
+    cl::Kernel  kern = cl::Kernel(prog, "simple_kernel");
+
+    //set global work dimensions
+    static const cl::NDRange global(num);
+
+    //prepare argumenst
+    kern.setArg(0, d_y);
+    kern.setArg(1, d_x);
+    kern.setArg(2, num);
+
+    //run kernel
+    queue.enqueueNDRangeKernel(kern, cl::NullRange, global);
+    queue.finish();
+
+    return;
+}
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+First of all, to access to OpenCL and the interoperability functions we need to
+include the appropriate headers.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+#include <af/opencl.h>
+#include <CL/cl.hpp>
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The **opencl.h** header includes a number of functions for getting and setting
+the context, queue, and device ids used internally in Arrayfire. There are also
+a number of methods to construct an af::array from an OpenCL cl_mem buffer object.
+There are both C and C++ versions of these functions, and the C++ versions are
+wrapped inside the afcl:: namespace. See full datails of these functions in the
+[opencl interop documentation.] (\ref opencl_mat)
+
 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+cl::Context context(afcl::getContext(true));
+cl::CommandQueue queue(afcl::getQueue(true));
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We start to use these functions by getting Arrayfire's context and queue. For the
+C++ api, a **true** flag must be passed for the retain parameter which calls the
+clRetainQueue() and clRetainContext() functions before returning. This allows us
+to use Arrayfire's internal OpenCL structures inside of the cl::Context and
+cl::CommandQueue objects from the C++ api.
+Once we have them, we can proceed to set up and enqueue the kernel like we would
+in any other OpenCL program. The kernel we are using is actually simple and can
+be seen below.
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+std::string CONST_KERNEL_STRING = R"(
+__kernel
+void simple_kernel(__global float *d_y,
+                   __global const float *d_x,
+                   const int num)
+{
+    const int id = get_global_id(0);
+
+    if (id < num) {
+        float x = d_x[id];
+        float sin_x = sin(x);
+        float cos_x = cos(x);
+        d_y[id] = (sin_x * sin_x) + (cos_x * cos_x);
+    }
+}
+)";
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+# Reversing the workflow: Arrayfire arrays from OpenCL Memory
+
+Arrayfire's interoperability functions don't limit us to working with memory
+managed by Arrayfire. We could take the reverse route and start with completely
+custom OpenCL code, then transfer our results into an af::array object. This is
+done rather simply with a special set of construction functions.
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
+cl::Buffer my_cl_buffer(context, CL_MEM_READ_WRITE, sizeof(float) * SIZE);
+//work and computations with OpenCL buffer
+
+//kernel(my_cl_buffer, queue);
+
+//construct af::array from OpenCL buffer
+af::array my_array = afcl::array(SIZE, my_cl_buffer(), f32);
+af_print(my_array);
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Please note: the \ref af::array constructors are not thread safe. 
+You may create and upload data to `cl_mem` objects from separate threads, 
+but the thread which instantiated ArrayFire must do the `cl_mem` to \ref af::array conversion.

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/arrayfire.git