[arrayfire] 57/61: DOC corrections, proper linking and syntaxes

Tue Dec 8 11:55:12 UTC 2015

This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to branch dfsg-clean
in repository arrayfire.

commit 11830298366b856f143200e80c0406a0bab0e584
Author: Shehzan Mohammed <shehzan at arrayfire.com>
Date:   Fri Dec 4 11:21:00 2015 -0500

    DOC corrections, proper linking and syntaxes
---
 docs/pages/interop_cuda.md    | 70 +++++++++++++++++++++++++------------------
 docs/pages/interop_opencl.md  | 66 +++++++++++++++++++++++-----------------
 docs/pages/timing.md          |  4 +--
 docs/pages/unified_backend.md |  2 ++
 docs/pages/vectorization.md   | 34 ++++++++++++++++-----
 5 files changed, 110 insertions(+), 66 deletions(-)

diff --git a/docs/pages/interop_cuda.md b/docs/pages/interop_cuda.md
index 5ce92d2..a131bbc 100644
--- a/docs/pages/interop_cuda.md
+++ b/docs/pages/interop_cuda.md
@@ -10,7 +10,6 @@ native CUDA commands. In this tutorial we are going to talk about how to use nat
 CUDA memory operations and integrate custom CUDA kernels into ArrayFire in a seamless fashion.
 
 # In and Out of Arrayfire
-
 First, let's consider the following code and then break it down bit by bit.
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
@@ -37,9 +36,10 @@ int main() {
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 ## Breakdown
-Most kernels require an input. In this case, we created a random uniform array **x**.
-We also go ahead and prepare the output array. 
-The necessary memory required is allocated in array **y** before the kernel launch.
+Most kernels require an input. In this case, we created a random uniform array `x`.
+We also go ahead and prepare the output array. The necessary memory required is
+allocated in array `y` before the kernel launch.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     af::array x = randu(num);
     af::array y = randu(num);
@@ -47,14 +47,16 @@ The necessary memory required is allocated in array **y** before the kernel laun
 
 In this example, the output is the same size as in the input. Note that the actual
 output data type is not specified. For such cases, ArrayFire assumes the data type
-is single precision floating point ( af::f32 ). If necessary, the data type can
-be specified at the end of the array(..) constructor. Once you have the input and
-output arrays, you will need to extract the device pointers / objects using 
-array::device() method in the following manner.
+is single precision floating point (\ref af::f32). If necessary, the data type can be
+specified at the end of the array(..) constructor. Once you have the input and
+output arrays, you will need to extract the device pointers / objects using
+af::array::device() method in the following manner.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     float *d_x = x.device<float>();
     float *d_y = y.device<float>();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 Accesing the device pointer in this manner internally sets a flag prohibiting the
 arrayfire object from further managing the memory. Ownership will need to be
 returned to the af::array object once we are finished using it.
@@ -64,18 +66,23 @@ returned to the af::array object once we are finished using it.
     // y = sin(x)^2 + cos(x)^2
     launch_simple_kernel(d_x, d_y, num);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The function **launch_simple_kernel** handles the launching of your custom kernel.
+
+The function `launch_simple_kernel` handles the launching of your custom kernel.
 We will have a look at how to do this in CUDA later in the post.
 
-Once you have finished your computations, you have to tell ArrayFire to take 
+Once you have finished your computations, you have to tell ArrayFire to take
 control of the memory objects.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     x.unlock();
     y.unlock();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This is a very crucial step as ArrayFire believes the user is still in control 
+
+This is a very crucial step as ArrayFire believes the user is still in control
 of the pointer. This means that ArrayFire will not perform garbage collection on
-these objects resulting in memory leaks. You can now proceed with the rest of the program.
+these objects resulting in memory leaks. You can now proceed with the rest of the
+program.
+
 In our particular example, we are just performing an error check and exiting.
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
@@ -88,38 +95,43 @@ In our particular example, we are just performing an error check and exiting.
 # Launching a CUDA kernel
 Arrayfire provides a collection of CUDA interoperability functions for additional
 capabilities when working with custom CUDA code. To use them, we need to include
-the appropriate header.
+the cuda.h header.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 #include <af/cuda.h>
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The first thing these headers allow us to do are to get and set the active device
 using native CUDA device ids. This is achieved through the following functions:
-> **static int getNativeId (int id)** 
-> -- Get the native device id of the CUDA device with **id** in the ArrayFire context.
 
-> **static void setNativeId (int nativeId)**  
-> -- Set the CUDA device with given native **id** as the active device for ArrayFire.
+> `static int afcu::getNativeId (int id)`
+> -- Get the native device id of the CUDA device with `id` in the ArrayFire context.
+
+> `static void afcu::setNativeId (int nativeId)`
+> -- Set the CUDA device with given native `id` as the active device for ArrayFire.
+
 The headers also allow us to retrieve the CUDA stream used internally inside Arrayfire.
-> **static cudaStream_t afcu::getStream(int id)**  
-> -- Get the stream for the CUDA device with **id** in ArrayFire context.
-These functions are available within the afcu:: namespace and equal C variants 
-can be fund in the full [cuda interop documentation.](\ref cuda_mat.htm)
+
+> `static cudaStream_t afcu::getStream(int id)`
+> -- Get the stream for the CUDA device with `id` in ArrayFire context.
+
+These functions are available within the \ref afcu namespace and equal C variants
+can be found in the full [af/cuda.h documentation](\ref cuda_mat).
 
 To integrate a CUDA kernel into an ArrayFire code base, we first need to get the
 CUDA stream associated with arrayfire. Once we have this stream, we need to make
 sure Arrayfire is done with all computation before we can call our custom kernel
-to avoid out of order execution. We can do this with some variant of 
-**cudaStreamQuery(af_stream)** or **cudaStreamSynchronize(af_stream)** or instead,
+to avoid out of order execution. We can do this with some variant of
+`cudaStreamQuery(af_stream)` or `cudaStreamSynchronize(af_stream)` or instead,
 we could add our kernel launch to Arrayfire's stream as shown below. Once we get
 the associated stream, all that is left is setting up the usual launch configuration
 parameters, launching the kernel and wait for the computations to finish:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
- 
- __global__
- static void simple_kernel(float *d_y,
-                           const float *d_x,
-                           const int num)
+__global__
+static void simple_kernel(float *d_y,
+                          const float *d_x,
+                          const int num)
 {
     const int id = blockIdx.x * blockDim.x + threadIdx.x;
 
@@ -143,7 +155,7 @@ void inline launch_simple_kernel(float *d_y,
     const int threads = 256;
     const int blocks = (num / threads) + ((num % threads) ? 1 : 0);
 
-    // execute kernel on Arrayfire's stream, 
+    // execute kernel on Arrayfire's stream,
     // ensuring all previous arrayfire operations complete
     simple_kernel<<<blocks, threads, 0, af_stream>>>(d_y, d_x, num);
 }
diff --git a/docs/pages/interop_opencl.md b/docs/pages/interop_opencl.md
index 93361d0..74c7167 100644
--- a/docs/pages/interop_opencl.md
+++ b/docs/pages/interop_opencl.md
@@ -3,10 +3,10 @@ Interoperability with OpenCL {#interop_opencl}
 
 As extensive as ArrayFire is, there are a few cases where you are still working
 with custom [CUDA] (@ref interop_cuda) or [OpenCL] (@ref interop_opencl) kernels.
-For example, you may want to integrate ArrayFire into an existing code base for 
+For example, you may want to integrate ArrayFire into an existing code base for
 productivity or you may want to keep it around the old implementation for testing
-purposes. Arrayfire provides a number of functions that allow it to work alongside 
-native OpenCL commands. In this tutorial we are going to talk about how to use 
+purposes. Arrayfire provides a number of functions that allow it to work alongside
+native OpenCL commands. In this tutorial we are going to talk about how to use
 native OpenCL memory operations and custom OpenCL kernels alongside ArrayFire
 in a seamless fashion.
 
@@ -38,9 +38,10 @@ int main() {
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 ## Breakdown
-Most kernels require an input. In this case, we created a random uniform array **x**.
+Most kernels require an input. In this case, we created a random uniform array `x`
 We also go ahead and prepare the output array. The necessary memory required is
-allocated in array **y** before the kernel launch.
+allocated in array `y` before the kernel launch.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     af::array x = randu(num);
     af::array y = randu(num);
@@ -48,15 +49,17 @@ allocated in array **y** before the kernel launch.
 
 In this example, the output is the same size as in the input. Note that the actual
 output data type is not specified. For such cases, ArrayFire assumes the data type
-is single precision floating point ( af::f32 ). If necessary, the data type can
+is single precision floating point (\ref af::f32). If necessary, the data type can
 be specified at the end of the array(..) constructor. Once you have the input and
-output arrays, you will need to extract the device pointers / objects using 
-array::device() method in the following manner.
+output arrays, you will need to extract the device pointers / objects using
+af::array::device() method in the following manner.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     float *d_x = x.device<float>();
     float *d_y = y.device<float>();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Accesing the device pointer in this manner internally sets a flag prohibiting 
+
+Accesing the device pointer in this manner internally sets a flag prohibiting
 the arrayfire object from further managing the memory. Ownership will need to be
 returned to the af::array object once we are finished using it.
 
@@ -65,18 +68,21 @@ returned to the af::array object once we are finished using it.
     // y = sin(x)^2 + cos(x)^2
     launch_simple_kernel(d_x, d_y, num);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The function **launch_simple_kernel** handles the launching of your custom kernel.
+
+The function `launch_simple_kernel` handles the launching of your custom kernel.
 We will have a look at the specific functions Arrayfire provides to interface with
-OpenCL later in the post. 
+OpenCL later in the post.
 
 Once you have finished your computations, you have to tell ArrayFire to take control
 of the memory objects.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     x.unlock();
     y.unlock();
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 This is a very crucial step as ArrayFire believes the user is still in control
-of the pointer. This means that ArrayFire will not perform garbage collection 
+of the pointer. This means that ArrayFire will not perform garbage collection
 on these objects resulting in memory leaks. You can now proceed with the rest of
 the program. In our particular example, we are just performing an error check and exiting.
 
@@ -89,9 +95,10 @@ the program. In our particular example, we are just performing an error check an
 
 ## Launching an OpenCL kernel
 If you are integrating an OpenCL kernel into your ArrayFire code base you will
-need several additional steps to access Arrayfire's internal OpenCL context. 
-Once you have access to the same context ArrayFire is using, the rest of the 
+need several additional steps to access Arrayfire's internal OpenCL context.
+Once you have access to the same context ArrayFire is using, the rest of the
 process is exactly the same as launching a stand alone OpenCL context.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 void inline launch_simple_kernel(float *d_y,
                                  const float *d_x,
@@ -122,6 +129,7 @@ void inline launch_simple_kernel(float *d_y,
     return;
 }
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 First of all, to access to OpenCL and the interoperability functions we need to
 include the appropriate headers.
 
@@ -129,25 +137,28 @@ include the appropriate headers.
 #include <af/opencl.h>
 #include <CL/cl.hpp>
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The **opencl.h** header includes a number of functions for getting and setting
-the context, queue, and device ids used internally in Arrayfire. There are also
-a number of methods to construct an af::array from an OpenCL cl_mem buffer object.
-There are both C and C++ versions of these functions, and the C++ versions are
-wrapped inside the afcl:: namespace. See full datails of these functions in the
-[opencl interop documentation.] (\ref opencl_mat)
+
+The opencl.h header includes a number of functions for getting and setting the
+context, queue, and device ids used internally in Arrayfire. There are also a
+number of methods to construct an af::array from an OpenCL `cl_mem` buffer
+object.  There are both C and C++ versions of these functions, and the C++
+versions are wrapped inside the \ref afcl namespace. See full datails of these
+functions in the [af/opencl.h documentation] (\ref opencl_mat).
 
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 cl::Context context(afcl::getContext(true));
 cl::CommandQueue queue(afcl::getQueue(true));
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We start to use these functions by getting Arrayfire's context and queue. For the
-C++ api, a **true** flag must be passed for the retain parameter which calls the
-clRetainQueue() and clRetainContext() functions before returning. This allows us
-to use Arrayfire's internal OpenCL structures inside of the cl::Context and
-cl::CommandQueue objects from the C++ api. Once we have them, we can proceed to 
-set up and enqueue the kernel like we would in any other OpenCL program. 
-The kernel we are using is actually simple and can be seen below.
+
+We start to use these functions by getting Arrayfire's context and queue. For
+the C++ api, a `true` flag must be passed for the retain parameter which calls
+the `clRetainQueue()` and `clRetainContext()` functions before returning. This
+allows us to use Arrayfire's internal OpenCL structures inside of the
+cl::Context and cl::CommandQueue objects from the C++ api. Once we have them,
+we can proceed to set up and enqueue the kernel like we would in any other
+OpenCL program.  The kernel we are using is actually simple and can be seen
+below.
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 std::string CONST_KERNEL_STRING = R"(
@@ -169,7 +180,6 @@ void simple_kernel(__global float *d_y,
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 # Reversing the workflow: Arrayfire arrays from OpenCL Memory
-
 Unfortunately, Arrayfire's interoperability functions don't yet allow us to work with
 external OpenCL contexts. This is currently an open issue and can be tracked here:
 https://github.com/arrayfire/arrayfire/issues/1002.
diff --git a/docs/pages/timing.md b/docs/pages/timing.md
index 675043f..4949c4e 100644
--- a/docs/pages/timing.md
+++ b/docs/pages/timing.md
@@ -60,5 +60,5 @@ int main() {
 
 This produces:
 
-	pi_function took 0.007252 seconds
-	(test machine: Core i7 920 @ 2.67GHz with a Tesla C2070)
+    pi_function took 0.007252 seconds
+    (test machine: Core i7 920 @ 2.67GHz with a Tesla C2070)
diff --git a/docs/pages/unified_backend.md b/docs/pages/unified_backend.md
index e11f75d..67e340f 100644
--- a/docs/pages/unified_backend.md
+++ b/docs/pages/unified_backend.md
@@ -51,6 +51,7 @@ DYLD_LIBRARY_PATH.
 
 On Windows, you can set up a post build event that copys the NVVM dlls to
 the executable directory by using the following commands:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
 echo copy "$(CUDA_PATH)\nvvm\bin\nvvm64*.dll" "$(OutDir)"
 copy "$(CUDA_PATH)\nvvm\bin\nvvm64*.dll" "$(OutDir)"
@@ -59,6 +60,7 @@ if errorlevel 1 (
     exit /B 0
 )
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 This ensures that the NVVM DLLs are copied if present, but does not fail the
 build if the copy fails. This is how ArrayFire ships it's examples.
 
diff --git a/docs/pages/vectorization.md b/docs/pages/vectorization.md
index cb4f529..8805ecf 100644
--- a/docs/pages/vectorization.md
+++ b/docs/pages/vectorization.md
@@ -15,6 +15,7 @@ arrays as a whole -- on all elements in parallel. Wherever possible, existing
 vectorized functions should be used opposed to manually indexing into arrays.
 For example, consider this valid, yet mislead code that attempts to increment
 each element of an array:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array a = af::range(10); // [0,  9]
 for(int i = 0; i < a.dims(0); ++i)
@@ -24,6 +25,7 @@ for(int i = 0; i < a.dims(0); ++i)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Instead, the existing vectorized Arrayfire overload of the + operator should have been used:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array a = af::range(10);  // [0,  9]
 a = a + 1;                    // [1, 10]
@@ -37,19 +39,22 @@ Operator Category                                           | Functions
 [Complex operations](\ref complex_mat)                      | real(), imag(), conj(), etc.
 [Exponential and logarithmic functions](\ref explog_mat)    | exp(), log(), expm1(), log1p(), etc.
 [Hyperbolic functions](\ref hyper_mat)                      | sinh(), cosh(), tanh(), etc.
-[Logical operations](\ref logic_mat)                        | [&&](\ref arith_func_and), [\|\|](\ref arith_func_or), [<](\ref arith_func_lt), [>](\ref arith_func_gt), [==](\ref arith_func_eq), [!=](\ref arith_func_neq) etc.
+[Logical operations](\ref logic_mat)                        | [&&](\ref arith_func_and), \|\|[(or)](\ref arith_func_or), [<](\ref arith_func_lt), [>](\ref arith_func_gt), [==](\ref arith_func_eq), [!=](\ref arith_func_neq) etc.
 [Numeric functions](\ref numeric_mat)                       | abs(), floor(), round(), min(), max(), etc.
 [Trigonometric functions](\ref trig_mat)                    | sin(), cos(), tan(), etc.
 
-Not only elementwise arithmetic operations are vectorized in Arrayfire.
+In addition to element-wise operations, many other functions are also
+vectorized in Arrayfire.
 
 Vector operations such as min() support vectorization:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array arr = randn(100);
 std::cout << min<float>(arr) << std::endl;
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Signal processing functions like convolve() support vectorization:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 float g_coef[] = { 1, 2, 1,
                    2, 4, 2,
@@ -62,6 +67,7 @@ af::array conv = convolve2(signal, filter);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Image processing functions such as rotate() support vectorization:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array imgs = randu(WIDTH, HEIGHT, 100); // 100 (WIDTH x HEIGHT) images
 af::array rot_imgs = rotate(imgs, 45); // 100 rotated images
@@ -72,18 +78,19 @@ algebra functions. Using the built in vectorized operations should be the first
 and preferred method of vectorizing any code written with Arrayfire.
 
 # GFOR: Parallel for-loops
-Another novel method of vectorization present in Arrayfire is the GFOR loop 
+Another novel method of vectorization present in Arrayfire is the GFOR loop
 replacement construct. GFOR allows launching all iterations of a loop in parallel
-on the GPU or device, as long as the iterations are independent. While the 
+on the GPU or device, as long as the iterations are independent. While the
 standard for-loop performs each iteration sequentially, ArrayFire's gfor-loop
 performs each iteration at the same time (in parallel). ArrayFire does this by
-tiling out the values of all loop iterations and then performing computation on 
+tiling out the values of all loop iterations and then performing computation on
 those tiles in one pass. You can think of gfor as performing auto-vectorization
 of your code, e.g. you write a gfor-loop that increments every element of a vector
 but behind the scenes ArrayFire rewrites it to operate on the entire vector in
 parallel.
 
 We can remedy our first example with GFOR:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array a = af::range(10);
 gfor(seq i, n)
@@ -92,14 +99,17 @@ gfor(seq i, n)
 
 To see another example, you could run an accum() on every slice of a matrix in a
 for-loop, or you could "vectorize" and simply do it all in one gfor-loop operation:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 for (int i = 0; i < N; ++i)
    B(span,i) = accum(A(span,i)); // runs each accum() in sequence
 gfor (seq i, N)
    B(span,i) = accum(A(span,i)); // runs N accums in parallel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 However, returning to our previous vectorization technique, accum() is already
 vectorized and the operation could be completely replaced with merely:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
     B = accum(A);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -110,6 +120,7 @@ in the narrow case of broadcast-style operations. Consider the case when we have
 a vector of constants that we wish to apply to a collection of variables, such as
 expressing the values of a linear combination for multiple vectors. The broadcast
 of one set of constants to many vectors works well with gfor-loops:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 const static int p=4, n=1000;
 af::array consts = af::randu(p);
@@ -128,29 +139,36 @@ functions to multiple sets of data. Effectively, batchFunc() allows Arrayfire
 functions to execute in "batch processing" mode. In this mode, functions will
 find a dimension which contains "batches" of data to be processed and will
 parallelize the procedure.
+
 Consider the following example:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array filter = randn(1, 5);
 af::array weights = randu(5, 5);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 We have a filter that we would like to apply to each of several weights vectors.
 The naive solution would be using a loop as we've seen before:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array filtered_weights = constant(0, 5, 5);
 for(int i=0; i<weights.dims(1); ++i){
     filtered_weights.col(i) = filter * weights.col(i);
 }
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 However we would like a vectorized solution. The following syntax begs to be used:
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array filtered_weights = filter * weights; // fails due to dimension mismatch
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-but it fails due to the (5x1), (5x5) dimension mismatch. Wouldn't it be nice if
+
+This fails due to the (5x1), (5x5) dimension mismatch. Wouldn't it be nice if
 Arrayfire could figure out along which dimension we intend to apply the batch
 operation? That is exactly what batchFunc() does!
 The signature of the function is:
 
-> array batchFunc(const array &lhs, const array &rhs, batchFunc_t func);
+`array batchFunc(const array &lhs, const array &rhs, batchFunc_t func);`
 
 where __batchFunc_t__ is a function pointer of the form:
 `typedef array (*batchFunc_t) (const array &lhs, const array &rhs);`
@@ -159,6 +177,7 @@ where __batchFunc_t__ is a function pointer of the form:
 So, to use batchFunc(), we need to provide the function we will be applying as a
 batch operation. For illustration's sake, let's "implement" a multiplication
 function following the format.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array my_mult (const af::array &lhs, const af::array &rhs){
     return lhs * rhs;
@@ -167,6 +186,7 @@ af::array my_mult (const af::array &lhs, const af::array &rhs){
 
 Our final batch call is not much more difficult than the ideal
 syntax we imagined.
+
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
 af::array filtered_weights = batchFunc( filter, weights, my_mult );
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/arrayfire.git