[clfft] 124/128: documentation updates

Thu Oct 22 14:54:48 UTC 2015

This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a commit to branch master
in repository clfft.

commit 92997f506923aa6ab601f869695bf4ea72d2f20d
Author: bragadeesh <bragadeesh.natarajan at amd.com>
Date:   Mon Oct 19 16:12:15 2015 -0700

    documentation updates
---
 src/include/clFFT.h    |  12 +-
 src/library/mainpage.h | 650 ++++++++++++++++++++-----------------------------
 2 files changed, 269 insertions(+), 393 deletions(-)

diff --git a/src/include/clFFT.h b/src/include/clFFT.h
index 8c8dd3d..2b4305a 100644
--- a/src/include/clFFT.h
+++ b/src/include/clFFT.h
@@ -555,12 +555,12 @@ extern "C" {
 	 *  clFFT library incorporates the callback function string into the main FFT kernel. This function is used
 	 *  by client to set the necessary parameters for callback
 	 *  @param[in] plHandle Handle to a previously created plan
-	 *  @param[funcName] Callback function name
-	 *  @param[funcString] Callback function in string form
-	 *  @param[localMemSize] Optional - Size (bytes) of the local memory used by callback function; pass 0 if no local memory is used
-	 *  @param[callbackType] Type of callback - Pre-Callback or Post-Callback
-	 *  @param[userdata] Supplementary data if any used by callback function
-	 *  @param[numUserdataBuffers] Number of userdata buffers
+	 *  @param[in] funcName Callback function name
+	 *  @param[in] funcString Callback function in string form
+	 *  @param[in] localMemSize Optional - Size (bytes) of the local memory used by callback function; pass 0 if no local memory is used
+	 *  @param[in] callbackType Type of callback - Pre-Callback or Post-Callback
+	 *  @param[in] userdata Supplementary data if any used by callback function
+	 *  @param[in] numUserdataBuffers Number of userdata buffers
 	 */
 	CLFFTAPI clfftStatus clfftSetPlanCallback(clfftPlanHandle plHandle, const char* funcName, const char* funcString,
 										int localMemSize, clfftCallbackType callbackType, cl_mem *userdata, int numUserdataBuffers);
diff --git a/src/library/mainpage.h b/src/library/mainpage.h
index 060c1af..a0dc6b5 100644
--- a/src/library/mainpage.h
+++ b/src/library/mainpage.h
@@ -1,5 +1,5 @@
 /* ************************************************************************
- * Copyright 2013 Advanced Micro Devices, Inc.
+ * Copyright 2013-2015 Advanced Micro Devices, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -20,307 +20,181 @@ This file contains all documentation, no code, in the form of comment text.  It'
 chapter 1 of the documentation we produce with doxygen.  This included the title page, installation instructions
 and prose on the nature of FFT's and their use in our library.
 
- at mainpage OpenCL Fast Fourier Transforms (FFT's)
-
-The clFFT library is an OpenCL library implementation of discrete Fast Fourier Transforms. It:
- at li Provides a fast and accurate platform for calculating discrete FFTs.
- at li Works on CPU or GPU backends.
- at li Supports in-place or out-of-place transforms.
- at li Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1.
- at li Supports planar (real and complex components in separate arrays) and interleaved (real and complex
-components as a pair contiguous in memory) formats.
- at li Supports dimension lengths that can be any mix of powers of 2, 3, and 5.
- at li Supports single and double precision floating point formats.
-
- at section InstallFFT Installation of clFFT library
-
- at subsection DownBinaries Downloadable Binaries
-clFFT library pre-compiled packages for recent versions of Microsoft Windows operating systems
-and several flavors of Linux are available.
-
-The downloadable binary packages are freely available at
-https://github.com/clMathLibraries/clFFT/releases
-
-Once the appropriate package for the respective OS has finished downloading,
-uncompress the package using the native tools available on the platform in a
-directory of the user's choice. Everything needed to build a program using
-clFFT is included in the directory tree, including documentation, header files,
-binary library components, and sample programs for programming illustration.
-
- at subsubsection CMakeDependancy CMake
-After the clFFT package is uncompressed on the user's hard drive, a samples directory exists with source code,
-but no Visual Studio project files, Unix makefiles, or other native build system exist. Instead, it contains a
-\c CMakeLists.txt file. clFFT uses CMake as its build system, and other build files, such as Visual Studio projects,
-nmake makefiles, or Unix makefiles, are generated by the CMake build system, during configuration. CMake is freely
-available for download from: http://www.cmake.org/
-
- at note CMake generates the native OS build files, so any changes made to the native build files are overwritten the
-next time CMake is run.
-
-CMake is written to pull compiler information from environment variables, and to look in default install
-directories for tools. Once installed, a popular interface to control the process of creating native build
-files is CMake-gui. When the GUI is launched, two text boxes appear at the top of the dialog: a path to
-source and a separate path to generate binaries. For the \c browse source... box, find the path to where you
-unzipped clFFT, and select the root \c samples directory that contains the CMakeLists.txt; for clFFT,
-this should be \c clFFT/samples.  For \c browse \c build..., select an appropriate directory where the build
-environment generates build files; a convenient location is a sibling directory to the source. This makes
-it easy to wipe all the binaries and start a fresh build. For instance, for a debug configuration of NMake,
-an example directory could be \c clFFT/bin/NMakeDebug. This is where the generated makefile, native build
-files, and intermediate object files are built. These generated files are kept separate from the source;
-this is referred to as 'out-of-source' builds, and is very similar in concept to what 'autotools' does for Linux.
-To build using NMake, simply type NMake in the build directory containing the makefile. To build using
-Visual Studio, generate the solution and project files into a directory such as \c clFFT/bin/vs10, find the
-generated \c .sln file, and open the solution.
-
-The first time the \c configure button near the bottom of the screen is clicked, it causes CMake to prompt for
-what type of native build files to make. Various properties appear in red in the \c properties box. Red indicates
-that the value has changed since last time \c configure was clicked. (The first time configure is clicked,
-everything is red.) CMake tries to configure itself automatically to the client's system by looking at a systems
-environment variables and by searching through default install locations for project dependencies. Take a moment to
-verify the settings and paths that are displayed on the configuration screen; if any changes must be made, you can
-provide correct paths or adjust settings by typing directly into the CMake configuration screen. Click the
-\c configure button a second time to 'bake' those settings and serialize them to disk.
-
-Options relevant to the clFFT project include:
-
- at li \c 'AMDAPPSDKROOT': Location of the Stream SDK installation. This value is already populated if CMake
-could determine the location by looking at the environment variables. If not, the user must provide a path to
-the root installation of the Stream SDK here.
-
- at li \c 'BOOST_ROOT':  Location of the Boost SDK installation. This value is already populated if CMake could
-determine the location by looking at the environment variables or default install locations. If not, the user must
-provide a path to the root installation of the Stream SDK here. This dependency is only relevant to the sample
-client; the FFT library does not depend on Boost.
-
- at li \c 'CMAKE_BUILD_TYPE':  Defines the build type (default is debug). For Visual Studio projects, this does
-not appear (modifiable in IDE); for makefile-based builds, this is set in CMake.
-
- at li \c 'CMAKE_INSTALL_PREFIX':  The path to install all binaries and headers generated from the build. This is
-used when the user types \c make \c install or builds the INSTALL project in Visual Studio. All generated binaries and
-headers are copied into the path prefixed with \c CMAKE_INSTALL_PREFIX.  The Visual Studio projects are self
-explanatory, but a few other projects are autogenerated; these might be unfamiliar.
-
-The Visual Studio projects are self explanatory, but a few other projects are autogenerated; these might be unfamiliar.
-
- at li \c 'ALL_BUILD': A project that is empty of files, but since it depends on all user projects, it provides a
-convenient way to rebuild everything.
-
- at li \c 'ZERO_CHECK':  A CMake-specific project that checks to see if the generated solution and project files are in sync
-with the \c CMakeLists.txt file. If these files are modified, the solutions and projects are now out-of-sync, and this
-project prompts the user to regenerate their environment.
-
- at note If the user chooses to build on Windows with a NMake based build, it is important to launch CMake from within a
-Visual Studio Command Prompt (20xx).  This is because CMake must be able to parse environment variables to properly
-initialize NMake. This is not necessary if a Visual Studio solution is generated, because solution files contain their
-own environmental setup.
-
- at subsubsection BoostDependancy Boost
-clFFT includes one sample project that has source dependencies on Boost: the sample client project. Boost is
-freely available from:  http://www.boost.org/.
-
-The command-line clFFT sample client links with the \c program_options library, which provides functionality for
-parsing command-line parameters and \c .ini files in a cross-platform manner. Once Boost is downloaded and
-extracted on the hard drive, the \c program_options library must be compiled. The Boost build system
-uses the BJam builder (a project for a CMake-based Boost build is available for separate download). This is
-available for download from the Boost website, or the user can build BJam; Boost includes the source to BJam
-in its distribution, and the user can execute \c bootstrap.bat (located in the root boost directory) to build it.
-
-After BJam is either built or installed, an example BJam command-line is given below for building a 64-bit
-\c program_options binary, for both static and dynamic linking:
- at code
-bjam --with-program_options address-model=64 link=static,shared stage
- at endcode
-
-The last step to make boost readily available and usable by CMake and the native compiler is to add an environment
-variable to the system called \c BOOST_ROOT. In Windows, right click on the computer icon and go to
- at code
-'Properties|Advanced system settings|Advanced|Environment Variables...'
- at endcode
-Remember to relaunch any new processes that are open, in order to inherit the new environment variable. On Linux,
-consider modifying the \c .bash_rc file (or shell equivalent) to export a new environment variable every time you log in.
+ at mainpage OpenCL Fast Fourier Transforms (FFTs)
 
-If you are on a Linux system and have used a package manager to install Boost, you may have to confirm where the Boost
-\c include and \c library files have been placed. For example, after installing Boost with the Ubuntu Synaptic Package
-Manager, the Boost \c include files are in \c /usr/include/boost, and the library files either \c /usr/lib or \c /usr/lib64.
-The \c CMakeLists.txt file in this project defaults the \c BOOST_ROOT value to \c /usr on Linux; so, if the system is set up
-similarly, no further action is necessary. If the system is set up differently, you may have to set the \c BOOST_ROOT
-environmental variable accordingly.
+The clFFT library is an OpenCL library implementation of discrete Fast Fourier Transforms. The library:
+ at li provides a fast and accurate platform for calculating discrete FFTs.
+ at li works on CPU or GPU backends.
+ at li supports in-place or out-of-place transforms.
+ at li supports 1D, 2D, and 3D transforms with a batch size that can be greater than or equal to 1.
+ at li supports planar (real and complex components are stored in separate arrays) and interleaved (real and complex
+components are stored as a pair in the same array) formats.
+ at li supports lengths that are any combination of powers of 2, 3, 5, and 7.
+ at li supports single and double precision floating point formats.
 
- at note Note that CMake does not recognize version numbers at the end of the library filename; so, if the package
-manager only created a \c libboost_module_name.so.x.xx.x file (where x.xx.x is the version of Boost),
-the user may need to manually create a soft link called \c libboost_module_name.so to the versioned
-\c libboost_module_name.so.x.xx.x. See the clFFT binary artifacts in the install directory for an example.
 
 @section IntroFFT Introduction to clFFT
 
 The FFT is an implementation of the Discrete Fourier Transform (DFT) that makes use of symmetries in the FFT
 definition to reduce the mathematical intensity required from O(\f$N^2\f$) to O(\f$ N \log N\f$) when the
-sequence length, \c N, is the product of small prime factors.  Currently, there is no standard API for FFT
+sequence length, *N*, is the product of small prime factors.  Currently, there is no standard API for FFT
 routines. Hardware vendors usually provide a set of high-performance FFTs optimized for their systems:
 no two vendors employ the same interfaces for their FFT routines. clFFT provides a set of FFT routines that
-are optimized for AMD graphics processors, and that also functional across CPU and other compute devices.
+are optimized for AMD graphics processors, and that are also functional across CPU and other compute devices.
 
- at subsection SupportRadix Supported Radices
-clFFT supports powers of 2, 3 and 5 sizes. This means that the vector lengths that can be
-configured through a plan can be any length that is a power of two, three, and five; examples include \f$2^7, 2^1*3^1, 3^2*5^4, 2^2*3^3*5^5\f$,
+ at subsection SupportRadix Supported radices
+clFFT supports transform sizes that are powers of 2, 3, 5, and 7. This means that the vector lengths that can be
+configured can be of any length that is a combination of powers of two, three, five, and seven; examples include \f$2^7, 2^1*3^1, 3^2*5^4, 2^2*3^3*5^5\f$,
 up to the limit that the device can support.
 
- at subsection SizeLimit Transform Size Limits
-Currently, there is an upper bound on the transform size the library supports. This
-limit is \f$2^{24}\f$ for single precision and \f$2^{22}\f$ for double precision. This means that the
-product of transform lengths must not exceed these values. As an example, a
-1D single-precision FFT of size 1024 is valid since 1024 \f$<= 2^{24}\f$. Similarly, a 2D
-double-precision FFT of size 1024x1024 is also valid, since 1024*1024 \f$<= 2^{22}\f$.
-But, a 2D single-precision FFT of size 4096x8192 is not valid because
-4096*8192 > 224.
+ at subsection SizeLimit Transform size limits
+Currently, there is an upper bound on the transform size that the library can support for certain transforms. This
+limit is \f$2^{24}\f$ for real 1D single precision and \f$2^{22}\f$ for real 1D double precision.
 
 @subsection EnumDim Dimensionality
-clFFT currently supports FFTs of up to three dimensions, given by the enum \c clFFT-Dim. This enum
-is a required parameter into \c clfftCreateDefaultPlan() to create an initial plan; there is no default for
-this parameter. Depending on the dimensionality that the client requests, clFFT uses the formulations
-shown below to compute the DFT.
+clFFT currently supports FFTs of (up to) three dimensions, given by the enum @ref clfftDim. This enum
+is a required parameter of @ref clfftCreateDefaultPlan() to create an initial plan, where a plan is the collection of
+(almost) all the parameters needed to specify an FFT computation. For more information about clFFT plans, see the section \ref clFFTPlans.
+Depending on the dimensionality that the client requests, clFFT uses the following formulations to compute the DFT:
 
-The definition of a 1D complex DFT used by clFFT is given by:
+ at li For a 1D complex DFT
 \f[
 {\tilde{x}}_j = {{1}\over{scale}}\sum_{k=0}^{n-1}x_k\exp\left({\pm i}{{2\pi jk}\over{n}}\right)\hbox{ for } j=0,1,\ldots,n-1
 \f]
-where \f$x_k\f$ are the complex data to be transformed, \f$\tilde{x}_j\f$ are the transformed data, and the sign
-of \f$\pm\f$ determines the direction of the transform: \f$-\f$ for forward and \f$+\f$ for backward. Note that
-the user must provided the scaling factor.  Typically, the scale is set to 1 for forward transforms, and
-\f${{1}\over{N}}\f$ for backwards transforms.
+where, \f$x_k\f$ are the complex data to be transformed, \f$\tilde{x}_j\f$ are the transformed data, and the sign \f$\pm\f$
+determines the direction of the transform: \f$-\f$ for forward and \f$+\f$ for backward. Note that
+you must provide the scaling factor.  By default, the scale is set to 1 for forward transforms, and
+\f${{1}\over{N}}\f$ for backward transforms, where *N* is the size of transform.
 
-The definition of a complex 2D DFT used by clFFT is given by:
+ at li For a 2D complex DFT
 \f[
 {\tilde{x}}_{jk} = {{1}\over{scale}}\sum_{q=0}^{m-1}\sum_{r=0}^{n-1}x_{rq}\exp\left({\pm i} {{2\pi jr}\over{n}}\right)\exp\left({\pm i}{{2\pi kq}\over{m}}\right)
 \f]
-for \f$j=0,1,\ldots,n-1\hbox{ and } k=0,1,\ldots,m-1\f$, where \f$x_{rq}\f$ are the complex data to be transformed,
-\f$\tilde{x}_{jk}\f$ are the transformed data, and the sign of \f$\pm\f$ determines the direction of the
-transform.  Typically, the scale is set to 1 for forwards transforms and \f${{1}\over{M \cdot N}}\f$ for backwards transforms.
+for \f$j=0,1,\ldots,n-1\hbox{ and } k=0,1,\ldots,m-1\f$, where, \f$x_{rq}\f$ are the complex data to be transformed,
+\f$\tilde{x}_{jk}\f$ are the transformed data, and the sign \f$\pm\f$ determines the direction of the
+transform.  By default, the scale is set to 1 for forwards transforms and \f${{1}\over{MN}}\f$ for backwards transforms,
+where *M* and *N* are the 2D size of the transform.
 
-The definition of a complex 3D DFT used by clFFT is given by:
+ at li For a 3D complex DFT
 \f[
 \tilde{x}_{jkl} = {{1}\over{scale}}\sum_{s=0}^{p-1}\sum_{q=0}^{m-1}\sum_{r=0}^{n-1}
 x_{rqs}\exp\left({\pm i} {{2\pi jr}\over{n}}\right)\exp\left({\pm i}{{2\pi kq}\over{m}}\right)\exp\left({\pm i}{{2\pi ls}\over{p}}\right)
 \f]
 for \f$j=0,1,\ldots,n-1\hbox{ and } k=0,1,\ldots,m-1\hbox{ and } l=0,1,\ldots,p-1\f$, where \f$x_{rqs}\f$ are the complex data
-to be transformed, \f$\tilde{x}_{jkl}\f$ are the transformed data, and the sign of \f$\pm\f$ determines the direction of the
-transform.  Typically, the scale is set to 1 for forwards transforms and \f${{1}\over{M \cdot N \cdot P}}\f$ for backwards transforms.
+to be transformed, \f$\tilde{x}_{jkl}\f$ are the transformed data, and the sign \f$\pm\f$ determines the direction of the
+transform. By default, the scale is set to 1 for forwards transforms and \f${{1}\over{MNP}}\f$ for backwards transforms,
+where *M*, *N*, and *P* are the 3D size of the transform.
 
 @subsection InitLibrary Setup and Teardown of clFFT
-clFFT is initialized by a call to \c clfftSetup(), which must be called before any other API exported
-from clFFT. This allows the library to create resources used to manage the plans that are created and
-destroyed by the user. This API also takes a structure \c clfftInitSetupData that is initialized by the
-client to control the behavior of the library. The corresponding \c clfftTeardown() method must be called
-by the client when it is done using the library. This instructs clFFT to release all resources, including
-any acquired references to any OpenCL objects that may have been allocated or passed to it through the
-API.
+clFFT is initialized by the API @ref clfftSetup(), which must be called before any other API of
+clFFT. This allows the library to create resources needed to manage the plans that you create and
+destroy. This API also takes a structure @ref clfftInitSetupData() that is initialized by the
+client to control the behavior of the library.
+
+After you use the library, the @ref clfftTeardown() method must be called. This function instructs clFFT to release all resources allocated
+internally, and resets acquired references to any OpenCL objects.
 
 @subsection ThreadSafety Thread safety
-The clFFT API is designed to be thread-safe. It is safe to create plans from multiple threads, and to
-destroy those plans in separate threads. Multiple threads can call \c clfftEnqueueTransform() to place work
-into a command queue at the same time. clFFT does not provide a single-threaded version of the library.
-It is expected that the overhead of the synchronization mechanisms inside of clFFT thread safe is minor.
+The clFFT API is designed to be thread-safe. It is safe to create plans from multiple threads and to
+destroy those plans in separate threads. Multiple threads can call @ref clfftEnqueueTransform() to place work
+in a command queue at the same time. clFFT does not provide a single-threaded version of the library.
+The overhead of the synchronization mechanisms inside a clFFT thread safe is expected to be minor.
 
-Currently, multi-device operation must be managed by the user. OpenCL contexts can be created that are
+Currently, you must manage the multi-device operation. You can create OpenCL contexts that are
 associated with multiple devices, but clFFT only uses a single device from that context to transform
-the data. Multi-device operation can be managed by the user by creating multiple contexts, where each
-context contains a different device, and the user is responsible for scheduling and partitioning the work
+the data. You can manage a multi-device operation by creating multiple contexts, in which each
+context contains a different device; you are responsible for scheduling and partitioning the work
 across multiple devices and contexts.
 
- at subsection MajorFormat Row Major formats
+ at subsection MajorFormat Row major formats
 clFFT expects all multi-dimensional input passed to it to be in row-major format. This is compatible
-with C-based languages. However, clFFT is very flexible in the input and output data organization it
-accepts by allowing the user to specify a stride for each dimension. This feature can be used to process
-data in column major arrays, and other non-contiguous data formats. See \ref clfftSetPlanInStride and
-\ref clfftSetPlanOutStride.
+with C-based languages. However, clFFT is very flexible in the organization of input and output data, and it
+accepts input data by letting you specify a stride for each dimension. This feature can be used to process
+data in column major arrays and other non-contiguous data formats. See @ref clfftSetPlanInStride() and
+ at ref clfftSetPlanOutStride().
 
 @subsection Object OpenCL object creation
-OpenCL objects, such as contexts, \c cl_mem buffers, and command queues, are the responsibility of the
-user application to allocate and manage. All of the clFFT interfaces that must interact with OpenCL
-objects take those objects as references through the API. Specifically, the plan creation function
- at ref clfftCreateDefaultPlan() takes an OpenCL context as a parameter reference, increments the reference
-count on that object, and keeps the object alive until the corresponding plan has been destroyed through
-a call to @ref clfftDestroyPlan().
+Your application must allocate and manage OpenCL objects, such as contexts,  *cl_mem* buffers and command queues.
+All the clFFT interfaces that interact with OpenCL objects take those objects as references through the API.
+Specifically, the plan creation function @ref clfftCreateDefaultPlan() takes an OpenCL context as a parameter
+reference, increments the reference count on that object, and keeps the object alive until the corresponding plan
+is destroyed by a call to @ref clfftDestroyPlan().
 
 @subsection FlushQueue Flushing of command queues
-The clFFT API operates asynchronously, and with the exception of thread safety locking with multiple
+The clFFT API operates asynchronously; with the exception of thread safety locking with multiple
 threads, all APIs return immediately. Specifically, the @ref clfftEnqueueTransform() API does not
-explicitly flush the command queues that are passed by reference to it; it pushes the transform work onto the
+explicitly flush the command queues that are passed by reference to it. It pushes the transform work onto the
 command queues and returns the modified queues to the client. The client is free to issue its own blocking
-logic, using OpenCL synchronization mechanisms, or push further work onto the queue to continue processing.
+logic using OpenCL synchronization mechanisms or push further work onto the queue to continue processing.
 
- at section clFFTPlans clFFT Plans
+ at section clFFTPlans clFFT plans
 
-A plan is the collection of (almost) all of the parameters needed to specify an FFT computation.
-This includes:
+A plan is the collection of (almost) all the parameters needed to specify an FFT computation.
+A clFFT plan includes the following parameters:
 <ul>
-<li> What OpenCL context executes the transform?
-<li> Is this a 1D, 2D or 3D transform?
-<li> What are the lengths or extents of the data in each dimension?
-<li> How many datasets are being transformed?
-<li> What is the data precision?
-<li> Should a scaling factor be applied to the transformed data?
-<li> Does the output transformed data replace the original input data in the same buffer (or
-buffers), or is the output data written to a different buffer (or buffers).
-<li> How is the input data stored in its data buffers?
-<li> How is the output data stored in its data buffers?
+<li> The OpenCL context that executes the transform
+<li> Dimension of the transform (1D, 2D or 3D)
+<li> Length or extent of data in each dimension
+<li> Number of datasets that are transformed
+<li> Precision of the data
+<li> Scaling factor to the transformed data
+<li> In-place or Out-of-place transform
+<li> Format of the input data - interleaved, planar or real
+<li> Format of the output data - interleaved, planar or real
 </ul>
 
-The plan does not include:
+The clFFT plan does not include the following parameters:
 <ul>
 <li> The OpenCL handles to the input and output data buffers.
 <li> The OpenCL handle to a temporary scratch buffer (if needed).
-<li> Whether to execute a forward or reverse transform.
+<li> Direction of execution of the transform (forward or reverse transform).
 </ul>
-These are specified when the plan is executed.
+These parameters are specified when the plan is executed.
 
- at subsection Default Default Plan Values
+ at subsection Default Default plan values
 
-When a new plan is created by calling @ref clfftCreateDefaultPlan, its parameters are initialized as
+When a new plan is created by calling @ref clfftCreateDefaultPlan(), its parameters are initialized as
 follows:
 
 <ul>
-<li> Dimensions: as provided by the caller.
-<li> Lengths: as provided by the caller.
-<li> Batch size: 1.
-<li> Precision: \c CLFFT_SINGLE.
+<li> Dimensions: as provided by the caller
+<li> Lengths: as provided by the caller
+<li> Batch size: 1
+<li> Precision: *CLFFT_SINGLE*
 <li> Scaling factors:
     <ol>
-    <li> For the forward transform, the default is 1.0, or no scale factor is applied.
-    <li> For the reverse transform, the default is 1.0 / P, where P is the product of the FFT lengths.
+    <li> for the forward transform, the default is 1.0, or no scale factor is applied
+    <li> for the reverse transform, the default is 1.0 / P, where P is the product of the FFT lengths
     </ol>
-<li> Location: \c CLFFT_INPLACE.
-<li> Input layout: \c CLFFT_COMPLEX_INTERLEAVED.
+<li> Location: *CLFFT_INPLACE*
+<li> Input layout: *CLFFT_COMPLEX_INTERLEAVED*
 <li> Input strides: the strides of a multidimensional array of the lengths specified, where the data is
-compactly stored using the row-major convention.
-<li> Output layout: \c CLFFT_COMPLEX_INTERLEAVED.
-<li> Output strides: same as input strides.
+compactly stored using the row-major convention
+<li> Output layout: *CLFFT_COMPLEX_INTERLEAVED*
+<li> Output strides: same as input strides
 </ul>
 
 Writing client programs that depend on these initial values is <b> not </b> recommended.
 
- at subsection EnumLayout Supported Memory Layouts
-There are two main families of Discrete Fourier Transform (DFT):
-<ul>
-<li> Routines for the transformation of complex data. clFFT supports two layouts to store complex numbers:
-a 'planar' format, where the real and imaginary components are kept in separate arrays:
+ at subsection EnumLayout Supported memory layouts
+There are two main types of Discrete Fourier Transform (DFT) in clFFT:
 <ol>
-	<li> Buffer1: \c RRRRR
-	<li> Buffer2: \c IIIII
-</ol>
-and an interleaved format, where the real and imaginary components are stored as contiguous pairs:
-<ol>
-	<li> Buffer1: \c RIRIRIRIRIRI
-</ol>
-<li> Routines for the transformation of real to complex data and vice versa; clFFT provides enums to define
-these formats. For transforms involving real data, there are two possibilities:
+<li> Transformation of complex data - clFFT supports the following two layouts to store complex numbers:
 <ul>
-<li> Real data being subject to forward FFT transform that results in complex
-data.
-<li> Complex data being subject to backward FFT transform that results in
-real data. See the Section "FFTs of Real Data".
+  <li> Planar format - where the real and imaginary components are kept in separate arrays: \n
+	   Buffer1: **RRRRR**  \n
+	   Buffer2: **IIIII**
+  <li> Interleaved format - where the real and imaginary components are stored as contiguous pairs:  \n
+	   Buffer1: **RIRIRIRIRIRI**
 </ul>
+<li> Transformation of real to complex data and vice versa - clFFT provides enums to define these formats.
+For transforms involving real data, there are two possibilities:
+<ul>
+<li> Real data being subject to forward FFT transform that results in complex data.
+<li> Complex data being subject to backward FFT transform that results in
+real data. See the section \ref RealFFT.
 </ul>
+</ol>
 
 @subsubsection DistanceStridesandPitches Strides and Distances
 For one-dimensional data, if clStrides[0] = strideX = 1, successive elements in the first dimension are stored contiguously
@@ -337,151 +211,152 @@ row-major or column-major arrays. Data can be extracted from arrays of structure
 data storage pattern can be accommodated.
 
 Distance is the amount of memory that exists between corresponding elements
-in an FFT primitive in a batch. Distance is measured in the units of the FFT
+in an FFT primitive in a batch. Distance is measured in units of the FFT
 primitive; complex data measures in complex units, and real data measures in
-real data. Stride between tightly packed elements is 1 in either case. Typically,
+real units. Stride between tightly packed elements is 1 in either case. Typically,
 one can measure the distance between any two elements in a batch primitive,
 be it 1D, 2D, or 3D data. For tightly packed data, the distance between FFT
 primitives is the size of the FFT primitive, such that dist=LenX for 1D data,
 dist=LenX*LenY for 2D data, and dist=LenX*LenY*LenZ for 3D data. It is
 possible to set the distance of a plan to be less than the size of the FFT vector;
 most often 1 for this case. When computing a batch of 1D FFT vectors, if
-distance == 1, and strideX == length( vector ), a transposed output is produced
-for a batch of 1D vectors. It is left to the user to verify that the distance and
-strides are valid (not intersecting); if not valid, undefined results can occur.
+distance == 1, and strideX == length(vector), a transposed output is produced
+for a batch of 1D vectors. You must verify that the distance and
+strides are valid (not intersecting); if not valid, undefined results may occur.
 
-A simple example is to perform a 1D length 4096 on each row of an array of 1024 rows x 4096 columns of
+A simple example would be to perform a 1D length 4096 on each row of an array of 1024 rows x 4096 columns of
 values stored in a column-major array, such as a FORTRAN program might provide. (This would be equivalent
-to a C or C++ program that had an array of 4096 rows x 1024 columns stored in a row-major manner, and
-you wanted to perform a 1-D length 4096 transform on each column.) In this case, specify the strides
+to a C or C++ program that has an array of 4096 rows x 1024 columns stored in a row-major manner, on which
+you want to perform a 1-D length 4096 transform on each column.) In this case, specify the strides
 [1024, 1].
 
-For a more complex example, an input buffer contained a raster grid of 1024 x 1024 monochrome pixel
-values, and you want to compute a 2D FFT for each 64 x 64 subtile of the grid. Specifying strides
+A more complex example would be to compute a 2D FFT for each 64 x 64 subtile of the grid that has an input
+buffer with a raster grid of 1024 x 1024 monochrome pixel values. Specifying strides
 allows you to treat each horizontal band of 1024 x 64 pixels as an array of 16 64 x 64 matrixes,
-and process an entire band with a single call to @ref clfftEnqueueTransform. (Specifying strides is not
+and process an entire band with a single call @ref clfftEnqueueTransform(). (Specifying strides is not
 quite flexible enough to transform the entire grid of this example with a single kernel execution.)
 It is possible to create a Plan to compute arrays of 64 x 64 2D FFTs, then specify three strides:
 [1, 1024, 64]. The first stride, 1, indicates that the rows of each matrix are stored consecutively;
 the second stride, 1024, gives the distance between rows, and the third stride, 64, defines the
-distance from matrix to matrix. Then call @ref clfftEnqueueTransform 16 times: once for each
+distance between two matrices. Then call @ref clfftEnqueueTransform() 16 times – once for each
 horizontal band of pixels.
 
- at subsection EnumPrecision Supported Precisions in clFFT
-Both \c CLFFT_SINGLE and \c CLFFT_DOUBLE precisions are supported by the library
-for all supported radices. With both of these enums the host computer's math
-functions are used to produce tables of sines and cosines for use by the OpenCL
-kernel.
-
-Both \c CLFFT_SINGLE_FAST and \c CLFFT_DOUBLE_FAST are meant to generate faster
-kernels with reduced accuracy, but are disabled in the current build..
+ at subsection EnumPrecision Supported precisions in clFFT
+Both *CLFFT_SINGLE* and *CLFFT_DOUBLE* precisions are supported by the library
+for all supported radices. For both these enums the math functions of the host computer are used to
+produce the sine and cosine tables that are used by the OpenCL kernel.
+Both *CLFFT_SINGLE_FAST* and *CLFFT_DOUBLE_FAST* generate faster kernels with reduced accuracy,
+but are disabled in the current build.
 
-See @ref clfftPrecision, @ref clfftSetPlanPrecision, and @ref clfftGetPlanPrecision.
+See @ref clfftPrecision, @ref clfftSetPlanPrecision(), and @ref clfftGetPlanPrecision().
 
 @subsection FftDirection clfftDirection
-The direction of the transform is not baked into the plan; the same plan can be used to specify both forward
-and backward transforms. Instead, @ref clfftDirection is passed as a parameter into @ref clfftEnqueueTransform.
-
- at subsection EnumResultLocation In-Place and Out-of-Place
-The clFFT API supports both in-place and out-of-place transforms. With inplace
-transforms, only input buffers are provided to the @ref clfftEnqueueTransform() API,
-and the resulting data is written in the same buffers, overwriting the input data.
+The direction of the transform is not baked into the plan for complex transforms; the same plan can be used to specify both forward
+and backward transforms. To specify the direction, @ref clfftDirection is passed as a parameter into @ref clfftEnqueueTransform().
+In the case of real transforms, the plan's input and output layouts determine the direction.
+
+ at subsection EnumResultLocation In-place and out-of-place transforms
+The clFFT API supports both in-place and out-of-place transforms. With in-place
+transforms, only the input buffers are provided to the @ref clfftEnqueueTransform() API,
+and the resulting data is written in the same buffer, overwriting the input data.
 With out-of-place transforms, distinct output buffers are provided to the
- at ref clfftEnqueueTransform() API, and the inputdata is preserved.
-In-place transforms require that the \c cl_mem objects the client
-creates have both \c read and \c write permissions. This is given in the nature of the
-in-place algorithm. Out-of-place transforms require that the destination buffers
-have \c read and \c write permissions, but input buffers can still be created with
+ at ref clfftEnqueueTransform() API, and the input data is preserved.
+In-place transforms require the *cl_mem* objects the client
+creates have both read and write permissions. This is given in the nature of the
+in-place algorithm. Out-of-place transforms require that the destination buffers to
+have read and write permissions, but input buffers can still be created with
 read-only permissions. This is a clFFT requirement because internally the
 algorithms may go back and forth between the destination buffers and internally
 allocated temp buffers. For out-of-place transforms, clFFT never writes back
-to the input buffers.
+to input buffers.
 
 @subsection clFFTEff Batches
 The efficiency of clFFT is improved by utilizing transforms in batches. Sending
 as much data as possible in a single transform call leverages the parallel
 compute capabilities of OpenCL devices (and GPU devices in particular), and
-minimizes the penalty of transfer overhead. It's best to think of an OpenCL device
+minimizes the penalty of transfer overhead. It is best to think of an OpenCL device
 as a high-throughput, high-latency device. Using a networking analogy as an
-example, it's similar to having a massively high-bandwidth pipe with very high
+example, this approach is similar to having a massively high-bandwidth pipe with very high
 ping response times. If the client is ready to send data to the device for compute,
-it should be sent in as few API calls as possible. This can be done by batching.
-clFFT plans have a parameter to describe the number of transforms being
-batched: @ref clfftSetPlanBatchSize(), and to describe how those batches are
-laid out and spaced in memory: @ref clfftSetPlanDistance(). 1D, 2D, or 3D
-transforms can be batched.
+it should be sent in as few API calls as possible and this can be done by batching.
+clFFT plans have a parameter @ref clfftSetPlanBatchSize() to describe the number of transforms being
+batched, and another parameter @ref clfftSetPlanDistance() to describe how those batches are
+laid out and spaced in memory. 1D, 2D, or 3D transforms can be batched.
 
- at section Outline  Using clFFT on a Client Application
+ at section Outline  Using clFFT in a client application
 
-To perform FFT calculations using clFFT, the client program must:
+To perform FFT calculations using clFFT, the client program must perform the following tasks:
 <ul>
-	<li> Initialize the library by calling @ref clfftSetup. </li>
-	<li> For each distinct type of FFT needed: </li>
+	<li> Initialize the library by calling @ref clfftSetup(). </li>
+	<li> For each distinct type of FFT needed:
 	<ol>
-		<li> Create an FFT Plan object. This usually is done by calling the factory function @ref clfftCreateDefaultPlan.
-		Some of the most fundamental parameters are specified at this time, and others assume default values.  The OpenCL
-		context must be provided when the plan is created; it cannot be changed. Another way is to call @ref clfftCopyPlan.
-		In either case, the function returns an opaque handle to the Plan object. </li>
-		<li> Complete the specification of all of the Plan parameters by calling the various parameter-setting functions,
-		\c clAmdFFtSet_____. </li>
-		<li> Optionally, "bake" or finalize the plan, calling @ref clfftBakePlan. This signals to the library the end
-		of the specification phase, and causes it to generate and compile the exact OpenCL kernels needed to perform the
-		specified FFT on the OpenCL device provided.
-
-		At this point, all performance-enhancing optimizations are applied, possibly including executing benchmark kernels
-		on the OpenCL device context in order to maximize runtime performance.
-
-		Although this step is optional, most users probably want to include it so that they can control when this work is
-		done. Usually, this time consuming step is done when the application is initialized. If the user does not call
-		@ref clfftBakePlan, this work is done during the first call to @ref clfftEnqueueTransform.
+    <li> Create an FFT Plan object. To create an FFT Plan object, do either of the following. In both cases,
+	the function returns an opaque handle to the plan object.
+		<ul>
+			<li>Call the factory function @ref clfftCreateDefaultPlan() and specify the value of the
+		   		most fundamental parameters, such as plHandle, context, dim, and clLengths, while other parametes assume
+		   		default values.  The OpenCL context must be provided when the plan is created; it cannot be changed. </li>
+		 	<li>Call @ref clfftCopyPlan(). </li>
+		</ul>
+    <li> Complete the specification of all the Plan parameters by calling various parameter-setting functions that
+		     have the prefix *clfftSet*. </li>
+	  <li> Optionally, "bake" or finalize the plan by calling @ref clfftBakePlan() function. This signals the library the end
+		     of the specification phase, and causes it to generate and compile the exact OpenCL kernels that perform the
+		     specified FFT on the provided OpenCL device.
+
+		     At this point, all performance-enhancing optimizations are applied, possibly including executing benchmark kernels
+		     on the OpenCL device context to maximize runtime performance.
+
+		     Although the last step is optional, it is recommended to use it so that you can have control on when to do this work.
+			   Usually, this time consuming step is done when the application is initialized. If you do not call
+			   @ref clfftBakePlan(), this work is done during the first call of @ref clfftEnqueueTransform().
 		</li>
 	</ol>
 
-	<li> The OpenCL FFT kernels now are ready to execute as many times as needed. </li>
+	<li> Execute the OpenCL FFT kernels as many times as needed. </li>
 	<ol>
-		<li>  Call @ref clfftEnqueueTransform. At this point, specify whether you want to execute a forward or reverse
-		transform; also, provide the OpenCL \c cl_mem handles for the input buffer(s), output buffer(s)--unless you want
-		the transformed data to overwrite the input buffers, and (optionally) scratch buffer.
-
-		@ref clfftEnqueueTransform performs one or more calls to the OpenCL function clEnqueueNDRangeKernel.
-		Like clEnqueueNDRangeKernel, @ref clfftEnqueueTransform is a non-blocking call. The commands to
-		execute the FFT compute kernel(s) are added to the OpenCL context queue to be executed asynchronously.
-		An OpenCL event handle is returned to the caller. If multiple NDRangeKernel operations are queued,
-		the final event handle is returned.
+		<li>  Call @ref clfftEnqueueTransform(). At this point, specify whether you want to execute a forward or reverse
+		      transform; also, provide the OpenCL *cl_mem* handles for the input buffer(s), output buffer(s)--unless you want
+		      the transformed data to overwrite the input buffers, and (optionally) scratch buffer.
+
+		      @ref clfftEnqueueTransform() performs one or more calls to the OpenCL function clEnqueueNDRangeKernel.
+		      Like clEnqueueNDRangeKernel, @ref clfftEnqueueTransform() is a non-blocking call. The commands to
+		      execute the FFT compute kernel(s) are added to the OpenCL context queue to be executed asynchronously.
+		      An OpenCL event handle is returned to the caller. If multiple NDRangeKernel operations are queued,
+		      the final event handle is returned.
 		</li>
-		<li>  The application now can add additional OpenCL tasks to the OpenCL context's queue. For example, if the
-		next step in the application's process is to apply a filter to the transformed data, the application would generate
-		that clEnqueueNDRangeKernel, specifying the transform's output buffer(s) as the input to the filter kernel,
-		and providing the transform's event handle to ensure proper synchronization. </li>
-		<li>  If the application must access the transformed data directly, it must call one of the OpenCL functions
-		for synchronizing the host computer's execution with the OpenCL device (for example: clFinish()). </li>
+		<li>  Add any application OpenCL tasks to the OpenCL context queue. For example, if the
+			  next step in the application process is to apply a filter to the transformed data, the application calls
+		      clEnqueueNDRangeKernel, and specifies its output buffer(s) as the input to the filter kernel,
+		      and provides the transform's event handle to ensure proper synchronization. </li>
+		<li>  If the application accessed the transformed data directly, it calls one of the OpenCL functions
+		      for synchronizing the host computer execution with the OpenCL device (for example: clFinish()). </li>
 	</ol>
-	<li> Terminate the library by calling @ref clfftTeardown.
+	<li> Terminate the library by calling @ref clfftTeardown().
 </ul>
 
- at section RealFFT  FFTs of Real Data
+ at section RealFFT  FFTs of real data
 
-When real data is subject to DFT transformation, the resulting complex output
+When real data is subject to DFT transformation, the resulting complex output data
 follows a special property. About half of the output is redundant because they are
 complex conjugates of the other half. This is called the Hermitian redundancy.
 So, for space and performance considerations, it is only necessary to store the
 non-redundant part of the data. Most FFT libraries use this property to offer
-specific storage layouts for FFTs involving real data. clFFT provides 3
+specific storage layouts for FFTs involving real data. clFFT provides three
 enumerated types to deal with real data FFTs:
 
 <ul>
-	<li> \c CLFFT_REAL
-	<li> \c CLFFT_HERMITIAN_INTERLEAVED
-	<li> \c CLFFT_HERMITIAN_PLANAR
+	<li> *CLFFT_REAL*
+	<li> *CLFFT_HERMITIAN_INTERLEAVED*
+	<li> *CLFFT_HERMITIAN_PLANAR*
 </ul>
 
-The first enum specifies that the data is purely real. This can be used to feed
-real input or get back real output. The second and third enums specify layouts
-for storing FFT output. They are similar to the corresponding full complex enums
-in the way they store real and imaginary components. The difference is that they
-store only about half of the complex output. Client applications can do just a
-forward transform and analyze the output. Or they can do some processing of
-the output and do a backward transform to get back real data. This is illustrated
+The *CLFFT_REAL* enum specifies that the data is purely real. This can be used to feed
+real input or get back real output. The *CLFFT_HERMITIAN_INTERLEAVED* and
+*CLFFT_HERMITIAN_PLANAR* enums are similar to the corresponding full complex enums
+in the way they store real and imaginary components, but store only about half of the
+complex output. Client applications can do just a forward transform and analyze the output
+or they can process the output and do a backward transform to get back real data. This is illustrated
 in the following figure.
 
 @image html realfft_fwdinv.jpg "Forward and Backward Transform Processes"
@@ -491,7 +366,7 @@ following figure.
 
 @image html realfft_1dlen.jpg "1D Real FFT of Length N"
 
-Here, C* denotes the complex conjugate of. Since the values at indices greater
+Here, C* denotes the complex conjugate. Since the values at indices greater
 than N/2 can be deduced from the first half of the array, clFFT stores data
 only up to the index N/2. This means that the output contains only 1 + N/2
 complex elements, where the division N/2 is rounded down. Examples for even
@@ -513,7 +388,7 @@ the output complex numbers are stored, with the index ranging from 0 through 3.
 For 2D and 3D FFTs, the FFT length along the least dimension is used to
 compute the (1 + N/2) value. This is because the FFT along the least dimension
 is what is computed first and is logically a real-to-hermitian transform. The FFTs
-along other dimensions are computed afterwards; they are simply 'complex-tocomplex'
+along other dimensions are computed afterwards; they are simply 'complex-to-complex'
 transforms. For example, assuming clLengths[2] is used to set up a 2D
 real FFT, let N1 = clLengths[1], and N0 = clLengths[0]. The output FFT has
 N1*(1 + N0/2) complex elements. Similarly, for a 3D FFT with clLengths[3] and
@@ -525,25 +400,25 @@ N2*N1*(1 + N0/2) complex elements.
 Out-of-place transforms:
 
 <ul>
-	<li> \c CLFFT_REAL to \c CLFFT_HERMITIAN_INTERLEAVED
-	<li> \c CLFFT_REAL to \c CLFFT_HERMITIAN_PLANAR
-	<li> \c CLFFT_HERMITIAN_INTERLEAVED to \c CLFFT_REAL
-	<li> \c CLFFT_ CLFFT_HERMITIAN_PLANAR to \c CLFFT_REAL
+	<li> *CLFFT_REAL* to *CLFFT_HERMITIAN_INTERLEAVED*
+	<li> *CLFFT_REAL* to *CLFFT_HERMITIAN_PLANAR*
+	<li> *CLFFT_HERMITIAN_INTERLEAVED* to *CLFFT_REAL*
+	<li> *CLFFT_ CLFFT_HERMITIAN_PLANAR* to *CLFFT_REAL*
 </ul>
 
 In-place transforms:
 
 <ul>
-	<li> \c CLFFT_REAL to \c CLFFT_HERMITIAN_INTERLEAVED
-	<li> \c CLFFT_HERMITIAN_INTERLEAVED to \c CLFFT_REAL
+	<li> *CLFFT_REAL* to *CLFFT_HERMITIAN_INTERLEAVED*
+	<li> *CLFFT_HERMITIAN_INTERLEAVED* to *CLFFT_REAL*
 </ul>
 
 @subsection ExplicitStrides Setting strides
 
-The library currently <b> requires the user to explicitly set input and output strides for real transforms.</b> See
-the following examples to understand what values to use for input and output strides under different scenarios. The
-examples only show typical usages. The user has flexibility in allocating their buffers and laying out data according
-to their needs.
+The library currently <b> requires you to explicitly set input and output strides for real transforms.</b> See
+the following examples to understand what values to use for input and output strides under different scenarios. These
+examples show typical usages, but you can allocate the buffers and layout data according
+to your need.
 
 @subsection RealExamples Examples
 
@@ -561,34 +436,35 @@ FFT features of this library.
 
 @section Callbacks  clFFT Callbacks
 
-Callback feature of clFFT provides ability to invoke user provided OpenCL inline functions from within FFT kernel, 
-to do custom processing of input or output data. The inline OpenCL function is passed as a string to the library 
-which would then be incorporated into the generated FFT kernel.This helps in avoiding additional kernel launches 
+The callback feature of clFFT has the ability to invoke user provided OpenCL inline functions from within FFT kernel
+to custom process the input or output data. The inline OpenCL function is passed as a string to the library
+which is incorporated into the generated FFT kernel. This helps to avoid additional kernel launches
 to carry out the pre/post processing tasks.
 
-There are 2 types of callback; Pre-callback and Post-callback. Pre-callback invokes user callback function to do 
-custom pre-processing of input data before FFT is executed. Post-callback invokes user callback function to do custom 
-post-processing of output data after FFT is executed.
+There are 2 types of callback; Pre-callback and Post-callback. Pre-callback invokes user callback function to
+perform custom  pre-processing of the input data before FFT is executed,. Post-callback invokes user callback function to
+perform custom post-processing of the output data after FFT is executed.
 
-The current release of clFFT includes Pre-callback feature. Post-callback will be supported in a future release.
+The current release of clFFT includes Pre-callback feature. Post-callback will be supported in future release.
 
 @subsection CallbackWorkflow Callback Workflow
 
 The workflow of FFT execution using callback feature of clFFT is as follows
 
 <ol>
-	<li> Create clFFT Plan and initialize standard clFFT parameters
-	<li> Use \c clfftSetPlanCallback API to register the callback function with library
+	<li> Create clFFT Plan and initialize standard clFFT parameters.
+	<li> Use @ref clfftSetPlanCallback() API to register the callback function with library
 		@code
-		clfftStatus clFFTSetPlanCallback(clfftPlanHandle plHandle, 
-											const char* funcName, 
-											const char* funcString, 
-											int localMemSize, 
-											clfftCallbackType callbackType, 
+		clfftStatus clfftSetPlanCallback(clfftPlanHandle plHandle,
+											const char* funcName,
+											const char* funcString,
+											int localMemSize,
+											clfftCallbackType callbackType,
 											void *userdata,
 											int numUserdataBuffers)
 		@endcode
-		The library uses the arguments passed to this API, including callback function string, to stitch the callback code into the generated FFT kernel. The arguments for clfftSetPlanCallback are
+		The library uses the arguments passed to this API, including callback function string, to stitch the callback code
+		into the generated FFT kernel. The arguments for clfftSetPlanCallback are
 		<ul>
 			<li> clFFT plan handle
 			<li> Name of the callback function
@@ -599,29 +475,29 @@ The workflow of FFT execution using callback feature of clFFT is as follows
 			<li> Number of user data buffers
 		</ul>
 	<li> Invoke Bake Plan step
-	<li> Library inserts the callback code into main FFT kernel during bake plan and compiles it. If there are any
-	compilation errors caused by syntax or incompatible callback function prototype, the failure is reported to user.
+	<li> Library inserts the callback code into the main FFT kernel during bake plan and compiles it. If there are any
+	compilation errors caused by syntax or incompatible callback function prototype, the library reports failure.
 	<li> Enqueue clFFT transform
 </ol>
 
-The caller is responsible to provide a callback function that matches the function prototype based on the type of 
+The caller is responsible to provide a callback function that matches the function prototype based on the type of
 callback(pre/post), type of transform(real/complex) and whether LDS is used. The bake plan step does the function prototype checking.
 
 @subsection CallbackFunctionPrototype Callback Function Prototypes
 
 clFFT expects the callback function to be of a specific prototype depending on the type of callback(pre/post), type of transform(real/complex)
-and whether LDS is used. These are as following.
+and whether LDS is used. These are as follows:
 
 @subsubsection PrecallbackProtyotype Pre-callback Prototypes
 
- FFT Type                               | Function Prototype 
+ FFT Type                               | Function Prototype
 ----------------------------------------| -------------
-C2C/C2R – Interleaved Single Precision  | \c Without \c LDS <br />float2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata) <br /> \c With \c LDS <br />float2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem) 
-C2C/C2R – Interleaved Double Precision  | \c Without \c LDS <br />double2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata) <br /> \c With \c LDS <br />double2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
-C2C – Planar Single Precision			| \c Without \c LDS <br />float2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata)<br /> \c With \c LDS <br />float2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, int inoffset, __global void *userdata, __local void *localmem)
-C2C – Planar Double Precision			| \c Without \c LDS <br />double2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata)<br /> \c With \c LDS <br />double2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata, __local void *localmem)
-R2C Single Precision					| \c Without \c LDS <br />float  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata)<br /> \c With \c LDS <br />float  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
-R2C Double Precision					| \c Without \c LDS <br />double  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata)<br /> \c With \c LDS <br />double  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
+C2C/C2R – Interleaved Single Precision  | Without LDS <br />float2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata) <br /> With LDS <br />float2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
+C2C/C2R – Interleaved Double Precision  | Without LDS <br />double2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata) <br /> With LDS <br />double2  <precallback_func>  (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
+C2C – Planar Single Precision			| Without LDS <br />float2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata)<br /> With LDS <br />float2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, int inoffset, __global void *userdata, __local void *localmem)
+C2C – Planar Double Precision			| Without LDS <br />double2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata)<br /> With LDS <br />double2  <precallback_func>  (  __global void *inputRe, __global void *inputIm, uint inoffset, __global void *userdata, __local void *localmem)
+R2C Single Precision					| Without LDS <br />float  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata)<br /> With LDS <br />float  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
+R2C Double Precision					| Without LDS <br />double  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata)<br /> With LDS <br />double  <precallback_func>   (  __global void *input, uint inoffset, __global void *userdata, __local void *localmem)
 
 
 Parameters
@@ -630,50 +506,50 @@ Parameters
 	<li> \c inputRe : The base pointer of the “Real” input buffer for Planar C2C transforms
 	<li> \c inputIm : The base pointer of the “Imaginary” part input buffer for Planar C2C transforms
 	<li> \c inoffset : Index of the current element  of the input buffer from the start
-	<li> \c userdata : Buffer containing optional caller specified data. The userdata pointer is useful 
-	for passing any supplementary data to the callback function. For example, buffer having convolution 
-	filter data or any scalar value. The userdata can be of any custom data type/structure, in which case, 
-	the user has to declare the custom data type and include it along with the callback function string. 
-	<li> \c localmem : Pointer to local memory. This memory is allocated by library based on the size specified
-	by user and subject to local memory availability
+	<li> \c userdata : Buffer containing optional caller specified data. The userdata pointer is useful
+	for passing any supplementary data to the callback function. For example, buffer having convolution
+	filter data or any scalar value. The userdata can be of any custom data type/structure, in which case,
+	you have to declare the custom data type and include it along with the callback function string. </li>
+	<li> \c localmem : Pointer to local memory. This memory is allocated by library based on the size you specify
+	and is subjected to local memory availability. </li>
 </ul>
 
-For Planar C2C, the return type of callback is a vector (float2/double2) whose elements contain the result for Real 
+For Planar C2C, the return type of callback is a vector (float2/double2) whose elements contain the result for Real
 and Imaginary as computed in the callback
 
 @subsection SamplePrecallbackCode Sample Pre-Callback Code
 
 @code
 //**************************************************************************
-//* Step 1 : Store the callback function in a string 
+//* Step 1 : Store the callback function in a string.
 //**************************************************************************
-const char* precallbackstr = “float2 mulval(__global void* in, 
-                                  uint inoffset, 
-                                  __global void* userdata, 
-                                  __local void* localmem) 					 \n
-				{															 \n
-				int scalar = *((__global int*)userdata + offset);			 \n			
-				float2 ret = *((__global float2*)(float2) + offset) * scalar;\n 
-				return ret; 												 \n
-				} \n”;												
-
-				
+const char* precallbackstr = "float2 mulval(__global void* in,        \n
+                                  uint inoffset,                      \n
+                                  __global void* userdata,            \n
+                                  __local void* localmem)             \n
+				{                                                             \n
+				int scalar = *((__global int*)userdata + offset);             \n
+				float2 ret = *((__global float2*)(float2) + offset) * scalar; \n
+				return ret;                                                   \n
+				}                                                             \n";
+
+
 //**************************************************************************
-//* Step 2 : Initialize arguments if any required by the callback 
+//* Step 2 : Initialize arguments if any required by the callback.
 //**************************************************************************
 int h_userdata[N] = {  };
 cl_mem userdata = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * N,  (void*)h_userdata, NULL);
 
 
 //**************************************************************************
-//* Step 3 : Register the callback
+//* Step 3 : Register the callback.
 //**************************************************************************
 
 status = clfftSetPlanCallback(plan_handle, "mulval", precallbackstr, 0, PRECALLBACK, &userdata, 1);
 
 
 //**************************************************************************
-//* Step 4 : Bake plan and enqueue transform
+//* Step 4 : Bake plan and enqueue transform.
 //**************************************************************************
 status = clfftBakePlan( plan_handle, 1, &queue, NULL, NULL );
 
@@ -685,9 +561,9 @@ status = clfftEnqueueTransform( plan_handle, dir, 1, &queue, 0, NULL, &outEvent,
 
 <ol>
 	<li> The caller is responsible to provide a callback function in string form that matches the function prototype based on the type of callback, type of transform(real/complex) and whether LDS is used
-	<li> clFFT considers the value returned by pre-callback function to be the new value of the input at the index corresponding to the \c inoffset argument
-	<li> Pre-callback function can request for local memory for its own use. If the requested amount of local memory is available on the device, clFFT will pass a pointer to it when it invokes the callback function
-	<li> clFFT may invoke FFT kernels several times depending on the input parameters. However the pre-callback function provided by caller, will be invoked only once for each point in the input
+	<li> clFFT considers the value returned by pre-callback function as the new value of the input at the index corresponding to the *inoffset* argument
+	<li> Pre-callback function can request for local memory for its own use. If the requested amount of local memory is available on the device, clFFT passes a pointer to the local memory when it invokes the callback function
+	<li> clFFT may invoke FFT kernels several times depending on the input parameters. However the pre-callback function provided by caller is invoked only once for each point in the input
 </ol>
 
  */

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clfft.git