[clblas] branch master updated (9731ea2 -> 27ab572)

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Tue Oct 27 08:02:08 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a change to branch master
in repository clblas.

      from  9731ea2   Merge pull request #119 from TimmyLiu/master
       new  6d7dcf8   bump develop branch version to 2.7
       new  63ca259   Merge pull request #120 from TimmyLiu/develop
       new  64d0ba3   add w9100 performance
       new  7f01bdf   Removed printfs in case of invalid matrices from SYMM
       new  edb7a50   Changed the order of matrix-size checking to A-B-C (consistent with other routines)
       new  4367daa   Guarded debug printf statements in several level 1 and 2 routines
       new  5b5eb03   Merge pull request #126 from CNugteren/errorcheck_order
       new  9b14e88   Updated the computation of the matrix-buffer memory used in case of no unused tail
       new  d114844   Simplified the calculation of the memory used for matrices
       new  ce984ae   Merge pull request #127 from CNugteren/mem_used_calculation
       new  d3d36e0   fix sgemm NT perf drop when lda=ldb=6144 and k>1536
       new  458c9da   fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=4096 or 5120 and k>lda/4
       new  f3a10ab   fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=7168 or 8192 and k>lda/4
       new  5a74faf   code clean up
       new  ecd89f9   Merge pull request #133 from TimmyLiu/develop
       new  45bb325   add missing includes on stdlib
       new  dcf60db   Merge pull request #134 from ghisvail/bugfix/missing-stdlib
       new  549df17   Fix redefinition warnings when using with clFFT
       new  8dd05f7   Merge pull request #135 from shehzan10/redef_fixes
       new  f496d1c   typo fix
       new  4b9a341   adding performance data
       new  40098f4   adding auto-gemm script
       new  33c5ca0   Merge pull request #138 from guacamoleo/develop-squash2
       new  b8ed4fd   release cl program
       new  4b34283   updating README
       new  2c5ab03   Merge pull request #139 from guacamoleo/develop
       new  e7e01ad   AutoGemm performance data; sgemm add unroll=8 for benchmarking; gemm compile kernel prints build log
       new  f6ae9ac   AutoGemm KernelOpenCL can generate standalone kernels
       new  644df17   Merge pull request #140 from guacamoleo/develop
       new  acc6889   added numQueues to performance data
       new  a6bdc3d   Merge pull request #141 from guacamoleo/develop
       new  0a08f16   Integrating new travis and appveyor build yaml scripts
       new  f496afa   Fixed badge links for appveyor clmathlibraries project 5
       new  c47ef12   Merge pull request #143 from kknox/ci
       new  ba1bbdd   dtrsm 192 trtri
       new  31c9214   mod192 dtrsm using dtrtri
       new  4d67a9e   enable big dgemm with split calls
       new  afe8fc0   enable output result with -p 1 in client
       new  d6e6a78   dtrsm reenablment 192
       new  c4e7964   bug fix
       new  4067d14   fix linux build
       new  5ee9e5f   dtrsm lower left
       new  8e9bc4d   dtrsm right side
       new  a08507d   attempt to fix macos build
       new  18404c0   Merge pull request #144 from TimmyLiu/develop
       new  4f204b2   finished dtrsm offline compile dev
       new  1ffeb0f   fix linux build
       new  7cbbf9c   Merge pull request #145 from TimmyLiu/develop
       new  7877094   fix VS 2015 build
       new  ccb8bec   Merge pull request #146 from TimmyLiu/develop
       new  55921b5   fix a install issue
       new  8299035   Merge branch 'develop' of https://github.com/TimmyLiu/clBLAS into develop
       new  a459976   Merge pull request #147 from TimmyLiu/develop
       new  705d16e   add dtrsm perf (clblas 2.7.1) on w9100 with 14502 driver
       new  b942250   add k40 cublas 7.5 dtrsm data
       new  c9a02b8   Merge pull request #148 from TimmyLiu/develop
       new  d0b106e   adding cublas data
       new  f40ed62   Merge pull request #149 from guacamoleo/develop
       new  ef47dda   add link to windows master branch badge
       new  feadbbb   Merge pull request #150 from TimmyLiu/develop
       new  0482e1c   merged develop to master; bumped version to 2.8.0
       new  8b5f7a0   Merge pull request #151 from guacamoleo/master
       new  b68e8bd   Update .travis.yml
       new  16973bf   Update appveyor.yml
       new  16744bf   fix 'array initializer must be an initializer list', https://github.com/clMathLibraries/clBLAS/issues/153
       new  c56c725   Fix https://github.com/clMathLibraries/clBLAS/issues/159 , teardown/setup when using autogemm causes next call to gemm to fail (segfault)
       new  27ab572   Merge pull request #163 from hughperkins/fix-teardown

The 67 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |    3 +
 .travis.yml                                        |  168 ++-
 README.md                                          |   57 +-
 appveyor.yml                                       |  105 ++
 .../S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 .../S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv   |   60 +-
 .../S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv    |  360 ++---
 .../S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 .../clBLAS_2.6.0/{S9150 => W9100}/README.txt       |    2 +-
 .../W9100/clblas_sgemmNT_w9100_14502.csv           |  181 +++
 .../{S9150/sgemm_32.csv => W9100/dgemm_32.csv}     |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv     |  120 +-
 .../dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv}  |   60 +-
 .../Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv   |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv      |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv     |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv     |  180 +--
 .../S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 ...as271_w9100_dtrsm_col_left_lower_unit_14502.csv |   31 +
 ...as271_w9100_dtrsm_col_left_upper_unit_14502.csv |   31 +
 ...s271_w9100_dtrsm_col_right_lower_unit_14502.csv |   31 +
 ...s271_w9100_dtrsm_col_right_upper_unit_14502.csv |   31 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv     |   60 +-
 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv     |  360 ++---
 .../cublas75_k40_dtrsm_col_left_lower_unit.csv     |   31 +
 .../cublas75_k40_dtrsm_col_left_upper_unit.csv     |   31 +
 .../cublas75_k40_dtrsm_col_right_lower_unit.csv    |   31 +
 .../cublas75_k40_dtrsm_col_right_upper_unit.csv    |   31 +
 .../cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv        |  721 +++++++++
 .../Tesla_K40/peak_dp.csv                          |    0
 .../Tesla_K40/peak_sp.csv                          |    0
 src/CMakeLists.txt                                 |   87 +-
 src/client/clfunc_common.hpp                       |   18 +-
 src/client/clfunc_xgemm.hpp                        |  192 ++-
 src/client/clfunc_xgemv.hpp                        |   22 +-
 src/client/clfunc_xger.hpp                         |   16 +-
 src/client/clfunc_xgerc.hpp                        |   12 +-
 src/client/clfunc_xgeru.hpp                        |   12 +-
 src/client/clfunc_xhemm.hpp                        |   34 +-
 src/client/clfunc_xhemv.hpp                        |   12 +-
 src/client/clfunc_xher.hpp                         |   10 +-
 src/client/clfunc_xher2.hpp                        |   12 +-
 src/client/clfunc_xher2k.hpp                       |   20 +-
 src/client/clfunc_xherk.hpp                        |   20 +-
 src/client/clfunc_xsymm.hpp                        |   58 +-
 src/client/clfunc_xsymv.hpp                        |   12 +-
 src/client/clfunc_xsyr.hpp                         |   10 +-
 src/client/clfunc_xsyr2.hpp                        |   12 +-
 src/client/clfunc_xsyr2k.hpp                       |   34 +-
 src/client/clfunc_xsyrk.hpp                        |   32 +-
 src/client/clfunc_xtrmm.hpp                        |   48 +-
 src/client/clfunc_xtrmv.hpp                        |   14 +-
 src/client/clfunc_xtrsm.hpp                        |   50 +-
 src/client/clfunc_xtrsv.hpp                        |   14 +-
 src/client/client.cpp                              |   12 +-
 src/include/msvc.h                                 |    2 +
 src/library/CMakeLists.txt                         |  477 +++++-
 src/library/OCLBinaryGenerator.cmake               |   86 ++
 src/library/bingen.cmake                           |    1 +
 src/library/blas/AutoGemm/.gitignore               |    4 +
 src/library/blas/AutoGemm/AutoGemm.py              |   47 +
 src/library/blas/AutoGemm/AutoGemmParameters.py    |  149 ++
 .../AutoGemmTools/AutoGemmPreCompileKernels.cpp    |  925 ++++++++++++
 .../AutoGemm/AutoGemmTools/AutoGemmUtil.h}         |   54 +-
 .../AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp     | 1392 ++++++++++++++++++
 .../blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp   |  995 +++++++++++++
 src/library/blas/AutoGemm/Common.py                |   60 +
 src/library/blas/AutoGemm/Includes.py              |  494 +++++++
 src/library/blas/AutoGemm/KernelOpenCL.py          |  587 ++++++++
 src/library/blas/AutoGemm/KernelParameters.py      |  253 ++++
 src/library/blas/AutoGemm/KernelSelection.py       |  683 +++++++++
 src/library/blas/AutoGemm/KernelsToPreCompile.py   |   91 ++
 src/library/blas/AutoGemm/README.txt               |    0
 .../UserGemmKernelSources/UserGemmClKernels.h      |   23 +
 .../UserGemmKernelSourceIncludes.cpp               |   57 +
 .../UserGemmKernelSourceIncludes.h                 |   80 +
 .../dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp       |  203 +++
 .../dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp       |  203 +++
 .../dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp       |  196 +++
 .../dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp       |  193 +++
 .../dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp       |  195 +++
 .../dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp       |  195 +++
 .../sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp       |  129 ++
 .../sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp       |  160 ++
 .../sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp       |  208 +++
 ...sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp |  149 ++
 .../sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp       |  129 ++
 .../sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp       |  161 +++
 .../sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp       |  207 +++
 .../sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp       |  126 ++
 .../sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp       |  210 +++
 ...sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp |  148 ++
 ...sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp |  158 ++
 .../sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp       |  126 ++
 .../sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp   |  161 +++
 .../sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp   |  157 ++
 .../sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp       |  160 ++
 .../sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp       |  208 +++
 .../sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp       |  290 ++++
 .../sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp       |  128 ++
 .../sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp       |  209 +++
 ...sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp |  148 ++
 .../sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp       |  127 ++
 .../sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp       |  209 +++
 src/library/blas/functor/functor.cc                |    3 +-
 src/library/blas/functor/hawaii.cc                 |   19 +
 .../blas/functor/hawaii_sgemmBig1024Kernel.cc      |  506 +++++++
 .../blas/functor/hawaii_sgemmSplitKernel.cc        |  147 ++
 ...mBranchKernel.h => hawaii_sgemmBig1024Kernel.h} |   18 +-
 src/library/blas/generic/binary_lookup.cc          |    6 +-
 src/library/blas/generic/common.c                  |   16 +-
 .../blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl |  264 ++++
 src/library/blas/gens/clTemplates/zgemm_gcn.cl     |    2 +-
 src/library/blas/include/xgemm.h                   |   39 +
 src/library/blas/init.c                            |    7 +
 src/library/blas/ixamax.c                          |   16 +-
 src/library/blas/specialCases/GemmSpecialCases.cpp |  994 +++++++++++++
 .../blas/specialCases/include/GemmSpecialCases.h   |   42 +
 src/library/blas/trtri/TrtriClKernels.h            |   44 +
 .../blas/trtri/TrtriKernelSourceIncludes.cpp       |   81 ++
 src/library/blas/trtri/TrtriKernelSourceIncludes.h |  124 ++
 .../blas/trtri/diag_dtrtri_lower_128_16.cpp        |  172 +++
 .../blas/trtri/diag_dtrtri_upper_128_16.cpp        |  151 ++
 .../blas/trtri/diag_dtrtri_upper_192_12.cpp        |  149 ++
 .../trtri/triple_dgemm_update_128_16_PART1_L.cpp   |  161 +++
 .../trtri/triple_dgemm_update_128_16_PART2_L.cpp   |  143 ++
 .../blas/trtri/triple_dgemm_update_128_16_R.cpp    |  239 +++
 .../trtri/triple_dgemm_update_128_32_PART1_L.cpp   |  150 ++
 .../trtri/triple_dgemm_update_128_32_PART1_R.cpp   |  151 ++
 .../trtri/triple_dgemm_update_128_32_PART2_L.cpp   |  135 ++
 .../trtri/triple_dgemm_update_128_32_PART2_R.cpp   |  136 ++
 .../trtri/triple_dgemm_update_128_64_PART1_L.cpp   |  145 ++
 .../trtri/triple_dgemm_update_128_64_PART1_R.cpp   |  145 ++
 .../trtri/triple_dgemm_update_128_64_PART2_L.cpp   |  133 ++
 .../trtri/triple_dgemm_update_128_64_PART2_R.cpp   |  134 ++
 .../triple_dgemm_update_128_ABOVE64_PART1_L.cpp    |  146 ++
 .../triple_dgemm_update_128_ABOVE64_PART1_R.cpp    |  144 ++
 .../triple_dgemm_update_128_ABOVE64_PART2_L.cpp    |  134 ++
 .../triple_dgemm_update_128_ABOVE64_PART2_R.cpp    |  135 ++
 .../triple_dgemm_update_128_ABOVE64_PART3_L.cpp    |   91 ++
 .../triple_dgemm_update_128_ABOVE64_PART3_R.cpp    |   94 ++
 .../blas/trtri/triple_dgemm_update_192_12_R.cpp    |  194 +++
 .../trtri/triple_dgemm_update_192_24_PART1_R.cpp   |  117 ++
 .../trtri/triple_dgemm_update_192_24_PART2_R.cpp   |  112 ++
 .../trtri/triple_dgemm_update_192_48_PART1_R.cpp   |  144 ++
 .../trtri/triple_dgemm_update_192_48_PART2_R.cpp   |  145 ++
 .../trtri/triple_dgemm_update_192_96_PART1_R.cpp   |  156 ++
 .../trtri/triple_dgemm_update_192_96_PART2_R.cpp   |  157 ++
 src/library/blas/xasum.c                           |   16 +-
 src/library/blas/xaxpy.c                           |    6 +
 src/library/blas/xcopy.c                           |    6 +
 src/library/blas/xdot.c                            |   20 +-
 src/library/blas/xgemm.cc                          |  872 ++++++++---
 src/library/blas/xger.c                            |    8 +
 src/library/blas/xher.c                            |    8 +-
 src/library/blas/xher2.c                           |    8 +
 src/library/blas/xrot.c                            |   12 +-
 src/library/blas/xrotg.c                           |   24 +-
 src/library/blas/xrotm.c                           |    8 +
 src/library/blas/xrotmg.c                          |   14 +
 src/library/blas/xscal.c                           |    8 +-
 src/library/blas/xswap.c                           |    6 +
 src/library/blas/xsymm.c                           |   19 +-
 src/library/blas/xsyr.c                            |    8 +-
 src/library/blas/xsyr2.c                           |    8 +
 src/library/blas/xtbmv.c                           |   16 +-
 src/library/blas/xtrmv.c                           |   16 +-
 src/library/blas/xtrsm.cc                          | 1525 ++++++++++++++++++++
 .../{bingen => OCLBinaryGenerator}/CMakeLists.txt  |   12 +-
 .../OCLBinaryGenerator/OCLBinaryGenerator.cpp      |  347 +++++
 src/scripts/perf/blasPerformanceTesting.py         |   14 +-
 src/tests/common.cpp                               |   29 +-
 src/tests/correctness/corr-gemm.cpp                |   12 +-
 src/tests/include/gemm.h                           |    6 +-
 185 files changed, 31773 insertions(+), 2163 deletions(-)
 create mode 100644 appveyor.yml
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/README.txt (99%)
 create mode 100644 doc/performance/clBLAS_2.6.0/W9100/clblas_sgemmNT_w9100_14502.csv
 copy doc/performance/clBLAS_2.6.0/{S9150/sgemm_32.csv => W9100/dgemm_32.csv} (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150/dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv} (58%)
 copy doc/performance/{cuBLAS_7.0/Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv (63%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv (63%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv (62%)
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_lower_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_upper_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_lower_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_upper_unit_14502.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_lower_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_upper_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_lower_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_upper_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv
 copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_dp.csv (100%)
 copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_sp.csv (100%)
 create mode 100644 src/library/OCLBinaryGenerator.cmake
 create mode 100644 src/library/blas/AutoGemm/.gitignore
 create mode 100644 src/library/blas/AutoGemm/AutoGemm.py
 create mode 100644 src/library/blas/AutoGemm/AutoGemmParameters.py
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/AutoGemmPreCompileKernels.cpp
 copy src/library/{tools/ktest/naive/naive_blas.cpp => blas/AutoGemm/AutoGemmTools/AutoGemmUtil.h} (95%)
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp
 create mode 100644 src/library/blas/AutoGemm/Common.py
 create mode 100644 src/library/blas/AutoGemm/Includes.py
 create mode 100644 src/library/blas/AutoGemm/KernelOpenCL.py
 create mode 100644 src/library/blas/AutoGemm/KernelParameters.py
 create mode 100644 src/library/blas/AutoGemm/KernelSelection.py
 create mode 100644 src/library/blas/AutoGemm/KernelsToPreCompile.py
 create mode 100644 src/library/blas/AutoGemm/README.txt
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmClKernels.h
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.h
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/functor/hawaii_sgemmBig1024Kernel.cc
 copy src/library/blas/functor/include/{hawaii_sgemmBranchKernel.h => hawaii_sgemmBig1024Kernel.h} (61%)
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl
 create mode 100644 src/library/blas/include/xgemm.h
 create mode 100644 src/library/blas/specialCases/GemmSpecialCases.cpp
 create mode 100644 src/library/blas/specialCases/include/GemmSpecialCases.h
 create mode 100644 src/library/blas/trtri/TrtriClKernels.h
 create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.cpp
 create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.h
 create mode 100644 src/library/blas/trtri/diag_dtrtri_lower_128_16.cpp
 create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_128_16.cpp
 create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_192_12.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_12_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART2_R.cpp
 copy src/library/tools/{bingen => OCLBinaryGenerator}/CMakeLists.txt (62%)
 create mode 100644 src/library/tools/OCLBinaryGenerator/OCLBinaryGenerator.cpp

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git



More information about the debian-science-commits mailing list