[clblas] branch debian/sid updated (763f39c -> b5bd678)

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Wed Oct 28 11:52:33 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a change to branch debian/sid
in repository clblas.

      from  763f39c   release to unstable
      adds  6d7dcf8   bump develop branch version to 2.7
      adds  63ca259   Merge pull request #120 from TimmyLiu/develop
      adds  64d0ba3   add w9100 performance
      adds  7f01bdf   Removed printfs in case of invalid matrices from SYMM
      adds  edb7a50   Changed the order of matrix-size checking to A-B-C (consistent with other routines)
      adds  4367daa   Guarded debug printf statements in several level 1 and 2 routines
      adds  5b5eb03   Merge pull request #126 from CNugteren/errorcheck_order
      adds  9b14e88   Updated the computation of the matrix-buffer memory used in case of no unused tail
      adds  d114844   Simplified the calculation of the memory used for matrices
      adds  ce984ae   Merge pull request #127 from CNugteren/mem_used_calculation
      adds  d3d36e0   fix sgemm NT perf drop when lda=ldb=6144 and k>1536
      adds  458c9da   fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=4096 or 5120 and k>lda/4
      adds  f3a10ab   fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=7168 or 8192 and k>lda/4
      adds  5a74faf   code clean up
      adds  ecd89f9   Merge pull request #133 from TimmyLiu/develop
      adds  45bb325   add missing includes on stdlib
      adds  dcf60db   Merge pull request #134 from ghisvail/bugfix/missing-stdlib
      adds  549df17   Fix redefinition warnings when using with clFFT
      adds  8dd05f7   Merge pull request #135 from shehzan10/redef_fixes
      adds  f496d1c   typo fix
      adds  4b9a341   adding performance data
      adds  40098f4   adding auto-gemm script
      adds  33c5ca0   Merge pull request #138 from guacamoleo/develop-squash2
      adds  b8ed4fd   release cl program
      adds  4b34283   updating README
      adds  2c5ab03   Merge pull request #139 from guacamoleo/develop
      adds  e7e01ad   AutoGemm performance data; sgemm add unroll=8 for benchmarking; gemm compile kernel prints build log
      adds  f6ae9ac   AutoGemm KernelOpenCL can generate standalone kernels
      adds  644df17   Merge pull request #140 from guacamoleo/develop
      adds  acc6889   added numQueues to performance data
      adds  a6bdc3d   Merge pull request #141 from guacamoleo/develop
      adds  0a08f16   Integrating new travis and appveyor build yaml scripts
      adds  f496afa   Fixed badge links for appveyor clmathlibraries project 5
      adds  c47ef12   Merge pull request #143 from kknox/ci
      adds  ba1bbdd   dtrsm 192 trtri
      adds  31c9214   mod192 dtrsm using dtrtri
      adds  4d67a9e   enable big dgemm with split calls
      adds  afe8fc0   enable output result with -p 1 in client
      adds  d6e6a78   dtrsm reenablment 192
      adds  c4e7964   bug fix
      adds  4067d14   fix linux build
      adds  5ee9e5f   dtrsm lower left
      adds  8e9bc4d   dtrsm right side
      adds  a08507d   attempt to fix macos build
      adds  18404c0   Merge pull request #144 from TimmyLiu/develop
      adds  4f204b2   finished dtrsm offline compile dev
      adds  1ffeb0f   fix linux build
      adds  7cbbf9c   Merge pull request #145 from TimmyLiu/develop
      adds  7877094   fix VS 2015 build
      adds  ccb8bec   Merge pull request #146 from TimmyLiu/develop
      adds  55921b5   fix a install issue
      adds  8299035   Merge branch 'develop' of https://github.com/TimmyLiu/clBLAS into develop
      adds  a459976   Merge pull request #147 from TimmyLiu/develop
      adds  705d16e   add dtrsm perf (clblas 2.7.1) on w9100 with 14502 driver
      adds  b942250   add k40 cublas 7.5 dtrsm data
      adds  c9a02b8   Merge pull request #148 from TimmyLiu/develop
      adds  d0b106e   adding cublas data
      adds  f40ed62   Merge pull request #149 from guacamoleo/develop
      adds  ef47dda   add link to windows master branch badge
      adds  feadbbb   Merge pull request #150 from TimmyLiu/develop
      adds  0482e1c   merged develop to master; bumped version to 2.8.0
      adds  8b5f7a0   Merge pull request #151 from guacamoleo/master
       new  c432c8e   Merge tag 'upstream/v2.8' into debian/sid
       new  29f9e79   d/patches:   * Remove fix-missing-stdlib.patch, applied upstream.   * Remove debian-enable-multiarch.patch, use SUFFIX_LIB cmake option.   * Refresh disable-multilib-cflags.patch, fix-pthread-linkage.patch     and use-boost-dynamic-libs.patch.
       new  8b3de5c   d/rules: use SUFFIX_LIB cmake option to provide multiarch install path
       new  2327b3b   Build documentation with arch-indep rules
       new  77b3cb8   Remove system jquery symlinks in HTML documentation
       new  82e0455   d/rules: further simplification and cleaning
       new  669b93b   d/rules: add missing call to dh_doxygen
       new  18b8d1e   Run `cme fix` on d/control and d/copyright
       new  f453463   d/changelog: add release information
       new  b5bd678   release to unstable

The 10 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |    3 +
 .travis.yml                                        |  168 ++-
 README.md                                          |   57 +-
 appveyor.yml                                       |  105 ++
 debian/changelog                                   |   22 +
 debian/control                                     |   27 +-
 debian/libclblas-doc.links                         |    1 -
 debian/patches/debian-enable-multiarch.patch       |   25 -
 debian/patches/disable-multilib-cflags.patch       |    2 +-
 debian/patches/fix-missing-stdlib.patch            |   33 -
 debian/patches/fix-pthread-linkage.patch           |    2 +-
 debian/patches/series                              |    2 -
 debian/patches/use-boost-dynamic-libs.patch        |    2 +-
 debian/rules                                       |   27 +-
 doc/clBLAS.doxy                                    |    6 +-
 .../S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 .../S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv   |   60 +-
 .../S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv    |  360 ++---
 .../S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv        |  721 +++++++++
 .../clBLAS_2.6.0/{S9150 => W9100}/README.txt       |    2 +-
 .../W9100/clblas_sgemmNT_w9100_14502.csv           |  181 +++
 .../{S9150/sgemm_32.csv => W9100/dgemm_32.csv}     |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv     |  120 +-
 .../dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv}  |   60 +-
 .../Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv   |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv      |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv     |  360 ++---
 .../clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv     |  180 +--
 .../S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 .../S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv        |  721 +++++++++
 ...as271_w9100_dtrsm_col_left_lower_unit_14502.csv |   31 +
 ...as271_w9100_dtrsm_col_left_upper_unit_14502.csv |   31 +
 ...s271_w9100_dtrsm_col_right_lower_unit_14502.csv |   31 +
 ...s271_w9100_dtrsm_col_right_upper_unit_14502.csv |   31 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv     |   60 +-
 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv     |  360 ++---
 .../cublas75_k40_dtrsm_col_left_lower_unit.csv     |   31 +
 .../cublas75_k40_dtrsm_col_left_upper_unit.csv     |   31 +
 .../cublas75_k40_dtrsm_col_right_lower_unit.csv    |   31 +
 .../cublas75_k40_dtrsm_col_right_upper_unit.csv    |   31 +
 .../cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv        |  721 +++++++++
 .../cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv        |  721 +++++++++
 .../Tesla_K40/peak_dp.csv                          |    0
 .../Tesla_K40/peak_sp.csv                          |    0
 src/CMakeLists.txt                                 |   95 +-
 src/client/clfunc_common.hpp                       |   18 +-
 src/client/clfunc_xgemm.hpp                        |  192 ++-
 src/client/clfunc_xgemv.hpp                        |   22 +-
 src/client/clfunc_xger.hpp                         |   16 +-
 src/client/clfunc_xgerc.hpp                        |   12 +-
 src/client/clfunc_xgeru.hpp                        |   12 +-
 src/client/clfunc_xhemm.hpp                        |   34 +-
 src/client/clfunc_xhemv.hpp                        |   12 +-
 src/client/clfunc_xher.hpp                         |   10 +-
 src/client/clfunc_xher2.hpp                        |   12 +-
 src/client/clfunc_xher2k.hpp                       |   20 +-
 src/client/clfunc_xherk.hpp                        |   20 +-
 src/client/clfunc_xsymm.hpp                        |   58 +-
 src/client/clfunc_xsymv.hpp                        |   12 +-
 src/client/clfunc_xsyr.hpp                         |   10 +-
 src/client/clfunc_xsyr2.hpp                        |   12 +-
 src/client/clfunc_xsyr2k.hpp                       |   34 +-
 src/client/clfunc_xsyrk.hpp                        |   32 +-
 src/client/clfunc_xtrmm.hpp                        |   48 +-
 src/client/clfunc_xtrmv.hpp                        |   14 +-
 src/client/clfunc_xtrsm.hpp                        |   50 +-
 src/client/clfunc_xtrsv.hpp                        |   14 +-
 src/client/client.cpp                              |   12 +-
 src/include/msvc.h                                 |    2 +
 src/library/CMakeLists.txt                         |  468 +++++-
 src/library/OCLBinaryGenerator.cmake               |   86 ++
 src/library/bingen.cmake                           |    1 +
 src/library/blas/AutoGemm/.gitignore               |    4 +
 src/library/blas/AutoGemm/AutoGemm.py              |   47 +
 src/library/blas/AutoGemm/AutoGemmParameters.py    |  149 ++
 .../AutoGemmTools/AutoGemmPreCompileKernels.cpp    |  925 ++++++++++++
 .../AutoGemm/AutoGemmTools/AutoGemmUtil.h}         |   54 +-
 .../AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp     | 1392 ++++++++++++++++++
 .../blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp   |  995 +++++++++++++
 src/library/blas/AutoGemm/Common.py                |   60 +
 src/library/blas/AutoGemm/Includes.py              |  465 ++++++
 src/library/blas/AutoGemm/KernelOpenCL.py          |  587 ++++++++
 src/library/blas/AutoGemm/KernelParameters.py      |  253 ++++
 src/library/blas/AutoGemm/KernelSelection.py       |  683 +++++++++
 src/library/blas/AutoGemm/KernelsToPreCompile.py   |   91 ++
 src/library/blas/AutoGemm/README.txt               |    0
 .../UserGemmKernelSources/UserGemmClKernels.h      |   23 +
 .../UserGemmKernelSourceIncludes.cpp               |   57 +
 .../UserGemmKernelSourceIncludes.h                 |   80 +
 .../dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp       |  203 +++
 .../dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp       |  203 +++
 .../dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp       |  196 +++
 .../dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp       |  193 +++
 .../dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp       |  195 +++
 .../dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp       |  195 +++
 .../sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp       |  129 ++
 .../sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp       |  160 ++
 .../sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp       |  208 +++
 ...sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp |  149 ++
 .../sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp       |  129 ++
 .../sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp       |  161 +++
 .../sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp       |  207 +++
 .../sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp       |  126 ++
 .../sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp       |  210 +++
 ...sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp |  148 ++
 ...sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp |  158 ++
 .../sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp       |  126 ++
 .../sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp   |  161 +++
 .../sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp   |  157 ++
 .../sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp       |  160 ++
 .../sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp       |  208 +++
 .../sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp       |  290 ++++
 .../sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp       |  128 ++
 .../sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp       |  209 +++
 ...sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp |  148 ++
 .../sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp       |  127 ++
 .../sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp       |  165 +++
 .../sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp       |  209 +++
 src/library/blas/functor/functor.cc                |    3 +-
 src/library/blas/functor/hawaii.cc                 |   19 +
 .../blas/functor/hawaii_sgemmBig1024Kernel.cc      |  506 +++++++
 .../blas/functor/hawaii_sgemmSplitKernel.cc        |  147 ++
 ...mBranchKernel.h => hawaii_sgemmBig1024Kernel.h} |   18 +-
 src/library/blas/generic/binary_lookup.cc          |    6 +-
 src/library/blas/generic/common.c                  |   16 +-
 .../blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl |  264 ++++
 src/library/blas/include/xgemm.h                   |   39 +
 src/library/blas/ixamax.c                          |   16 +-
 src/library/blas/specialCases/GemmSpecialCases.cpp |  994 +++++++++++++
 .../blas/specialCases/include/GemmSpecialCases.h   |   42 +
 src/library/blas/trtri/TrtriClKernels.h            |   44 +
 .../blas/trtri/TrtriKernelSourceIncludes.cpp       |   81 ++
 src/library/blas/trtri/TrtriKernelSourceIncludes.h |  124 ++
 .../blas/trtri/diag_dtrtri_lower_128_16.cpp        |  172 +++
 .../blas/trtri/diag_dtrtri_upper_128_16.cpp        |  151 ++
 .../blas/trtri/diag_dtrtri_upper_192_12.cpp        |  149 ++
 .../trtri/triple_dgemm_update_128_16_PART1_L.cpp   |  161 +++
 .../trtri/triple_dgemm_update_128_16_PART2_L.cpp   |  143 ++
 .../blas/trtri/triple_dgemm_update_128_16_R.cpp    |  239 +++
 .../trtri/triple_dgemm_update_128_32_PART1_L.cpp   |  150 ++
 .../trtri/triple_dgemm_update_128_32_PART1_R.cpp   |  151 ++
 .../trtri/triple_dgemm_update_128_32_PART2_L.cpp   |  135 ++
 .../trtri/triple_dgemm_update_128_32_PART2_R.cpp   |  136 ++
 .../trtri/triple_dgemm_update_128_64_PART1_L.cpp   |  145 ++
 .../trtri/triple_dgemm_update_128_64_PART1_R.cpp   |  145 ++
 .../trtri/triple_dgemm_update_128_64_PART2_L.cpp   |  133 ++
 .../trtri/triple_dgemm_update_128_64_PART2_R.cpp   |  134 ++
 .../triple_dgemm_update_128_ABOVE64_PART1_L.cpp    |  146 ++
 .../triple_dgemm_update_128_ABOVE64_PART1_R.cpp    |  144 ++
 .../triple_dgemm_update_128_ABOVE64_PART2_L.cpp    |  134 ++
 .../triple_dgemm_update_128_ABOVE64_PART2_R.cpp    |  135 ++
 .../triple_dgemm_update_128_ABOVE64_PART3_L.cpp    |   91 ++
 .../triple_dgemm_update_128_ABOVE64_PART3_R.cpp    |   94 ++
 .../blas/trtri/triple_dgemm_update_192_12_R.cpp    |  194 +++
 .../trtri/triple_dgemm_update_192_24_PART1_R.cpp   |  117 ++
 .../trtri/triple_dgemm_update_192_24_PART2_R.cpp   |  112 ++
 .../trtri/triple_dgemm_update_192_48_PART1_R.cpp   |  144 ++
 .../trtri/triple_dgemm_update_192_48_PART2_R.cpp   |  145 ++
 .../trtri/triple_dgemm_update_192_96_PART1_R.cpp   |  156 ++
 .../trtri/triple_dgemm_update_192_96_PART2_R.cpp   |  157 ++
 src/library/blas/xasum.c                           |   16 +-
 src/library/blas/xaxpy.c                           |    6 +
 src/library/blas/xcopy.c                           |    6 +
 src/library/blas/xdot.c                            |   20 +-
 src/library/blas/xgemm.cc                          |  872 ++++++++---
 src/library/blas/xger.c                            |    8 +
 src/library/blas/xher.c                            |    8 +-
 src/library/blas/xher2.c                           |    8 +
 src/library/blas/xrot.c                            |   12 +-
 src/library/blas/xrotg.c                           |   24 +-
 src/library/blas/xrotm.c                           |    8 +
 src/library/blas/xrotmg.c                          |   14 +
 src/library/blas/xscal.c                           |    8 +-
 src/library/blas/xswap.c                           |    6 +
 src/library/blas/xsymm.c                           |   19 +-
 src/library/blas/xsyr.c                            |    8 +-
 src/library/blas/xsyr2.c                           |    8 +
 src/library/blas/xtbmv.c                           |   16 +-
 src/library/blas/xtrmv.c                           |   16 +-
 src/library/blas/xtrsm.cc                          | 1525 ++++++++++++++++++++
 .../{bingen => OCLBinaryGenerator}/CMakeLists.txt  |   12 +-
 .../OCLBinaryGenerator/OCLBinaryGenerator.cpp      |  347 +++++
 src/scripts/perf/blasPerformanceTesting.py         |   14 +-
 src/tests/common.cpp                               |   29 +-
 src/tests/correctness/corr-gemm.cpp                |   12 +-
 src/tests/include/gemm.h                           |    6 +-
 194 files changed, 31788 insertions(+), 2258 deletions(-)
 create mode 100644 appveyor.yml
 delete mode 100644 debian/libclblas-doc.links
 delete mode 100644 debian/patches/debian-enable-multiarch.patch
 delete mode 100644 debian/patches/fix-missing-stdlib.patch
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/README.txt (99%)
 create mode 100644 doc/performance/clBLAS_2.6.0/W9100/clblas_sgemmNT_w9100_14502.csv
 copy doc/performance/clBLAS_2.6.0/{S9150/sgemm_32.csv => W9100/dgemm_32.csv} (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150/dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv} (58%)
 copy doc/performance/{cuBLAS_7.0/Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv (63%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv (63%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv (62%)
 copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv (62%)
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_lower_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_upper_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_lower_unit_14502.csv
 create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_upper_unit_14502.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_lower_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_upper_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_lower_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_upper_unit.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv
 create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv
 copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_dp.csv (100%)
 copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_sp.csv (100%)
 create mode 100644 src/library/OCLBinaryGenerator.cmake
 create mode 100644 src/library/blas/AutoGemm/.gitignore
 create mode 100644 src/library/blas/AutoGemm/AutoGemm.py
 create mode 100644 src/library/blas/AutoGemm/AutoGemmParameters.py
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/AutoGemmPreCompileKernels.cpp
 copy src/library/{tools/ktest/naive/naive_blas.cpp => blas/AutoGemm/AutoGemmTools/AutoGemmUtil.h} (95%)
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp
 create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp
 create mode 100644 src/library/blas/AutoGemm/Common.py
 create mode 100644 src/library/blas/AutoGemm/Includes.py
 create mode 100644 src/library/blas/AutoGemm/KernelOpenCL.py
 create mode 100644 src/library/blas/AutoGemm/KernelParameters.py
 create mode 100644 src/library/blas/AutoGemm/KernelSelection.py
 create mode 100644 src/library/blas/AutoGemm/KernelsToPreCompile.py
 create mode 100644 src/library/blas/AutoGemm/README.txt
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmClKernels.h
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.h
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp
 create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp
 create mode 100644 src/library/blas/functor/hawaii_sgemmBig1024Kernel.cc
 copy src/library/blas/functor/include/{hawaii_sgemmBranchKernel.h => hawaii_sgemmBig1024Kernel.h} (61%)
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl
 create mode 100644 src/library/blas/include/xgemm.h
 create mode 100644 src/library/blas/specialCases/GemmSpecialCases.cpp
 create mode 100644 src/library/blas/specialCases/include/GemmSpecialCases.h
 create mode 100644 src/library/blas/trtri/TrtriClKernels.h
 create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.cpp
 create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.h
 create mode 100644 src/library/blas/trtri/diag_dtrtri_lower_128_16.cpp
 create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_128_16.cpp
 create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_192_12.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_L.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_12_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART2_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART1_R.cpp
 create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART2_R.cpp
 copy src/library/tools/{bingen => OCLBinaryGenerator}/CMakeLists.txt (62%)
 create mode 100644 src/library/tools/OCLBinaryGenerator/OCLBinaryGenerator.cpp

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git



More information about the debian-science-commits mailing list