[clblas] branch debian/sid updated (763f39c -> b5bd678)
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Wed Oct 28 11:52:33 UTC 2015
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a change to branch debian/sid
in repository clblas.
from 763f39c release to unstable
adds 6d7dcf8 bump develop branch version to 2.7
adds 63ca259 Merge pull request #120 from TimmyLiu/develop
adds 64d0ba3 add w9100 performance
adds 7f01bdf Removed printfs in case of invalid matrices from SYMM
adds edb7a50 Changed the order of matrix-size checking to A-B-C (consistent with other routines)
adds 4367daa Guarded debug printf statements in several level 1 and 2 routines
adds 5b5eb03 Merge pull request #126 from CNugteren/errorcheck_order
adds 9b14e88 Updated the computation of the matrix-buffer memory used in case of no unused tail
adds d114844 Simplified the calculation of the memory used for matrices
adds ce984ae Merge pull request #127 from CNugteren/mem_used_calculation
adds d3d36e0 fix sgemm NT perf drop when lda=ldb=6144 and k>1536
adds 458c9da fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=4096 or 5120 and k>lda/4
adds f3a10ab fix sgemm NT perf drop when fix sgemm NT perf drop when lda=ldb=7168 or 8192 and k>lda/4
adds 5a74faf code clean up
adds ecd89f9 Merge pull request #133 from TimmyLiu/develop
adds 45bb325 add missing includes on stdlib
adds dcf60db Merge pull request #134 from ghisvail/bugfix/missing-stdlib
adds 549df17 Fix redefinition warnings when using with clFFT
adds 8dd05f7 Merge pull request #135 from shehzan10/redef_fixes
adds f496d1c typo fix
adds 4b9a341 adding performance data
adds 40098f4 adding auto-gemm script
adds 33c5ca0 Merge pull request #138 from guacamoleo/develop-squash2
adds b8ed4fd release cl program
adds 4b34283 updating README
adds 2c5ab03 Merge pull request #139 from guacamoleo/develop
adds e7e01ad AutoGemm performance data; sgemm add unroll=8 for benchmarking; gemm compile kernel prints build log
adds f6ae9ac AutoGemm KernelOpenCL can generate standalone kernels
adds 644df17 Merge pull request #140 from guacamoleo/develop
adds acc6889 added numQueues to performance data
adds a6bdc3d Merge pull request #141 from guacamoleo/develop
adds 0a08f16 Integrating new travis and appveyor build yaml scripts
adds f496afa Fixed badge links for appveyor clmathlibraries project 5
adds c47ef12 Merge pull request #143 from kknox/ci
adds ba1bbdd dtrsm 192 trtri
adds 31c9214 mod192 dtrsm using dtrtri
adds 4d67a9e enable big dgemm with split calls
adds afe8fc0 enable output result with -p 1 in client
adds d6e6a78 dtrsm reenablment 192
adds c4e7964 bug fix
adds 4067d14 fix linux build
adds 5ee9e5f dtrsm lower left
adds 8e9bc4d dtrsm right side
adds a08507d attempt to fix macos build
adds 18404c0 Merge pull request #144 from TimmyLiu/develop
adds 4f204b2 finished dtrsm offline compile dev
adds 1ffeb0f fix linux build
adds 7cbbf9c Merge pull request #145 from TimmyLiu/develop
adds 7877094 fix VS 2015 build
adds ccb8bec Merge pull request #146 from TimmyLiu/develop
adds 55921b5 fix a install issue
adds 8299035 Merge branch 'develop' of https://github.com/TimmyLiu/clBLAS into develop
adds a459976 Merge pull request #147 from TimmyLiu/develop
adds 705d16e add dtrsm perf (clblas 2.7.1) on w9100 with 14502 driver
adds b942250 add k40 cublas 7.5 dtrsm data
adds c9a02b8 Merge pull request #148 from TimmyLiu/develop
adds d0b106e adding cublas data
adds f40ed62 Merge pull request #149 from guacamoleo/develop
adds ef47dda add link to windows master branch badge
adds feadbbb Merge pull request #150 from TimmyLiu/develop
adds 0482e1c merged develop to master; bumped version to 2.8.0
adds 8b5f7a0 Merge pull request #151 from guacamoleo/master
new c432c8e Merge tag 'upstream/v2.8' into debian/sid
new 29f9e79 d/patches: * Remove fix-missing-stdlib.patch, applied upstream. * Remove debian-enable-multiarch.patch, use SUFFIX_LIB cmake option. * Refresh disable-multilib-cflags.patch, fix-pthread-linkage.patch and use-boost-dynamic-libs.patch.
new 8b3de5c d/rules: use SUFFIX_LIB cmake option to provide multiarch install path
new 2327b3b Build documentation with arch-indep rules
new 77b3cb8 Remove system jquery symlinks in HTML documentation
new 82e0455 d/rules: further simplification and cleaning
new 669b93b d/rules: add missing call to dh_doxygen
new 18b8d1e Run `cme fix` on d/control and d/copyright
new f453463 d/changelog: add release information
new b5bd678 release to unstable
The 10 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
.gitignore | 3 +
.travis.yml | 168 ++-
README.md | 57 +-
appveyor.yml | 105 ++
debian/changelog | 22 +
debian/control | 27 +-
debian/libclblas-doc.links | 1 -
debian/patches/debian-enable-multiarch.patch | 25 -
debian/patches/disable-multilib-cflags.patch | 2 +-
debian/patches/fix-missing-stdlib.patch | 33 -
debian/patches/fix-pthread-linkage.patch | 2 +-
debian/patches/series | 2 -
debian/patches/use-boost-dynamic-libs.patch | 2 +-
debian/rules | 27 +-
doc/clBLAS.doxy | 6 +-
.../S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv | 721 +++++++++
.../S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv | 721 +++++++++
doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv | 60 +-
.../S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv | 721 +++++++++
doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv | 360 ++---
.../S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv | 721 +++++++++
.../clBLAS_2.6.0/{S9150 => W9100}/README.txt | 2 +-
.../W9100/clblas_sgemmNT_w9100_14502.csv | 181 +++
.../{S9150/sgemm_32.csv => W9100/dgemm_32.csv} | 360 ++---
.../clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv | 120 +-
.../dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv} | 60 +-
.../Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv | 360 ++---
.../clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv | 360 ++---
.../clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv | 360 ++---
.../clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv | 180 +--
.../S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv | 721 +++++++++
.../S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv | 721 +++++++++
.../S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv | 721 +++++++++
.../S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv | 721 +++++++++
...as271_w9100_dtrsm_col_left_lower_unit_14502.csv | 31 +
...as271_w9100_dtrsm_col_left_upper_unit_14502.csv | 31 +
...s271_w9100_dtrsm_col_right_lower_unit_14502.csv | 31 +
...s271_w9100_dtrsm_col_right_upper_unit_14502.csv | 31 +
doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv | 60 +-
doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv | 360 ++---
.../cublas75_k40_dtrsm_col_left_lower_unit.csv | 31 +
.../cublas75_k40_dtrsm_col_left_upper_unit.csv | 31 +
.../cublas75_k40_dtrsm_col_right_lower_unit.csv | 31 +
.../cublas75_k40_dtrsm_col_right_upper_unit.csv | 31 +
.../cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv | 721 +++++++++
.../cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv | 721 +++++++++
.../cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv | 721 +++++++++
.../cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv | 721 +++++++++
.../Tesla_K40/peak_dp.csv | 0
.../Tesla_K40/peak_sp.csv | 0
src/CMakeLists.txt | 95 +-
src/client/clfunc_common.hpp | 18 +-
src/client/clfunc_xgemm.hpp | 192 ++-
src/client/clfunc_xgemv.hpp | 22 +-
src/client/clfunc_xger.hpp | 16 +-
src/client/clfunc_xgerc.hpp | 12 +-
src/client/clfunc_xgeru.hpp | 12 +-
src/client/clfunc_xhemm.hpp | 34 +-
src/client/clfunc_xhemv.hpp | 12 +-
src/client/clfunc_xher.hpp | 10 +-
src/client/clfunc_xher2.hpp | 12 +-
src/client/clfunc_xher2k.hpp | 20 +-
src/client/clfunc_xherk.hpp | 20 +-
src/client/clfunc_xsymm.hpp | 58 +-
src/client/clfunc_xsymv.hpp | 12 +-
src/client/clfunc_xsyr.hpp | 10 +-
src/client/clfunc_xsyr2.hpp | 12 +-
src/client/clfunc_xsyr2k.hpp | 34 +-
src/client/clfunc_xsyrk.hpp | 32 +-
src/client/clfunc_xtrmm.hpp | 48 +-
src/client/clfunc_xtrmv.hpp | 14 +-
src/client/clfunc_xtrsm.hpp | 50 +-
src/client/clfunc_xtrsv.hpp | 14 +-
src/client/client.cpp | 12 +-
src/include/msvc.h | 2 +
src/library/CMakeLists.txt | 468 +++++-
src/library/OCLBinaryGenerator.cmake | 86 ++
src/library/bingen.cmake | 1 +
src/library/blas/AutoGemm/.gitignore | 4 +
src/library/blas/AutoGemm/AutoGemm.py | 47 +
src/library/blas/AutoGemm/AutoGemmParameters.py | 149 ++
.../AutoGemmTools/AutoGemmPreCompileKernels.cpp | 925 ++++++++++++
.../AutoGemm/AutoGemmTools/AutoGemmUtil.h} | 54 +-
.../AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp | 1392 ++++++++++++++++++
.../blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp | 995 +++++++++++++
src/library/blas/AutoGemm/Common.py | 60 +
src/library/blas/AutoGemm/Includes.py | 465 ++++++
src/library/blas/AutoGemm/KernelOpenCL.py | 587 ++++++++
src/library/blas/AutoGemm/KernelParameters.py | 253 ++++
src/library/blas/AutoGemm/KernelSelection.py | 683 +++++++++
src/library/blas/AutoGemm/KernelsToPreCompile.py | 91 ++
src/library/blas/AutoGemm/README.txt | 0
.../UserGemmKernelSources/UserGemmClKernels.h | 23 +
.../UserGemmKernelSourceIncludes.cpp | 57 +
.../UserGemmKernelSourceIncludes.h | 80 +
.../dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp | 203 +++
.../dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp | 203 +++
.../dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp | 196 +++
.../dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp | 193 +++
.../dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp | 195 +++
.../dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp | 195 +++
.../sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp | 129 ++
.../sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp | 160 ++
.../sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp | 208 +++
...sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp | 149 ++
.../sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp | 129 ++
.../sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp | 161 +++
.../sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp | 207 +++
.../sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp | 126 ++
.../sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp | 165 +++
.../sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp | 210 +++
...sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp | 148 ++
...sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp | 158 ++
.../sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp | 126 ++
.../sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp | 161 +++
.../sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp | 157 ++
.../sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp | 160 ++
.../sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp | 208 +++
.../sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp | 290 ++++
.../sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp | 128 ++
.../sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp | 165 +++
.../sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp | 209 +++
...sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp | 148 ++
.../sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp | 127 ++
.../sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp | 165 +++
.../sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp | 209 +++
src/library/blas/functor/functor.cc | 3 +-
src/library/blas/functor/hawaii.cc | 19 +
.../blas/functor/hawaii_sgemmBig1024Kernel.cc | 506 +++++++
.../blas/functor/hawaii_sgemmSplitKernel.cc | 147 ++
...mBranchKernel.h => hawaii_sgemmBig1024Kernel.h} | 18 +-
src/library/blas/generic/binary_lookup.cc | 6 +-
src/library/blas/generic/common.c | 16 +-
.../blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl | 264 ++++
src/library/blas/include/xgemm.h | 39 +
src/library/blas/ixamax.c | 16 +-
src/library/blas/specialCases/GemmSpecialCases.cpp | 994 +++++++++++++
.../blas/specialCases/include/GemmSpecialCases.h | 42 +
src/library/blas/trtri/TrtriClKernels.h | 44 +
.../blas/trtri/TrtriKernelSourceIncludes.cpp | 81 ++
src/library/blas/trtri/TrtriKernelSourceIncludes.h | 124 ++
.../blas/trtri/diag_dtrtri_lower_128_16.cpp | 172 +++
.../blas/trtri/diag_dtrtri_upper_128_16.cpp | 151 ++
.../blas/trtri/diag_dtrtri_upper_192_12.cpp | 149 ++
.../trtri/triple_dgemm_update_128_16_PART1_L.cpp | 161 +++
.../trtri/triple_dgemm_update_128_16_PART2_L.cpp | 143 ++
.../blas/trtri/triple_dgemm_update_128_16_R.cpp | 239 +++
.../trtri/triple_dgemm_update_128_32_PART1_L.cpp | 150 ++
.../trtri/triple_dgemm_update_128_32_PART1_R.cpp | 151 ++
.../trtri/triple_dgemm_update_128_32_PART2_L.cpp | 135 ++
.../trtri/triple_dgemm_update_128_32_PART2_R.cpp | 136 ++
.../trtri/triple_dgemm_update_128_64_PART1_L.cpp | 145 ++
.../trtri/triple_dgemm_update_128_64_PART1_R.cpp | 145 ++
.../trtri/triple_dgemm_update_128_64_PART2_L.cpp | 133 ++
.../trtri/triple_dgemm_update_128_64_PART2_R.cpp | 134 ++
.../triple_dgemm_update_128_ABOVE64_PART1_L.cpp | 146 ++
.../triple_dgemm_update_128_ABOVE64_PART1_R.cpp | 144 ++
.../triple_dgemm_update_128_ABOVE64_PART2_L.cpp | 134 ++
.../triple_dgemm_update_128_ABOVE64_PART2_R.cpp | 135 ++
.../triple_dgemm_update_128_ABOVE64_PART3_L.cpp | 91 ++
.../triple_dgemm_update_128_ABOVE64_PART3_R.cpp | 94 ++
.../blas/trtri/triple_dgemm_update_192_12_R.cpp | 194 +++
.../trtri/triple_dgemm_update_192_24_PART1_R.cpp | 117 ++
.../trtri/triple_dgemm_update_192_24_PART2_R.cpp | 112 ++
.../trtri/triple_dgemm_update_192_48_PART1_R.cpp | 144 ++
.../trtri/triple_dgemm_update_192_48_PART2_R.cpp | 145 ++
.../trtri/triple_dgemm_update_192_96_PART1_R.cpp | 156 ++
.../trtri/triple_dgemm_update_192_96_PART2_R.cpp | 157 ++
src/library/blas/xasum.c | 16 +-
src/library/blas/xaxpy.c | 6 +
src/library/blas/xcopy.c | 6 +
src/library/blas/xdot.c | 20 +-
src/library/blas/xgemm.cc | 872 ++++++++---
src/library/blas/xger.c | 8 +
src/library/blas/xher.c | 8 +-
src/library/blas/xher2.c | 8 +
src/library/blas/xrot.c | 12 +-
src/library/blas/xrotg.c | 24 +-
src/library/blas/xrotm.c | 8 +
src/library/blas/xrotmg.c | 14 +
src/library/blas/xscal.c | 8 +-
src/library/blas/xswap.c | 6 +
src/library/blas/xsymm.c | 19 +-
src/library/blas/xsyr.c | 8 +-
src/library/blas/xsyr2.c | 8 +
src/library/blas/xtbmv.c | 16 +-
src/library/blas/xtrmv.c | 16 +-
src/library/blas/xtrsm.cc | 1525 ++++++++++++++++++++
.../{bingen => OCLBinaryGenerator}/CMakeLists.txt | 12 +-
.../OCLBinaryGenerator/OCLBinaryGenerator.cpp | 347 +++++
src/scripts/perf/blasPerformanceTesting.py | 14 +-
src/tests/common.cpp | 29 +-
src/tests/correctness/corr-gemm.cpp | 12 +-
src/tests/include/gemm.h | 6 +-
194 files changed, 31788 insertions(+), 2258 deletions(-)
create mode 100644 appveyor.yml
delete mode 100644 debian/libclblas-doc.links
delete mode 100644 debian/patches/debian-enable-multiarch.patch
delete mode 100644 debian/patches/fix-missing-stdlib.patch
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/cgemmNT_S9150_14.50.2_2.6.0_8.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemmNT_S9150_14.50.2_2.6.0_8.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemmNT_S9150_14.50.2_2.6.0_8.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemmNT_S9150_14.50.2_2.6.0_8.csv
copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/README.txt (99%)
create mode 100644 doc/performance/clBLAS_2.6.0/W9100/clblas_sgemmNT_w9100_14502.csv
copy doc/performance/clBLAS_2.6.0/{S9150/sgemm_32.csv => W9100/dgemm_32.csv} (62%)
copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/dgemm_96.csv (62%)
copy doc/performance/clBLAS_2.6.0/{S9150/dtrsm_192.csv => W9100/dtrsm_w9100_14502.csv} (58%)
copy doc/performance/{cuBLAS_7.0/Tesla_K40 => clBLAS_2.6.0/W9100}/peak_dp.csv (63%)
copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/peak_sp.csv (63%)
copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_32.csv (62%)
copy doc/performance/clBLAS_2.6.0/{S9150 => W9100}/zgemm_64.csv (62%)
create mode 100644 doc/performance/clBLAS_2.7.1/S9150/cgemmNT_S9150_14.50.2_2.7.1_8.csv
create mode 100644 doc/performance/clBLAS_2.7.1/S9150/dgemmNT_S9150_14.50.2_2.7.1_8.csv
create mode 100644 doc/performance/clBLAS_2.7.1/S9150/sgemmNT_S9150_14.50.2_2.7.1_8.csv
create mode 100644 doc/performance/clBLAS_2.7.1/S9150/zgemmNT_S9150_14.50.2_2.7.1_8.csv
create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_lower_unit_14502.csv
create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_left_upper_unit_14502.csv
create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_lower_unit_14502.csv
create mode 100644 doc/performance/clBLAS_2.7.1/W9100/clblas271_w9100_dtrsm_col_right_upper_unit_14502.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_lower_unit.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_left_upper_unit.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_lower_unit.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas75_k40_dtrsm_col_right_upper_unit.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_cgemm_8.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_dgemm_8.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_sgemm_8.csv
create mode 100644 doc/performance/cuBLAS_7.5/Tesla_K40/cublas_zgemm_8.csv
copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_dp.csv (100%)
copy doc/performance/{cuBLAS_7.0 => cuBLAS_7.5}/Tesla_K40/peak_sp.csv (100%)
create mode 100644 src/library/OCLBinaryGenerator.cmake
create mode 100644 src/library/blas/AutoGemm/.gitignore
create mode 100644 src/library/blas/AutoGemm/AutoGemm.py
create mode 100644 src/library/blas/AutoGemm/AutoGemmParameters.py
create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/AutoGemmPreCompileKernels.cpp
copy src/library/{tools/ktest/naive/naive_blas.cpp => blas/AutoGemm/AutoGemmTools/AutoGemmUtil.h} (95%)
create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/ProfileAutoGemm.cpp
create mode 100644 src/library/blas/AutoGemm/AutoGemmTools/TestAutoGemm.cpp
create mode 100644 src/library/blas/AutoGemm/Common.py
create mode 100644 src/library/blas/AutoGemm/Includes.py
create mode 100644 src/library/blas/AutoGemm/KernelOpenCL.py
create mode 100644 src/library/blas/AutoGemm/KernelParameters.py
create mode 100644 src/library/blas/AutoGemm/KernelSelection.py
create mode 100644 src/library/blas/AutoGemm/KernelsToPreCompile.py
create mode 100644 src/library/blas/AutoGemm/README.txt
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmClKernels.h
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/UserGemmKernelSourceIncludes.h
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B0_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NN_B1_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B0_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_NT_B1_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B0_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/dgemm_Col_TN_B1_MX048_NX048_KX08_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B0_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_BRANCH_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NN_B1_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B0_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_BRANCH_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_SINGLE_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX032_NX064_KX16_ROW_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX032_KX16_COL_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_NT_B1_MX128_NX128_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B0_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX032_NX032_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX064_NX064_KX16_src.cpp
create mode 100644 src/library/blas/AutoGemm/UserGemmKernelSources/sgemm_Col_TN_B1_MX096_NX096_KX16_src.cpp
create mode 100644 src/library/blas/functor/hawaii_sgemmBig1024Kernel.cc
copy src/library/blas/functor/include/{hawaii_sgemmBranchKernel.h => hawaii_sgemmBig1024Kernel.h} (61%)
create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_bigMatrices.cl
create mode 100644 src/library/blas/include/xgemm.h
create mode 100644 src/library/blas/specialCases/GemmSpecialCases.cpp
create mode 100644 src/library/blas/specialCases/include/GemmSpecialCases.h
create mode 100644 src/library/blas/trtri/TrtriClKernels.h
create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.cpp
create mode 100644 src/library/blas/trtri/TrtriKernelSourceIncludes.h
create mode 100644 src/library/blas/trtri/diag_dtrtri_lower_128_16.cpp
create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_128_16.cpp
create mode 100644 src/library/blas/trtri/diag_dtrtri_upper_192_12.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART1_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_PART2_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_16_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_32_PART2_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_64_PART2_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART2_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_L.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_128_ABOVE64_PART3_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_12_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_24_PART2_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_48_PART2_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART1_R.cpp
create mode 100644 src/library/blas/trtri/triple_dgemm_update_192_96_PART2_R.cpp
copy src/library/tools/{bingen => OCLBinaryGenerator}/CMakeLists.txt (62%)
create mode 100644 src/library/tools/OCLBinaryGenerator/OCLBinaryGenerator.cpp
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git
More information about the debian-science-commits
mailing list