[clblas] branch master updated (a6b3f9d -> 9731ea2)
Ghislain Vaillant
ghisvail-guest at moszumanska.debian.org
Fri Jul 24 22:49:42 UTC 2015
This is an automated email from the git hooks/post-receive script.
ghisvail-guest pushed a change to branch master
in repository clblas.
from a6b3f9d Merge pull request #98 from clMathLibraries/revert-97-master
new 434b38e enable offline compilation of a subset of GEMM and TRSM on targeted device
new 2dce4f5 minor bug fix
new 595c63b fix bug for small matrix when beta is 0
new 5c3f082 minor bug fix in client code
new 8dc95f9 Merge pull request #81 from TimmyLiu/develop
new d00b59a do not build bingen if offline compilation is disabled
new 38b342a Merge pull request #82 from TimmyLiu/develop
new 1795886 correctness fix
new eff87f9 fix travis CI build
new 0a6d431 Merge pull request #85 from TimmyLiu/develop
new a55d3ae Merge branch 'develop' of https://github.com/clMathLibraries/clBLAS into develop
new fda48a7 replacing barrier with memfence in the inner most loop requires an extra barrier at the beginning of the outer loop.
new 39b324d improve big sgemm column NN perf. improve small sgemm NN perf.
new f9e0160 Merge pull request #87 from TimmyLiu/develop
new 413819f bump develop version to 2.5
new fdcf987 Merge pull request #88 from TimmyLiu/develop
new 8ef0a43 some static kernel code clean up
new a280c96 improve sgemm column major TN small matrix perf. some type/bug fixes
new 5137231 Merge pull request #90 from TimmyLiu/develop
new 93b5b69 fix a very silly bug in compuing s/dtrsm flops.
new 8b41d5e Merge pull request #91 from TimmyLiu/develop
new c084b47 Ben : fixing bonaire path for sgemm using CL2.0 path
new 2ad3664 fixing a typo
new aa972ec chanching the heuristic to detect the small matrices
new d4163f4 Merge pull request #93 from BenjaminCoquelle/develop
new 7302f86 some typo fixes
new 573b487 Merge pull request #95 from TimmyLiu/develop
new 1972170 Fix install location of samples
new 9edf929 Merge pull request #75 from marbre/samples
new d8419d8 Install scripts/perf to share/clBLAS on non WIN32 systems
new f8af95c Merge pull request #74 from marbre/develop
new 2f845e2 fix cmake bug introduced by pull request #75
new 17b22e8 Merge pull request #96 from TimmyLiu/develop
new 46389ac added test for OSX detection to turn off CORR_TEST_WITH_ACML, refactored CMakeLists.txt in BUILD_TEST block
new f5d5adc Merge pull request #99 from lzamparo/cmake_fix
new 6d1e3c4 stop checking opencl major number in some routines
new f4af838 better handle sgemm NT where M and N are mod32 and not mod64. M and N are within range from 1184 to 3872
new 4447bfe Merge pull request #100 from TimmyLiu/develop
new 701210c fix undefined reference to symbol 'pthread_key_delete@@GLIBC_2.2.5'
new 1136350 Merge pull request #102 from lunochod/develop
new 60092c2 delete appendix in license file
new 2621814 Merge pull request #106 from TimmyLiu/develop
new b83750a Install cmake configuration to lib/cmake/clBLAS
new 77b3245 Merge pull request #105 from marbre/develop
new 6623809 adding zgemm kernel for hawaii
new 8580cdb fixed including gcn_zgemm.h
new 6f476b8 Merge pull request #107 from guacamoleo/develop
new bd13b7b enables apiCallCount for zgemm within client
new 03ae187 fixed zgemm offset bug; removed profiling from client
new f9a2250 Merge pull request #111 from guacamoleo/develop
new f7c6536 add codepath for dtrsm when M and N are mod192
new 828aff1 Merge pull request #112 from TimmyLiu/develop
new 262a1e1 add x86_64/sdk suffix as search location for libOpenCL.so when AMDAPPSDKROOT is used
new 2137cae Merge pull request #113 from lunochod/develop
new 5b922a7 python scripts should call clBLAS-client instead of client
new f3471bf Merge pull request #116 from TimmyLiu/develop
new 6311c6b adding performance data
new e058f67 fixed graph script
new 5005205 Merge pull request #118 from guacamoleo/develop
new 3f032e7 merge develop branch to master branch. Bump master branch version number to 2.6
new 9731ea2 Merge pull request #119 from TimmyLiu/master
The 61 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
.gitignore | 3 +
.travis.yml | 22 +-
LICENSE | 25 -
README.md | 65 +-
doc/README-BinaryCacheOnDisk.txt | 69 +
doc/README-FunctorConcepts.txt | 100 +
doc/README-HowToIntroduceFunctors.txt | 402 ++
doc/README-TransformASolverIntoAFunctor.txt | 382 ++
doc/performance/clBLAS_2.6.0/S9150/README.txt | 35 +
doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv | 181 +
doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv | 61 +
doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv | 31 +
.../clBLAS_2.6.0/S9150/generate_graphs.sh | 92 +
doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv | 181 +
doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv | 181 +
doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv | 181 +
doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv | 181 +
doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv | 91 +
doc/performance/cuBLAS_7.0/Tesla_K40/README.txt | 35 +
doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv | 181 +
doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv | 31 +
doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv | 181 +
doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv | 181 +
doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv | 181 +
doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv | 181 +
src/CMakeLists.txt | 83 +-
src/FindOpenCL.cmake | 3 +-
src/clBLAS.def | 28 +
src/clBLAS.h | 622 ++
src/client/clfunc_common.hpp | 1 +
src/client/clfunc_xgemm.hpp | 53 +-
src/client/clfunc_xtrsm.hpp | 14 +-
src/client/client.cpp | 21 +-
src/flags_public.txt | 4 +
src/include/binary_lookup.h | 273 +
src/include/devinfo.h | 2 +
src/include/md5sum.h | 50 +
src/include/rwlock.h | 117 +
src/library/CMakeLists.txt | 282 +-
src/library/bingen.cmake | 144 +
src/library/blas/fill.cc | 272 +
src/library/blas/functor/bonaire.cc | 90 +
src/library/blas/functor/functor.cc | 117 +
src/library/blas/functor/functor_fill.cc | 156 +
src/library/blas/functor/functor_selector.cc | 344 ++
src/library/blas/functor/functor_xgemm.cc | 323 +
src/library/blas/functor/functor_xscal.cc | 410 ++
src/library/blas/functor/functor_xscal_generic.cc | 439 ++
src/library/blas/functor/functor_xtrsm.cc | 336 ++
src/library/blas/functor/gcn_dgemm.cc | 1035 ++++
src/library/blas/functor/gcn_dgemmCommon.cc | 997 +++
src/library/blas/functor/gcn_dgemmSmallMatrices.cc | 654 ++
src/library/blas/functor/gcn_sgemm.cc | 556 ++
src/library/blas/functor/gcn_sgemmSmallMatrices.cc | 558 ++
src/library/blas/functor/gcn_zgemm.cc | 354 ++
src/library/blas/functor/gpu_dtrsm.cc | 823 +++
src/library/blas/functor/gpu_dtrsm192.cc | 596 ++
src/library/blas/functor/hawaii.cc | 223 +
.../blas/functor/hawaii_dgemmChannelConflict.cc | 159 +
.../blas/functor/hawaii_dgemmSplitKernel.cc | 670 ++
.../blas/functor/hawaii_sgemmBranchKernel.cc | 442 ++
src/library/blas/functor/hawaii_sgemmSplit64_32.cc | 423 ++
.../blas/functor/hawaii_sgemmSplitKernel.cc | 858 +++
src/library/blas/functor/include/BinaryBuild.h | 10 +
src/library/blas/functor/include/atomic_counter.h | 173 +
src/library/blas/functor/include/bonaire.h | 41 +
src/library/blas/functor/include/functor.h | 496 ++
src/library/blas/functor/include/functor_fill.h | 99 +
.../functor/include/functor_hawaii_dgemm_NT_MN48.h | 210 +
.../blas/functor/include/functor_selector.h | 149 +
src/library/blas/functor/include/functor_utils.h | 116 +
src/library/blas/functor/include/functor_xgemm.h | 213 +
src/library/blas/functor/include/functor_xscal.h | 207 +
.../blas/functor/include/functor_xscal_generic.h | 173 +
src/library/blas/functor/include/functor_xtrsm.h | 203 +
src/library/blas/functor/include/gcn_dgemm.h | 59 +
src/library/blas/functor/include/gcn_dgemmCommon.h | 22 +
.../blas/functor/include/gcn_dgemmSmallMatrices.h | 27 +
src/library/blas/functor/include/gcn_sgemm.h | 62 +
.../blas/functor/include/gcn_sgemmSmallMatrices.h | 27 +
src/library/blas/functor/include/gcn_zgemm.h | 62 +
src/library/blas/functor/include/gpu_dtrsm.h | 28 +
src/library/blas/functor/include/gpu_dtrsm192.h | 28 +
src/library/blas/functor/include/hawaii.h | 42 +
.../functor/include/hawaii_dgemmChannelConflict.h | 22 +
.../blas/functor/include/hawaii_dgemmSplitKernel.h | 46 +
.../functor/include/hawaii_sgemmBranchKernel.h | 50 +
.../blas/functor/include/hawaii_sgemmSplit64_32.h | 46 +
.../blas/functor/include/hawaii_sgemmSplitKernel.h | 46 +
src/library/blas/functor/include/tahiti.h | 41 +
src/library/blas/functor/tahiti.cc | 120 +
src/library/blas/generic/binary_lookup.cc | 685 +++
src/library/blas/generic/common.c | 25 +-
src/library/blas/generic/common2.cc | 98 +
src/library/blas/generic/functor_cache.cc | 80 +
src/library/blas/generic/solution_seq_make.c | 4 +-
src/library/blas/gens/blas_kgen.h | 3 -
src/library/blas/gens/blas_subgroup.c | 6 +-
src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl | 347 ++
.../gens/clTemplates/dgemm_gcn_SmallMatrices.cl | 1159 ++++
src/library/blas/gens/clTemplates/dgemm_hawai.cl | 6371 ++++++++++++++++++++
.../clTemplates/dgemm_hawaiiChannelConfilct.cl | 152 +
.../gens/clTemplates/dgemm_hawaiiSplitKernel.cl | 5043 ++++++++++++++++
src/library/blas/gens/clTemplates/dtrsm_gpu.cl | 2004 ++++++
src/library/blas/gens/clTemplates/dtrsm_gpu192.cl | 1031 ++++
src/library/blas/gens/clTemplates/sgemm_gcn.cl | 2083 +++++++
.../gens/clTemplates/sgemm_gcn_SmallMatrices.cl | 1036 ++++
.../gens/clTemplates/sgemm_hawaiiSplit64_32.cl | 530 ++
.../gens/clTemplates/sgemm_hawaiiSplitKernel.cl | 6179 +++++++++++++++++++
src/library/blas/gens/clTemplates/zgemm_gcn.cl | 319 +
src/library/blas/include/clblas-internal.h | 28 +
src/library/blas/init.c | 12 +
src/library/blas/matrix.c | 979 +++
src/library/blas/xgemm.c | 783 ---
src/library/blas/xgemm.cc | 328 +
src/library/blas/xscal.cc | 340 ++
src/library/blas/xtrsm.c | 249 -
src/library/blas/xtrsm.cc | 333 +
src/library/common/devinfo.c | 6 +
src/library/common/md5sum.c | 378 ++
src/library/common/rwlock.c | 172 +
.../tools/{tplgen => bingen}/CMakeLists.txt | 17 +-
src/library/tools/bingen/bingen.cpp | 512 ++
src/library/tools/ktest/CMakeLists.txt | 34 +-
src/library/tools/tplgen/tplgen.cpp | 85 +-
src/library/tools/tune/CMakeLists.txt | 33 +-
src/library/tools/tune/tune.c | 5 +-
src/samples/CMakeLists.txt | 21 +-
src/samples/example_csscal.c | 3 +-
src/scripts/perf/CMakeLists.txt | 6 +-
src/scripts/perf/blasPerformanceTesting.py | 4 +-
src/tests/CMakeLists.txt | 28 +-
src/tests/correctness/test-correctness.cpp | 3 +-
src/tests/performance/test-performance.cpp | 5 +-
134 files changed, 48857 insertions(+), 1266 deletions(-)
create mode 100644 doc/README-BinaryCacheOnDisk.txt
create mode 100644 doc/README-FunctorConcepts.txt
create mode 100644 doc/README-HowToIntroduceFunctors.txt
create mode 100644 doc/README-TransformASolverIntoAFunctor.txt
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/README.txt
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv
create mode 100755 doc/performance/clBLAS_2.6.0/S9150/generate_graphs.sh
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv
create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/README.txt
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv
create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv
create mode 100644 src/flags_public.txt
create mode 100644 src/include/binary_lookup.h
create mode 100644 src/include/md5sum.h
create mode 100644 src/include/rwlock.h
create mode 100644 src/library/bingen.cmake
create mode 100644 src/library/blas/fill.cc
create mode 100644 src/library/blas/functor/bonaire.cc
create mode 100644 src/library/blas/functor/functor.cc
create mode 100644 src/library/blas/functor/functor_fill.cc
create mode 100644 src/library/blas/functor/functor_selector.cc
create mode 100644 src/library/blas/functor/functor_xgemm.cc
create mode 100644 src/library/blas/functor/functor_xscal.cc
create mode 100644 src/library/blas/functor/functor_xscal_generic.cc
create mode 100644 src/library/blas/functor/functor_xtrsm.cc
create mode 100644 src/library/blas/functor/gcn_dgemm.cc
create mode 100644 src/library/blas/functor/gcn_dgemmCommon.cc
create mode 100644 src/library/blas/functor/gcn_dgemmSmallMatrices.cc
create mode 100644 src/library/blas/functor/gcn_sgemm.cc
create mode 100644 src/library/blas/functor/gcn_sgemmSmallMatrices.cc
create mode 100644 src/library/blas/functor/gcn_zgemm.cc
create mode 100644 src/library/blas/functor/gpu_dtrsm.cc
create mode 100644 src/library/blas/functor/gpu_dtrsm192.cc
create mode 100644 src/library/blas/functor/hawaii.cc
create mode 100644 src/library/blas/functor/hawaii_dgemmChannelConflict.cc
create mode 100644 src/library/blas/functor/hawaii_dgemmSplitKernel.cc
create mode 100644 src/library/blas/functor/hawaii_sgemmBranchKernel.cc
create mode 100644 src/library/blas/functor/hawaii_sgemmSplit64_32.cc
create mode 100644 src/library/blas/functor/hawaii_sgemmSplitKernel.cc
create mode 100644 src/library/blas/functor/include/BinaryBuild.h
create mode 100644 src/library/blas/functor/include/atomic_counter.h
create mode 100644 src/library/blas/functor/include/bonaire.h
create mode 100644 src/library/blas/functor/include/functor.h
create mode 100644 src/library/blas/functor/include/functor_fill.h
create mode 100644 src/library/blas/functor/include/functor_hawaii_dgemm_NT_MN48.h
create mode 100644 src/library/blas/functor/include/functor_selector.h
create mode 100644 src/library/blas/functor/include/functor_utils.h
create mode 100644 src/library/blas/functor/include/functor_xgemm.h
create mode 100644 src/library/blas/functor/include/functor_xscal.h
create mode 100644 src/library/blas/functor/include/functor_xscal_generic.h
create mode 100644 src/library/blas/functor/include/functor_xtrsm.h
create mode 100644 src/library/blas/functor/include/gcn_dgemm.h
create mode 100644 src/library/blas/functor/include/gcn_dgemmCommon.h
create mode 100644 src/library/blas/functor/include/gcn_dgemmSmallMatrices.h
create mode 100644 src/library/blas/functor/include/gcn_sgemm.h
create mode 100644 src/library/blas/functor/include/gcn_sgemmSmallMatrices.h
create mode 100644 src/library/blas/functor/include/gcn_zgemm.h
create mode 100644 src/library/blas/functor/include/gpu_dtrsm.h
create mode 100644 src/library/blas/functor/include/gpu_dtrsm192.h
create mode 100644 src/library/blas/functor/include/hawaii.h
create mode 100644 src/library/blas/functor/include/hawaii_dgemmChannelConflict.h
create mode 100644 src/library/blas/functor/include/hawaii_dgemmSplitKernel.h
create mode 100644 src/library/blas/functor/include/hawaii_sgemmBranchKernel.h
create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplit64_32.h
create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplitKernel.h
create mode 100644 src/library/blas/functor/include/tahiti.h
create mode 100644 src/library/blas/functor/tahiti.cc
create mode 100644 src/library/blas/generic/binary_lookup.cc
create mode 100644 src/library/blas/generic/common2.cc
create mode 100644 src/library/blas/generic/functor_cache.cc
create mode 100644 src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl
create mode 100644 src/library/blas/gens/clTemplates/dgemm_gcn_SmallMatrices.cl
create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawai.cl
create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiChannelConfilct.cl
create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiSplitKernel.cl
create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu.cl
create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu192.cl
create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn.cl
create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_SmallMatrices.cl
create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplit64_32.cl
create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplitKernel.cl
create mode 100644 src/library/blas/gens/clTemplates/zgemm_gcn.cl
create mode 100644 src/library/blas/matrix.c
delete mode 100644 src/library/blas/xgemm.c
create mode 100644 src/library/blas/xgemm.cc
create mode 100644 src/library/blas/xscal.cc
delete mode 100644 src/library/blas/xtrsm.c
create mode 100644 src/library/blas/xtrsm.cc
create mode 100644 src/library/common/md5sum.c
create mode 100644 src/library/common/rwlock.c
copy src/library/tools/{tplgen => bingen}/CMakeLists.txt (61%)
create mode 100644 src/library/tools/bingen/bingen.cpp
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git
More information about the debian-science-commits
mailing list