[clblas] branch master updated (a6b3f9d -> 9731ea2)

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Fri Jul 24 22:49:42 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a change to branch master
in repository clblas.

      from  a6b3f9d   Merge pull request #98 from clMathLibraries/revert-97-master
       new  434b38e   enable offline compilation of a subset of GEMM and TRSM on targeted device
       new  2dce4f5   minor bug fix
       new  595c63b   fix bug for small matrix when beta is 0
       new  5c3f082   minor bug fix in client code
       new  8dc95f9   Merge pull request #81 from TimmyLiu/develop
       new  d00b59a   do not build bingen if offline compilation is disabled
       new  38b342a   Merge pull request #82 from TimmyLiu/develop
       new  1795886   correctness fix
       new  eff87f9   fix travis CI build
       new  0a6d431   Merge pull request #85 from TimmyLiu/develop
       new  a55d3ae   Merge branch 'develop' of https://github.com/clMathLibraries/clBLAS into develop
       new  fda48a7   replacing barrier with memfence in the inner most loop requires an extra barrier at the beginning of the outer loop.
       new  39b324d   improve big sgemm column NN perf. improve small sgemm NN perf.
       new  f9e0160   Merge pull request #87 from TimmyLiu/develop
       new  413819f   bump develop version to 2.5
       new  fdcf987   Merge pull request #88 from TimmyLiu/develop
       new  8ef0a43   some static kernel code clean up
       new  a280c96   improve sgemm column major TN small matrix perf. some type/bug fixes
       new  5137231   Merge pull request #90 from TimmyLiu/develop
       new  93b5b69   fix a very silly bug in compuing s/dtrsm flops.
       new  8b41d5e   Merge pull request #91 from TimmyLiu/develop
       new  c084b47   Ben : fixing bonaire path for sgemm using CL2.0 path
       new  2ad3664   fixing a typo
       new  aa972ec   chanching the heuristic to detect the small matrices
       new  d4163f4   Merge pull request #93 from BenjaminCoquelle/develop
       new  7302f86   some typo fixes
       new  573b487   Merge pull request #95 from TimmyLiu/develop
       new  1972170   Fix install location of samples
       new  9edf929   Merge pull request #75 from marbre/samples
       new  d8419d8   Install scripts/perf to share/clBLAS on non WIN32 systems
       new  f8af95c   Merge pull request #74 from marbre/develop
       new  2f845e2   fix cmake bug introduced by pull request #75
       new  17b22e8   Merge pull request #96 from TimmyLiu/develop
       new  46389ac   added test for OSX detection to turn off CORR_TEST_WITH_ACML, refactored CMakeLists.txt in BUILD_TEST block
       new  f5d5adc   Merge pull request #99 from lzamparo/cmake_fix
       new  6d1e3c4   stop checking opencl major number in some routines
       new  f4af838   better handle sgemm NT where M and N are mod32 and not mod64. M and N are within range from 1184 to 3872
       new  4447bfe   Merge pull request #100 from TimmyLiu/develop
       new  701210c   fix undefined reference to symbol 'pthread_key_delete@@GLIBC_2.2.5'
       new  1136350   Merge pull request #102 from lunochod/develop
       new  60092c2   delete appendix in license file
       new  2621814   Merge pull request #106 from TimmyLiu/develop
       new  b83750a   Install cmake configuration to lib/cmake/clBLAS
       new  77b3245   Merge pull request #105 from marbre/develop
       new  6623809   adding zgemm kernel for hawaii
       new  8580cdb   fixed including gcn_zgemm.h
       new  6f476b8   Merge pull request #107 from guacamoleo/develop
       new  bd13b7b   enables apiCallCount for zgemm within client
       new  03ae187   fixed zgemm offset bug; removed profiling from client
       new  f9a2250   Merge pull request #111 from guacamoleo/develop
       new  f7c6536   add codepath for dtrsm when M and N are mod192
       new  828aff1   Merge pull request #112 from TimmyLiu/develop
       new  262a1e1   add x86_64/sdk suffix as search location for libOpenCL.so when AMDAPPSDKROOT is used
       new  2137cae   Merge pull request #113 from lunochod/develop
       new  5b922a7   python scripts should call clBLAS-client instead of client
       new  f3471bf   Merge pull request #116 from TimmyLiu/develop
       new  6311c6b   adding performance data
       new  e058f67   fixed graph script
       new  5005205   Merge pull request #118 from guacamoleo/develop
       new  3f032e7   merge develop branch to master branch. Bump master branch version number to 2.6
       new  9731ea2   Merge pull request #119 from TimmyLiu/master

The 61 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |    3 +
 .travis.yml                                        |   22 +-
 LICENSE                                            |   25 -
 README.md                                          |   65 +-
 doc/README-BinaryCacheOnDisk.txt                   |   69 +
 doc/README-FunctorConcepts.txt                     |  100 +
 doc/README-HowToIntroduceFunctors.txt              |  402 ++
 doc/README-TransformASolverIntoAFunctor.txt        |  382 ++
 doc/performance/clBLAS_2.6.0/S9150/README.txt      |   35 +
 doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv    |   61 +
 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv   |   31 +
 .../clBLAS_2.6.0/S9150/generate_graphs.sh          |   92 +
 doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv     |  181 +
 doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv     |  181 +
 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv    |   91 +
 doc/performance/cuBLAS_7.0/Tesla_K40/README.txt    |   35 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv     |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv     |   31 +
 doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv   |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv   |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv     |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv     |  181 +
 src/CMakeLists.txt                                 |   83 +-
 src/FindOpenCL.cmake                               |    3 +-
 src/clBLAS.def                                     |   28 +
 src/clBLAS.h                                       |  622 ++
 src/client/clfunc_common.hpp                       |    1 +
 src/client/clfunc_xgemm.hpp                        |   53 +-
 src/client/clfunc_xtrsm.hpp                        |   14 +-
 src/client/client.cpp                              |   21 +-
 src/flags_public.txt                               |    4 +
 src/include/binary_lookup.h                        |  273 +
 src/include/devinfo.h                              |    2 +
 src/include/md5sum.h                               |   50 +
 src/include/rwlock.h                               |  117 +
 src/library/CMakeLists.txt                         |  282 +-
 src/library/bingen.cmake                           |  144 +
 src/library/blas/fill.cc                           |  272 +
 src/library/blas/functor/bonaire.cc                |   90 +
 src/library/blas/functor/functor.cc                |  117 +
 src/library/blas/functor/functor_fill.cc           |  156 +
 src/library/blas/functor/functor_selector.cc       |  344 ++
 src/library/blas/functor/functor_xgemm.cc          |  323 +
 src/library/blas/functor/functor_xscal.cc          |  410 ++
 src/library/blas/functor/functor_xscal_generic.cc  |  439 ++
 src/library/blas/functor/functor_xtrsm.cc          |  336 ++
 src/library/blas/functor/gcn_dgemm.cc              | 1035 ++++
 src/library/blas/functor/gcn_dgemmCommon.cc        |  997 +++
 src/library/blas/functor/gcn_dgemmSmallMatrices.cc |  654 ++
 src/library/blas/functor/gcn_sgemm.cc              |  556 ++
 src/library/blas/functor/gcn_sgemmSmallMatrices.cc |  558 ++
 src/library/blas/functor/gcn_zgemm.cc              |  354 ++
 src/library/blas/functor/gpu_dtrsm.cc              |  823 +++
 src/library/blas/functor/gpu_dtrsm192.cc           |  596 ++
 src/library/blas/functor/hawaii.cc                 |  223 +
 .../blas/functor/hawaii_dgemmChannelConflict.cc    |  159 +
 .../blas/functor/hawaii_dgemmSplitKernel.cc        |  670 ++
 .../blas/functor/hawaii_sgemmBranchKernel.cc       |  442 ++
 src/library/blas/functor/hawaii_sgemmSplit64_32.cc |  423 ++
 .../blas/functor/hawaii_sgemmSplitKernel.cc        |  858 +++
 src/library/blas/functor/include/BinaryBuild.h     |   10 +
 src/library/blas/functor/include/atomic_counter.h  |  173 +
 src/library/blas/functor/include/bonaire.h         |   41 +
 src/library/blas/functor/include/functor.h         |  496 ++
 src/library/blas/functor/include/functor_fill.h    |   99 +
 .../functor/include/functor_hawaii_dgemm_NT_MN48.h |  210 +
 .../blas/functor/include/functor_selector.h        |  149 +
 src/library/blas/functor/include/functor_utils.h   |  116 +
 src/library/blas/functor/include/functor_xgemm.h   |  213 +
 src/library/blas/functor/include/functor_xscal.h   |  207 +
 .../blas/functor/include/functor_xscal_generic.h   |  173 +
 src/library/blas/functor/include/functor_xtrsm.h   |  203 +
 src/library/blas/functor/include/gcn_dgemm.h       |   59 +
 src/library/blas/functor/include/gcn_dgemmCommon.h |   22 +
 .../blas/functor/include/gcn_dgemmSmallMatrices.h  |   27 +
 src/library/blas/functor/include/gcn_sgemm.h       |   62 +
 .../blas/functor/include/gcn_sgemmSmallMatrices.h  |   27 +
 src/library/blas/functor/include/gcn_zgemm.h       |   62 +
 src/library/blas/functor/include/gpu_dtrsm.h       |   28 +
 src/library/blas/functor/include/gpu_dtrsm192.h    |   28 +
 src/library/blas/functor/include/hawaii.h          |   42 +
 .../functor/include/hawaii_dgemmChannelConflict.h  |   22 +
 .../blas/functor/include/hawaii_dgemmSplitKernel.h |   46 +
 .../functor/include/hawaii_sgemmBranchKernel.h     |   50 +
 .../blas/functor/include/hawaii_sgemmSplit64_32.h  |   46 +
 .../blas/functor/include/hawaii_sgemmSplitKernel.h |   46 +
 src/library/blas/functor/include/tahiti.h          |   41 +
 src/library/blas/functor/tahiti.cc                 |  120 +
 src/library/blas/generic/binary_lookup.cc          |  685 +++
 src/library/blas/generic/common.c                  |   25 +-
 src/library/blas/generic/common2.cc                |   98 +
 src/library/blas/generic/functor_cache.cc          |   80 +
 src/library/blas/generic/solution_seq_make.c       |    4 +-
 src/library/blas/gens/blas_kgen.h                  |    3 -
 src/library/blas/gens/blas_subgroup.c              |    6 +-
 src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl |  347 ++
 .../gens/clTemplates/dgemm_gcn_SmallMatrices.cl    | 1159 ++++
 src/library/blas/gens/clTemplates/dgemm_hawai.cl   | 6371 ++++++++++++++++++++
 .../clTemplates/dgemm_hawaiiChannelConfilct.cl     |  152 +
 .../gens/clTemplates/dgemm_hawaiiSplitKernel.cl    | 5043 ++++++++++++++++
 src/library/blas/gens/clTemplates/dtrsm_gpu.cl     | 2004 ++++++
 src/library/blas/gens/clTemplates/dtrsm_gpu192.cl  | 1031 ++++
 src/library/blas/gens/clTemplates/sgemm_gcn.cl     | 2083 +++++++
 .../gens/clTemplates/sgemm_gcn_SmallMatrices.cl    | 1036 ++++
 .../gens/clTemplates/sgemm_hawaiiSplit64_32.cl     |  530 ++
 .../gens/clTemplates/sgemm_hawaiiSplitKernel.cl    | 6179 +++++++++++++++++++
 src/library/blas/gens/clTemplates/zgemm_gcn.cl     |  319 +
 src/library/blas/include/clblas-internal.h         |   28 +
 src/library/blas/init.c                            |   12 +
 src/library/blas/matrix.c                          |  979 +++
 src/library/blas/xgemm.c                           |  783 ---
 src/library/blas/xgemm.cc                          |  328 +
 src/library/blas/xscal.cc                          |  340 ++
 src/library/blas/xtrsm.c                           |  249 -
 src/library/blas/xtrsm.cc                          |  333 +
 src/library/common/devinfo.c                       |    6 +
 src/library/common/md5sum.c                        |  378 ++
 src/library/common/rwlock.c                        |  172 +
 .../tools/{tplgen => bingen}/CMakeLists.txt        |   17 +-
 src/library/tools/bingen/bingen.cpp                |  512 ++
 src/library/tools/ktest/CMakeLists.txt             |   34 +-
 src/library/tools/tplgen/tplgen.cpp                |   85 +-
 src/library/tools/tune/CMakeLists.txt              |   33 +-
 src/library/tools/tune/tune.c                      |    5 +-
 src/samples/CMakeLists.txt                         |   21 +-
 src/samples/example_csscal.c                       |    3 +-
 src/scripts/perf/CMakeLists.txt                    |    6 +-
 src/scripts/perf/blasPerformanceTesting.py         |    4 +-
 src/tests/CMakeLists.txt                           |   28 +-
 src/tests/correctness/test-correctness.cpp         |    3 +-
 src/tests/performance/test-performance.cpp         |    5 +-
 134 files changed, 48857 insertions(+), 1266 deletions(-)
 create mode 100644 doc/README-BinaryCacheOnDisk.txt
 create mode 100644 doc/README-FunctorConcepts.txt
 create mode 100644 doc/README-HowToIntroduceFunctors.txt
 create mode 100644 doc/README-TransformASolverIntoAFunctor.txt
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/README.txt
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv
 create mode 100755 doc/performance/clBLAS_2.6.0/S9150/generate_graphs.sh
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/README.txt
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv
 create mode 100644 src/flags_public.txt
 create mode 100644 src/include/binary_lookup.h
 create mode 100644 src/include/md5sum.h
 create mode 100644 src/include/rwlock.h
 create mode 100644 src/library/bingen.cmake
 create mode 100644 src/library/blas/fill.cc
 create mode 100644 src/library/blas/functor/bonaire.cc
 create mode 100644 src/library/blas/functor/functor.cc
 create mode 100644 src/library/blas/functor/functor_fill.cc
 create mode 100644 src/library/blas/functor/functor_selector.cc
 create mode 100644 src/library/blas/functor/functor_xgemm.cc
 create mode 100644 src/library/blas/functor/functor_xscal.cc
 create mode 100644 src/library/blas/functor/functor_xscal_generic.cc
 create mode 100644 src/library/blas/functor/functor_xtrsm.cc
 create mode 100644 src/library/blas/functor/gcn_dgemm.cc
 create mode 100644 src/library/blas/functor/gcn_dgemmCommon.cc
 create mode 100644 src/library/blas/functor/gcn_dgemmSmallMatrices.cc
 create mode 100644 src/library/blas/functor/gcn_sgemm.cc
 create mode 100644 src/library/blas/functor/gcn_sgemmSmallMatrices.cc
 create mode 100644 src/library/blas/functor/gcn_zgemm.cc
 create mode 100644 src/library/blas/functor/gpu_dtrsm.cc
 create mode 100644 src/library/blas/functor/gpu_dtrsm192.cc
 create mode 100644 src/library/blas/functor/hawaii.cc
 create mode 100644 src/library/blas/functor/hawaii_dgemmChannelConflict.cc
 create mode 100644 src/library/blas/functor/hawaii_dgemmSplitKernel.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmBranchKernel.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmSplit64_32.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmSplitKernel.cc
 create mode 100644 src/library/blas/functor/include/BinaryBuild.h
 create mode 100644 src/library/blas/functor/include/atomic_counter.h
 create mode 100644 src/library/blas/functor/include/bonaire.h
 create mode 100644 src/library/blas/functor/include/functor.h
 create mode 100644 src/library/blas/functor/include/functor_fill.h
 create mode 100644 src/library/blas/functor/include/functor_hawaii_dgemm_NT_MN48.h
 create mode 100644 src/library/blas/functor/include/functor_selector.h
 create mode 100644 src/library/blas/functor/include/functor_utils.h
 create mode 100644 src/library/blas/functor/include/functor_xgemm.h
 create mode 100644 src/library/blas/functor/include/functor_xscal.h
 create mode 100644 src/library/blas/functor/include/functor_xscal_generic.h
 create mode 100644 src/library/blas/functor/include/functor_xtrsm.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemm.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemmCommon.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemmSmallMatrices.h
 create mode 100644 src/library/blas/functor/include/gcn_sgemm.h
 create mode 100644 src/library/blas/functor/include/gcn_sgemmSmallMatrices.h
 create mode 100644 src/library/blas/functor/include/gcn_zgemm.h
 create mode 100644 src/library/blas/functor/include/gpu_dtrsm.h
 create mode 100644 src/library/blas/functor/include/gpu_dtrsm192.h
 create mode 100644 src/library/blas/functor/include/hawaii.h
 create mode 100644 src/library/blas/functor/include/hawaii_dgemmChannelConflict.h
 create mode 100644 src/library/blas/functor/include/hawaii_dgemmSplitKernel.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmBranchKernel.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplit64_32.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplitKernel.h
 create mode 100644 src/library/blas/functor/include/tahiti.h
 create mode 100644 src/library/blas/functor/tahiti.cc
 create mode 100644 src/library/blas/generic/binary_lookup.cc
 create mode 100644 src/library/blas/generic/common2.cc
 create mode 100644 src/library/blas/generic/functor_cache.cc
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_gcn_SmallMatrices.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawai.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiChannelConfilct.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiSplitKernel.cl
 create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu.cl
 create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu192.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_SmallMatrices.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplit64_32.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplitKernel.cl
 create mode 100644 src/library/blas/gens/clTemplates/zgemm_gcn.cl
 create mode 100644 src/library/blas/matrix.c
 delete mode 100644 src/library/blas/xgemm.c
 create mode 100644 src/library/blas/xgemm.cc
 create mode 100644 src/library/blas/xscal.cc
 delete mode 100644 src/library/blas/xtrsm.c
 create mode 100644 src/library/blas/xtrsm.cc
 create mode 100644 src/library/common/md5sum.c
 create mode 100644 src/library/common/rwlock.c
 copy src/library/tools/{tplgen => bingen}/CMakeLists.txt (61%)
 create mode 100644 src/library/tools/bingen/bingen.cpp

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git



More information about the debian-science-commits mailing list