[clblas] branch debian/sid updated (c531d41 -> 516a02e)

Ghislain Vaillant ghisvail-guest at moszumanska.debian.org
Tue Aug 4 15:35:42 UTC 2015


This is an automated email from the git hooks/post-receive script.

ghisvail-guest pushed a change to branch debian/sid
in repository clblas.

      from  c531d41   d/changelog: release to unstable
      adds  434b38e   enable offline compilation of a subset of GEMM and TRSM on targeted device
      adds  2dce4f5   minor bug fix
      adds  595c63b   fix bug for small matrix when beta is 0
      adds  5c3f082   minor bug fix in client code
      adds  8dc95f9   Merge pull request #81 from TimmyLiu/develop
      adds  d00b59a   do not build bingen if offline compilation is disabled
      adds  38b342a   Merge pull request #82 from TimmyLiu/develop
      adds  1795886   correctness fix
      adds  eff87f9   fix travis CI build
      adds  0a6d431   Merge pull request #85 from TimmyLiu/develop
      adds  a55d3ae   Merge branch 'develop' of https://github.com/clMathLibraries/clBLAS into develop
      adds  fda48a7   replacing barrier with memfence in the inner most loop requires an extra barrier at the beginning of the outer loop.
      adds  39b324d   improve big sgemm column NN perf. improve small sgemm NN perf.
      adds  f9e0160   Merge pull request #87 from TimmyLiu/develop
      adds  413819f   bump develop version to 2.5
      adds  fdcf987   Merge pull request #88 from TimmyLiu/develop
      adds  8ef0a43   some static kernel code clean up
      adds  a280c96   improve sgemm column major TN small matrix perf. some type/bug fixes
      adds  5137231   Merge pull request #90 from TimmyLiu/develop
      adds  93b5b69   fix a very silly bug in compuing s/dtrsm flops.
      adds  8b41d5e   Merge pull request #91 from TimmyLiu/develop
      adds  c084b47   Ben : fixing bonaire path for sgemm using CL2.0 path
      adds  2ad3664   fixing a typo
      adds  aa972ec   chanching the heuristic to detect the small matrices
      adds  d4163f4   Merge pull request #93 from BenjaminCoquelle/develop
      adds  7302f86   some typo fixes
      adds  573b487   Merge pull request #95 from TimmyLiu/develop
      adds  1972170   Fix install location of samples
      adds  9edf929   Merge pull request #75 from marbre/samples
      adds  d8419d8   Install scripts/perf to share/clBLAS on non WIN32 systems
      adds  f8af95c   Merge pull request #74 from marbre/develop
      adds  2f845e2   fix cmake bug introduced by pull request #75
      adds  17b22e8   Merge pull request #96 from TimmyLiu/develop
      adds  46389ac   added test for OSX detection to turn off CORR_TEST_WITH_ACML, refactored CMakeLists.txt in BUILD_TEST block
      adds  f5d5adc   Merge pull request #99 from lzamparo/cmake_fix
      adds  6d1e3c4   stop checking opencl major number in some routines
      adds  f4af838   better handle sgemm NT where M and N are mod32 and not mod64. M and N are within range from 1184 to 3872
      adds  4447bfe   Merge pull request #100 from TimmyLiu/develop
      adds  701210c   fix undefined reference to symbol 'pthread_key_delete@@GLIBC_2.2.5'
      adds  1136350   Merge pull request #102 from lunochod/develop
      adds  60092c2   delete appendix in license file
      adds  2621814   Merge pull request #106 from TimmyLiu/develop
      adds  b83750a   Install cmake configuration to lib/cmake/clBLAS
      adds  77b3245   Merge pull request #105 from marbre/develop
      adds  6623809   adding zgemm kernel for hawaii
      adds  8580cdb   fixed including gcn_zgemm.h
      adds  6f476b8   Merge pull request #107 from guacamoleo/develop
      adds  bd13b7b   enables apiCallCount for zgemm within client
      adds  03ae187   fixed zgemm offset bug; removed profiling from client
      adds  f9a2250   Merge pull request #111 from guacamoleo/develop
      adds  f7c6536   add codepath for dtrsm when M and N are mod192
      adds  828aff1   Merge pull request #112 from TimmyLiu/develop
      adds  262a1e1   add x86_64/sdk suffix as search location for libOpenCL.so when AMDAPPSDKROOT is used
      adds  2137cae   Merge pull request #113 from lunochod/develop
      adds  5b922a7   python scripts should call clBLAS-client instead of client
      adds  f3471bf   Merge pull request #116 from TimmyLiu/develop
      adds  6311c6b   adding performance data
      adds  e058f67   fixed graph script
      adds  5005205   Merge pull request #118 from guacamoleo/develop
      adds  3f032e7   merge develop branch to master branch. Bump master branch version number to 2.6
      adds  9731ea2   Merge pull request #119 from TimmyLiu/master
       new  8be809d   Merge tag 'upstream/v2.6' into debian/experimental
       new  4a6859c   d/changelog: bump dversion, switch to unreleased
       new  9be369f   d/p: refresh patches
       new  b42faf3   d/rules: update cmake build options
       new  a71e799   d/p: add patch fixing missing pthread linkage
       new  bcc1339   d/p: break doxygen patch down to more specific patches
       new  516a02e   d/changelog: release to unstable

The 7 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |    3 +
 .travis.yml                                        |   22 +-
 LICENSE                                            |   25 -
 README.md                                          |   65 +-
 debian/changelog                                   |   11 +
 debian/patches/debian-enable-multiarch.patch       |    6 +-
 debian/patches/disable-multilib-cflags.patch       |    6 +-
 debian/patches/fix-docs-output.patch               |   18 +
 debian/patches/fix-doxygen-settings.patch          |   34 -
 debian/patches/fix-pthread-linkage.patch           |   18 +
 debian/patches/reproducible-build.patch            |   17 +
 debian/patches/series                              |    5 +-
 debian/patches/use-system-mathjax.patch            |   16 +
 debian/rules                                       |    3 +-
 doc/README-BinaryCacheOnDisk.txt                   |   69 +
 doc/README-FunctorConcepts.txt                     |  100 +
 doc/README-HowToIntroduceFunctors.txt              |  402 ++
 doc/README-TransformASolverIntoAFunctor.txt        |  382 ++
 doc/performance/clBLAS_2.6.0/S9150/README.txt      |   35 +
 doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv    |   61 +
 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv   |   31 +
 .../clBLAS_2.6.0/S9150/generate_graphs.sh          |   92 +
 doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv     |  181 +
 doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv     |  181 +
 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv    |  181 +
 doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv    |   91 +
 doc/performance/cuBLAS_7.0/Tesla_K40/README.txt    |   35 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv     |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv     |   31 +
 doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv   |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv   |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv     |  181 +
 doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv     |  181 +
 src/CMakeLists.txt                                 |   83 +-
 src/FindOpenCL.cmake                               |    3 +-
 src/clBLAS.def                                     |   28 +
 src/clBLAS.h                                       |  622 ++
 src/client/clfunc_common.hpp                       |    1 +
 src/client/clfunc_xgemm.hpp                        |   53 +-
 src/client/clfunc_xtrsm.hpp                        |   14 +-
 src/client/client.cpp                              |   21 +-
 src/flags_public.txt                               |    4 +
 src/include/binary_lookup.h                        |  273 +
 src/include/devinfo.h                              |    2 +
 src/include/md5sum.h                               |   50 +
 src/include/rwlock.h                               |  117 +
 src/library/CMakeLists.txt                         |  282 +-
 src/library/bingen.cmake                           |  144 +
 src/library/blas/fill.cc                           |  272 +
 src/library/blas/functor/bonaire.cc                |   90 +
 src/library/blas/functor/functor.cc                |  117 +
 src/library/blas/functor/functor_fill.cc           |  156 +
 src/library/blas/functor/functor_selector.cc       |  344 ++
 src/library/blas/functor/functor_xgemm.cc          |  323 +
 src/library/blas/functor/functor_xscal.cc          |  410 ++
 src/library/blas/functor/functor_xscal_generic.cc  |  439 ++
 src/library/blas/functor/functor_xtrsm.cc          |  336 ++
 src/library/blas/functor/gcn_dgemm.cc              | 1035 ++++
 src/library/blas/functor/gcn_dgemmCommon.cc        |  997 +++
 src/library/blas/functor/gcn_dgemmSmallMatrices.cc |  654 ++
 src/library/blas/functor/gcn_sgemm.cc              |  556 ++
 src/library/blas/functor/gcn_sgemmSmallMatrices.cc |  558 ++
 src/library/blas/functor/gcn_zgemm.cc              |  354 ++
 src/library/blas/functor/gpu_dtrsm.cc              |  823 +++
 src/library/blas/functor/gpu_dtrsm192.cc           |  596 ++
 src/library/blas/functor/hawaii.cc                 |  223 +
 .../blas/functor/hawaii_dgemmChannelConflict.cc    |  159 +
 .../blas/functor/hawaii_dgemmSplitKernel.cc        |  670 ++
 .../blas/functor/hawaii_sgemmBranchKernel.cc       |  442 ++
 src/library/blas/functor/hawaii_sgemmSplit64_32.cc |  423 ++
 .../blas/functor/hawaii_sgemmSplitKernel.cc        |  858 +++
 src/library/blas/functor/include/BinaryBuild.h     |   10 +
 src/library/blas/functor/include/atomic_counter.h  |  173 +
 src/library/blas/functor/include/bonaire.h         |   41 +
 src/library/blas/functor/include/functor.h         |  496 ++
 src/library/blas/functor/include/functor_fill.h    |   99 +
 .../functor/include/functor_hawaii_dgemm_NT_MN48.h |  210 +
 .../blas/functor/include/functor_selector.h        |  149 +
 src/library/blas/functor/include/functor_utils.h   |  116 +
 src/library/blas/functor/include/functor_xgemm.h   |  213 +
 src/library/blas/functor/include/functor_xscal.h   |  207 +
 .../blas/functor/include/functor_xscal_generic.h   |  173 +
 src/library/blas/functor/include/functor_xtrsm.h   |  203 +
 src/library/blas/functor/include/gcn_dgemm.h       |   59 +
 src/library/blas/functor/include/gcn_dgemmCommon.h |   22 +
 .../blas/functor/include/gcn_dgemmSmallMatrices.h  |   27 +
 src/library/blas/functor/include/gcn_sgemm.h       |   62 +
 .../blas/functor/include/gcn_sgemmSmallMatrices.h  |   27 +
 src/library/blas/functor/include/gcn_zgemm.h       |   62 +
 src/library/blas/functor/include/gpu_dtrsm.h       |   28 +
 src/library/blas/functor/include/gpu_dtrsm192.h    |   28 +
 src/library/blas/functor/include/hawaii.h          |   42 +
 .../functor/include/hawaii_dgemmChannelConflict.h  |   22 +
 .../blas/functor/include/hawaii_dgemmSplitKernel.h |   46 +
 .../functor/include/hawaii_sgemmBranchKernel.h     |   50 +
 .../blas/functor/include/hawaii_sgemmSplit64_32.h  |   46 +
 .../blas/functor/include/hawaii_sgemmSplitKernel.h |   46 +
 src/library/blas/functor/include/tahiti.h          |   41 +
 src/library/blas/functor/tahiti.cc                 |  120 +
 src/library/blas/generic/binary_lookup.cc          |  685 +++
 src/library/blas/generic/common.c                  |   25 +-
 src/library/blas/generic/common2.cc                |   98 +
 src/library/blas/generic/functor_cache.cc          |   80 +
 src/library/blas/generic/solution_seq_make.c       |    4 +-
 src/library/blas/gens/blas_kgen.h                  |    3 -
 src/library/blas/gens/blas_subgroup.c              |    6 +-
 src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl |  347 ++
 .../gens/clTemplates/dgemm_gcn_SmallMatrices.cl    | 1159 ++++
 src/library/blas/gens/clTemplates/dgemm_hawai.cl   | 6371 ++++++++++++++++++++
 .../clTemplates/dgemm_hawaiiChannelConfilct.cl     |  152 +
 .../gens/clTemplates/dgemm_hawaiiSplitKernel.cl    | 5043 ++++++++++++++++
 src/library/blas/gens/clTemplates/dtrsm_gpu.cl     | 2004 ++++++
 src/library/blas/gens/clTemplates/dtrsm_gpu192.cl  | 1031 ++++
 src/library/blas/gens/clTemplates/sgemm_gcn.cl     | 2083 +++++++
 .../gens/clTemplates/sgemm_gcn_SmallMatrices.cl    | 1036 ++++
 .../gens/clTemplates/sgemm_hawaiiSplit64_32.cl     |  530 ++
 .../gens/clTemplates/sgemm_hawaiiSplitKernel.cl    | 6179 +++++++++++++++++++
 src/library/blas/gens/clTemplates/zgemm_gcn.cl     |  319 +
 src/library/blas/include/clblas-internal.h         |   28 +
 src/library/blas/init.c                            |   12 +
 src/library/blas/matrix.c                          |  979 +++
 src/library/blas/xgemm.c                           |  783 ---
 src/library/blas/xgemm.cc                          |  328 +
 src/library/blas/xscal.cc                          |  340 ++
 src/library/blas/xtrsm.c                           |  249 -
 src/library/blas/xtrsm.cc                          |  333 +
 src/library/common/devinfo.c                       |    6 +
 src/library/common/md5sum.c                        |  378 ++
 src/library/common/rwlock.c                        |  172 +
 .../tools/{tplgen => bingen}/CMakeLists.txt        |   17 +-
 src/library/tools/bingen/bingen.cpp                |  512 ++
 src/library/tools/ktest/CMakeLists.txt             |   34 +-
 src/library/tools/tplgen/tplgen.cpp                |   85 +-
 src/library/tools/tune/CMakeLists.txt              |   33 +-
 src/library/tools/tune/tune.c                      |    5 +-
 src/samples/CMakeLists.txt                         |   21 +-
 src/samples/example_csscal.c                       |    3 +-
 src/scripts/perf/CMakeLists.txt                    |    6 +-
 src/scripts/perf/blasPerformanceTesting.py         |    4 +-
 src/tests/CMakeLists.txt                           |   28 +-
 src/tests/correctness/test-correctness.cpp         |    3 +-
 src/tests/performance/test-performance.cpp         |    5 +-
 144 files changed, 48949 insertions(+), 1308 deletions(-)
 create mode 100644 debian/patches/fix-docs-output.patch
 delete mode 100644 debian/patches/fix-doxygen-settings.patch
 create mode 100644 debian/patches/fix-pthread-linkage.patch
 create mode 100644 debian/patches/reproducible-build.patch
 create mode 100644 debian/patches/use-system-mathjax.patch
 create mode 100644 doc/README-BinaryCacheOnDisk.txt
 create mode 100644 doc/README-FunctorConcepts.txt
 create mode 100644 doc/README-HowToIntroduceFunctors.txt
 create mode 100644 doc/README-TransformASolverIntoAFunctor.txt
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/README.txt
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dgemm_96.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/dtrsm_192.csv
 create mode 100755 doc/performance/clBLAS_2.6.0/S9150/generate_graphs.sh
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_dp.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/peak_sp.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/sgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_32.csv
 create mode 100644 doc/performance/clBLAS_2.6.0/S9150/zgemm_64.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/README.txt
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dgemm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/dtrsm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_dp.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/peak_sp.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/sgemm.csv
 create mode 100644 doc/performance/cuBLAS_7.0/Tesla_K40/zgemm.csv
 create mode 100644 src/flags_public.txt
 create mode 100644 src/include/binary_lookup.h
 create mode 100644 src/include/md5sum.h
 create mode 100644 src/include/rwlock.h
 create mode 100644 src/library/bingen.cmake
 create mode 100644 src/library/blas/fill.cc
 create mode 100644 src/library/blas/functor/bonaire.cc
 create mode 100644 src/library/blas/functor/functor.cc
 create mode 100644 src/library/blas/functor/functor_fill.cc
 create mode 100644 src/library/blas/functor/functor_selector.cc
 create mode 100644 src/library/blas/functor/functor_xgemm.cc
 create mode 100644 src/library/blas/functor/functor_xscal.cc
 create mode 100644 src/library/blas/functor/functor_xscal_generic.cc
 create mode 100644 src/library/blas/functor/functor_xtrsm.cc
 create mode 100644 src/library/blas/functor/gcn_dgemm.cc
 create mode 100644 src/library/blas/functor/gcn_dgemmCommon.cc
 create mode 100644 src/library/blas/functor/gcn_dgemmSmallMatrices.cc
 create mode 100644 src/library/blas/functor/gcn_sgemm.cc
 create mode 100644 src/library/blas/functor/gcn_sgemmSmallMatrices.cc
 create mode 100644 src/library/blas/functor/gcn_zgemm.cc
 create mode 100644 src/library/blas/functor/gpu_dtrsm.cc
 create mode 100644 src/library/blas/functor/gpu_dtrsm192.cc
 create mode 100644 src/library/blas/functor/hawaii.cc
 create mode 100644 src/library/blas/functor/hawaii_dgemmChannelConflict.cc
 create mode 100644 src/library/blas/functor/hawaii_dgemmSplitKernel.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmBranchKernel.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmSplit64_32.cc
 create mode 100644 src/library/blas/functor/hawaii_sgemmSplitKernel.cc
 create mode 100644 src/library/blas/functor/include/BinaryBuild.h
 create mode 100644 src/library/blas/functor/include/atomic_counter.h
 create mode 100644 src/library/blas/functor/include/bonaire.h
 create mode 100644 src/library/blas/functor/include/functor.h
 create mode 100644 src/library/blas/functor/include/functor_fill.h
 create mode 100644 src/library/blas/functor/include/functor_hawaii_dgemm_NT_MN48.h
 create mode 100644 src/library/blas/functor/include/functor_selector.h
 create mode 100644 src/library/blas/functor/include/functor_utils.h
 create mode 100644 src/library/blas/functor/include/functor_xgemm.h
 create mode 100644 src/library/blas/functor/include/functor_xscal.h
 create mode 100644 src/library/blas/functor/include/functor_xscal_generic.h
 create mode 100644 src/library/blas/functor/include/functor_xtrsm.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemm.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemmCommon.h
 create mode 100644 src/library/blas/functor/include/gcn_dgemmSmallMatrices.h
 create mode 100644 src/library/blas/functor/include/gcn_sgemm.h
 create mode 100644 src/library/blas/functor/include/gcn_sgemmSmallMatrices.h
 create mode 100644 src/library/blas/functor/include/gcn_zgemm.h
 create mode 100644 src/library/blas/functor/include/gpu_dtrsm.h
 create mode 100644 src/library/blas/functor/include/gpu_dtrsm192.h
 create mode 100644 src/library/blas/functor/include/hawaii.h
 create mode 100644 src/library/blas/functor/include/hawaii_dgemmChannelConflict.h
 create mode 100644 src/library/blas/functor/include/hawaii_dgemmSplitKernel.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmBranchKernel.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplit64_32.h
 create mode 100644 src/library/blas/functor/include/hawaii_sgemmSplitKernel.h
 create mode 100644 src/library/blas/functor/include/tahiti.h
 create mode 100644 src/library/blas/functor/tahiti.cc
 create mode 100644 src/library/blas/generic/binary_lookup.cc
 create mode 100644 src/library/blas/generic/common2.cc
 create mode 100644 src/library/blas/generic/functor_cache.cc
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_NT_MN48.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_gcn_SmallMatrices.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawai.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiChannelConfilct.cl
 create mode 100644 src/library/blas/gens/clTemplates/dgemm_hawaiiSplitKernel.cl
 create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu.cl
 create mode 100644 src/library/blas/gens/clTemplates/dtrsm_gpu192.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_gcn_SmallMatrices.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplit64_32.cl
 create mode 100644 src/library/blas/gens/clTemplates/sgemm_hawaiiSplitKernel.cl
 create mode 100644 src/library/blas/gens/clTemplates/zgemm_gcn.cl
 create mode 100644 src/library/blas/matrix.c
 delete mode 100644 src/library/blas/xgemm.c
 create mode 100644 src/library/blas/xgemm.cc
 create mode 100644 src/library/blas/xscal.cc
 delete mode 100644 src/library/blas/xtrsm.c
 create mode 100644 src/library/blas/xtrsm.cc
 create mode 100644 src/library/common/md5sum.c
 create mode 100644 src/library/common/rwlock.c
 copy src/library/tools/{tplgen => bingen}/CMakeLists.txt (61%)
 create mode 100644 src/library/tools/bingen/bingen.cpp

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/clblas.git



More information about the debian-science-commits mailing list