[libclc] 71/79: amdgcn, popcount: Workaround broken llvm.ctpop intrinsic on some GCN ASICs
Andreas Boll
aboll-guest at moszumanska.debian.org
Mon Mar 19 16:51:02 UTC 2018
This is an automated email from the git hooks/post-receive script.
aboll-guest pushed a commit to branch master
in repository libclc.
commit 034def98aad1f5314a4ec56a07e66f6a8a0cb0db
Author: Jan Vesely <jan.vesely at rutgers.edu>
Date: Thu Mar 8 18:58:07 2018 +0000
amdgcn,popcount: Workaround broken llvm.ctpop intrinsic on some GCN ASICs
This is only really needed for VI+ ASICs. However, llvm would cast the value to
i32 for older asics anyway. The proper fix is in LLVM-7 (r326535).
Fixes CTS popcount on carrizo.
Reviewer: Aaron Watry <awatry at gmail.com>
Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/libclc/trunk@327044 91177308-0d34-0410-b5e6-96231b3b80d8
---
amdgcn/lib/SOURCES | 1 +
amdgcn/lib/integer/popcount.cl | 6 ++++++
amdgcn/lib/integer/popcount.inc | 17 +++++++++++++++++
3 files changed, 24 insertions(+)
diff --git a/amdgcn/lib/SOURCES b/amdgcn/lib/SOURCES
index 8e14ce2..6a5ce00 100644
--- a/amdgcn/lib/SOURCES
+++ b/amdgcn/lib/SOURCES
@@ -1,4 +1,5 @@
cl_khr_int64_extended_atomics/minmax_helpers.ll
+integer/popcount.cl
math/ldexp.cl
mem_fence/fence.cl
synchronization/barrier.cl
diff --git a/amdgcn/lib/integer/popcount.cl b/amdgcn/lib/integer/popcount.cl
new file mode 100644
index 0000000..ebd167d
--- /dev/null
+++ b/amdgcn/lib/integer/popcount.cl
@@ -0,0 +1,6 @@
+#include <clc/clc.h>
+#include <utils.h>
+#include <integer/popcount.h>
+
+#define __CLC_BODY "popcount.inc"
+#include <clc/integer/gentype.inc>
diff --git a/amdgcn/lib/integer/popcount.inc b/amdgcn/lib/integer/popcount.inc
new file mode 100644
index 0000000..402ddb7
--- /dev/null
+++ b/amdgcn/lib/integer/popcount.inc
@@ -0,0 +1,17 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE popcount(__CLC_GENTYPE x) {
+/* LLVM-4+ implements i16 ops for VI+ ASICs. However, ctpop implementation
+ * is missing until r326535. Therefore we have to convert sub i32 types to uint
+ * as a workaround. */
+#if __clang_major__ < 7 && __clang_major__ > 3 && __CLC_GENSIZE < 32
+ /* Prevent sign extension on uint conversion */
+ const __CLC_U_GENTYPE y = __CLC_XCONCAT(as_, __CLC_U_GENTYPE)(x);
+ /* Convert to uintX */
+ const __CLC_XCONCAT(uint, __CLC_VECSIZE) z = __CLC_XCONCAT(convert_uint, __CLC_VECSIZE)(y);
+ /* Call popcount on uintX type */
+ const __CLC_XCONCAT(uint, __CLC_VECSIZE) res = __clc_native_popcount(z);
+ /* Convert the result back to gentype. */
+ return __CLC_XCONCAT(convert_, __CLC_GENTYPE)(res);
+#else
+ return __clc_native_popcount(x);
+#endif
+}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-opencl/libclc.git
More information about the Pkg-opencl-commits
mailing list